Real World Studies II:
Stream Basins Morphometry.
Now technically speaking, we are not dealing with perfectly two-dimensional systems here: there is vertical range in the topographical component of any drainage basin, and beyond this the general curvature of the earth. Nevertheless, if one begins with relatively small systems the curvature complication becomes negligible (and in any case may be argued irrelevant on gravitational grounds), and in this instance the topography itself is the measured variable whose pattern in two dimensions is really what we are looking at. (For those who are still not convinced, I actually did run a parallel set of analyses to the ones described below in which I calculated distances between sample points not on the basis of their relative two-dimensional coordinates, but instead within three dimensions: sample point as intersecting the earth at x feet above sea level. The results differed only to the most trivial degree from the strictly two-dimensional ones.) What we are left with is a system comprised of areally distributed potential energies; that is, a "field" of varying elevations above sea level every location within which has a potential for doing work that is directly proportional to its elevation. In the case of the earth's internal zonation described in the last write-up, the four zones had evolved as a function of very large-scale forces exerted rather evenly and consistently over many millions, even billions, of years. Stream basins, however, are dynamic over a much shorter period of time and of course are less closed: with changes in sea level due to glaciation episodes or major tectonic events (or even local events such as stream piracy) energy conditions across any basin may change relatively rapidly. Thus, equilibrium within its surrounding environment may prove, even if it becomes possible under just the right conditions, fleeting. As a result, one finds few such systems within which all portions of the basin are evenly balanced in terms of being exactly transitional between erosional and depositional environments. As we will see in the next write-up, the conditions of non-equilibrium within stream basins provide a number of potential secondary tests of the model under discussion here, but for the moment we will concentrate on the primary issue. If indeed the areal distribution of potential energies (elevations) within such systems can be interpreted through the model, we should expect that a(n efficient) classification of the elevations into four maximally-different ranges of elevation will yield patterns whose class-level spatial autocorrelation properties, again represented as a four by four matrix of values, will double-standardize into a symmetric arrangement of z scores. This should be true of all natural basins, though again, there is no initial reason to expect that our measures of equilibrium/redundancy (the mean r value of the correlation matrix for the spatial autocorrelation scores, and the total of the absolute values of the column means) should necessarily closely approach zero in value as they did for the earth zones model. In this instance there will be no a priori separation of zones as in that problem, so the zones must be derived. Here there is good reason for discussion as to what constitutes a proper approach to such derivation. I have begun with what seemed the most rational starting point: taking a regular triangular grid-based sample of elevations across each basin, ordering the resulting vector by heights, and then subjecting the resulting data to a nonhierarchical, information statistic-based clustering algorithm to identify its most efficient partitioning into four ranges of elevation values. Mapping the results of a given classification, we can see a pattern of classes that resembles a topographic map (the difference being that the isolines separating ranges of elevations are not necessarily spaced at equal intervals). [A methodological note: Those familiar with the use of information statistic-based nonhierarchical clustering algorithms are aware that the solutions are reached through an iterative process. This means they are prone to accidentally becoming "entrapped" within local minima, and reporting suboptimal results. The way around this problem is to initialize the process at a variety of starting configurations, and ultimately select the particular solution that accounts for the most variation in the data. In the many analyses I performed here (two- through six-class solutions for each of the twenty-five basins) this was done obsessively, to an extent that I am quite sure the solutions I accepted for further analysis were either the best ones outright, or very very close to them.] Once this basic plan of investigation was thought out, twenty-five drainage basins were chosen for examination. Over half of the basins selected were from northeastern United States 7.5 minute (1:24000) USGS quadrangle series maps, with the remainder from other U.S. areas and map series at various scales. Data collection was painfully manual, with transparent triangularly gridded overlays: each pinpointed sample location fell randomly between successive contour lines, requiring a careful act of interpolation to retrieve an actual (estimated) value. In this first round of studies, the number of points sampled varied from around three hundred on the least sampled drainage basin to just over 550 on the most sampled one. These choices were a gamble: I figured that for an initial look at the matter a range of sources, geographical locations, and sample grid densities should be employed lest complaints be raised on this basis. However, it was unclear initially just how dense a sampling of the surfaces would be needed to reasonably capture the essence of the organization postulated. Table 1 provides basic stats on the basins studied:
Table 1. Drainage basins used in the study. In the regression studies described in the next writeup, the dependent variable (Y) consists of the means of the correlation coefficients associated with each system's corresponding set of spatial autocorrelation scores; the Xi variables are: X1=whether the base map was derived from surface triangulation methods or aerial photography; X2=whether the base map used was the original map, or an nth generation copy thereof; X3=the standard deviations of the numbers of cases in each 4-class classification, converted to proportions of total; X4=a subjective rating of degree of vagueness of basin boundary; X5=the percentage of variation explained given by the n=4 nonhierarchical cluster analysis models.
Inasmuch as the basins studied included a wide variety of system sizes, and evolved under a considerable range of climate types and geologies, I have little doubt at this point that just about any such system, if sampled adequately, will produce the same results (don't ask me at this point what to think about special cases such as karst landscapes and/or areas of internal drainage; one thing at a time!). This is pretty startling. It is, however, difficult or impossible to compare the results closely to any initial standard. The simulations I reported earlier that involved two-dimensional systems indicated that only a few percent of the patterns produced double-standardized results with the appropriate symmetrical Z scores condition, but these tests were run on variously-shaped areas and with several different numbers of sampled points, and through random assignment of classes, so a comparison of the actual systems to this standard is suggestive only. A further interesting discovery concerns the relative lack of variation among the arrays of z scores produced from the 25 analyses: most of them look pretty much like one another. Were much denser samplings of the basins taken, the resulting refinements in characterization of internal organization might well produce even greater similarity: that is, it seems actually possible that a single entropy-maximized matrix of scores might describe them all. Lastly here, we can consider two sets of statistics bearing on the meaning of the four-class classifications of within-basin elevations described here. As mentioned above, it should not be surprising to find here that, unlike the internal zonation patterns of the earth, there is no obvious division of each basin into four zones of elevation: simply, drainage basins are much more open systems that continually respond to all sorts of upsetting conditions that would prevent them from attaining conditions of dynamic equilibrium resulting in obvious zonations. That said, however, a close enough look at a basin's organization might yet reveal some tendencies in that direction. As it turns out, this particular data set does exhibit evidence of such. First, there is the mean of the mean r values connected with the correlation matrices for the twenty-five spatial autocorrelation matrices data that ultimately were double-standardized. In the two-dimensional simulations reported earlier, the mean r's across the eighteen groups of four-class solutions ranged from .029 to .198, with only one mean being below .050. Across the actual twenty-five analyses reported here, the mean r values were: for three-class classifications of the sample elevations, .0571; for four-class classifications, .0345; for five-class classifications, .0332; and for six-class classifications, .0295. Notice both the rather lower mean for the four-class real world systems than for the simulated systems, and the small absolute difference between the four- and five-class classifications in the real world data. One might interpret the first fact to indicate that a greater general level of equilibrium is associated with these real world systems than "just any system." In turn, one might interpret the second as suggesting a real world structural response specific to the four-class classification, as in the absence of such structure the decrease in r values would be more directly (logarithmically, actually) proproportional to the degrees of freedom involved. Were there no structure complicating the matter, one would expect the second of these four values (.0571, .0345, .0332, .0295) to be something close to .0400 or .0390. Meanwhile, if one examines the proportion of total variation explained statistic for each of the n-class clustering solutions for each drainage basin, another interesting fact emerges. One can average out the mean total variation explained, across all twenty-four basins, for a given value of n. It turns out these values are: for the 2-class solutions, .7045; for the 3-class solutions, .8530; for the 4-class solutions, .9122; for the 5-class solutions, .9415; for the 6-class solutions, .9592. If one starts from these numbers one can derive at each step the percentage of the remaining variation explained by the next step; thus: the 2-class solution explained 70.45% of the remaining variation, the 3-class solution added to this by explaining 50.25% of the remaining variation, the 4-class solution added to this by explaining 40.27% of the remaining variation, the 5-class solution added to this by explaining 33.37% of the remaining variation, and the 6-class solution added to this by explaining 30.26% of the remaining variation (at which point 100 - 95.92 or 4.08% remained unexplained). Again, if there were no internal structure going on in the internal arrangement of the basins to complicate things, one would expect a simple function of increase in explanation from one n-class solution to the next. This turns out to be: 70.45 - 50.25 - 39.35 - 33.45 - 30.26. Notice that the third number, 39.35, is almost one percent less than the actually recorded value of 40.27. Thus, the four-class solution is explaining more of the variation than one would expect in the absence of the assumption of structural complications. Further, if one turns to the four-class solutions alone and examines this pattern from basin to basin, something very interesting emerges. For the twenty-four basins whose patterns double-standardized to symmetric results, I correlated their associated mean r value (i.e., of the correlation matrices for the spatial autocorrelation scores) with each basin's "additional variation explained going from three-class solution to four-class solution" value. On the average, and per the last paragraph, this latter number is .9122 - .8530 = .0592, but of course it varies from basin to basin. The question was, is there a significant relationship between our measure of system equilibrium, and the degree to which the statistical explanation of the classification structure deviates from an assumption of no structural controls? The correlation coefficient of the relationship turns out to be a rather high r = -.7125 (with the sign being the anticipated negative). So, in these data, at least, it appears that in the four-class solutions in particular there is a heightened general connection between structural peculiarity of the system and its level of measurable equilibrium. Overall, these constitute pretty remarkable results and again generally support the validity of the overall spatial model being explored here. In the next write-up, I report a series of secondary analyses on the same basins that are equally revealing. _________________________
Copyright 2006 by Charles H. Smith. All
rights reserved. Feedback: charles.smith@wku.edu |