The Once and Future Wallace

 

Real World Studies II: Stream Basins Morphometry.
B: Secondary Tests.


Introduction

     The twenty-five stream basins that were examined as explained in the last section were also subjected to a series of secondary analyses designed to explore various deductions about them that can be drawn from the model under study here. So as not to try the reader's patience unduly, these will next be described individually--but as succinctly as possible, and leaving the reporting of results to the very end.

Analyses Carried Out

     1. With the possible exception of one of the twenty-five (which it will be recalled barely missed, and was smaller, somewhat vaguely bounded, and in part artificially landscaped), all achieved the same kind of matrix-level symmetry of resulting z-scores when double-standardized. On the basis of the model under discussion here this should be due to the four-class classifications in this instance reflecting actual, functional, structure within the system. We already know from the two-dimensional spatial simulations, however, that there are many class-level patterns that will not yield this particular kind of symmetric matrix result. That is to say, one should expect that not just any partitition of sampled elevations will produce autocorrelative patterns that when summarized and double-standardized lead to such a symmetric matrix of z-scores. I investigated this distinction in two ways. In the first, the original data (elevations and accompanying locations) were re-allocated into four classes based on the numbers of points in each class actually occurring, but re-arranged in different order. Thus, if classes one, two, three, and four of the analysis contained, respectively, the first (highest) twenty, the next (highest) thirty, the third (highest) fifty, and the last (lowest) eighty, in the new test analysis the class containing the highest elevations might now have the highest eighty, and so on. Three new re-groupings were examined for each of the twenty-five original sets of data. The expectation was that of these new seventy-five analyses, a nontrivial number of the resulting autocorrelative patterns would not double-standardize to symmetric conditions. It was also expected, though not as strongly, that the mean r values (associated with the correlation matrices of the autocorrelation data) would now turn out somewhat higher than the ones connected to the real (perhaps I should say, "untampered with") analyses.

     2. As a slight extension on #1, the twenty-five re-arrangements most resembling the actual arrangements were looked at separately, as were the twenty-five re-arrangements that were the most different, and the twenty-five that were in between. So, and for example, if the real classification ended up with class 1 through 4 totals of sample points of 90, 40, 30, 125, the "most resembling" one might be a re-arrangement into 90, 30, 40, 125, and the "least resembling" one, 40, 125, 90, 30. Again, these re-classifications pertain to the rank order of all the elevations/points in the sample: I was merely resetting the elevation limits between classes 1 through 4 in each instance (not illogically putting high elevation points into low elevation classes). Here, the expectation was that those that had been reset the least from the actual analyses would end up with the fewest results typified by asymmetric matrices of double-standardized z-scores. The mean r values were again (weakly) expected to be higher than in the real analyses.

     3. In a parallel study to #1 and #2, I simply arbitrarily assigned new numbers of points to the classes. So, if in the actual analyses the class 1 through 4 totals of samples were, say, 55, 63, 113, 122 (a total of 353 points), one new reclassification might have 50, 30, 20, 253 as the totals in the four classes. Again, the expectation was that the new elevation limits between classes of heights, being set arbitrarily and apparently not reflecting any real structure, would tend to produce a number of asymmetric double-standardized results. Three resettings of this type were done for each of the original twenty-five data sets. The mean r values were again (weakly) expected to be higher than in the real analyses.

    4. I also performed an analysis directly comparable to the "real" analyses reported in the last section, but removing three of each four sample points from consideration so as to be left with a less densely sampled basin. In theory, the smaller the sample size, the less likely one would be to capture the essential structural/pattern variation within the system through it; the anticipated results: some of the output double-standardized matrices might start to show asymmetry, and the mean r values should rise a bit.

     5. I also performed several sets of analyses of a completely different sort, employing multiple regression techniques. In these studies, the object of attention was the varying mean r values of the correlation matrices connected to the autocorrelation scores used as input to the double-standardization operations. I have suggested in some of the other spatial systems write-ups here that this mean r statistic might be used as the measure of a system's level of internal redundancy (a not very difficult stretch), and quite possibly its level of internal disequilibrium as well (a somewhat harder sell). In any case, the systems set up here do seem to double-standardize to symmetric z-scores more frequently/easily as the accompanying mean r scores decrease. For the stream basin systems described in the last section it seemed likely that, while some reduction of the mean r scores would occur with finer sampling of the patterns, they would never reduce all the way to zero. On the other hand, it also seemed likely that secondary measurable characteristics of the drainage basins might be used to construct multiple regression models significantly predicting the variation among the r scores. Secondly, it seemed likely that the best such models should arise from the best classifications; that is, the fuzzier the picture of the actual structure in the system, the fuzzier too should be the ability of various secondary statistical surrogates to capture the variation inherent in the inferior models. Otherwise put, where underlying structure has been poorly elicited through this model, one should expect surrogates for that structure to do a poorer job at reconstructing it than when it has been more precisely elicited.

     Ideally, a surrogate measure based on lack of regularity of the contours of elevation might be used to represent degree of equilibrium (i.e., between erosional and depositional forces), but I couldn't come up with one that wouldn't require hand-measuring the entire array of twenty-five sets of spatial data all over again (and doubtlessly in a manner taking a good deal longer for each cell than merely interpolating its center's elevation). I settled instead on the use of several statistics designed to measure the overall pattern of allocation of number of sample points into the four classes, for example: (1) the standard deviation of the number of cases in each 4-class classification, converted to proportions of total, and (2) the number of cases in the largest of the four classes divided by the number of cases in the smallest of the four classes.

     I also generated three measurement error surrogates, reasoning that those basins returning higher mean r values might have done so in significant part merely because of variations in the quality/precision of the mapped information itself. The three that I created were: (1) whether the base map used was derived from surface triangulation methods or aerial photography (2) whether the base map used was an original map, or an nth generation copy thereof (3) a subjective rating of the degree of vagueness of basin boundary. Secondarily, it could also be projected that these measures, if significant at all, would be more likely to have a goodly effect on the results obtained from the best-delineated system data sets than on parallel sets based on smaller samples or variously manipulated classifications.

     It would take several pages to explain how this general strategy of using multiple regression models was applied to secondarily analyze the data produced in the preceding section write-up, analyses one through four noted above, and a few additional efforts suggested after the fact by some of the results from the same, and I will not bore the reader with details. In general, it may be said that it was expected beforehand that none of the secondary studies were expected to produce as "clear" (significant) results as those connected with the most efficient classifications described in the last section.

Results

     I could provide the reader with a blizzard of statistical detail describing the results of the twenty-some-odd analyses alluded to above, but fortunately for the moment I can dispense with such detail: *all* of these analyses produced the projected results. Specifically: (1) the re-analyses of the stream data using one-quarter as many cases (as described in #4 above) yielded three of twenty-five that did not double-standardize to symmetry (instead of one), and produced a mean of the mean r values for the twenty-five spatial autocorrelation matrices of .044, considerably higher than it was for the full data sets (.034), and suggesting that the coarser sampling did in fact affect the ability to discern structure; (2) the re-assignment exercises (#2, #3, and #4 above) in fact resulted in double-standardizations that included from about fifteen to thirty-five percent asymmetric matrices, with the ones that were more severely manipulated producing relatively more asymmetric results; (3) all of the multiple regression analyses came out as expected, with the "real" data providing markedly better models than did the contrived ones; (4) the three measurement error surrogates all behaved as I expected them to, being weakly significant predictors (with the right sign) in the real data-based regressions, but weaker yet in all the variously altered data-based regressions.

     Although only about half of these secondary studies produced results that were individually dramatic, their overall consistency is a strong argument in support of the kind of structures predicted here. Let us continue on to another context, this one at a rather larger scale.    

_________________________


Return to Writings Menu
Return to Home

Copyright 2006 by Charles H. Smith. All rights reserved.
Materials from this site, whole or in part, may not be reposted or otherwise reproduced for publication without the written consent of Charles H. Smith.

Feedback: charles.smith@wku.edu
http://www.wku.edu/~smithch/once/streams2.htm