Global Spatial Autocorrelation Indices - Moran's I, Geary's C and the General Cross-Product Statistic
The analysis of spatially located data is one of the basic concerns of the geographer and is becoming increasingly important in other fields (Cliff and Ord, 1973). The assessment of spatial autocorrelation is generally considered to be one of the primary tasks of geographical data analysis (Hubert and Arabie, 1991). Moran (1950) notes that the study of stochastic processes has naturally led to the study of stochastic phenomena distributed in space in two or more dimensions which is important in such investigations as the study of soil fertility distribution or the relation between velocities at different points in a turbulent fluid.
I will define spatial autocorrelation. I will explain how spatial autocorrelation works through explaining the general cross-product statistic. I will show that the common spatial autocorrelation measures of Morans I and Gearys C are simply variations of the general cross-product statistic. I will also provide worked examples of the common spatial autocorrelation measures for which there are well-defined tests of significance - which do not require randomization. I will then present a program that can be used to measure spatial autocorrelation called Rooks Case v0.9.
There are a number of different definitions of spatial autocorrelation. Upton and Fingleton (1985) define spatial autocorrelation as a property that mapped data possess whenever it exhibits an organized pattern. However, this definition is subjective because the exhibition of an 'organized pattern' can mean many things. The authors say that spatial autocorrelation exists whenever there is systematic spatial variation in values across a map, or patterns in the values recorded at locations with the locations given. If high values at one locality are associated with high values at neighboring localities the spatial autocorrelation is positive and when high values and low values alternate between adjacent localities the spatial autocorrelation is negative (e.g., a checkerboard). Thus, Upton and Fingleton (1985) say that it is more useful to define spatial autocorrelation by means of understanding lack of spatial autocorrelation. That is, if there is no connection between the variables (Xi, Xj) at any pair of regions (i, j) in the study area, then the data exhibits a lack of spatial autocorrelation. In other words, a lack of spatial autocorrelation should be found in a mapped pattern that does significantly deviate from a map where each value Xi was assigned randomly with equal probability to each (i,j) location on the map. Thus, notes Griffith (1991) and Goodchild (1987), spatial autocorrelation deals simultaneously with both location and attribute information. In order to determine if the values in a mapped pattern deviates significantly from a pattern in which the values are assigned randomly requires some sort of index of comparison.
Cliff and Ord (1973) put spatial autocorrelation in simple terms, namely, that it is often necessary to see if the distribution of some quality or quantity in the counties or states of one country makes its presence in neighboring counties more or less likely. In other words, if the presence of a quality in a county of a country makes its presence in neighboring counties more or less likely, we say that the phenomena exhibits spatial autocorrelation. Such investigations can be extended to any lattice of regions or set of points at which the value of some variate X can be obtained. Sokal and Oden (1978a) say that spatial autocorrelation analysis tests whether the observed value of a nominal, ordinal, or interval variable at one locality is independent of values of the variable at neighboring localities. However, in a companion paper Sokal and Oden (1978b) say that spatial autocorrelation analysis tests whether the observed value of a variable at one locality is significantly dependent on the values of the same variable at neighboring localities. These are just two ways of saying the same thing. Goodchild (1987) says that in its most general sense spatial autocorrelation is concerned with the degree to which objects or activities at some place on the earth's surface are similar to other objects or activities located nearby and reflects Tobler's first law of geography that "everything is related to everything else, but near things are more related than distant things".
Most published measures of spatial autocorrelation can be recast as a (normalized) cross-product statistic that indexes the degree of relation between corresponding entries from two matrices - one specifying the spatial connections among a set of n locations, and the other reflecting a very explicit definition of similarity between the set of values on some variable x realized over n locations (Hubert and Arabie, 1991).
In spatial autocorrelation analysis some measure of contiguity is required. Contiguity has a rather broad definition depending on the research question, however, most analyses in spatial autocorrelation adhere to a common definition of neighborhood relations. Namely, neighborhood relations are defined as either rooks case, bishops case or queens (kings) case. These are rather simple and intuitive as their names suggest (Figure 1). Rooks case contiguity is by
Figure 1: Different definitions of contiguity
a neighborhood of 4 locations adjacent to each cell, Bishops only considers the diagonals of the relation and queens or kings case considers a neighborhood of eight cells. These are the most common forms of contiguity used in spatial autocorrelation when considering continuous data in a raster format. Of these three the rooks case is the most commonly used and most programs only will compute this particular case. When I talk about the contiguity matrix or weight matrix below, only those cells that have a rooks case contiguous relation are given a value of 1.
However, this type of contiguity is not sufficient for vector data formats or irregularly spaced points. In these cases distances to the four or even n nearest neighbors can be used or distance between a variate X and all neighbors can be used. In vector topological systems like ArcView 3.0 for Win95/NT4.0 the polygon attribute table PAT or arc attribute table AAT defines contiguity and can be used instead.
All measures of spatial autocorrelation have a common root, that is, a matrix cross-product. This is referred to as the general cross-product statistic (Hubert et al. 1981; Upton and Fingleton, 1985).
Where the Wij matrix is called the connectivity, contiguity or spatial weight matrix whose values are a function of some measure of contiguity in the original data (e.g., rook's, queen's or bishop's cases). The Cij is a measure of the proximity of values i,j on some other dimension (e.g., Euclidean distance, spherical distance, Manhattan distance et cetera.). The general cross-product statistic can be thought of as the general linear model of spatial autocorrelation. It works in the following way, consider the mapped pattern below (Figure 2),
Figure 2: Cell locations and Values
The Wij matrix is determined by creating a 6 x 6 grid where the columns and rows are labeled by the cell locations (Figure 3). When two cells are adjacent in the original data we enter a 1 in the Wij matrix, if two cells are not adjacent we enter a 0. For example, in Figure 2, the location a has location b and d as a rooks case adjacency (see Figure 1 - diagonal neighbors are excluded). Therefore, in the Wij matrix in Figure 3, a number 1 will be placed in position ab and ad in the first column and all other entries in that column will be given zero. Likewise cell e has neighbors on all four cardinal directions, above, below and on either side, and so the number 1 is entered into the coordinates, eb, ed, ef, and eh in the Wij matrix (Figure 3). This is repeated for each of the cells in the original matrix in Figure 2 until all of the locations in the W matrix are full (Figure 3).
Figure 3: Wij matrix for map in Figure 2.
The next step before determining the cross-product is to create the Cij matrix (Figure 4) which is of the same proportions as the Wij matrix but where entries are based on some other measure of distance between all cells. Let us take for example the squared Euclidean distance (xi xj)2. To do this, we take from Figure 2, cell a (called xi) and subtract it from each of the other values in the map and square the difference, eg., b, c, d, e, f, g, h, etc. For example in Figure 4 column a row a is (a-a)2 from figure 1 and column a row b is (a-b)2 et cetera.
Figure 4: Cij matrix based on squared Euclidean distance
In order to generate a cross-product value these two matrices are multiplied and summed. The multiplication of these two matrices is seen in Figure 5.
Figure 5: The WijCij Matrix for the General Cross Product based on Figure 2-4
Taking the sum of Figure 4 arrives at a cross product of 120. However, this value does not signify anything because it has no context. The cross-product statistic tells us nothing about the structure of the original map because we do not know in what context to put the value of 120.
I believe that the general cross-product is called a statistic because it describes a possible spatial relation of values within the map. The equation presents only a single value out of a possible n! number of values that could be found under randomization of the map. That is, if there are nine distinct locations with values then there are n! possible ways which those values could be assigned to the various locations. So, to determine what the value of 120 signifies, the map can be permuted randomly 9! = 362880 times and each time the value of WijCij can be found. This will produce a distribution from which a standard error can be attained. The value of 120 can then be determined to be either extreme or common. If the value of WijCij is found to be a rare occurrence under randomization then it can be inferred that there is likely some process that lead to the distribution observed in the mapped values.
To see how this worked, I computed the value for WijCij under randomization 20 000 times in order to produce a distribution for WijCij for the mapped values. Here the values from 1 to 9 were randomly assigned to one of the cells in Figure 2 with replacement with equal probability of assignment and order being important. The value at one cell location is independent with those of adjacent locations (Figure 6).
Figure 6: General Cross Product Results for Matrix in Figure 2 under 20000 trials.
Figure 6 shows that under 20000 trials in permuting figure 2 and taking a WijCij value each time that the standard deviation of these values is 132.36 and the computed WijCij value of Figure 2 is 120. Thus this value of 120 does not seem to be very extreme. However, note that the distribution of WijCij values is somewhat skewed which may affect the results. Possibly I cannot assume that 120 is not significant.
The general cross-product is a statistic in the sense that the derived matrix is a sample of a number of possible matrices. The value of the cross product can be compared to the range of values that might be produced if a number of maps with the same set of values were created by complete random assignment of values to locations. The original map (e.g., Figure 2) contains 9 distinct values each at a distinct location. However, this map can be thought of as a permutation or one possible particular arrangement of the 9 values. In total however, there are 9! possible permutations given the same set of numbers. That is to say, there are 362880 different maps that could be produced randomly if each of the nine original values were randomly assigned - with replacement - to an individual location. If this was done then we could calculate 9! different cross-product indices. Thus we would have created a distribution for the cross-product statistic. We could then see whether our calculated T=120 is an extreme case (unlikely to be due to chance) or is an outcome or permutation that is very likely due to chance. We could then use the general cross-product as a means of assessing this mapped pattern/organization in a meaningful way. Obviously, even with a small map with only nine localities deriving a randomized distribution would be a computational burden, particularly if one did it by hand it could take up to four years! Upton and Fingleton (1985) suggest a Monte Carlo approach where a random sample is taken from the n! possible map permutations and the significance of the derived statistic is compared to this random distribution. They also suggest that if there are a large number of samples (locations) under consideration, and especially if join-count, Morans I and Gearys C are used then normal approximations or those derived from randomization experiments are usually sufficient to test departures from the null hypothesis of complete spatial randomness in the assignment of values to locations on the map.
However, through randomization Hubert et al. (1981) show that the general cross-product statistic can add additional flexibility to task specific needs and does not require a theoretical distribution in order to test significance of the result. This allows the Cij matrix to be customized to specific applications where you may want to measure the distance between objects in the Wij in different ways rather than those dictated by the Moran, Geary or the Cliff-Ord spatial autocorrelation measures. The idea is that if the values in the map pattern are considered an unusual outcome of a random process then it is likely that the particular permutation has been caused by some rule governing the distribution of the values and that some process has controlled the spatial distribution (Upton and Fingleton, 1985).
There are a number of general measures of spatial autocorrelation that are derived from the general cross product statistic: Morans I, Gearys C, the Cliff-Ord Statistic and to a lesser extent the Join-Count statistic.
Moran (1950) introduced the first measure of spatial autocorrelation in order to study stochastic phenomena which are distributed in space in two or more dimensions. Moran's I has been subsequently used in almost all studies employing spatial autocorrelation (e.g., for a review see Upton and Fingleton, 1985). It is analogous to the conventional correlation coefficient because its numerator is a product moment term (Sokal and Oden, 1978). And like a correlation coefficient the values of Moran's I range from +1 meaning strong positive spatial autocorrelation, to 0 meaning a random pattern to -1 indicating strong negative spatial autocorrelation. This particular statistic is designed for the measurement of spatial autocorrelation of ordinal, interval or ratio data. The Moran I is given as,
Here, Wij is the binary weight matrix of the general cross-product statistic, such that Wij.=1 if locations i and j (e.g., two different cells or points) are adjacent and zero for all cells, points or regions which are not adjacent and by convention Wii=0 (a cell or region is not adjacent to itself). The Cij is given by the (xi-mean(x))(xj-mean(x)) or the product of the distance of the value of xi at location i and xj at location j from the global mean of the z-values (Figure 2). The numerator (xi-mean(x))(xj-mean(x)) means "take the value of the cell being considered, i (e.g., cell a in Figure 2), subtract the mean of all z-values (mean(x)) from this i (cell a) and then multiply this by the the mean subtracted value in cell j". Do the same for each j in the original data matrix and then goto the next i value and repeat. Algorithmically the creation of the Cij matrix is a simple double loop, e.g.,
For example, from Figure 2, the mean is 5, so we take the first value (1-5) and hold this constant and multiply that by itself (1-5) and enter this into the Cij matrix in row a column a (Figure 7). Then, we take the second value (4-5 - second column first row) and multiply this by (1-5) and enter the value into the Cij matrix in column a row b. Then we take (1-5) again and multiply this by (7-5 - third column first row) and put the entry into the Cij matrix under column a row c. We continue doing this for all of the values until the first column a in the Cij matrix is full (Figure 7). Then we move to the second column first row in Figure 2 and repeat this until the column in the second row b of the Cij matrix is full et cetera. The result is a Cij matrix representing (xi-mean(x))(xj-mean(x)) in Figure 7.
Figure 7: Cij Matrix for Figure 2 under Moran's I
Now we multiply this by the Wij matrix (Figure 8) for Figure 2.
Figure 8: Wij Matrix for Figure 2 under Moran's I
The result of WijCijis Figure 9.
Figure 9: WijCij Matrix for Figure 2 under Moran's I (Figure 8 x Figure 7)
Of course, to achieve Figure 9 we only need pay attention to those Cij values which have a corresponding Wij value of 1. We could have easily modified Algorithm 1 to do this without creating any matrices at all, for example, notice that each row (or column) in the Wij matrix (Figure 8) provides all of the adjacency information for each cell in the original data matrix (Figure 2). That is, the Wij matrix tells us that cell a (Figure 2) (Figure 8 row a) is adjacent to cells b and d. For the WijCij value then, we need only be concerned with (xi-mean(x))(xj-mean(x)) values which have a corresponding Wij = 1 value. For example,
In Algorithm 2, if one is using irregularly spaced xyz data, the W(i,j) matrix itself could be substituted with a distance measure in the "If" statement on line 5, such that only cells within a certain distance of each other are considered adjacent. In any case, the WijCij matrix (Figure 9) is summed which is equal to the value of 80 in this example. Which yields gives us so far a Moran's I equation of:
Now, n is the number of elements in the original matrix of Figure 2 which is 9 elements. The term is simply the sum of the Wij matrix which is 24 (that is there are 24 ones in the matrix) and the term is the sum of each value in the original array Figure 2 (e.g., (1-5)2+(2-5)2+(3-5)2 .(9-5)2) minus the mean which is in this case equal to 60. Therefore we simply substitute these values into the remainder of the equation to arrive at a value for Moran's I:
And so Moran's I for Figure 2 is equal to 0.5. This value of 0.5 signifies positive spatial autocorrelation or the idea that similar values on the map tend to cluster together.
Note: From the equation is clear that the term in the denominator cannot be zero because the equation would be undefined in such a case. In other words, if this term is zero then there is no variance in the original data and the I measure is inappropriate. This seems intuitively problematic since a field with no variance is perfectly autocorrelated since all values are similar, that is, exactly the same. Furthermore, cannot be zero if there are adjacency relations considered. The only case where Moran's I equals zero is when equals zero.
The statistical significance of I can be assessed through a number of formulae that have been derived by either the normal approximation or by randomization experiments. For the general cross product I used randomization. However, for Moran's I there are formulae for the normal approximations and for randomization but the normal formulae are far simpler than those for the randomization tests. First, under the normality assumption the variance of I (VarN(I)) is given as:
whereas, the variance of Moran's I under Randomization - VarR(I) - is given as:
The following variables in the variance equations are defined as:
The Standard deviation and standard z-scores of I are given by:
Significance of Spatial Arrangement in Figure 2 under Moran's I.
Goodchild (1987, p.24) states that the simplest hypothesis test regarding spatial autocorrelation exhibited by a sample of n classes is that the sample was drawn from a population in which autocorrelation is zero. Goodchild also discusses at length the interpretations of the word "drawn" in the context of resampling and randomization.
The significance of a particular autocorrelation statistic can be derived in two ways: First, by a randomization procedure whose null hypothesis proposes that the sample is one randomly chosen possibility from among the n! possible arrangements of the observed attributes among the n locations (Goodchild 1987, p.24). Where each of the values in the map are assigned randomly n! times to each of the locations; Second, significance can be tested using statistics designed for SA based on the normal approximation or randomization above. Using Rooks Case v0.9 I compare these three cases for Moran's I for the permutation in Figure 2 with which has Moran's I = 0.5.
First, Griffith (1987 - Appendix E) provides the argument which shows that under randomization or normality the expected value of Moran's I is -1/(n-1) and not zero. However it is obvious that as n increases this expectation approaches the value of zero. For example, the expected value of Figure 2 if the values were randomly assigned to each location would be -1/(9-1) = -0.125. Thus, if the value of Moran's I was equal to -0.125, the spatial distribution is no different than if the values were assigned randomly to each location and there would be no spatial autocorrelation evident.
To approximate this expected Moran's I value, I used the randomization option of Rooks Case v.0.9 to compute 20000 values of I under randomization (Figure 10).
Figure 10: Random matrix generation and randomization module of Rooks Case v.0.9. Creates 20000 values of Moran's I and Geary's C where each I and C value is from a random permutation of the values in Figure 2 assigned with equal probability without replacement to the locations in Figure 2.
That is, the mapped values in Figure 2 were assigned with equal probability to each of the 9 locations twenty thousand times. After each random assignment, the values were then tested for their Moran's I and Geary's C simultaneously. The distribution of I under 20000 experiments (randomization) is seen in Figure 11.
Figure 11: Distribution of Moran's I under randomization of Figure 2's values with 20000 trials
Figure 11 shows that the values of Moran's I are approximately normal under randomization, however, the histogram seems somewhat positively skewed, I attribute this to the small number of samples 20000+ out of 9!. The mean of the random samples of I is -0.128 which is very close to that expected under randomization - 0.125. Thus the small number of random samples do tend to converge to a mean around that expected analytically. The standard deviation is 0.242 from these 20000 permutations. From the normal equation the standard deviation is given as .23049 and from the randomization equations presented above, the standard deviation is .24431. Thus, the standard deviation from the 20000 permutations compare favorably with those derived theoretically for this small sample.
Figure 12: Statistics for Moran's I and Geary's C under randomization and the normality assumption using Rooks Case v.0.9.
Second, using Rooks Case v.0.9 which computes the variance and standard z-scores of Moran's I and Geary's C under both the normality assumption and randomization assumption, I computed the z-scores and variances (Figure 12) for the Values in Figure 2 to compare these expected values with those derived from my randomization experiment. Figure 12 shows that for the spatial arrangement of values in Figure 2 where Moran's I equals 0.5 that z = 2.711 under the normal approximation and z = 2.558 under randomization. This pattern is significant at the 0.05 level (1/20 probability that this pattern is due to chance). Furthermore, the variance and standard deviation are very close to those found under randomization. That the pattern observed in Figure 2 is significantly different from zero spatial autocorrelation is also evident under randomization where almost none of the values in the distribution of I are greater than 0.5 (see Figure 10). Therefore, the analytic and randomization approximations are consistent for this small n = 9 observations. Sokal and Oden (1978) note that the standard errors under randomization or normality do not greatly differ for most applications where n is fairly large, e.g., > 10 observations.
Moran's I is simply a special case of the general cross-product statistic where the values of the variates at locations i and j are standardized distances from the mean of all values. The only difference being that the numerator has a variance term.
Geary's C is based upon a paired comparison of juxtaposed map values. It is given as:
and ranges between 1 and 2. The expectation under randomization for n! samples is 1. Positive spatial autocorrelation is found with values ranging from 0 to 1 and negative spatial autocorrelation is found between 1 and 2. However, values can be found greater than 2 on occasion (Griffith, 1987). As similar values are in juxtaposition the numerator which measures the absolute difference squared between juxtaposed values will tend towards zero. Whereas, as non-similar values become juxtaposed the statistic will tend towards larger values in the numerator and thus towards its maximum value of two. The significance of Geary's C statistically speaking is given by:
All variables are the same as defined for the Moran's I significance tests. Sokal and Oden (1978) note that empirical results computed by both Moran's I and Geary's C are similar they are not identical. The numerator of Geary's C will tend to be sensitive to absolute differences between neighboring variates due to the squared term in the numerator making up the Cij component. I will not provide an explicit example for Geary's C because it is computed identically to Moran's I with the exception of the numerator Cij term which does not include the difference of each variate from the mean but rather each variate from all other variates.
The significance of Geary's C was tested identically to that for Moran's I. In fact, Rooks Case v.0.9 was used for the randomization (Figure 10) and calculates the C values concurrently with the I values on each random matrix. The expected mean of the randomization of n! experiments is 1. The result of 20000 trials is seen in Figure 13.
The mean of Geary's C for the 20000 trials is very close to 1 and is 1.002623. The standard deviation from the randomization example is 0.232. The expected variance under the randomization and normal approximation given by Rooks Case is 0.2357 for the normal approximation and 0.2337 under the randomization equation given above. Therefore, this example shows that the formulae for testing the significance of Figure 2 for Geary's C are quite accurate and can be substituted for the computationally prohibitive randomization distribution. From what I know of the central limit theorem the differences between randomization distributions and the formulae for variance will become less apparent as n increases.
Figure 13: Distribution of Geary's C under randomization using 20000 trials
Griffith (1987) notes that simulation experiments suggest that the inverse relationship between Moran's I and Geary's C is basically linear in nature. Departures from linearity are ascribed to differences in what each of the two indices measure, that is, Geary's C deals with paired comparisons and Moran's I with covariations. From the randomization experiment above Figure 14 compares the relation between Moran's I and Geary's C.
Figure 14: Relation between Moran's I and Geary's C for 20000 statistics generated using Rooks Case v.0.9
Figure 14 suggests that the relation between Moran's I and Geary's C is linear and either statistic will essentially capture the same aspects of spatial autocorrelation.
Most often spatial autocorrelation is considered for raw data at various locations. Another application of spatial autocorrelation is in model specification. Specifically, in the testing of regression residuals for spatial autocorrelation. The use of spatial autocorrelation in the residuals of a regression is similar to that of regular temporal autocorrelation in the residuals of a regression. Cliff and Ord (1971) state that spatial autocorrelation among residuals could imply:
© University of Ottawa
If you are looking for additional information, please contact us.
Technical questions or comments about this site? Last updated: 2009.11.23