1 This paper was accepted in 2001 for publication in ...

Viewer
Transcript

This paper was accepted in 2001 for publication in the book Stochastic Modeling II being published by the American Association of Petroleum Geologists, but still not published. Available at http://www.esri.com/software/arcgis/arcgisxtensions/geostatistical/research_papers.html

Modeling the Semivariogram: New Approach, Methods Comparison and Case Study

Alexander Gribov1, Konstantin Krivoruchko13, Jay M. Ver Hoef2

1

Environmental Systems Research Institute, 380 New York Street, Redlands, CA 92373-8100

2

Alaska Department of Fish and Game, 1300 College Road, Fairbanks, AK 99701

3

All correspondence to second author

Abstract

Three steps are often used to model a semivariogram or a covariance: 1. Compute the empirical semivariogram or covariance, Choose a model among the family of valid semivariograms or covariances, and 2. Estimate the semivariogram or covariance by fitting the model to the empirical semivariogram or covariance. Once the semivariogram or covariance has been estimated, it can be used in a kriging interpolator. This chapter proposes some new methods both for computing empirical semivariograms and covariances (step 1) and for fitting semivariogram and covariance models to the empirical data 1

(step 3). For step 1, grid-based empirical semivariograms and covariances are described where the grid values smooth using triangular kernels. For step 3, model-fitting using an iterative weighted least squares modified from Cressie (1985) is described. Proposed model fitting algorithm is reliable for a wide range of data types and conditions. Comparison with restricted maximum likelihood estimation is discussed.

Introduction

Investigation of the structure functions, covariance and semivariogram, has a long history. Geostatistical textbooks usually present structural analysis and spatial interpolation in geology. However, both semivariogram and kriging were used in meteorology years before. Keller and Friedman, 1925, introduced a general form of the correlation function. They assumed that moments of order greater than two could be ignored when characterizing average motion in hydrodynamics. They considered covariance in four dimensions, three-dimensional space plus time. Kolmogorov, 1941, presented a simplification of the theory for locally homogeneous and isotropic turbulence in two dimensions. In particular, he showed that a velocity structure function (variogram) is proportional to two-thirds the power of the distance in a range for which the assumptions of local stationarity are true. After Kolmogorov’s publications on structural analysis, a number of case studies on structure functions were published, mostly in the Soviet Union. After developing optimal spatial interpolation in two dimensions (kriging) by Gandin, 1959, the variogram had been extensively used by meteorologists as a necessary step in spatial interpolation. In addition to a power model, they used such functions as power exponential and J-

2

and K-Bessel. Soviet meteorological monitoring networks in the fifties and early sixties included several dozen stations, and Gandin and his colleagues usually used all pairs of points to estimate the parameters of the variogram/covariance models, Gandin, 1963. Gandin mentioned that he used a least squares model-fitting algorithm to find the parameters of the model but did not provide details. Several reliable methods to estimate semivariogram and (cross)covariance are available nowadays. This chapter discuss, in detail, implementation in the commercial software of one of the most popular methods -- least squares model-fitting. A reason to prefer this method is that least squares estimates always look better than likelihood estimates when compared to the empirical semivariogram, because least squares find the best-fitting trend through the empirical semivariogram, while likelihood methods use actual data. However, geographers always feel better when they see visual confirmation of their expectations.

Empirical semivariogram estimation

Let z(sp) be a value observed at the pth location sp, where sp = (xp, yp) is the vector containing the x- and y-spatial coordinates. Define the lag, h = s2 − s1, as the vector from point s1 to point s2. The semivariogram cloud is defined as half the squared differences, γpq=0.5⋅[z(sp) − z(sq)]2 for all possible pairwise lags {(sp, sq); p, q = 1, 2,…,n}, plotted as a function of the distance ||h|| = ||sq − sp|| = [(xp − xq)2 + (yp − yq)2]1/2. Currently typical applications deal with hundreds and sometimes thousands of data locations, and it is often difficult to see any pattern because there are so many pairwise lags, Figure 1. 3

Not only it is difficult to see any pattern, but fitting a semivariogram model to so many data is difficult even for modern computers. Instead, the squared differences γij are usually binned into those distances and directions that are similar. Thus, the following often-used empirical semivariogram formula is obtained,

γˆ (h k ) ≡

1 2 N (h)

∑[ z (s

p

) − z (s q )]2

N (h )

where z(·) are data values, N(h) is the set of pairs of data at sp and sq that have a similar lag h ∈ T(hk), with T(hk) a tolerance region around hk and |N(h)| the number of distance pairs in the set N(h). The question remains how to best bin the data, which is equivalent to choosing the sets N(h).

Most often, the binning is computed on a tolerance region T(hk) that is a radial sector. Figure 2 shows the method most commonly used in software packages, including GSLIB (Deutsch and Journel, 1998), Splus (Kaluzny et al., 1998), and SAS (SAS Institute Inc., 1996).

A rule of thumb has been to make the maximum lag for binning as half the maximum distance between pairs in the data set, and set the number of bins so that there are more than 30 pairs per bin (Journel and Huijbregts, 1978). However, this choice seems to be based on tradition alone, and there is scant theoretical evidence to recommend its use over other schemes.

Instead of the radial sector method, the procedure described here employs a binning approach based on tolerance regions T(hk) that are rectangles, distributed uniformly on a grid, Figure 3. After averaging the values within each bin, average values are plotted for each hk. This is done in 4

two ways: making a semivariogram graph by plotting the average bin values by distance, and making a semivariogram surface by plotting the average bin values directly on their grid, preserving the spatial orientation of the lags and using colors to indicate the lag values. The semivariogram surface is given cool colors like blue and green for low values and warm colors like red and yellow for high values. The color scale next to the semivariogram graph links the semivariogram graph and the semivariogram surface, that is each point in the graph corresponds to a cell on the surface, Figure 4.

The grid method is more understandable because the semivariogram surface and semivariogram cloud are linked in an obvious manner, Figure 4. The grid method also helps to ensure that the number of pairs in each cell is approximately the same, which helps for visually assessing the fit of the model. If data are distributed uniformly on rectangular domain A by B, distance between points has the following distribution:

⎛ Δy ⎞ Δx ⎞ ⎛ ⎟ ⎟⎟ ⋅ ⎜⎜1 − ⎜⎜1 − A ⎠ ⎝ B ⎟⎠ ( A − Δx ) ⋅ (B − Δy ) ⎝ f (Δx, Δy ) = = , where Δx ≤ A and Δy ≤ B . A⋅ B A2 ⋅ B 2 The most important pairs are separated by relatively short distances, Δx << A and Δy << B , and their distance distribution is close to the uniform distribution. Hence, the number of pairs in a rectangular averaging scheme is approximately constant for small lags; in a sector averaging scheme, the number of pairs can differ significantly from sector to sector.

Figure 5, a three-dimensional view of the semivariogram surface is presented for a very large simulated data set with an anisotropic structure. The sector and grid methods are presented for the 90-degree sector for fixed size of tolerance regions T(hk) (hereafter called fixed lag sizes) and 5

with logarithmic increases in the size of tolerance regions (hereafter called logarithmic lag sizes). For small fixed lags, both sector and grid methods may have bins that are too large near the origin and logarithmic lag size is more effective at describing variability in the data. Another advantage of this method is that it is not necessary to define the lag interval.

The requirement for a relatively large number of points in each lag interval, say 30 or more for each bin, can be overcome using an algorithm proposed below. Fewer samples per bin can be used if smoothing has been done, for example, by relying on values in neighboring bins. A triangular kernel to assign weighted semivariogram values to each cell, depending on how close they are to the center of the cell, Figure 6, is used. The weight for the cell containing the blue dot can be taken as the product of the two marginal profiles. The weight wk(h) for vector h for the kth bin with center hk is ⎛ x −x wk (h) = ⎜⎜1 − k L ⎝

⎞⎛ y −y ⎟⎜1 − k ⎟⎜ L ⎠⎝

⎞ ⎟ I ( xk − x ≤ L) I ( yk − y ≤ L) , ⎟ ⎠

where L is the length of the side of a square bin, xk is the x-coordinate of hk, yk is the y-coordinate of hk, x is the x-coordinate of h, y is the y-coordinate of h, and I(•) is the indicator function, which is equal to one if its argument is true, otherwise it is zero. Modification of this method can be used for other types of averaging presented in Figure 5.

Often, a table of binned semivariogram values shows the number of pairs for that bin. Because smoothing with weights is used, an equivalent presentation in our method is to present the sum of the weights used in each bin.

6

Empirical covariance and cross-covariance estimation

It is also possible to estimate the empirical covariance function, Cˆ (h k ) ≡

1 N (h)

∑[ z (s

p

) − z ][ z (s q ) − z ]

N (h )

where z is the average of all the data, and all other notation is the same as for γˆ (h k ) . In comparison, γˆ (h k ) is an unbiased estimate of the semivariogram, whereas Cˆ (h k ) is a biased estimate of the population covariance. Thus, γˆ (h k ) is generally preferred.

A choice of structure function for modeling correlation between variables is more complicated (Ver Hoef and Cressie, 1993). The traditional cross-semivariogram is defined as

2τ ij (h) ≡ E[ zi (s) − zi (s + h)][ z j (s) − z j (s + h)] which has, as an estimator, 2τˆij (h k ) ≡

1 N (h)

∑[ z (s i

p

) − zi (s q )][ z j (s p ) − z j (s q )]

N (h )

for the ith and jth variable types. The notation follows that of γˆ (h k ) and Cˆ (h k ) . The crosssemivariogram, τ ij (h) , has two problems. First, it does not accommodate asymmetry; that is, the co-kriging equations are only obtained by assuming that Cij (h) = Cij (−h) . Second, both variables must be measured at the same location in order for that location to be used in the empirical estimator.

Alternativly, the cross-semivariogram can be defined as

7

2γ ij ( h) ≡ var[ zi (s) − z j (s + h)] which has an estimator 2γˆij ( h k ) ≡

1 N ( h)

∑ [~z (s i

p

)−~ z j (s q )]2

N (h)

where ~ zi (s p ) is standardized by subtracting off the mean and dividing by the standard deviation. The cross-semivariogram γ ij (h) is asymmetric and allows the co-kriging equations to be

obtained more generally than τ ij (h) . However, because of standardization, γ~ij (h k ) is no longer unbiased, so there appears to be no advantage in using γ ij (h) over cross-covariances. The first complete explanation of co-kriging (Gandin, 1963) used cross-covariances only. Hence, the present work concentrates on the cross-covariance, which is estimated by Cˆ ij (h k ) ≡

1 N (h)

∑[ z (s i

p

) − zi ][ z j (s q ) − z j ]

N (h )

A spatial shift is an example of asymmetry, as illustrated in Figure 7. A shift, which is present when the highest cross-covariance occurs for two variables at different locations, represents the distance and direction where the calculated cross-covariance is at its maximum value. For example, in Figure 7, variable 1 at some location will have the highest correlation with variable 2 at a location to the west. Shifted cross-correlation is common in environmental applications (Krivoruchko, 2002).

Fitting semivariogram and covariance models using iterative modified weighted least squares algorithm

8

After estimating the empirical semivariogram, the next step is to fit a theoretical model (e.g., spherical or exponential) to it. There are three main approaches for estimating the parameters of the semivariogram model: visual, (weighted) least squares, and likelihood. In practice, a true semivariogram is almost never known and consequently there is no method to determine the it exactly. Before modern computers and software became available, semivariograms were often fitted visually. More recently, statisticians have commonly use variants of maximum likelihood estimation to fit semivariograms. Researchers from physical sciences like meteorology, geology, and environment more often use variants of a least squares approach. All three methods have provided useful results, especially if a researcher has experience in spatial data processing using geostatistics. There are several good comparisons of estimators, including Cressie, 1993, and Zimmerman and Zimmerman, 1991. A modification of the weighted least squares approach, Cressie, 1985, is described below. One important reason to prefer the weighted least squares for implementation in the software for a large audience is because the results of estimation from this method can be easily visualized.

The first step in our modified approach is to select a reasonable lag size for the empirical semivariogram averaging by fitting a preliminary semivariogram using the sector method with logarithmic lags (stage one of the algorithm) and then using the range of the fitted model to define the lag used in the stage two when the final semivariogram model is fitted, using a grid method with constant lag size.

Before detailing the algorithm, it is instructive to consider an extreme situation where the method gives very good results. Figure 8 presents a simulated data set based on a spherical covariance

9

model, further described below. The data locations consist of two clusters separated by a relatively long distance. An automatically derived semivariogram is shown. It looks very reasonable and can be compared to the true semivariogram given in Figure 9. A screenshot made in ArcInfo 8.2 shows a map produced using ordinary kriging with the default semivariogram.

Stage 1

The algorithm first scales each data set, ~z j (s i ) = (z j (si ) − z j )/ σ j , where σj is the sample standard deviation for the jth variable type. Stage 1 begins by assuming an isotropic model. The empirical semivariogram (or covariance) is computed on the scaled data ~z j (s i ) , using the sector method with logarithmic lag size (Figure 5c) over a large range of lag classes that progress in a geometric series. The lag classes are formed from intervals [d k −1 / 2 , d k +1 / 2 ) , where empirical parameter d equals 1.25, and k ranges from large negative to large positive integers. Part of the

(

)

series can be seen in Figure 5c. The center of each lag class is taken to be d k −1 / 2 + d k +1 / 2 2 . Only lag classes that have data are used. Call empirical (cross)covariance Cˆ ij ( h k ) , where i indicates the ith variable, j indicates the jth variable type, and k indicates the kth lag class. The first iteration of covariance parameter estimates is obtained by minimizing: T

T

nij

∑∑∑ w (h i =1 j = i k =1

ij

k

(

~ ) Cij (h k ; θ

) − Cˆ ( h ) )

2

ij

(1)

k

where θ is the vector of parameters for the i,jth covariance function.

10

(2)

N (h )

wij (h k ) = N ij (h k ) / ∑ N ij (h m ) m =1

where Nij(hk) is the number of pairs in the empirical (cross)covariance function for variables i and j in lag class k. Call this estimate θ(1). In the next iteration, a modification of Cressie’s (1985) weighted least squares algorithm is used by minimizing (1) again. In this case N ij (h k ) ~ ~ Cii (0; θ(1) )C jj (0; θ(1) ) + Cij2 (h k ; θ(1) )

ϖ ij (h k ; θ(1) ) = ~

(3)

These weights are normalized so that each (cross)covariance gets equal weight, (4)

nij

wij (h k ) = ϖ ij (h k ; θ ) / ∑ϖ ij (h m ; θ (1) ) (1)

m =1

Call this estimate θ(2). If a semivariogram is used rather than covariance, the estimates θ(1) is ⎡ ⎛ nii 2 ~ ⎢ T ⎜ ∑ wii (h k ) γ ii ( h k ; θ ) − γˆii ( h k ) + ⎜ k =1 arg min ⎢∑ ⎜ T nij θ ⎢ ~ i =1 ⎢ ⎜⎜ ∑ ∑ wij (h k ) Cij ( h k ; θ ) − Cˆ ij ( h k ) ⎣⎢ ⎝ j =i +1 k =1

(

)

(

⎞⎤ ⎟⎥ ⎟⎥ ⎟ 2 ⎥ ⎟⎥ ⎟ ⎠⎦⎥

(5)

)

where wii(hk) is given by (2). θ( 2 ) is then obtained from (5), with weights as in (4), except that N (h )

ϖ ii (h k ; θ (1) ) = ~ 2 ii k (1) γ ii (h k ; θ ) The estimates θ(1) and θ(2) are two steps in an iteratively re-weighted least-squares algorithm. It is possible to use additional iterations as in Shapiro and Botha, 1991. The estimate θ(2) is used to provide an estimate of the range of autocorrelation, used to obtain a default lag size for the grid method in estimating the empirical semivariogram or covariance, see stage 2 below. Based on our experience, a dozen lags is usually enough to estimate the

11

parameters of the model using the grid method, so the default lag size for the grid method for the next stage of the algorithm is taken to be 2*range/12.

The main difference between this approach and that of Cressie (1985) is that weights do not change during optimization. One of the consequences of this modification is that the algorithm is not sensitive to the number of lags in the grid cells. In fact, it is possible to use even one pair in the cell – its weight will be simply less than the weights of more populated cells. A similar approach for semivariogram calculation was discussed in Shapiro and Botha (1991) for an isotropic process using one variable. Unlike Shapiro and Botha, we estimate anisotropic structure functions (covariance, semivariogram, and cross-covariance) models for transformed variables in two steps, using logarithmic lag scales and then regular lag scales, with additional kernel smoothing of empirical structure functions. All these complications arise as a result of intensive method testing using numerous simulated and real data from different areas of GIS.

Stage 2

Stage 2 essentially repeats stage 1, using an empirical semivariogram or (cross)covariance for the scaled data ~z j (si ) that employs the grid method (Figure 5b) where the default lag size is obtained from the range estimate in Stage 1. It also allows for anisotropy and linear combinations of (cross)covariance or semivariogram models for each data set, such as M ~ Cij (h; θ) = ∑ Bu (i, j ) ρu (h; φu ) . u =1

12

Here, Bu(i,j) is the i,jth component of Bu, a T × T positive-definite matrix, where T is the number of variables, θ contains the parameters Bu(i,j) and φu for all u, M is the number of different (cross)covariance models used in a linear combination, and the function ρu (h; φu ) is a normalized covariance model with ρ u (h = 0; φ u ) = 1. The third iteration of parameter estimates, θ(3), is obtained by minimizing (1) with weights (2) using the empirical covariance and the grid method. θ(4) is obtained by minimizing (1) with weights from (4) and (3) using the empirical covariance and the grid method. These formulas are modified when using semivariograms, as shown in stage 1 (formula 5). Now, it is necessary to change back to the original scale. The final (cross)covariance model is, ~ Cij (h) = σ iσ j Cij (h; θ( 4 ) ) or in the case of semivariograms,

γ ii (h) = σ i2γ~ii (h; θ( 4) ) If the user changes any of the default parameters of the semivariogram/covariance model (lag size, range, nugget, partial sill), then estimates are recalculated beginning at stage 2.

In addition to the weighted least squares, there are several other good methods of fitting spatial semivariogram models, including maximum likelihood, restricted maximum likelihood (REML), and generalized least squares. For a review, see Cressie (1993) and Genton (2001). Next section compares proposed modified weighted least squares approach and REML. Consider the covariance matrix Σ(θ), which depends on spatial parameters θ. The REML equation to be minimized for θ is L(θ) = (z − Xβˆ )′Σ(θ) −1 (z − Xβˆ ) + log(| Σ(θ) |) + log(|X′ Σ(θ) −1 X |) + (n − p) log(2π ) (6) 13

Equation (6) can be minimized by any of the computer intensive minimization methods. The generic minimizer, FMINCON in Matlab, was used.

A simulation example

This section presents data simulated using the spherical semivariogram model: ⎧ ⎡ 3 h 1 ⎛ h ⎞3 ⎤ ⎪⎪θ ⎢ − ⎜ ⎟ ⎥ for 0 ≤ h ≤ θ r , γ (h; θ) = θ n + ⎨ s ⎢ 2 θ r 2 ⎜⎝ θ r ⎟⎠ ⎥ ⎦ ⎪ ⎣ θs θr < h , for ⎪⎩ where θn ≥ 0 is the nugget, θs ≥ 0 is the partial sill and θr ≥ 0 is the range. For each of the simulations θs = 4, θr = 0.25, and θn = 1. For REML and simulations, the semivariogram to a covariance function conversion was used. For semivariogram models with sills, it is easy to go back and forth between semivariogram and covariance functions obtained using the relations,

γ (h) = C (0) − C (h) and C (h) = γ (∞) − γ (h) After converting the spherical semivariogram to a covariance function, the Cholesky decomposition method is applied (Cressie, 1993) for simulating 200 Gaussian data in the region 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1 . 22 data sets are simulated. Then the grid method is used to compute the

empirical semivariogram and fitted to the spherical semivariogram model using modified weighted least squares as described above. REML was used to fit the spherical covariance model and then converted to a semivariogram.

14

The true semivariogram model is shown as the green curve in Figure 9. The average fitted semivariogram is shown as the red curve in Figure 9, with the lower and upper bounds given by the blue curves. The curves were calculated using the following formulas:

γ minimum (h) = min(γ i (h)), i = 1...22 , γ maximum (h) = max(γ i (h)), i = 1...22 , and γ average (h) = average(γ i (h)), i = 1...22 . Figure 9 shows that on average both algorithms described above for fitting the semivariogram worked quite well.

However, REML estimated all semivariogram parameter θr, θn, and, θs more precisely, as presented in the Table 1.

Table 1. Estimation of the semivariogram parameters by REML and WLS. REML Nugget Average value

WLS Sill

Range

Nugget

Sill

Range

0.973295 4.000645 0.250886 1.199273 3.839545 0.269727

22 (1 / 22)∑i =1 (θˆr ,i − θ r ) 2 0.051663 0.513559 0.000717 0.161678 0.583832 0.004794

The difference between true and estimated models near the origin is smaller for REML. To quantify this, the mean integrated squared error (MISE) was used, b

22 1 MISE (a, b) = [γˆi (h) − γ (h)]2 dh ∑ ∫ 22(b − a ) i =1 a

This can be approximated with 15

MISE (a, b) ≅

1 22 [γˆi (h j ) − γ (h j )]2 ∑∑ j∈B 22 | B | i =1

where H = {h j : j / c} j=1, 2, … and some value c, and B ⊂ H where B = {h j : a ≤ h j ≤ b} and |B| is the number of elements in B. The larger the value c, the closer the approximation becomes. It is interesting that MISE is different for different ranges. For example MISE(0,0.1) and MISE(0.1,0.2) is smaller for REML, but MISE(0.3,0.5) is smaller for a modified WLS, Figure 9c. Thus, a REML estimated semivariogram parameters better for the most important short lags.

Figure 10 presents the 22 sets of WLS parameters estimated for the spherical semivariogram model. The parameters are scattered around the true values given in magenta. Also, there is correlation among parameters. For example, the estimate of the nugget effect is often low when the partial sill is high.

The correlation among the parameters is presented below in table 2.

16

Table 2. The correlation between the estimated parameters of the spherical semivariogram model REML Nugget

Sill

WLS Range

-0.46974 0.488066

Nugget Sill

-0.46974

0.138503

Range

0.488066 0.138503

Nugget

Sill

Range

-0.70297 0.789361

Nugget Sill

-0.70297

Range

0.789361 -0.38377

-0.38377

Correlation between semivariogram parameters estimated by WLS is larger because the range is estimated first, and range influences the estimation of other semivariogram parameters. However, this is theoretically undesirable, hence the REML algorithm performed better.

Smooth empirical semivariogram, covariance and cross-covariance surfaces

A smooth semivariogram/covariance surface is a useful exploratory data analysis tool for indicating spatial auto- and cross-correlation. An algorithm for estimating a smooth semivariogram surface is followings. The sector method with logarithmic lag size (Figure 5c) is r used for averaging the pairs. Estimation at the point v of the semivariogram surface was made

using the following formula: N

r γ (v ) =

∑n ⋅e i

i =1

N

r r − Θ⋅ ρ ( v , ci )

∑n ⋅e i =1

i

⋅ γˆi

r r − Θ ⋅ ρ ( v , ci )

,

where N is the number of sectors, γˆi is the average semivariogram value in the sector i ∈1, N , r ni is the number of pairs in sector i ∈1, N , and ci are the coordinates of the center of sector

17

8 ⋅π i ∈1, N . Parameter Θ depends on data and data locations: Θ =

N

∑n i =1

i

S

, where S is an

area of the region under investigation. Figure 11 presents examples of empirical and smooth semivariogram surfaces.

It is assumed that the smooth semivariogram/covariance surface defined above is used for data exploration. To be used in quantitative data analysis, some improvements to surface calculation can be suggested. For example, ρ (a ,b ) can be based on non-Euclidean distance, taking into r r

account anisotropy of the covariance surface, and parameter Θ depends on position on the covariance surface.

Discussion and Conclusion

This chapter described grid-based empirical semivariograms and covariances, and model-fitting using modified weighted least squares, as implemented in the Geostatistical Analyst extension to ArcGIS 8.1 (Johnston et al., 2001). Many simulated and real data sets have been used to test the software. For data sets with more than one hundred data points and with significant spatial dependence, the algorithm gives very good defaults, which allows users to easily update parameters using the cross-validation tool. However, there are situations in which it does not perform well. Examples include situations where the data set is small, where there is large measurement error, and where the data include clusters of high or low values (i.e., they are non-stationary). In these cases, the lag and, 18

consequently, the range are usually overestimated, and the default algorithm produces oversmoothed surfaces. However, the user can control this situation by selecting different lag sizes with the graphical tools. Cross-validation is also an important diagnostic in these instances. If the lag size is selected properly (usually it is obvious how to manually improve default lag estimation), then other model parameters are generally well estimated. For relatively small data sets, REML is almost always better, and this algorithm should be used as an option in the geostatistical software. The only serious drawback of this estimation technique is that it requires much more computing time than WLS, and this may be a serious disadvantage.

19

References

•

Cressie, N., 1985, Fitting variogram models by weighted least squares: Mathematical Geology v. 17, p. 653-702.

•

Cressie, N., 1993, Statistics for spatial data: John Wiley and Sons, New York. 900 p.

•

Deutsch, C.V., and Journel A.G., 1998, GSLIB: geostatistical software library and user’s guide: Oxford University Press, New York, 369 p.

•

Gandin, L.S., 1959, The problem on optimal interpolation: Trudy GGO v. 99, p. 67-75.

•

Gandin, L.S., 1963, Objective analysis of meteorological fields: Gidrometeorologicheskoe Izdatel’stvo, Leningrad, 286 p. (translated by Israel Program for Scientific Translations, Jerusalem, 1965, 242 p.)

•

Genton, M. G., 2001, Robustness problems in the analysis of spatial data, in Spatial Statistics: Methodological Aspects and Applications, M. Moore, ed., Springer Lecture Notes in Statistics 159, p. 21-37.

•

Johnston, K., Ver Hoef, J., Krivoruchko, K., and Lucas N., 2001, Using ArcGIS Geostatistical Analyst: GIS by ESRI, 300 p.

•

Journel, A.G., and Huijbregts, C.J., 1978, Mining geostatistics: Academic Press, London. 600 p.

•

Kaluzny, S.P., Vega, S.C., Cardoso, T.P., and Shelly, A.A., 1998, S+ Spatial Stats. User's manual for Windows and Unix: Springer-Verlag. 384 p.

•

Keller, L., and Friedmann, A., 1925, Differentialeichungen fur die turbulente bewegungner kompressibelen flussigkeit: Proceedings of the 1st Congress for Applied Mathematics.

•

Kolmogorov, A., 1941, Local turbulence structure in an incompressible viscous liquid for very high Reynolds numbers: Doklady AN SSSR, v. 30, No. 4.

•

Krivoruchko K., 2002, Geostatistical Analysis of California Air Quality: Available from ESRI online at http://www.esri.com/software/arcgis/arcgisxtensions/geostatistical/airqualityjgra.pdf.

20

•

SAS Institute Inc., 1996, SAS/STAT technical report: spatial prediction using the SAS system: SAS Institute Inc., Cary, NC, 80 p.

•

Shapiro A., and Botha J.D., 1991, Variogram fitting with a general class of conditionally nonnegative definite functions: Computational Statistics & Data Analysis, v. 11, p. 87-96.

•

Ver Hoef, J.M., and Cressie, N., 1993, Multivariable spatial prediction: Mathematical Geology, v.25, p. 219-240.

•

Zimmerman, D.L., and Zimmerman, M.B., 1991, A comparison of spatial semivariogram estimators and corresponding ordinary kriging predictors: Technometrics, v. 33, p. 77-91.

21

Figure Labels

Figure 1. An empirical semivariogram cloud. The y-axis of each red dot represents the empirical semivariogram of a pair of locations.

Figure 2. Radial sector tolerance regions for binning empirical semivariograms. The red vector hk is the center of the bin, which has a radial tolerance region T(hk) shown in green. Any vector that falls within the green area is assigned to the bin for hk.

Figure 3. Grid tolerance regions for binning empirical semivariograms. The red vector hk is the center of the bin, which has a radial tolerance region T(hk) shown in green. Any vector that falls within the green area is assigned to the bin for hk.

Figure 4. Semivariogram cloud and anisotropical model based on the semivariogram surface Yellow lines represent models in different directions. Ellipse of anisotropy is displayed over the semivariogram surface, blue color.

Figure 5. Empirical semivariograms: (a) sector method with fixed lag size, (b) grid method with fixed lag size, (c) sector method with logarithmic lag size, and (d) grid method with logarithmic lag size.

22

Figure 6. Smoothing the empirical semivariogram. The yellow region is the area in which lags have some positive weighting for the bin with the large red circle. The lag indicated by the blue dot will have an influence in all bins with diagonal hatching.

Figure 7. Cross-covariance modeling using the shift parameter. For the cross-covariance surface, the highest cross-covariance (red and orange colors) is shifted toward the west.

Figure 8. Estimated semivariogram (default) using an automatic model fitting algorithm. The data locations are black dots in two clusters separated by a relatively long distance. The predicted surface using kriging with the estimated semivariogram is also given. For the kriging surface, the cool colors (blue and green) represent low values and the warm colors (red and orange) represent high values.

Figure 9. Comparison of true and fitted semivariogram models. The green line is the true semivariogram, the red line is the average estimated semivariogram over the 22 simulations, and the blue lines are the minimum and maximum semivariogram values from the simulations. a) WLS; b) REML c) Quantitative comparison for different ranges.

Figure 10. 22 sets of parameter estimates of the spherical semivariogram model. Each estimate of the nugget, range, and partial sill is given in green, and the true value is given in magenta. a) WLS; b) REML.

23

Figure 11. Empirical semivariogram cloud and surface using grid method (a), and all semivariogram pairs and a smooth semivariogram surface (b) using the same data set.

Figures

Figure 1.

24

Figure 2.

Figure 3.

25

Figure 4.

a

b

c

d

Figure 5. 26

Figure 6.

Figure 7.

27

Figure 8.

a

b

MISE

Reml

0 0.1 0.981946

Wls

0 0.1 1.764688

Reml

0.1 0.2 4.407449

Wls

0.1 0.2

Reml

0.2 0.3 8.224324

Wls

0.2 0.3 7.892238

5.96006

28

a)

Reml

0.3 0.5 9.090794

Wls

0.3 0.5 6.945768

b)

c)

Figure 9.

a)

b)

Figure 10.

29

a)

b)

Figure 11. Empirical semivariogram cloud and surface using grid method (a), and all semivariogram pairs and a smooth semivariogram surface (b) using the same data set.

30

1 This paper was accepted in 2001 for publication in ...

nowadays. This chapter discuss, in detail, implementation in the commercial software of one of ..... of variables, Î¸ contains the parameters Bu(i,j) and u Ï for all u, ...

Download PDF

876KB Sizes 2 Downloads 162 Views

Report

1 This paper was accepted in 2001 for publication in ...

Recommend Documents