You are on page 1of 36

Basic geostatistics

Austin Troy
How does interpolation work
In ArcGIS, to interpolate:
Create or add a point shapefile with some attribute
that will be used as a Z value
Click Spatial Analyst>>Interpolate to Raster and
then choose the method

Three methods in Arc GIS
IDW
SPLINE
Kriging
Inverse Distance Weighting
IDW weights the value of each point by its distance to
the cell being analyzed and averages the values.
IDW assumes that unknown value is influenced more by
nearby than far away points, but we can control how rapid
that decay is. Influence diminishes with distance.
IDW has no method of testing for the quality of
predictions, so validity testing requires taking additional
observations.
IDW is sensitive to sampling, with circular patterns often
around solitary data points
IDW: assumes value of an attribute z at any unsampled
point is a distance-weighted average of sampled points
lying within a defined neighborhood around that
unsampled point. Essentially it is a weighted moving avg

Where
i
are given by some weighting fn and
Common form of weighting function is d
-p
yielding:
Inverse Distance Weighting


n
i
i i
x z x z
1
0
^
) ( ) (


n
i
i
1
1

n
i
p
ij
n
i
p
ij i
d
d x z
x z
1
1
0
^
) (
) (
IDW-How it works
Z value at location ij is f of Z value
at known point xy times the inverse
distance raised to a power P.
Z value field: numeric attribute to be
interpolated
Power: determines relationship of
weighting and distance; where p= 0,
no decrease in influence with
distance; as p increases distant points
becoming less influential in
interpolating Z value at a given pixel
IDW-How it works
There are two IDW method options Variable and fixed radius:


1. Variable (or nearest neighbor): User defines how many
neighbor points are going to be used to define value for each
cell
2. Fixed Radius: User defines a radius within which every
point will be used to define the value for each cell

IDW-How it works
Can also define Barriers: User chooses whether to
limit certain points from being used in the calculation of a
new value for a cell, even if the point is near. E.g. wouldn't
use an elevation point on one side of a ridge to create an
elevation value on the other side of the ridge. User chooses a
line theme to represent the barrier
IDW-How it works
What is the best P to use?
It is the P where the Root Mean Squared
Prediction Error (RMSPE) is lowest, as
in the graph on right
To determine this, we would need a test,
or validation data set, showing Z values
in x,y locations that are not included in
prediction data and then look for
discrepancies between actual and
predicted values. We keep changing the P
value until we get the minimum level of
error. Without this, we just guess.
IDW-How it works
This can be done in ArcGIS using the Geostatistical Wizard
You can look for an optimal P by testing your sample point
data against a validation data set
This validation set can be another point layer or a raster layer
Example: we have elevation data points and we generate a
DTM. We then validate our newly created DTM against an
existing DTM, or against another existing elevation points data
set. The computer determine what the optimum P is to
minimize our error
IDW-How it works
Optimizing P value
Plot of model fits
The blue line
indicates degree
of spatial
autocorrelation
(required for
interpolation).
The closer to the
dashed (1:1) line,
the more perfectly
autocorrelated.
Where horizontal,
indicates data
independence
Mean pred. Error near zero means unbiased
Plot of model errors
Spline Method
Another option for interpolation method
This fits a curve through the sample data assign values to
other locations based on their location on the curve
Thin plate splines create a surface that passes through sample
points with the least possible change in slope at all points,
that is with a minimum curvature surface.
Uses piece-wise functions fitted to a small number of data
points, but joins are continuous, hence can modify one part of
curve without having to recompute whole
Overall function is continuous with continuous first and
second derivatives.
Spline Method
SPLINE has two types: regularized and tension
Tension results in a rougher surface that more closely
adheres to abrupt changes in sample points
Regularized results in a smoother surface that smoothes out
abruptly changing values somewhat
Spline Method
Weight: this controls the tautness of the curves.
High weight value with the Regularized Type, will
result in an increasingly smooth output surface.
Under the Tension Type, increases in the Weight will
cause the surface to become stiffer, eventually
conforming closely to the input points.
Number of points around a cell that will be used to
fit a polynomial function to a curve
Pros and Cons of Spline
Method
Splines retain smaller features, in
contrast to IDW
Produce clear overview of data
Continuous, so easy to calculate
derivates for topology
Results are sensitive to locations of
break points
No estimate of errors, like with IDW
Can often result in over-smooth surfaces
Kriging Method
Like IDW interpolation, Kriging forms weights from surrounding
measured values to predict values at unmeasured locations. As with
IDW interpolation, the closest measured values usually have the
most influence. However, the kriging weights for the surrounding
measured points are more sophisticated than those of IDW. IDW
uses a simple algorithm based on distance, but kriging weights
come from a semivariogram that was developed by looking at the
spatial structure of the data. To create a continuous surface or map
of the phenomenon, predictions are made for locations in the study
area based on the semivariogram and the spatial arrangement of
measured values that are nearby.
--from ESRI Help
Kriging Method
Kriging is a geostatistical method and a probabilistic method,
unlike the others, which are deterministic. That is, there is a
probability associated with each prediction. Kriging has both a
deterministic and probabilistic component, respectively
Z(s) = (s) + (s), where both are functions of distance
Assumes spatial variation in variable is too irregular to be modeled
by simple smooth function, better with stochastic surface
Interpolation parameters (e.g. weights) are chosen to optimize fn
Assumes that variable in space can be modeled as sum of three
components: 1) structure/deterministic part, 2) random but spatially
correlated part and 3) spatially uncorrelated random part
Kriging Method
Hence, foundation of Kriging is notion of spatial autocorrelation,
or tendency of values of entities closer in space to be related.
This is a violation of classical statistical models, since
observations are assumed to be independent.
Autocorrelation can be assessed using a semivariogram, which
plots the difference in pair values (variance) against their distances.
Where autocorrelation exists, the
semivariance should increase until certain
distance where SV= variance around mean,
so flattens out. That value is called a sill.
The sloped area, or range is where values
are related to each other. Intercept is nugget
Semivariance
n
h x z x z
h
n
i
i i
2
)} ( ) ( {
) (
1
2



Semivariogram(distance h) = 0.5 * average [ (value at location i
value at location j)2] OR


Based on the scatter of points, the computer (Geostatistical analyst)
fits a curve through those points
The inverse is the covariance matrix which
shows correlation over space
Steps
Variogram cloud; can use bins to make
box plot
Empirical variogram: choose bins and lags
Model variogram: fit function through
empirical variogram
Functional forms?

Variogram
Plots semi-variance against
distance between points
Is binned to simplify
Can be binned based on
just distance (top) or
distance and direction
(bottom)
Where autocorrelation
exists, the semivariance
should have slope
Look at variogram to find
where slope levels
Binning based on distance only
Binning based on distance and
direction
Variogram
SV value where it flattens
out is called a sill.
The distance range for
which there is a slope is
called the neighborhood;
this is where there is
positive spatial structure
The intercept is called the
nugget and represents
random noise that is
spatially independent
sill
range
nugget
Functional Forms
From Fortin and Dale Spatial Analysis
Kriging Method
We can then use a scatter plot of predicted versus actual values to
see the extent to which our model actually predicts the values
If the blue line and the points lie along the 1:1 line this indicates
that the kriging model predicts the data well
Kriging Method
The fitted variogram results in a series of matrices and vectors that
are used in weighting and locally solving the kriging equation.
Basically, at this point, it is similar to other interpolation methods
in that we are taking a weighting moving average, but the weights
() are based on statistically derived autocorrelation measures.
s are chosen so that the estimate is unbiased and the
estimated variance is less than for any other possible linear combo
of the variables.
) (
0
x z

Kriging Method
Produces four types of prediction maps:
Prediction Map: Predicted values
Probability Map: Probability that value over x
Prediction Standard Error Map: fit of model
Quantile maps: Probability that value over certain quantile
Kriging Method
Semivariograms measure the strength of statistical correlation as a
function of distance; they quantify spatial autocorrelation
Because Kriging is based on the semivariogram, it is probabilistic,
while IDW and Spline are deterministic
Kriging associates some probability with each prediction, hence it
provides not just a surface, but some measure of the accuracy of
that surface
Kriging equations are determined by fitting line through points so
as to minimize weighted sum of squares between points and line
These equations are weighted based on spatial autocorrelation,
which is determined from the semivariograms
Kriging: Ordinary vs. Universal

Known as Kriging in the presence of universal
trends.
Universal kriging is used where there is an
underlying trend beyond the simple spatial
autocorrelation
Generally this trend occurs at a different scale
Trend may be fn of some geographic feature that
occurs on one part of the map
Example
Here are some sample elevation points from which surfaces were
derived using the three methods
Example: IDW
Done with P =2. Notice how it is not as smooth as Spline. This is
because of the weighting function introduced through P
Example: Spline
Note how smooth the curves of the terrain are; this is because
Spline is fitting a simply polynomial equation through the points
Example: Kriging
This one is kind of in betweenbecause it fits an equation
through point, but weights it based on probabilities
Kriging output: prediction

You might also like