Geomath Lect12 Stats

Probability and Statistics in Geology
Probability and statistics are an important aspect of Earth Science.
Understanding the details, population of a data sample
How rounded are these pebbles ? Where did they come from ?
How likely is an earthquake here in Northridge ?
Probability and Statistics in Geology
Statistics
Histograms
Probability
Error Analysis
Regression
Discuss next week
Are Statistics Always Right ?

Can They be Misleading ?
Toss a coin 6 times....Heads or tails ?
What is most unlikely ?

.....six tails
What is more likely ?

.....3 heads and 3 tails
So is HTHTHT more likely than TTTTTT ? ...No, both are

equally unlikely!
Are Statistics Always Right ?

Can They be Misleading ?
The result 3 heads and 3 tails is more likely only because
There are many combinations where this can occur

(e.g. HHTHTT, or HTHTTH, or HHHTTT...)
Let's try it...
What is a Statistic ?
Is this a statistic ?
In 1970, the oil refining capacity of Belgium
was 32.6 million tonnes per year
This is actually, just a fact not a statistic
What is a Statistic ?
Consider a pebbly beach
How could you determine the composition, mass, length,

shape of these particular pebbles ?
Would these sizes be the same on every beach ?
What is a Statistic ? - Specimen
Let's pick up a pebble and look at it this is a specimen
This pebble could probably give us the composition

but would it be inclusive of all the pebbles ? Is it typical ?
How could be improve this specimen ?
What is a Statistic ? - Sample
We could pick up 100 pebbles, this is a sample from the beach
This should give you a much better idea of your beach rocks
Could we do any better ?
What is a Statistic ? - Population
Or we could sample ALL the pebbles on the beach!
This is the population of all pebbles
Now measure the composition, size, shape, of each
Is this a realistic plan ?
What is a Statistic ? - Population
Specimen:
Sample:
Population:
One object
A subset number of objects
All the objects
These terms are often misused in science and literature.
Faults in Southern California
Above is a map of faults found in southern California
If we just study the San Jacinto fault,

what is this called statistically ?
If we study the system, San Jacinto, Elsinor, and San Andreas

what is this called statistcally ?
So What is a Statistic ?
Is the average mass of a pebble a statistic ?
This depends on whether this average is determine

From a sample of pebbles or the total population...
If we take the average of the total population

this considered a parameter and is now a simple fact
The average of a sample, however, is a statistic.
So What is a Statistic ?
A statistic is an attempt to estimate the average mass

of all the pebbles by calculating the average mass
of some of the pebbles
Statistics are generally based on a sample of the population
Election Polls
Polling question: Who did the best job in the debate ?

Obama 54%
McCain 30%
Estimates of voter intentions obtained before an election

are statistics...a sample of the population
Obama 365
McCain 162
Election Polls
Obama 66,882,230
McCain 58,343,671
The final result of an election, however, is an election parameter
The final result is a fact, a measure of the entire voting population
Back to the Pebbly Beach

Average, Mean, and Median
Pebble#
1
2
3
4
5
6
7
8
9
10
Mass (g)
374
389
395
364
224
250
378
376
330
310
The typical mass of pebbles on a particular beach

can be described by the mean, w (same as the average)
N
w = 1/N wi
i=1
The mean is the total mass of the sample divided by

The number of pebbles - What is mean of these pebbles ?

Pebble#
1
2
3
4
5
6
7
8
9
10
11
Mass (g)
225
250
310
330
364
374
376
378
389
395
399
Another way of finding the typical mass of pebbles

is to use the median value.
Median means middle and is the weight of the middle

Pebble if all are lined up (ranked) from lightest to heaviest.
You must have an odd number of pebbles to get the median
In the above example, pebble #6 has a mass of 374 g

which gives the median value of this pebble sample

Pebble#
1
2
3
4
5
6
7
8
9
10
11
Mass (g)
225
250
310
330
364
374
376
378
389
395
399
Will the median always be the same as the mean ?

With an even number of pebbles (100), you can average
The 50th and 51st pebbles.

- Dispersion
Pebble#
1
2
3
4
5
6
7
8
9
10
11
Mass (g)
225
250
310
330
364
374
376
378
389
395
399
What about other aspects of the distribution of pebbles ?
How can we tell if the pebbles are similar in size
(i.e. well or poorly sorted)
We could give the total range of sizes known as the dispersion
But how much does this tell us about all the sample pebbles ?

- Dispersion Pebble#
1
2
3
4
5
6
7
8
9
10
11
Mass (g)
225
250
310
330
364
374
376
378
389
395
399
The heaviest and lightest pebbles may not be typical
One way to get an accurate measure of how similar your

Pebbles are is to use the mean square of the standard deviation
The standard deviation is the
2
2
= (mass - w)
square root of this value.
This measures the deviation from the mean

also known as the variance
- the bar indicates the average of all calculations

- Dispersion Pebble#
1
2
3
4
5
6
7
8
9
10
11
Mass (g)
225
250
310
330
364
374
376
378
389
395
399
= (mass - w)
Why do we square this difference ?
Some will be negative, we just want the deviation of each

From the average value.
If 2 is small then the masses are similar and well sorted

2
If is large then the masses are widely varying

and are poorly sorted
2
Visualizing Distribution of Data
How can you display graphically the distribution of a large

number of pebbles ?
Which sizes occur most often ? Which are fairly rare ?
Visualizing Distribution of Data: Histogram

Frequency Distribution
200-235
236-260
261-285
286-315
316-335
336-365
366-385
386-415
416-435
436-465
Number
1
3
7
9
16
22
19
14
6
2
Frequency
Range(g)
Pebble mass (g)
A histogram displays the pebble mass count in bins (10 bins shown)
We first count the number of occurences (frequency) in each bin

and list them in a table called the frequency distribution
Then plot this frequency as a bar chart against mass
Histograms in Matlab (or Octave)

Pebble#
1
2
3
4
5
6
7
8
9
10
11
Mass (g)
225
250
310
330
364
374
376
378
389
395
399
Frequency
Count of all pebbles
Pebble mass (g)
To plot histograms in Matlab:

>> x = 200:25:500
>> y = pebblefile(:,2)
>> hist(y,x)
% set bin range and increment, here 25

% read column 2 of file of pebble masses
% plots histogram shown above
for data (y) and bins (x)
Visualizing Distribution of Data

Marine seismic study, Weeraratne et al., 2007
We're interested in earthquake paths which come from every

possible azimuth within 360o (the back azimuth).
How can we graphically represent the distribution of
cyclical data or direction ?
Visualizing Distribution of Data: Rose Diagrams

A rose diagram is like plotting
a histogram on a polar graph.
The direction is represented by

The angle around the plot and
The frequency is proportional
To distance from the center.
Here frequency ranges from

o
0 to 6 and an angle of 30 is the
most frequent occuring 6 times.
A list of fault dip angles could be plotted in this way.
Plotting Rose Diagrams in Matlab (or Octave)
To plot rose diagrams in Matlab:

>> dip = faultdipfile(:,1)
>> dipradians = dip.*pi./180
>> bins = 100
>> rose(dipradians,bins)
% reads first column of data input

% converts angles to radians
% specify the number of bins
% plot the rose diagram
Probability
If I measure a large number of data points,

how often do I obtain a particular result ?
For the pebbles masses measured
here, the most probably mass
is 350 grams
This mass value occurs in

22 (frequency) out of 100 cases
or 22% of the time.
Thus the estimated probability
of picking up a pebble in this area
with a mass of 350 grams is 22%.
Frequency
What is Probability ?
Pebble mass (g)
Probability

is 350 grams

or 22% of the time.
Frequency
Pebble mass (g)
Probability

is 350 grams

or 22% of the time.
Frequency
Pebble mass (g)
Probability
Frequency Distribution & Probability
200-235
236-260
261-285
286-315
316-335
336-365
366-385
386-415
416-435
436-465
Number
1
3
7
9
16
22
19
14
6
2
Probability
.01
.03
.07
.09
.16
.22
.19
.14
.06
.02
Probability
Range(g)
Pebble mass (g)
We can then add another column to the data which shows

the probability for each bin size
You can now plot probability in a histogram
Probability: What is Normal ?

Frequency Distribution & Probability
200-235
236-260
261-285
286-315
316-335
336-365
366-385
386-415
416-435
436-465
Number
1
3
7
9
16
22
19
14
6
2
Probability
.01
.03
.07
.09
.16
.22
.19
.14
.06
.02
Probability
Range(g)
Pebble mass (g)
You can compare your data distribution to theoretical estimates
The most common distribution used is a normal distribution also

known as a Gaussian distribution.
Gaussian Distribution
P(x) = e
[-(x-x)2/22]
2
sqrt(2 )
The Gaussian distribution is written as above and describes

the relative probability of obtaining the value, x.
Here is the standard deviation and x is the average of all x
P(x) = e
[-(x-x)2/22]
2
sqrt(2 )
P(x)
x
This is a Gaussian distribution for xmean= 5.0 and = 2.0
You are more likely to obtain a value between 4-6 where the
graph is high
And less likely to obtain a value between 1-2, or 9-10
P(x)
We can quantify this by looking at the area under the curve, the
total area under the curve is 1.0
The area under the curve between 1 - 2 is shown in gray.
This area is much smaller than the dark gray block between 4 - 7.
To quantify these
areas we use
established values
for multiples of the
standard deviation
from the mean
P(x)
1.0
2.0
x
The area under the curve between 3-7 is 0.683 and is termed 1.0
(this is known as the 68% confidence limit)
The area under the curve between 1-9 is 0.954 and is termed
(this is known as the 95% confidence limit)
Linear Regression:
How to Fit a Line to Scattered Data
Now that we've learned
statistical analysis of a
single variable
We can also consider
statistical analysis of two
related variables.
We may be able to
approximate this
relationship by a straight
line.
How do we find this

line ? Which line is best ?
Distance from shore (m)
Pebble diameter
Linear Regression:
The line draw to the
right is one possibility.
How can we determine
whether this line is better
than another in a
quantitative way ?
We can calculate the

mean square deviation by
looking the distance each
point is from the predicted
line
Pebble diameter
The devation of one point is shown by y and is estimated in the

y direction only.
Linear Regression:
This gives you the
deviation of one point
from the line.
To obtain the mean
square deviation, we take
the average ofy for all
points
We calculate this using

the same equation for
standard deviation which
we used before.
(y - y)
Pebble diameter
The line with the smallest will have

the best fit to the data
Linear Regression:
Now that we've learned
statistical analysis of a
single variable
We can also consider
statistical analysis of two
related variables.
We may be able to
approximate this
relationship by a straight
line.
How do we find this

line ? Which line is best ?
Pebble diameter
37
Linear Regression:
The line draw to the
right is one possibility.
How can we determine
whether this line is better
than another in a
quantitative way ?
We can calculate the

mean square deviation by
looking the distance each
point is from the predicted
line
Pebble diameter
The devation of one point is shown by y and is estimated in the

38
y direction only.
Linear Regression:
This gives you the
deviation of one point
from the line.
To obtain the mean
square deviation, we take
the average ofy for all
points
We calculate this using

the same equation for
standard deviation which
we used before.
(y - y)2
Pebble diameter
The line with the smallest will have

39
the best fit to the data

Geomath Lect12 Stats

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Geomath Lect12 Stats

Uploaded by

Copyright:

Available Formats

Probability and Statistics in Geology

Probability and statistics are an important aspect of Earth Science.

Understanding the details, population of a data sample

How likely is an earthquake here in Northridge ?

Probability and Statistics in Geology

Discuss next week

Are Statistics Always Right ?

Toss a coin 6 times....Heads or tails ?

What is most unlikely ?

What is more likely ?

So is HTHTHT more likely than TTTTTT ? ...No, both are

Are Statistics Always Right ?

The result 3 heads and 3 tails is more likely only because

There are many combinations where this can occur

Let's try it...

This is actually, just a fact not a statistic

Consider a pebbly beach

How could you determine the composition, mass, length,

Would these sizes be the same on every beach ?

What is a Statistic ? - Specimen

Let's pick up a pebble and look at it this is a specimen

This pebble could probably give us the composition

How could be improve this specimen ?

What is a Statistic ? - Sample

We could pick up 100 pebbles, this is a sample from the beach

Could we do any better ?

What is a Statistic ? - Population

Or we could sample ALL the pebbles on the beach!

This is the population of all pebbles

Now measure the composition, size, shape, of each

Is this a realistic plan ?

What is a Statistic ? - Population

These terms are often misused in science and literature.

Faults in Southern California

Above is a map of faults found in southern California

If we just study the San Jacinto fault,

If we study the system, San Jacinto, Elsinor, and San Andreas

Is the average mass of a pebble a statistic ?

This depends on whether this average is determine

If we take the average of the total population

The average of a sample, however, is a statistic.

A statistic is an attempt to estimate the average mass

Polling question: Who did the best job in the debate ?

Estimates of voter intentions obtained before an election

The final result of an election, however, is an election parameter

The final result is a fact, a measure of the entire voting population

Back to the Pebbly Beach

The typical mass of pebbles on a particular beach

The mean is the total mass of the sample divided by

Back to the Pebbly Beach

Another way of finding the typical mass of pebbles

Median means middle and is the weight of the middle

You must have an odd number of pebbles to get the median

In the above example, pebble #6 has a mass of 374 g

Back to the Pebbly Beach

Will the median always be the same as the mean ?

Back to the Pebbly Beach

What about other aspects of the distribution of pebbles ?

How can we tell if the pebbles are similar in size

(i.e. well or poorly sorted)

We could give the total range of sizes known as the dispersion

Back to the Pebbly Beach

The heaviest and lightest pebbles may not be typical

One way to get an accurate measure of how similar your