Professional Documents
Culture Documents
• Univariate Statistics
measure of location
measures of spread
measure of shape
tendency toward normal or lognormal behavior
data Transforms
• Bivariate Statistics
Correlation Coefficient
Linear Regression
Q-Q / P-P Plots
•Follow instructions in handout
EX‐ Log Stats‐PROB.doc
with data in
EX‐Log Stats‐DATA.xls
x
i 1
i
x
n
fx
i 1
i i
x
n
fx
i 1
i i
x
n
x 0.5*.2652+1.5*.1843+2.5*.1251+3.5*.1169+4.5*.0742 +5.5*.0569+ 6.5*.0547+
7.5*0.0464+8.5*.0247+ 9.5*0.217 +10.5*0.0120 +11.5*0.0090+ 12.5*0.0060+
13.5*0.0007+ 15.5*0.0015+16.5*0.0007=3.25%
Frequency 90.00%
25 Cumulative % 80.00%
70.00%
20
Frequency
60.00%
15 50.00%
40.00%
10
30.00%
20.00%
5
10.00%
0 .00%
0.13
0.15
0.16
0.17
0.19
0.20
0.21
0.23
0.24
0.25
0.27
Porosity Range
f ( x) 2 exp 2 x x>0
f ( x) 0 x0
E ( x) x * f ( x) * dx
x
ia
i
(b a)
g x1 x2 x3 ...xn 1/ n
1/ n
n
g xi
L 1
hi1
i
k harm n
h / k
i 1
i i
• In general:
• Arithmetic > Geometric > Harmonic
9/9/2017 Thai Ba Ngoc – Faculty of Geology & Petroleum Engineering ‐ HCMUT 46
• The midpoint of observed values arranged in increasing
order
• 50th percentile
50.00% Median
100 40.00%
30.00%
50 20.00%
10.00%
0 .00%
Permeability Range, md
250 60.00%
200 50.00%
150 40.00%
30.00%
100
20.00%
50 10.00%
0 .00%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Porosity Range, %
• The cumulative frequency is the total or the cumulative fraction of samples less than a given threshold
11, 13, 15, 15, 16, 17, 17, 17, 19, 21, 25, 26
mode
m, TM, Md, Mo
freq. Theoretical
0.1 freq. Observed
freq. mode 1
freq. mode 2
0.08
Frequency
0.06
0.04
0.02
Ln B6
IQR
Md
9/9/2017 Thai Ba Ngoc – Faculty of Geology & Petroleum Engineering ‐ HCMUT 62
250 100.00%
90.00%
200 80.00%
70.00%
Frequency
Frequency
150 60.00%
Cumulative %
50.00%
100 40.00%
30.00%
50 20.00%
10.00%
0 .00%
Permeability Range, md
150 60.00%
50.00%
100 40.00%
30.00%
50 20.00%
10.00%
0 .00%
Permeability Range, md
Data for Box Plots: 4, 17, 7, 14, 18, 12, 3, 16, 10, 4, 4, 11
n 2
xi x
i 1
s2
n 1
• Use of n-1 is not arbitrary. It is needed to make sample variance an unbiased estimator of
population variance
• If we take a large number of samples of size n, from a population with the variance s2, the average
of s2 = s2
n 2
xi x
i 1
s s
2
n 1
xi
1
n
i
2 i 1
s
2
n 1 i 1
x
n
s s2
f x x
2
0.5 354 2,672.39
i i 1.5 246 751.28
n 1 4.5
5.5
99
76
155.29
385.58
6.5 73 772.22
7.5 62 1,121.16
x 3.25 8.5
9.5
33
29
910.41
1,133.70
10.5 16 841.56
n 1335 11.5
12.5
12
8
817.23
684.86
13.5 1 105.11
14.5 0 -
s 8.18
2 15.5 2 300.24
16.5 1 175.63
f i xi
1
2
s
2
f i xi i
n 1 i n
fi - frequency of the i-th class
xi- midpoint of the i-th class
2
2
E x E ( x) E ( x ) E ( x) 2 2
2
2
E x E ( x) E ( x ) E ( x) 2 2
2
2
E x E ( x) E ( x ) E ( x) 2 2
Var (cx) E cx E cx
2
2
c E x c E x c 2Var ( x)
2 2 2 2
f ( w) exp *
2 2
1
Amplitude
2
Area 2 * * Amplitude 1
Center
CPF
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0.125 0.175 0.225 0.275
Porosity, fraction
positive negative
positive negative
1 n xi x 4
Kurt ( x) 3
n 1 s
positive
negative
s ~ Range/4
40 90%
Cumulative Frequency
35 80%
70%
30
Frequency
60%
25
50%
20
40%
15
30%
10 20%
5 10%
0 0%
10.000
0.001
0.002
0.005
0.010
0.022
0.046
0.100
0.215
0.464
1.000
2.154
4.642
Calculated Permeability Data Set
10
0.1
0.01
0.001
1 2 5 10 15 20 30 40 50 60 70 80 85 90 95 98 99
Probability , % Less Than
90%
3500
2500
60%
Frequency
Cumulative % 40%
1500
30%
1000
20%
500
10%
0 0%
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4
Bin
25 100%
Frequency 90%
20 Cumulative % 80%
70%
Frequency
15 60%
Mean = 1.3467 50%
10 St. Dev = 0.1042 40%
30%
5 20%
10%
0 0%
1.07 1.12 1.17 1.22 1.27 1.33 1.38 1.43 1.48 1.53 1.59 1.64
Bin
Frequency
Coefficient of Variation 0.876 250 Frequency 60%
Kurtosis 1.215 200 Cumulative % 50%
Skewness 1.234
150 40%
Range 16.56
Minimum 0.04 30%
100
Maximum 16.6 20%
Sum 4342 50 10%
Count 1335 0 .00%
Q3 Largest(334) 4.7 0 1 2 3 4 5 6 7 8 9 10 12 14 16
Q1 Smallest(334) 0.95
IQR 3.75 Porosity Range, %
Cumulative %
800 60.00%
50.00%
600 40.00%
400 30.00%
20.00%
200
10.00%
0 .00%
K Range, md
150 60.00%
Frequency 50.00%
100 Cumulative % 40.00%
30.00%
50 20.00%
10.00%
0 .00%
Permeability Range, md
•Why?
•Identifying key spatial controls on reservoir properties
–Which facies or lithotypes or layers are really different
and need to be modelled separately
•Quality control - comparing predicted (estimated)
distributions
•Establishing statistical homogeneity
• If points line on a straight line then the two distributions are the
same
• A systematic departure above or below the 45o line (but parallel
to it) indicates a difference in the MEANS
– Above the 45o line: mean of Y > mean of x
– Below the 45o line: mean of Y < mean of x
• A slope different from 45o indicates a difference in spread
– Slope greater than 1 (>45o): Variance of Y > variance of X
– Slope less than 1 (<45o): Variance of Y < variance of X
1.E+01
Permeability
1.E+00
1.E-01
1.E-02
1.E-03
0.001 0.010 0.100 1.000
Porosity
•Match Quantiles
Can you think of any problems that might arise?
1)Tie-breaking spikes (vertical sections of CDF)
2)Reproducing tails on back transformation
1.0E+02
Core Plug Permeability, md
1.0E+01
1.0E+00
1.0E-01
Core plug k
Conditional
1.0E-02 median k
1.0E-03
0 5 10 15 20
Porosity, %
Y a 0 a1 * X
Y X X YX
2
a0
N X X 2 2
N XY X Y
a1
N X X 2 2
X b0 b1 * Y
X Y Y YX
2
b0
N Y Y 2 2
N XY X Y
b1
N Y Y 2 2
Y Y Y Y Y
2 2 2
est Y est
Y Y
2
r est
Y Y
2
where
x X X
y Y Y
This formula automatically gives the proper sign of r
3.00
Log k = 27.376* - 3.7015
2.00 R2= 7.42910E-01
log k
1.00
0.00 logk
Linear (logk)
-1.00
-2.00
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Porosity, Fraction
6 d 2
rrank 1
n n2 1
Where
d = difference between ranks of corresponding x and y
n = number of pairs of values (x,y) in the data
• Rank the two data sets. Ranking is achieved by giving the ranking '1' to the
biggest number in a column, '2' to the second biggest value and so on. The
smallest value in the column will get the lowest ranking. This should be done for
both sets of measurements.
• Find the difference in the ranks (d): This is the difference between the ranks of
the two values on each row of the table. The rank of the second value (price) is
subtracted from the rank of the first (distance from the museum).
• Square the differences (d²) To remove negative values and then sum them
(d²).
1 50 10 1.80 2 8 64
2 175 9 1.20 3.5 5.5 30.25
3 270 8 2.00 1 7 49
4 375 7 1.00 6 1 1
5 425 6 1.00 6 0 0
6 580 5 1.20 3.5 1.5 2.25
7 710 4 0.80 9 ‐5 25
8 790 3 0.60 10 ‐7 49
9 890 2 1.00 6 ‐4 16
10 980 1 0.85 8 ‐7 49
d² = 285.5
1.5
1
0.5
0
-0.5
-1
0 0.05 0.1 0.15 0.2 0.25 0.3
Porosity, Fraction
200
150
100
50
0
0 50 100 150 200 250 300 350 400
x
1
y
0.01
0.001
x