You are on page 1of 60

Statistics and DOE

Mayank

Applied Statistics
es of central tendency ( central position of data )
Mean Median Mode
Population :

Sample:

Measures of dispersion ( spread of data )


Population Variance Standard deviationPopulation Coefficient of variation : :

Sample: Sample:

s2 s

Measures of Central tendency


Data: 34, 43, 81, 106, 106 and 115 Mean Average Mode Highest frequency Media n Middle score (81+106)/2 = 93 . 5 = 106 x/n = 80 . 83

Measures of dispersion
Variance : Standard deviation :

x
44 50 38 49 42 47 40 39 46 50

(x x)
-0.5 5.5 -6.5 4.5 -2.5 2.5 -4.5 -5.5 1.5 5.5

( x x )2
0.3 30 . 3 42 . 3 20 . 3 6.3 6.3 20 . 3 30 . 3 2.3 30 . 3

(x x)
i =1 i

188.5

SS

SS /( n - 1 )

20 . 9

MS

MS

4.57

sd

44 . 5

Most of the data lies between 44.54,57 = 39 to 49

Measures of dispersion
Coefficient of Variance
CV = S / * 100%

4.57/44.5*100% = 10.28% Standard deviation is 10.28% of the mean

Measures of dispersion
Normal Distribution Example: IQ Score

Low Borderline low 130-144 Below Lower Higher Above Gifted Genius average 145 Grade average Score <55 55-69 70-84 85-99 100-114 115-129

Measures of dispersion
Normal Distribution IQ Score

Cou nt
Score

<55

55

70

85

100

115

130 145

145<

Measures of dispersion
Normal Distribution

34 . 13 %

34 . 13 %

Probabili ty
Score

13 . 59 %

13 . 59 %

0 . 000028 %

0 . 0031 %

0 . 13 %

2 . 14 %

2 . 14 %

0 . 13 %

0 . 0031 %

0 . 000028 %

-6

-5

-4

-3

-2

- 1

Sd from

9968 . 2689 99 7300 95 4499 . 999999802 9936 99.999942669 %% %

Measures of dispersion
Normal Distribution 0.000000198 0.00198 DPHO DPMO Six Sigma

LSL

USL

-6

-5

-4

-3

-2

- 1

Sd from

99 . 999999802 %

Measures of dispersion
Normal Distribution

LSL

LSL

USL

USL

Measures of dispersion
Normal Distribution

1.5

LSL

USL 3 . 4 DMPO

-6 -5 -4 -3 -2 -1

2 3 4

5 6

Measures of dispersion
Normal Distribution Process capability LSL b c d Cp = a/b Cpk = (c or d)/0.5b a USL

-6 -5 -4 -3 -2 -1

2 3 4

5 6

Measures of dispersion
Non Normal Distribution Measurements:

Skewness

Kurtosis

-ve

+ve

Statistical significance tests


Significance tests

Ztest ttest Ftest ANOVA

Statistical significance tests


Z test Zvalue :

How many standard deviations away from mean?

+ ve z : values are above the mean, - ve z : values are below the mean

Sample

Populati on
1 point compared to population Group compared to population

xx z= s

xi zi =

x z= n

Statistical significance tests


Z test BMI Sample :

Mean x( ) Standard deviation

= 26 . 20 ( s) = 6 . 57

What is the probability that of a person having BMI 19 . 2 sd below the mean 19 . 2 sd above the mean

on with a BMI of 19 . 2 has a z score of :

xx z= s

19.2 26.20 = = 1.07 6.57

So this person has a BMI 1 . 07 standard deviations below the mean

Statistical significance tests


Z test Sample :

Probabili ty
Sd

< 19 . 6
16 %

> 19 . 6
84 %

Standard deviation Z score

- 1

-1

Statistical significance tests


Z test Test group Test Claim Populatio n : : Employee having two wheeler : Commuting time from home to Biocon : Average commuting time is less than 24 min

At 0 . 01 level of significance ( = 0 . 01 ): Is there enough evidence to support the research claim???


Samples : 30 18 16 15 16 21 16 23 18 24 19 16 15 25 29 6 48 15 11 13 8 14 17 19 23 20 20 18 23 7 12

Statistical significance tests


Z test Populatio n :

ssumption : Population is normally distributed

Probabili ty

Score

Mean

24

Statistical significance tests


Z test Populatio n : Test vs Population Comparison of means :

Hypothesis testing

Null hypothesis

: No difference ( Claim not true ) H0

H0 :

= 24

Alternate hypothesis :It is different ( Claim is true ) H1

H1 :

x <

Statistical significance tests


Z test Populatio n :

Probab ility

Sco re

Mean

24

Level of significance = 0.01


Probab ility

Z v a

Critical -value 2 . 33

Statistical significance tests


Z test Populatio n : Ztest < Zcritical Ztest >Zcri
tical
Acceptance region

Rejection region

- 2 . 33

x =

18 . 2 s = 7.7 = 24 n = 30

x z= s n

Z = 4 . 13

Statistical significance tests


Z test Populatio n :

Rejection region

- 4 . 13

- 2 . 33

So is test value is significantly different (lower) than the mean Yes: There are significant evidence to reject the null hypothesis

H0 : H1 :

24 Rejected

and therefore accept the claim

< 24 Significantly supported

Statistical significance tests


t test Comparison of means between two groups H 0: H 1:

ttest > tcritical ttest < tcritical

Null hypothesis will be rejected Null hypothesis will not be rejected

Statistical significance tests


t test Comparison of means between two groups

t =

Signal Noise =

Difference between group means Variability of groups

x1 x 2 t= s x1 x2

s x1 x2 =

s1 s2 + n1 n2

Statistical significance tests


t test
30 25 35 21 14 46 28 40 16 Plant height 30 32 40 31 25 35 35 25 36 21

Case Effect of fertilizer on plant height 1


Fertiliz w/o er Fertilizer
19 27 31 7 19 0 34 22 25 12 15 12 16 26 29 14 22 20 38

test

29.7 20.4
70.20 19 + 89.59 19

= 3.2 t
critical

level

with 36 df at 0.05 significance df = 2n-2 = 2.02 ttest > tcritical is significantly different 2 x from

So

x1
H 0:

x
s2

29 . 7

20 . 4

x1 =x 2 x1 x 2

Rejected

70 . 20

89 . 59

H 1:

Statistical significance tests


t test
30 25 46 70 18 56 15 44 18 Plant height 14 2 48 31 25 35 35 25 4 23

Case 2
Fertiliz w/o er Fertilizer
0.5 26 28 6 17 0 46 36 24 7 0.5 16 16 26 29 33 14 26 37

test

1 =.8

t critical =2.02 ttest < tcritical

So

x1 is
H 0: H 1:

not significantly different from x2 Not rejected Rejected

x
s2

29 . 7 303

20 . 4 181

x1 =x 2 x1 x 2

Statistical significance tests


t test Overvi ew

Statistical significance tests


F test Comparison of variances F=
where and are the sample variances

The F hypothesis test is defined as: H0: = Rejected Ha: < > If Ftest > Fcritical (at significant level)

Statistical significance tests


ANOVA AN alysis O f VAriance

One way :
Effect of one factor ( variable )

Effect of two factors ( variables ) Effect of interaction

Two way :

Statistical significance tests


One way ANOVA

Strate gy :

Compare variability within group MS wg to between groups MS bg

Group 1

Group 2

Group 1

Group 2

Between groups

F =

MS bg MS wg

Within groups

Statistical significance tests


One way ANOVA

Is there any impact of day on number of attendees ?

Factor ( Independent Variable ):( Mon , Wed , Sat ) Day Effect ( Dependent Variable ): Number of attendees

here any effect of presentation day on number of attendees ? Null hypothesis ( H 0 ) Alternate hypothesis ( H 1 ) : : No effect ( 1 = 2 = 3 )

There is an effect

( 1 2 3

Statistical significance tests


One way ANOVA Mon
55 60 51 Number of Attendees 65 72 65 55 72 68 60 75 67

Wed
75 65 80 75 67 68 77 83 67 56 65 83 71.75

Sat
67 53 65 49 54 61 65 72 63 64 54 65 61

M
77 14 163 2 68 2 77 68 18 14 127 11 11 46 68 11 23 14 28 127 23 248 46 127 768

S
36 64 16 144 49 0 16 121 4 9 49 16 524

( x x )2

63.75

SS

638

= /3

65.5 3.06 39.06 20.25

SS M

+ SS W

+ SS S

( x x) 2
n ( x

SS bg MS bg

3.06

39.06

20.25

= 748 . 4

+ + ( df = 3 - 1 = 2)

= SS wg

1930
= SS wg /df = ( df = ( 12x3 )- 3 = 33 )

) = SS bg /df

MS wg

= 374 . 25

58 . 5

Statistical significance tests


One way ANOVA F = F critical for MS bg MS wg = 374 . 25 58 . 5 = 6 . 40

Numerator degrees of freedom : 2 Denominator degrees of freedom : 33 At significance level ( ) : 0 . 05 Ftest > Fcritical So there are enough evidence to reject null hypothesis H0: All means are same (no effect of Day) At 95% confidence level we can say: That the variation between means is not just by chance Day of presentation matters significantly

4 . 17

Rejected

Statistical significance tests


Two way ANOVA

Factors ( Independent Variable) Gender : 1 ):

Man

Woman Outdoor

Indoor 2 ) Type of sport

Effect ( Dependent Variable1 ) Number of participants ):

elative impact of gender or type of sprot? ny interaction between gender and type of sport? Null hypothesis ( H 0a ) : No effect of gender Null hypothesis ( H 0b ) : No effect of type of sport Null hypothesis ( H 0c ) : No interaction Alternate hypothesis ( H 1 ) : There is an effect

Statistical significance tests


Two way ANOVA
s g

Man

Woman

Indoor Outdoor

30, 40, 50 140, 150, 160

60, 70, 80 5, 10, 15 SS SSG SSs SSGx s SSwithin SS 9075 1875 21675 650 MS MSG MSs MSGx MSwithi s
n

Source Gender Sports GxS Within

Df g-1 s-1 (g-1)(s1) x I (k-1) xj Source Df Gender 1 Sport 1 GxS 1 Within 8

F MSG MS /Ms /Msswithin within MSGx s /MSwithin F 111.69 23.07 266.77 Fcritical 11.3 (=0.01) 11.3 11.3

MS 9075 1875 21675 81.3

Statistical significance tests


Two way ANOVA

Null hypothesis ( H 0a ) Null hypothesis ( H 0b ) :

No effect of gender No effect of type of sports

Rejected Rejected Rejected

Null hypothesis ( H 0c )

No interaction

Ind Otd

Woman Man 70 50 10 150

Indoor

Outdoor

Statistical significance tests


Two way ANOVA

Factors ( Independent Variable ):Temperature : 30 1) 2 ) pH


5

35 7

Effect ( Dependent Variable ):) Total product ( g ) 1


pH 7 pH 5

30o C 35o C pH7 70 50 pH5 10 150

30oC

35o C

Regression and correlation


Regression analysis: Investigation of relationship between variables
X 2 19 34 40 8 12 20 20 37 19 30 46 Y 48 30 17.5 11 41 42 35 31 18 35 16 8.3

Regression and correlation


Regression analysis: Investigation of relationship between variables
X 2 19 34 40 8 12 20 20 37 19 30 46 Y 48 30 17.5 11 41 42 35 31 18 35 16 8.3

y = -0.951x + 50.49 R = 0.955

y = ax +b
One independent variable

Simple linear regression

Regression and correlation


Regression analysis: Simple linear regression y = ax + b Multiple linear regression y = a1x 1 + a2x 2 + a3x 3 + b Linear Non linear y = a1x 1 + a2x 2 + a11 x2 + a12 x1x2 +b Non Linear

Regression and correlation


Correlation analysis: To find how well (or badly) a line fits the observation

What is the strength of this relationship - r 2 (coefficient of determination) or adjusted r 2

Is the relationship we have described statistically significant? -Significant tests

Regression and correlation


Correlation analysis:

= ax + b
slope intercept

= , predicted value = y i , true value = residual error = y -

A and b values are calculated that minimize Sum of Squares (SS) of residuals = (y )2 : minimum

Regression and correlation


Correlation analysis: r 2 : Coefficient of determination

Total

Error

(yi y)2

(y )2 Always between 0 and 1 Increase with number of predictor

r2

SSError = 1 SSTotal

It can be negative SSError /(n-p-1) representative ofalso 2 True relationship Adjusted r = 1 SSTotal /(n-1) n= total observation
p= Number of predictor

str

Regression and correlation


Correlation analysis: Statistical significance of relationship

MSbg F = MSwg
Group 1

Group 2

Group 1

Group 2

MSModel F = MSError
Model Error

Design of experiment
Traditional method One factor at time ( OFAT ) Multiple factor at time ( MFAT )

Traditional method

Design of experiment

Design of experiment
How to select a design?
Number of factors 2-4 5 or more Screening Full or fractional factorial factorial or Fraction Plackett Burman Optimization Robustness Central composite Taguchi or Box-Behnken Taguchi Screen first to reduce factors

Design of experiment - terminology


Factors Independent variable/s

ContinuousNumeric: any value between lower and upper value


eg. Temperature, pH, concentration

Categorical Numeric/non-numeric : only characters or levels


eg. Gender, operator, type, temperature Levels Effect s Range of a factor/s -1
(lower)

(middle)

+1

(higher)

Dependent variable/s: Response

Main effect / s

Effect/s due to individual factor/s

Effect Interaction effect / s/s due to interaction of multiple factors Confounding/Aliasing When two or more effects can not be distinguished eg. Main effect is confounded with interaction effects Main effects and interaction effects are aliased

Design of experiment
Resolution of a design Power of a design

Resoluti Order of interaction effects on type confounded with main effect III 2 (eg. A with A.B or A.C or B.C etc) with ABC) IV 3 (eg. A V 4 (eg A with ABCD)

Experiment type Screening Optimization Optimization

order interaction are less significant than lower order interaction

Design of experiment
Factorial design Factor

Full factorial :
Level

Lf
Design type 22 23 32 33 Number of experiments 2x2=4 2x2x2=8 3x3=9 3x3x3=27

No. of Levels 2 2 3 3

No. of Factors 2 3 2 3

Design of experiment
Factorial design

22

b a 4 experiments

Design of experiment
Factorial design

23

b a

8 experiments

Design of experiment
Factorial design

32

b a 9 experiments

Design of experiment
Factorial design

33

27 experiments

Design of experiment
Fractional Factorial design

23

231

8 experiments

4 experiments

Design of experiment
Response surface methodology

Design of experiment
Geometry of some important response surface designs

Box Behnken

eg. 3 factor 3 level

12 experiments

Design of experiment
Geometry of some important response surface designs

Central composite design

eg. 2 factor 2level

Design of experiment
Geometry of some important response surface designs

Sign Media, pH, feed rate al Outer array: Uncontrollable variables during production Nois Temp, DO, e

Taguchi design Inner array: Controllable variables during production

You might also like