Professional Documents
Culture Documents
IGS
1
Statistical Conclusion Validity
2
Glossary
3
Formal Statistical Inference
4
Formal Statistical Inference (contd)
!
= possible samples are equally likely
! !
! 8! 87654321
= = = 56
! ! 3! 5! (3 2 1)(5 4 3 2 1)
5
The Mean
6
Unbiasedness of the Sample Mean
= [ ]
7
Variability of the Sample Mean
Sampling variance
Std. Dev. =
SE summarizes the
variability in an estimate due
to random sampling 8
Estimated standard error
9
T-statistic for the sample mean
= =
10
T-statistic for the sample mean (contd)
= = 0
11
Central limit theorem
12
Distribution of a t-statistic
13
Hypothesis testing
14
Confidence interval
, + 2
2
15
Confidence Level
16
Comparison of Two Group Averages
1 = [ = 1
0 = [ = 0
0 : 1 0 = = 0
1 0 1 0
= =
1
0
1 1
+
1 2
17
Significance vs. Effect magnitude
18
Null Hypothesis Significance Testing
(NHST)
The null hypothesis (0 ) is a claim to be tested,
usually an hypothesis of no difference (e.g., no
difference between test scores in group A and
group B)
The alternative hypothesis (1 ) is the one we
would believe if the null hypothesis is rejected
Rejecting 0 does not prove 0 to be false nor 1 to be
true
The only way 0 can be proven false (or true) is to know
the value of the population parameter(s) specified in the
null hypothesis; sample data do not provide that kind of
information
19
p-value
20
More on NHST
DECISION
21
More on NHST (contd)
= =
= 1 = 1
22
Statistical Conclusion Validity
1. Do X and Y covary?
Type I error (false positive): We may incorrectly
conclude that X and Y covary when they do not
Type II error (false negative): We may incorrectly
conclude that X and Y do not covary when they do
2. How strongly do X and Y covary?
We can over/underestimate
The magnitude of covariation
The degree of confidence that magnitude estimate
warrants
23
Threats to Statistical
Conclusion Validity
24
1. Low Statistical Power
= 1 = 1
The ability of a test to detect relationships that exist in
the population
The probability that a statistical test will reject the null
hypothesis when it is false
25
1. Low Statistical Power (contd)
26
1. Low Statistical Power (contd)
27
Figure 1: The relationship between sample size and power for
H0: = 75, real = 80, one-tailed = 0.05, for 's of 10 and 15.
n
Source: Lane 2015
28
1. Low Statistical Power (contd)
29
Figure 1: The relationship between sample size and power for
H0: = 75, real = 80, one-tailed = 0.05, for 's of 10 and 15.
n
Source: Lane 2015
30
1. Low Statistical Power (contd)
31
Figure 2. The relationship between and power for
H0: = 75, one-tailed = 0.05, for 's of 10 and 15
33
Figure 3. The relationship between significance level and
power with one-tailed test: H0: = 75, real = 80, and = 10.
n
Source: Lane 2015
34
2. Violated Assumptions of the Test Statistics
35
3. Fishing and the Error Rate Problem
36
4. Unreliability of Measures
37
4. Unreliability of Measures (contd)
38
5. Restriction of Range
40
7. Extraneous Variance in the Experimental
Setting
Some features of an experimental setting may inflate error,
making detection of an effect more difficult (Shadish et al.
2002, 55)
Example: Fire drill or concert downstairs during lab
experiment
Particularly frequent in field experiments
When sources of extraneous variance cannot be controlled,
we should measure them and include them in the statistical
analysis
41
8. Heterogeneity of Units
42
9. Inaccurate Effect Size Estimation
43
Internal Validity
44
Internal Validity
46
Threats to Internal Validity
47
Threats to Internal Validity
48
1. Ambiguous Temporal Precedence
49
2. Selection
50
3. History
51
4. Maturation
52
5. Regression Artifacts
53
5. Regression Artifacts (contd)
High scores will tend to have more positive random error pushing
them up, low scores will tend to have more negative random error
pulling them down
On the same measure at a later time, or on other measures at the
same time, the random error is less likely to be so extreme
Examples
A compensatory tutoring program for kids in the lowest 10 percent on a
pretest will seem more effective than it actually is because those kids will
tend to improve anyway in the post-test
People tend to go to psychotherapy after a shock and organizations tend to
hire consultants after a downturn; clients measured progress is partly a
movement back toward their stable mean as the temporary shock grows less
acute
54
6. Attrition/Mortality
55
7. Testing
56
8. Instrumentation
58