Murphy Power Analysis

Power Analysis for Traditional and Modern Hypothesis Tests
Kevin R. Murphy Pennsylvania State University
Power Analysis
Helps you plan better studies Helps you make better sense of existing studies Is not limited to traditional null hypothesis tests
Application of power analysis to minimum-effect tests will be discussed
Errors in Null Hypothesis Tests

True State of Affairs
No Effect (H0) Some Effect Reject Null

Your Decision Type I Error reject null when it is true ()
Power= 1-
Fail to Reject Null
Type II Error - fail to reject null when you should ()
Power Depends On
Effect Size
How large is the effect in the population?
Sample Size (N)

You are using a sample to make inferences about the population. How large is the sample?
Decision Criteria -
How do you define significant and why?
Power Analysis and the F Distribution

The power of most statistical tests in social sciences (e.g., ANOVA, regression, t-tests, other linear model statistics) can be evaluated via the familiar F distribution F is a ratio of observed effect to error
F= MS treatments / MS error
F = (True Effect + Error) / Error

The larger the true treatment effect, the larger F you expect to find If the null hypothesis is correct, E(F) = 1.0
How Does Power Analysis Work?

In the familiar F distribution below, 95% of the values are below 2.00 (distribution for df = 7,200)
Frequency
F=2.0 represents cutoff for rejecting H0
2 F Value
The Noncentral F Distribution

If the null hypothesis is false, the Noncentral F distribution is needed. In the Noncentral F distribution below, 75% of the values are below 2.00. Therefore, power = .25
Frequency
Central F Noncentral F
2 F Value
A Larger Effect
In the Noncentral F distribution below, in which the effect is larger, 30% of the values are below 2.00. Therefore power = .70
Frequency
Central F Noncentral F
2 F Value
Power Functions
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0 0. 2 0. 4 0. 6 0. 8 1
Likelihood of rejection H0
Effect Size
Power Functions
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
25 75 5 5 12 17 5 22 27 5
Sample Size
How to Increase Power

Increase N
Effects of adding more subjects are not identical to those of adding more observations Increase ES Choose a different research question Use stronger treatments or interventions Use better measures Use a more lenient alpha p<.05 is driven by force of habit, not necessarily by substantive concerns
Effects of Implementing Power Analysis

Stronger studies
Larger samples, better measures
Fewer studies
Adequate studies are harder to do than most people realize
Less emphasis, in the long term, on null hypothesis testing
Conducting a Power Analysis

The classic text in this field is still one of the best sources
Cohen, J. (1998). Statistical power analysis for the behavioral sciences (2nd Ed.). Erlbaum
More current (and more accessible) sources include

Lipsey, M. (1990). Design sensitivity. Sage Murphy, K. & Myors, B. (2004). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests. Erlbaum.

Power Analysis software
Power and Precision - Biostat
www.PowerAnalysis.com
One-Stop F Calculator
Included in Murphy & Myors (2004)
PASS - NCSS software

www.ncss.com/pass.html

In planning studies, you should
Assume relatively small effects
If it was reasonable to expect a large effect, you probably dont need to do the study or the test
Aim for power of .80 or better

Power of .50 means that significance tests have become a coin flip
Effect Size Conventions

In behavioral and social sciences, there are widelyfollowed conventions for describing small, moderate, and large effects d- standardized mean difference Small Moderate Large .20 .50 .80 Percentage of variance explained 1% 10% 25%
Applications of Power Analysis

Study planning - Given ES and , solve for N
If you wanted to compare the effects of four types of training programs and: You expected small to moderate effects (programs account for 5% of variation in performance) You use an level of .05
You need N=214 to achieve Power=.80

Study evaluation - Given N and , solve for ES
If you wanted to compare the effects of four safety interventions and: You have 44 subjects available You use an level of .05 You will achieve Power=.80 only if the effects of interventions are truly large (accounting for 25% of the variance in outcomes)

Making a rational choice regarding Given N and ES, solve for
If you wanted to compare the effects of two leadership development programs and: You have 200 subjects available You expect a small difference (d=.20, or 1% of the variance explained by programs) You will achieve Power=.64 using = .0 5 You will achieve Power=.37 using = .0 1
Moving Beyond Traditional Significance Testing

Traditional null hypotheses tests are the focus of most power analyses These tests are deeply flawed, and there is relatively little research on the power of alternatives Minimum effect tests represent one useful alternative
Nil Hypothesis Testing

Testing the hypothesis that treatments, interventions, etc. have no effect (Nil Hypothesis Test - NHT) is most common and least useful thing social and behavioral scientists do Two problems loom largest: Confusion over Type 1 errors Likelihood of rejecting the null hypothesis eventually reaches 1.0, regardless of the research question
Type I Errors are Very Rare

Type I error - reject H0 when it is true If H0 is never true, it is impossible to make a Type I error If H0 is very unlikely, a Type I error is even less likely H0 - treatment had NO effect at all H1 - SOMETHING happened
Most things we do to minimize Type I errors lead to more Type II errors
This Implies
Large literature on protecting yourself from Type I errors is not really useful NHTs yield one of two outcomes confirm the obvious reject H0, which you already know is likely to be wrong confuse you accept H0 even though you know it is likely to be wrong
In NHT, All You Need in N

As N increases, the likelihood of rejecting the nil hypothesis approaches 1.0 Power to reject H0 does not depend all that much on the phenomenon
if N is big enough you will reject H0 if N it is small enough, you wont
Significance tests are an indirect index of how many subjects showed up
There Must be a Better Way

Stop doing significance tests (e.g., Schmidt, 1992) Confidence intervals (e.g., APA Task Force, American Psychologist, August, 1999) Bayesian methods (e.g., Rounet, Psychological Bulletin, 1996)
There Must be a Better Way

Minimum-Effect Tests
Test the hypothesis that something nontrivial happened Murphy, K. & Myors, B. (2003) Statistical power analysis: A simple and general model for traditional and modern hypothesis tests: 2nd Ed. Mahwah, NJ: Erlbaum. Murphy, K. & Myors, B. (1999). Testing the hypothesis that treatments have negligible effects: Minimum-effect tests in the general linear model. Journal of Applied Psychology, 84, 234-248.
H0 - treatments have a negligible effect (e.g., they account for 1% or less of the variance) H1 - the effect of treatments is big enough to care about
This approach addresses the two biggest flaws of traditional tests H0 really is plausible. Treatments rarely have zero effect but they often have negligible effects Increasing N does not automatically increase likelihood of rejecting H0
With Minimum Effect Tests (METs)
Type I errors are once again possible, but can be miminized the question asked in MET is no longer trivial you can actually learn something by doing the test Power Analysis work exactly the same way in MET as in NHT
Performing Minimum-Effect Tests

Put your test statistics in a simple, common form
e.g. F
Decide what you mean by a negligible effect Find or create an F table based on that definition of a negligible effect - Noncentral F distribution Proceed as you would for any traditional NHT
Working with the Noncentral F

Calculating or deriving noncentral F distributions was once a daunting task Many simple calculators now available
http://calculators.stat.ucla.edu/cdf/ncf/ncfcalc.php
Noncentrality parameter ( )
in a measure of effect size = [dfh * (MSh - MSe )] / MSe
What Constitutes a Negligible Effect ?

Standards for negligible effects depend on the research area and on the consequences of decisions Aspirin use accounts for very little variance in heart attacks, but the use of aspirin saves thousands of lives at minimal cost In personnel selection, it is relatively easy to account for a large proportion of the variance in performance with simple cognitive tests, so the increase in effectiveness that is defined as negligible might be larger
Defining a Negligible Effect

Effect Size conventions are useful, but by themselves may not be sufficient Consequences of errors must also be considered d- standardized mean difference Small Moderate .20 .50 Percentage of variance explained 1% 10%
Power Analysis for MET: Small Effect - d=.20, PV=.01

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0 0. 2 0. 4 0. 6
Effect Size
0. 8

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
25 75 5 5 12 17 5 22 27 5
Likelihood of rejection H0 given population d=.30
Sample Size

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0 0. 2 0. 4 0. 6
Effect Size
0. 8

0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
0 0. 05 0. 25 0. 15 0. 1 0. 2
Effect Size
Errors in MET
The potential downsides of MET are:
Type I errors could actually occur Lower power than corresponding NHT
You can reduce Type I errors by using larger samples The loss of power is more than balanced by the fact that the hypothesis being tested is not a trivial one
Type I Error Rates of MinimumEffect Tests

0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
0 0. 05 0. 15 0. 25 0. 1 0. 2
Smaller Sample Larger Sample
Effect Size
Type I vs Type II Errors

The tradeoff between Type I and Type II errors is more complicated in METs than in Nil tests In MET, alpha is precise only if the true effect size is exactly the same as your definition of negligible Type II errors more of a problem with METs METs are less powerful than NHTs (it is easier to reject the hypothesis that nothing happened than the hypothesis that nothing important happened), but this is not necessarily a bad thing METs place even greater premium on large samples, but small samples cause problems even where there is substantial power
Examples - comparing two treatments

N needed
Nil MET
(1%=negligible)
True effect
PV=.05 149 375 PV=.10 79 117

Murphy Power Analysis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Murphy Power Analysis

Uploaded by

Copyright:

Available Formats

Power Analysis for Traditional and Modern Hypothesis Tests

Kevin R. Murphy Pennsylvania State University

Errors in Null Hypothesis Tests

No Effect (H0) Some Effect Reject Null

Fail to Reject Null

Type II Error - fail to reject null when you should ()

Sample Size (N)

Power Analysis and the F Distribution

F = (True Effect + Error) / Error

How Does Power Analysis Work?

F=2.0 represents cutoff for rejecting H0

The Noncentral F Distribution

How to Increase Power

Effects of Implementing Power Analysis

Less emphasis, in the long term, on null hypothesis testing

Conducting a Power Analysis

More current (and more accessible) sources include

Conducting a Power Analysis

PASS - NCSS software

Conducting a Power Analysis

Aim for power of .80 or better

Effect Size Conventions

Applications of Power Analysis

You need N=214 to achieve Power=.80

Applications of Power Analysis

Applications of Power Analysis

Moving Beyond Traditional Significance Testing

Nil Hypothesis Testing

Type I Errors are Very Rare

Most things we do to minimize Type I errors lead to more Type II errors

In NHT, All You Need in N

Significance tests are an indirect index of how many subjects showed up

There Must be a Better Way

There Must be a Better Way

Performing Minimum-Effect Tests

Working with the Noncentral F

What Constitutes a Negligible Effect ?

Defining a Negligible Effect

Power Analysis for MET: Small Effect - d=.20, PV=.01

Power Analysis for MET: Small Effect - d=.20, PV=.01

Likelihood of rejection H0 given population d=.30

Power Analysis for MET: Small Effect - d=.20, PV=.01

Power Analysis for MET: Small Effect - d=.20, PV=.01

Type I Error Rates of MinimumEffect Tests

Smaller Sample Larger Sample

Type I vs Type II Errors

Examples - comparing two treatments

You might also like