You are on page 1of 42

Econometrics

Review of Basic Statistics



Topics
1. Descriptive Statistics:
- 1 variable: Mean and Variance
- 2 variables: Covariance, Correlation

2. Hypothesis Testing
Descriptive Statistics
Inferential Statistics
Involves:
- Estimation
- Hypothesis Testing

Purpose:
- Make decisions about
population characteristics
Descriptive Statistics
Mean
Measure of central tendency
Affected by extreme values
Formula:

Median
Measure of central tendency
Middle value in ordered series
- If odd n, mean of the 2 middle values
Value that splits the distribution into two halves
Not affected by extreme values
Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position: 1 2 3 4 5 6 7 8
Mode
Measure of central tendency
Value that occurs most often
Not Affected by Extreme Values
There may be more than one mode

Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21


Sample Variance
Measure of Dispersion around the Mean
Formula:

Sample Standard Deviation
Measure of Dispersion around the Mean
Has the same unit of measurement as the
variable itself
Formula:

Radom Variables
Random variable: numerical summary of a
random outcome
1. Discrete: only a discrete set of possible values
=> summarized by probability distribution: list of
all possible values of the variable and the
probability that each value will occur.
2. Continuous: continuum of possible values
=> summarized by the probability density
function (pdf)
Probability Distribution
1. List of pairs [ Xi, P(Xi) ]
Xi = Value of Random Variable (Outcome)
P(Xi) = Probability Associated with Value

2. 0 P(Xi) 1 - Mutually exclusive (no overlap)

3. P(Xi) = 1 - Collectively exhaustive (nothing
left out)
Mean and Variance: Discrete Case
Mean, or Expected Value
Weighted Average of All Possible Values
E(X) =
X
= XiP(Xi)
Variance
Weighted Average Squared Deviation about
the Mean
E(X) =
X
= (X
i
-
X
)
2
P(Xi)

Covariance
- measures joint variability of X and Y
For discrete RVs,

Can take any value in the real numbers
Depends on units of measurement (e.g., dollars, cents)

cov(X,Y) > 0 means that X and Y tend to move together
when y is above its mean, x tends to be above its mean
cov(X,Y) < 0 means that X and Y tend to move in opposite
directions


15
| | ) )( ( ) , cov(
Y X
Y X E Y X =
) , ( ) )( (
1
Y X P Y X
n
i
Y i X i
=
=
Correlation
A more convenient measure of the relationship between X
and Y is correlation since it is normalized to lie inside
[-1;1] interval.


-1 < corr(X,Y) < 1
If corr(X,Y) = 0, then X and Y are uncorrelated.
If corr(X,Y) > 0 , then X and Y are positively correlated.
If corr(X,Y) < 0, then X and Y are negatively correlated.


16
Y X
XY
Y X
Y X
Y X corr
o o
o
= =
) var( ) var(
) , cov(
) , (
Note
Covariance and correlation measure only linear
dependence!
Example: Cov(X,Y)=0
Does not necessarily imply that y and x are
independent.
They may be non-linearly related.
But if X and Y are jointly normally distributed,
then they are independent.
18
The correlation
coefficient
measures linear
association
The Mean and Variance of Sums of
Random Variables
19
) ( ) ( ) ( Y E X E Y X E + = +
) , cov( 2 ) var( ) var( ) var( Y X Y X Y X + + = +
) , cov( 2 ) var( ) var( ) var( Y X Y X Y X + =
Continuous Probability Distributions:
Normal Distribution
The notation reads X is Normally distributed
with mean and variance
2
The PDF for a normal RV is


The normal distribution has a familiar bell-shape.
The normal density is symmetric around its mean, and 95% of
probability density lies in the region .

20
) , ( ~
2
o N X
|
.
|

\
|
=
2
2
2
) (
2
1
exp
2
1
) (
o
to
x x f
) 96 . 1 , 96 . 1 ( o o +
o

=
X
Y
Effects of Varying Parameters
Infinite Number of Normal
Distribution Tables
Normal distributions differ by mean and
standard deviations
Each distribution would require its own table
Thats an infinite number of tables!

Standard Normal Distribution
If X is a Normal RV with mean and variance
2


has a normal distribution with mean 0

and variance 1, or standard normal distribution.

o

=
X
Z
Example
Values of the std normal CDF, ,
are tabulated in Appendix Table 1
To compute probabilities for a normal RV, it
must be standardized by subtracting its mean
and dividing by standard deviation
Example: Suppose Y ~ N(2,16), and we need
P(Y<0)



) ( ) ( z Z P z s = u
) 1 , 0 ( ~
4
2
N
Y
Z

=
3085 . 0 )
2
1
( )
4
2 0
( ) 0 ( = s =

s = s Z P Z P Y P
26
Moments: Skewness, Kurtosis
skewness =
( )
3
3
Y
Y
E Y
o
(


= measures asymmetry in a
distribution
- The larger the skewness (by absolute value), the more
asymmetric is distribution
- skewness = 0: distribution is symmetric
- skewness > (<) 0: distribution has long right (left) tail
kurtosis =
( )
4
4
Y
Y
E Y
o
(


- measure the thickness of tails
- kurtosis = 3: normal distribution
- kurtosis > 3: heavy tails (leptokurtotic), i.e. extreme events are
more likely to occur

27
Central Limit Theorem
Important Continuous Distributions
All derived from normal
distribution

2
distribution: arises from
sum of squared normal
random variables
t distribution: arises from
ratios of normal and
2
variables
F distribution: arises from
ratios of
2
variables.
Hypothesis Testing
Identifying Hypotheses
1. Formulate the question, e.g. test that the
population mean is equal to 3
2. State the question statistically (H
0
: = 3)
3. State its alternative statistically (H
1
: 3)
4. Choose level of significance
Typical values are 0.01, 0.05, 0.10
Rejection region of sampling distribution: the
unlikely values of sample statistic if null
hypothesis is true
Identifying Hypotheses: Examples
1. Is the population average amount of TV
viewing 12 hours?
H
0
: = 12
H
1
: 12
Choosing the Level of Significance:
Type I and Type II Errors
Type I Error: Reject a true null hypothesis.
Type II Error: Do not reject a false null.
We would like probabilities of both errors to be
small. BUT, we cannot make both very small at
the same time.
In statistics, we fix the probability of Type I error
at a significance level (e.g. 5%) and minimize the
probability of Type II error.
Hypothesis Testing: Basic Idea
Method 1: Compare Test Statistic to
Critical Value from the Table
1. Convert Sample Statistic (e.g., ) to standardized
Z variable


2. Compare to Critical Value from the table
If Z-test statistic falls in the rejection region,
reject H0;
Otherwise do not reject H0

Two-Sided Test: Rejection Regions
One-Sided Test: Rejection Region
Method 2: Compute the P-value
Probability of obtaining a test statistic more
extreme ( or ) than actual sample value given
H
0
is true.
The lowest significance level at which we reject H
0
Compute p-value for the test

Use this p-value to make rejection decision:
If p value , do not reject H
0
If p value < , reject H
0
P-values for 2-sided and 1-sided tests
Two-sided test: H
0
: =
0

H
1
:
0


One-sided test:
a. H
0
: >
0
b. H
0
: <
0

H
1
: <
0
H
1
: >
0

|
|
.
|

\
|

u =
|
|
.
|

\
|

> =
Y Y
Y Y
Z P value p
o

o

0 0
2 | |
|
|
.
|

\
|

u =
Y
Y
value p
o

0
1
|
|
.
|

\
|

u =
Y
Y
value p
o

0
P-value for a 2-Sided Test
Method 3: Confidence Intervals
Confidence interval: set of values that contains
the true population mean with a pre-specified
probability, say 95%.
This pre-specified probability is called confidence
level.
A 95% confidence interval for
Y
contains the true
value of
Y
in 95% of repeated samples.
The 90% CI is:
The 95% CI is:
The 99% CI is:


41
) ( 96 . 1 Y SE Y
) ( 58 . 2 Y SE Y
) ( 645 . 1 Y SE Y
Jarque-Bera Test for Normality
Assesses whether a given sample of data is
normally distributed
Aggregates information in the data about both
skewness and kurtosis
Test of the hypothesis that S = 0 and K = 3


The 5% critical value is 5.99; if JB > 5.99, reject
the null of normality.

You might also like