Statistics Lecture PDF

Introduction to the Course
 Teaching staff: Dr. Tran Thi Bich

Department of Statistics, NEU
Email: bichtt@neu.edu.vn
 Lectures:
- 16th of Aug: 6-9:00pm
Statistics - 17th of Aug: 6-9:00pm
- 18thh of Aug: 9-12:00pm; 14-17:00pm
Vietnamese--Belgium program
Vietnamese - 19th of Aug: 9-12:00pm; 14-17:00pm
 Tutorials: 30 minutes - 1 hour, at the end of the lecture, from
lecture 2 to 6
 Text book:
- Statistics for Management and Economics. 7th Edition, Keller.
 Assessment: one in-class exam at the end of the course
1 2
Section 1 Outline
 Introduction to statistics
Introduction to Statistics and  Basic concepts: variables and data
SPSS  Getting acquainted with SPSS
Reading materials:
Chap 1, 2 (Keller)
3 4
Why is statistics important?

What is statistics
 Financial management (capital budgeting)
 Statistics is all about collecting, organising and  Marketing management (pricing)
interpreting data  Marketing research (consumer behaviour)
 Statistics is a way to get information from data and  Operations management (inventory)
make
k ddecisions
ii under
d uncertainty
i  Accounting (forecasting sales)
 Statistical analysis of data uses statistical modelling  Human resources management (performance
and probability; our main focus is on data and appraisal)
techniques for analysing data  Information systems
 Economics (summarising, predicting)
5 6
1
Types of statistics Basic concepts: variables and data
 Descriptive statistics:  A variable is some characteristics of population or
Collecting, organising, summarising, and presenting data sample
 E.g: graphical techniques;  Eg:
numerical techniques • Height of female students
 Inferential statistics: • Occupation of students in this class
Estimating, predicting, and making decisions about  Data are the observed values of a variable
population based on sample data  Eg:
 E.g: estimation; • Height of 10 female students: 1.6, 1.7, 1.55, 1.59, 1.5, 1.58,
hypothesis testing 1.64, 1.67, 1.58, 1.55
• Occupation of 5 students: teller, accountant, IT, marketing
manager, teacher
7 8
Types of data Qualitative data

 Qualitative is the kind of data that cannot be measured
(quantified)
Data
 Marital status: single, married, divorced, and widowed
 Study performance of students: poor, fair, good, very good, excellent
 More classification: qualitative data can be classified as Nominal
Quantitative aandd Ordinal
O d a data
da a
Q lit ti
Qualitative (also called
Interval)  Nominal data (also called categorical data): cannot be
quantified with any meaningful unit
- Marital status: single, married, divorced, and widowed
 Ordinal data: a sort of nominal data but their values are in order
Ordinal Continuous
Nominal Discrete - Study performance of students: poor, fair, good, very good, excellent
- Opinions of consumers: strongly disagree, somewhat disagree, neither
disagree nor agree, agree, strongly agree
9 10
Quantitative data Activity 1

 Quantitative (interval) data are real number (can be measured)
 Eg:
 Mid-term test marks of 10 students: 7, 8, 10, 5, 5, 6, 8, 9, 9, 7  For each of the following examples of data, determine the
 Weights of postal packages type:
 Monthly salary
i. The number of miles joggers run per week
 More classification: quantitative data can be divided into two
ii. The starting salaries of graduates of advanced program
types:
yp discrete or continuous
◦ Discrete data: take only integer value iii.The months in which a firm’s employees choose to take their vacations
 Eg: iv.The occupation of graduates of advanced program
 Number of children in family: 1, 2, 4, 7, 2 v. Teachers’ ranking
 Number of owned houses
◦ Continuous data: can take any value
 Eg:
 Weights of postal packages
 Monthly salary
11 12
2
Population versus sample Population versus sample (con.t)
 A sample is a smaller group of the population.
 Population is a set of all items or people that share
some common characteristics  A sample survey is obtained by collecting information of
some members of the population
 A census is obtained by collecting information - Collect the height of 1,000 Vietnamese citizens
about every member of a population - Verify the quality of a proportion of products that are
produced by factory X
- Collect the height of Vietnamese citizens 2
 Statistics: a descriptive measure of a sample (x, s )
- Verify the quality of all products that are produced by  Sampling: taking a sample from the population
factory X
 An important requirement: a sample must be representative
 Parameter: a descriptive measure of a population of the population. That means the profile of the sample is the
( ,  2 ) same as that of the population
14
13
Moving from population to sample Reasons to take sample
 A census can give accurate data but collecting information

Population from the entire population is sometimes impossible
 A census is time-consuming and expensive
 A sample allows to investigate more detailed information
Sampling frame
 A certain sample size ensures that results from the sample are
(a list of all items of
the population)
as accurate as those of the population
Sample
15 16
Types of sample Getting acquainted with SPSS

 Random sampling => Random sample  Import the file ‘assignment 1 data set.xls’
into SPSS and get familiar with SPSS.
Systematic sample
 Quasi random sampling

Quasi-random St tifi d sample
Stratified l
Multistage sample
Quota sample
 Non-random sampling
Cluster sample
17 18
3
Data presentation: Outline
Tables and charts
 Frequency distribution
- Simple frequency table
- Grouped frequency table
 Charts
- Bar and pie charts
- Histograms
Reading materials: - Boxplot
Chap 2, 3 (Keller) - Stem-and-leaf
- Ogive
1 2
Why do we have to summarise data Tables: frequency distribution
 Recap
 Frequency is the number of times a certain event has
◦ In the previous chap you know how to collect data. Data collected
through surveys are called ‘raw’ data.
happened
◦ Raw data may include thous. obs and often provide too much  A frequency distribution records the number of times
information => need to summarise before presenting to audience each value occurs and is presented in the form of
 Requirement table
◦ Data summary clears away details but should give the overall pattern.  Types of frequency distribution:
◦ Summarised information are concise but should reflect the accurate • Simple frequency distribution
view of the original data
• Grouped frequency distribution
 Methods to summarise and present data
• Cumulative, percentage, and cumulative percentage
◦ Tables
frequency distribution
◦ Charts
◦ Numerical summaries (measure of location and dispersion)
3 4
Simple frequency distribution Simple frequency table: example 1
 Applications: Marks Number of students (frequency)
• Qualitative data 4 3
• Discrete variable with few values 5 3
 Example of discrete variable with few values 6 2
• You are given a raw data of midterm marks of 20 students 7 4

as follows: 7, 7, 10, 8, 5, 4, 5, 6, 4, 9, 8, 7, 6, 4, 8, 5, 7, 10, 8 3
10, 9
9 2
• Create a simple frequency table manually
10 3
5 6
1
Simple frequency distribution: Simple frequency distribution: example 2
nominal variable
Nationality Number of students (frequency)
Australia 179
 Example 2: We have a data set of 686 international students New Zealand 1
studying at UNSW, Australia. Create a frequency table Hong Kong 21
 Large data set => can’t create a frequency table manually Singapore 48
Malaysia 70
 Creating a simple frequency table using SPSS Indonesia 76
 Go to ‘Analyse’ => ‘Tables’ => ‘Tables of frequency’ Philippines 6

Thailand 18
 When the dialog box appears, choose a variable for the box China 99
‘Frequencies for’, then click OK Vietnam 9
 Copy the table to Excel for more manipulations India 11
USA, Canada 14
UK, Ireland 35
Other Europe 42
Rest of the world 57
Total 686
7 8
Grouped frequency table: discrete variable with Grouped frequency table: discrete variable with
many values many values (cont.)
Example 3:3: the marks scored by 58 candidates seeking promotion in a
personnel selection test were recorded as follows. Construct a frequency Marks (class interval) Number of
table using a class width of ten marks candidates
(frequency)
37 49 58 59 56 79 21 – 30 2 Note: Decision on the
62 82 53 58 34 45 31 – 40 11 number of classes and
40 43 44 50 42 61 class intervals is
41 – 50 17 subjective but the
54 30 49 54 76 47 51 – 60 20 number should be
64 53 64 54 60 39 61 – 70 5 chosen carefully
49 44 47 44 25 38 71 – 80 2
55 57 54 55 59 40 81 – 90 1
31 41 53 47 58 55 Total 58
59 64 56 42 38 37
33 33 47 50
9 10
Grouped frequency table: continuous variable

Grouped frequency table: continuous variable
(cont.)
Example 4:4: draw a frequency table of wages (in USD) Wages (class Number of
Terminology:
interval) people
paid to 30 people as follows: (frequency)
Lower value: the lowest value of one
< $100 2 class.
$100 – < $200 5 Upper value: the highest value of
202 277 554 145 361 one class
$200 – <$300 8 Class interval: range from lower to
457 87 94 240 144
upper value
310 391 362 437 429 $300 – <$400 9
Open-ended class: the first or last
176 325 221 374 216 $400 – <$500 5 classes in the range may be open-
ended. That means they have no
480 120 274 398 282 $500 – <$600 1 lower or upper values (e.g: <$100).
153 470 303 338 209 Total 30 Open-ended class is designed for
uncommon value: too low or too high
11 12
2
Cumulative, percentage, and cumulative
Frequency distribution: summary percentage frequency distribution
1. Simple frequency distribution: easy task and can either do Wages (class Number of Cumulative Percentage Cumulative
manually or rely on statistical software interval) people frequency frequency percentage
(frequency) frequency
2. Grouped frequency distribution: more difficult. The
< $100 2 2 6.7 6.7
hardest task is to decide the number of classes and class
width or class intervals. Ideal: each class reflects $100 – < $200 5 7 16.7 23.3
differences in the nature of data. The more you work on it, $200 – <$300 8 15 26.7 50.0
the more reasonable classes’ number and size you decide $300 – <$400 9 24 30.0 80.0
$400 – <$500 5 29 16.7 96.7
3. The upper value of the previous class should not coincide
with the lower value of the following class to make sure $500 – <$600 1 30 3.3 100.0
each value should only be in one class. Total 30
13 14
Charts Bar and pie charts

 Back to the UNSW survey example, create a bar and pie
charts
 Tools for qualitative and discrete data:  Reduce numbers of classes for easily visual look
• Simple bar charts
Number of students Percentage
• Pie charts Nationality (frequency) frequency
 T l for
Tools f continuous
ti data:
d t A t li & NZ
Australia 180 26 24%
26.24%
China 120 17.49%
• Histograms
South East Asia 227 33.09%
• Stem-and-leaf plots India 11 1.60%
• Cumulative frequency curve (ogive) USA & Canada 14 2.04%
• Boxplots (discussed in lecture 3) UK & Ireland 35 5.10%
Other Europe 42 6.12%
Rest of the world 57 8.31%
Total 686 100.00%
15 16
Bar charts: example of UNSW Pie charts: example of UNSW
Number of inter. students at UNSW Percentage of inter.students at UNSW

8.31%
250 6.12%
5.10%
200
26.24%
2 04%
2.04%
Frequency
150
1.60%
100
50 17.49%
33.09%
0
Australia China South India USA & UK & Other Rest of
& NZ East Asia Canada Ireland Europe the world Australia & NZ China South East Asia India
USA & Canada UK & Ireland Other Europe Rest of the world
17 18
3
Equal--width histograms
Equal
Histograms
 All bars have the same width (the same class intervals)
 Raw data => frequency table => histograms  The height of each bar represents the frequency of the
class intervals
 A histogram looks like a bar charts except that the  Using raw data in the example 4, draw a histogram
bars are joined together representing wages
 Two types
yp of histograms:
g
 Equal-width histogram
 Unequal-width histogram
19 20
Shapes of histograms – positive skew (long

Shapes of Histograms - symmetric
tail to right)
Histogram of Positive skew

Histogram of Symmetric
35
50
30
40 25
Frequency
20
30
Frequency
15
20 10
10
0
0.0 1.5 3.0 4.5 6.0 7.5
Positive skew
0
-2.4 -1.6 -0.8 0.0 0.8 1.6 2.4
Symmetric
21 22
Shapes of histograms – negative skew (long

tail to left) Shapes of histograms - bimodal
Histogram of Negative skew Histogram of Bimodal

35 25
30
20
25
15
Frequency
Frequency
20
15 10
10
5
5
0
0 -1.5 0.0 1.5 3.0 4.5 6.0
3.0 4.5 6.0 7.5 9.0 Bimodal
Negative skew
23 24
4
Histogram terms Stem-and-leaf display
 Modal class – class with highest number of

 Raw data:
observations
24, 26, 24, 21, 27, 27, 30, 41, 32, 38
 Uni-modal, bi-modal, tri-modal, multi-modal
 Rearranging data:
 Skewness, symmetry
21 24,
21, 24 24,
24 26,
26 27,
27 27,
27 30,
30 32,
32 38,
38 41
 Relative frequency histogram: replace frequency
for each class by
class frequency/total number of obs.  Display stem-and-leaf 2 144677
3 028
4 1
25 26
Ogive How to draw an ogive

 Ogive is a cumulative frequency curve which shows  Ogive is line chart of cumulative frequency and can be
number of items less than one particular value drawn in Excel using line graph
 E.g. Having frequency table of salary => draw an ogive
Ogive
Class Frequency Cumulative 350

Cumulative frequency
frequency 300
250
<100 22 22 200
150
100-<150 44 66 100
150-<200 79 145 50
0
200-<250 96 241 <100 100-<150 150-<200 200-<250 250-<300 300-<350
Value
250-<300 44 285
300-<350 15 300
27 28
5
Numerical summaries: Outline
Central tendency and dispersion
 Measures of location:
 Mean, median, mode
 Selection of measures of location
 Measures of dispersion:
 Range
Range, quartile range,
range quartile deviation,
deviation variance,
variance
standard deviation
 Chebyshev’s law
Reading materials:  Coefficient of variation
 Coefficient of skewness
Chap 4 (Keller)
1 2
Measures of location (central tendency) Arithmetic mean

N
X i
 A measure of location shows where the centre of  Arithmetic mean from population:  i 1
N
the data is
n
 Three most useful measures of location:  Arithmetic mean from sample:

x i
x i 1
 Arithmetic mean/average n
 Median Where: Xi, xi - the value of each item

 Mode N, n - total number of items
3 4
Advantages and disadvantages of arithmetic mean

Easy example - mean
 Advantages:
◦ Easy to understand and calculate
 Data: 5, 7, 1, 2, 4
◦ Values of every items are included => representative for
the whole set of data
1 n
X  Xi
n i 1
 Disadvantages
sadva tages
◦ Sensitive to outliers:
1 Sample: (43; 38; 37; : : : ; 27; 34): => x  33.5
 5  7  1  2  4 Contaminated sample
5
(43; 38; 37; : : : ; 27; 1934): => x  71.5
1
 *19 (Source: Slide #23, Dehon’s statistics lecture, Universite libre de
Bruxelles, SBS-EM)
5
 3.8 6
1
Median Calculate median from raw data
 Median is the value of the observation which is  If the data has an odd number of observations:
located in the middle of the data set (n  1)th
◦ Middle observation:
2
 Steps to find median: Median  x( n 1))th
1. Arrange the observations in order of size (normally 2
ascending order)  If the data has an even number of observations:
2. Find the number of observations and hence the middle ◦ There are two observations located in the middle and
observation
3. The median is the value of the middle observation
M edian  ( x th x th )/2
n n 
   1 
2 2 
7 8
Example Advantages and disadvantages of median
 Advantages:
 E.g1. Raw data: 11, 11, 13, 14, 17 => find median
◦ Easy to understand and calculate
 E.g 2. Raw data: 11, 11, 13, 14, 16, 17 => find ◦ Not affected by outlying values => thus can be used when
median th mean would
the ld be
b misleading
i l di
 Disadvantages
◦ Value of one observation => fails to reflect the whole data
set
◦ Not easy to use in other analysis
9 10
Mode
Example to calculate mode
 Mode is the value which occurs most frequently
in the data set X Frequency
8 3
 Steps to find mode
12 7
1. Draw a frequency table for the data 16 12
2. Identify the mode as the most frequent value 17 8

19 5
11 12
2
Mean, median and mode in normal and skewed
Bimodal and multimodal data distributions
Bimodal (two modes) Multimodal (several modes)
13 14
Which measure of centre is best? Measures of dispersion

 Mean generally most commonly used
 Sensitive to extreme values
 If data skewed/extreme values present, median better, e.g.
 Measures of dispersion tell you how spread out all
real estate prices other values of the distribution from the central
 Mode generally best for categorical data – e.g. restaurant tendency
service quality (below): mode is very good. (ordinal)
 Measures of dispersion
Rating # customers • The range, quartile range, and quartile deviation
Excellent 20 • Variance and standard deviation
Very good 50
Good 30
Satisfactory 12
Poor 10
Very Poor 6
15 16
Measures of dispersion Why do we need measures of dispersion?
 Two data sets of midterm marks of 5 students:

◦ First set: 100, 40, 40, 35, 35 => Mean: 50 => Measure of
location is less representative, and thus less reliable
◦ Second set: 70, 55, 50, 40, 35 => Mean: 50 => Measure of
location is more representative, and thus more reliable
 Need to know the spread of other values around the

central tendency, especially important in analysing
stock market.
17 18
3
Variance
Range
2  
( X i   )2
 Range is the difference between the largest and  Variance from population:
N
smallest value => Sort data before computing range
 Formula: Range = maximum value - minimum  Variance from sample s2 
 ( x  x) 2
value n 1
 Advantages of Range: easy to calculate for  Advantages:

ungrouped data. • Take into account all values
• Easy to interpret the result.
 Disadvantages:
 Disadvantages: the unit of variance has no meaning
◦ Take into account only two values
◦ Affected by one or two extreme values
◦ More difficult to calculate for grouped data
19 20
Standard deviation ( ) Application of this in finance

 Standard deviation (S.D) is the square root of variance  Variance (or S.D) of an investment, can be used
 S.D from population: as a measure of risk e.g. on profits/return.
  2  Larger variance  larger risk
 Usually, higher rate of return, higher risk
 S D from sample:
S.D
s  s2
 Advantages:
• Overcome the disadvantage of meaningless unit of
variance
• The most widely used measure of dispersion (the bigger
its value => the more spread out are the data)
21
Example – 2 funds over 10 years Chebyshev’s law or the law of 3

 For a normal or symmetrical distribution:
 Rates of return ◦ 68.26% of all obs fall within 1 standard deviation of the mean, i.e. in
the range:
A 8.3 -6.2 20.9 -2.7 33.6 42.9 24.4 5.2 3.1 30.5 ( x  1s)  ( x  1s )
B 12.1
12 1 -2.8
-2 8 6.4
6 4 12.2
12 2 27.8
27 8 25.3
25 3 18.2
18 2 10.7
10 7 -1.3
-1 3 11.4
11 4
◦ 95.45% of all obs fall within 2 standard deviation of the mean, i.e. in
the range:
x A  16% xB  12% ( x  2s)  ( x  2s)

◦
s  280.34(%)
2
A
2 s A2  99.37(%) 2 ◦ 99.73% of all obs fall within 3 standard deviation of the mean, i.e. in
the range:
 Fund A: higher risk, but also higher average rate ( x  3s )  ( x  3s )
of return.
24
4
Boxplot Boxplots
Here is the Boxplot of height of international students
 Need MEDIAN and QUARTILES to create a boxplot
studying at UNSW
 MEDIAN = middle of observations, i.e. ½ way through
observations
Boxplot of Height  QUARTILES = mark quarter points of observations, i.e.
200 ¼ (Q1) and ¾ (Q3) of the way through data [(n+1)/4;
whisker 3(n+1)/4]
190
 INTERQUARTILE RANGE = Q3-Q1
180 upper quartile  Whiskers: max length is 1.5*IQR; stretch from box to
Height
median
furthest data point (within this range)
170 box
 Points further out from box marked with stars; called
160 lower quartile outliers
whisker
150
25 26
Shapes of Boxplots Coefficient of variation (C of V)
 Standard deviation can compare dispersion of two distributions with

similar mean
 For distributions having diff. means, we use coefficient of variation to
Boxplot of Symmetric, Positive skew, Negative skew, Bimodal compare their dispersions
5.0
 The bigger the coefficient of variation, the wider the dispersion
 Skewness/  Eg: two sets of data having the following information:
2.5
symmetry
A B
 Modality
Data
0.0
Mean 120 125
 Range
Standard deviation 50 51
-2.5
 Which one is more spread out?

-5.0
Symmetric Positive skew Negative skew Bimodal

27 28
Coefficient of skewness (C of S)
Coefficient of variation (cont.)
 This measures the shape of distribution
 Formula:  There are some measures of skewness.
s
Coefficient of variation = standard deviation/mean =  Below is a common one: Pearson’s coefficient of skewness.
x
Coefficient of skewness = 3 x (mean-median)/standard deviation
 C off VA = 0.417
0 417 andd C off VB=0.408
0 408 =>
> A iis more spreadd outt  If C of S is nearly +1 or -1, the distribution is highly skewed
than B
 If C of S is positive => distribution is skewed to the right
(positive skew)
 If C of S is negative => distribution is skewed to the left
(negative skew)
29 30
5
Activity 1 Distribution shapes
 Summary statistics of two data sets are as follows
10
200
Set 1: Set 2:
8
Ages of students Wages of staffs
150
studying at UNSW
6
quency
Freq uency
Mean 22 4839
22.4839 294 3
294.3
100
Freq
4
Median 21 292.5
50
Standard deviation 6.3756 125.93
2
0
0
20 40 60 80 100 200 300 400 500 600
 Compute the Pearson’s coefficient of skewness of these data age wages
sets and describe their shapes of distribution

Skewed to the right Nearly normal
31 32
Measure correlation between two variables Covariance

 If we have two measurements on one  Measures the strength of linear
observation. E.g. height and weight of a relationship between X and Y.
person, weekly income and amount spent  Calculated as
on rent pper week.
◦ Scatterplot (discussed in lecture 8)
n
◦ Covariance  (X i  X )(Yi  Y )
◦ Correlation co v( X , Y )  i 1
n 1
1  n 
  X iYi  nX Y 
n  1  i  1
Values of covariance Values of covariance

 If cov>0, then as X increases, Y increases;  If cov<0, then as X increases, Y
as X decreases, Y decreases (positive decreases; as X decreases, Y increases
slope) (negative slope)
Scatterplot of Positive vs X values Scatterplot of Negative vs X values
100
50
80 40
60 30
Positive
Negative
40 20
20 10
0 0
0 10 20 30 40 50
X values 0 10 20 30 40 50
X values
6
Values of covariance Coefficient of Correlation
 If cov=0, then as X changes, Y doesn’t  Also measures strength of linear
change  variables are not linearly relationship between X and Y.
related  Is bounded between -1 and +1.
Scatterplot of Zero vs X values

 Calculated as
1.0
0.5
0.0
COV ( X ,Y ) co v( X , Y )
-0.5
  , r 
Zero
-1.0
 X Y s X sY
-1.5
-2.0
-2.5
0 10 20 30 40 50
X values
Example
If correlation equals….
 Calculate covariance and correlation for the following
 If r=-1, perfect negative linear relationship data.
 xi  x  *
 If r=+1, perfect positive linear relationship xi yi xi  x  xi  x 
2
yi  y  yi  y   y i  y 
2
 If r=0, no LINEAR relationship 1 7 -2.5 6.25 3 9 -7.5

2 5 -1.5
15 2 25
2.25 1 1 -1.5
15
3 5
4 4
5 2
6 1
Total 21 24 0 17.5 0 24 -20
Covariance cont’d Summary
 Techniques for summarizing data

n  Bar charts, pie charts
(X i  X )(Yi  Y )
20  Histograms and boxplots – shape of distribution
cov( X , Y )  i 1
  4  Centre, spread, modality, skewness
n 1 5
n  Cumulative Relative Density Function (Ogive)
(X i  X )2
17.5  Numerical measures:
sx2  i 1
  3.5
n 1 5 ◦ Central tendency – mean, median, mode
n
 Correlation implies strong ◦ Dispersion – variance, standard deviation, coefficient of variation,
i  (Y  Y )
24
2 negative relationship – view
graph over.
range, interquartile range
s y2  
i 1
 4.8  Two sets of data: scatterplot, covariance, correlation
n 1 5
cov( x, y ) 4
r   0.976
sx s y 3.5 4.8
7
Why do we need to study probability
Section 2 and probability distribution?
 Prob is a crucial component to obtain

information about pops from samples
Probability and Random Variables  Prob provides the link between pops and
samples.
 Eg:
Reading materials: ◦ From sample means => infer pop means
◦ From a known pop => measure the likelihood of
Chap 6, 7, 8 (Keller) obtain a particular event or sample.
1 2
Terminology (1) Terminology (2)

 A random experiment is a process that  The sample space of a random experiment is a
results in a number of possible outcomes. list of all possible outcomes
None of which can be predicted with  Outcomes must be mutually exclusive and
certainty.
y exhaustive.
 Eg: ◦ No two outcomes can both occur on any one trial
◦ Roll a die: outcomes 1, 2, 3, 4, 5, 6. ◦ All possible outcomes must be included
◦ Flip a coin: outcomes Heads, Tails  E.g. roll a die: sample space:
◦ Take an exam: pass or fail S={1, 2, 3, 4, 5, 6}.
3 4
Continued Probabilities
Number of favorable outcomes
 An event is a collection of one or more Probability of an event=
Total number of outcomes
simple (individual) outcomes or events.  For the sample space S, P(S)=1
 E.g. roll a die: event A = odd number comes  E.g. roll a die: sample space:
up. Then A={1, 3, 5}. S={1,
S {1, 2, 3, 4, 5, 6}.
Example of events:
 In general, use sample space S={E1, E2,…, Obtain the number ‘1’: A= {1} and P(A)= 1/6
En} where there are n possible outcomes. Obtain an odd number: B={1, 3, 5} and P(B)=1/2
 Probability of an event Ei occurring on a Obtain a number larger than 6: C={} and P(C)=0
single trial is written as P(Ei) Obtain a number smaller than 7: D={1, 2, 3, 4, 5, 6} and
P(D)=1
5 6
1
Two rules about probabilities Probabilities of Combined Events
 The probability assigned to each simple  Consider two events, A and B.

event Ei must satisfy: P(A or B) = P(A U B) = P(A union with B)
= P(A occurs, or B occurs, or both occur)
P(A and B) = P(A ∩ B) = P(A intersection with B)

1. 0  P  Ei   1 for all i = P(A and B both occur)
n
2.  PE  1
i 1
i
P(Ā)=P(Ac)= P(A complement) = P(A does not
occur)
P(A|B)=P(A occurs given that B has occurred)
7 8
Joint Probabilities
Marginal Probabilities (1)
 Eg: mutual funds (http://www.howtosavemoney.com/how-
do-mutual-funds-work/)
B2 = Mutual fund Probabilities B1 B2

B1 = Mutual Fund
Probabilities does not outperform
outperforms market A1 0.11 0.29
market
A1=Top-20
Top 20 MBA program 0.11 0.29 A2 0.06 0.54
A2 = Not top-20 MBA
0.06 0.54
program
 Joint probabilities = P(A ∩ B)  Marginal probabilities:
P(Mutual fund outperforms AND top-20 MBA)=0.11 ◦ Computed by adding across rows or down columns
P(Mutual fund outperforms AND not top-20)=0.06 ◦ Named because they are calculated in the margins of the
P(Mutual fund not outperform AND top-20)=0.29 table
P(Mutual fund not outperform AND not top-20)=0.54
9 10
Marginal Probabilities (2) Conditional probability

 Conditional probability that A occurs, given that B
has occurred:
P ( A and B )
Probabilities B1 B2 Totals P A | B  
A1 0.11 0.29 0.40 P(B)
A2 0.06 0.54 0.60  Want to see whether a fund managed by a graduate
Totals 0.17 0.83 1.00
of a top-20 MBA program will outperform the
market
P(A1)=P(A1 and B1)+P(A1 and B2)=0.11+0.29=0.40
P(B1 and A1) 0.11
P(A2)=P(A2 and B1)+P(A2 and B2)=0.06+0.54=0.60 P(B1 | A1)    0.275
P( A1) 0.40
P(B1)=P(B1 and A1)+P(B1 and A2)=0.11+0.06=0.17
P(B2)=P(B2 and A1)+P(B2 and A2)=0.29+0.54=0.83
11 12
2
Some rules of probability Independence
 Additive rule: for the union of two events
P(A or B)  P(A)  P(B)  P( A and B)  Two events are independent if
 Multiplicative rule: for the joint prob. of P(A|B)=P(A) or P(B|A)=P(B)
two events:
P(Aand B) Note: If A and B are independent,
independent then
P A| B   P(Aand B)  P A| B P(B)  P B| A P(A) P(A and B) = P(A)*P(B) Note: only if indep!
P(B)
Then P(A|B) = [P(A and B)]/P(B)
 Complement rule: A and its complement, Ā,
=[P(A)*P(B)] /P(B)
so P(A)+P(Ā)=1;
=P(A)
therefore P(Ā)=1-P(A)
13 14
Activity 1 Random Variables

 Check whether the event that manager  Imagine tossing three unbiased coins.
graduated from a top-20 MBA program is S= {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT)
independent from the event that the fund  8 equally likely outcomes.
outperforms
p the market.  Let X = number of heads that occur.
 X can take values 0, 1, 2, 3.
 Actual value of X depends on chance – call it a random
variable (r.v.)
 Definition: a random variable is a function that assigns a
numeric value to each simple event in a sample space.
15 16
Notation Discrete vs continuous R.V.s

 Denote random variables (X, Y, ...) in  A discrete random variable has a
upper case countable number of possible values, e.g.
 Denote actual realised values (x, y,...) in number of heads, number of sales etc.
lower case  A co
continuous
t uous random
a do variable
va ab e has
as an
a
infinite number of possible values –
number of elements in sample space is
infinite as a result of continuous variation
e.g. height, weight etc.
17 18
3
Discrete probability distributions More on discrete prob distns
 Definition: A table or formula listing all  If x is the value taken by a r.v. X, then
possible values that a discrete r.v. can p(x)=P(X=x)= sum of all the probabilities
take, together with the associated associated with the simple events for
probabilities. which X=x.
 E.g.
E forf our toss three
h coins i example: l
 If a r.v. X can take values xi, then
x 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8 1. 0  p  xi   1 for all xi
(Check the probability in the table) 2.  px  1
xi
i
19 20
Activity 2 Describing the Probability Distribution
o What is the probability of at most one  Expected value or mean of a discrete

head? random variable, X, which takes on values
o What is the probability of at least one x with probability p(xi) is:
head?
ead?
  E ( X )   xi  p( xi )
all xi
21 22
Back to the coin tossing Rules for Expectations

If X and Y are random variables, and c is any
x 0 1 2 3
constant, then the following hold:
P(X=x) 1/8 3/8 3/8 1/8  E(c)=c
 E(cX)=cE(X)
  E ( X )   xi  p( xi )  E(X-Y)=E(X)-E(Y)
all xi  E(X+Y)=E(X)+E(Y)
 0*  1*  2*  3*
1
8
3
8
3
8
1
8
 E(XY)=E(X)*E(Y) only if X and Y are
independent
 12
8  1.5
23 24
4
Variance Variance continued
 Measures spread/dispersion of distribution
 Let X be a discrete random variable with
values xi that occur with probability p(xi),
 2  E  X    
2
and E(X) = μ.  
 The variance of X is defined as  E  X   
2 2
   xi2  p  xi    2
  E  X    
2 2
all xi
 
   xi     p  xi  
2
all x
 
i
25 26
Tossing three coins – again… Laws for Variances
x 0 1 2 3  If X and Y are r.v.s and c is a constant,

P(X=x) 1/8 3/8 3/8 1/8
11. V(c)=0
V  X     xi2  p  xi    2
all xi
2. V(cX)=c²V(X)
3. V(X+c)=V(X)
 02  18  12  83  22  83  32  18   1.52 4. V(X+Y)=V(X)+V(Y) if X and Y are
independent
 0.75 5. V(X-Y)=V(X)+V(Y) if X and Y are
Std Dev  X   0.75  0.866 (to 3dp) independent
27 28
Bivariate Distributions Example
 Toss three coins.

 Distribution of a single variable – univariate  Let X be the number of heads.
 Distribution of two variables together – bivariate  Let Y be the number of changes of sequence,
 So,
So if X and Y are discrete random variables,
variables then we say i the
i.e. th number
b off times
ti we change
h from
f H
p(x,y) = P(X=x and Y=y) is the joint probability that X=x and
Y=y.
→T or T→H.
◦ HHH: x=3, y=0 TTT: x=0, y=0
◦ HHT: x=2, y=1 TTH: x=1, y=1
◦ HTH: x=2, y=2 THT: x=1, y=2
◦ THH: x=2, y=1 HTT: x=1, y=1
29 30
5
Example continued Bivariate probability distribution
Outcome (S) x y y
HHH 3 0
0 1 2 px(x)
HHT 2 1
HTH 2 2 0 1/8 0 0 1/8
THH 2 1 1 0 2/8 1/8 3/8
TTH 1 1 x
2 0 2/8 1/8 3/8
THT 1 2
3 1/8 0 0 1/8
HTT 1 1
TTT 0 0 py(y) 2/8 4/8 2/8 1
31 32
Independence of Random Variables Covariance

 Consider the r.v.s X and Y with joint pdf
p(x,y); x=x1,…,xm; y=y1,…,yn.
 If the random variables X and Y are
independent, then  If E(X) = µx and E(Y)= µy, then the
P(X=x
P(X x and YY=y)
y) = P(X
P(X=x) x) . P(Y
P(Y=y)
y) covariance
cova a ce between
betwee X aandd Y iss ggiven
ve by
p(x,y) = px(x) . py(y)
 In previous example, X and Y are clearly not
independent:  xy  cov  X , Y   E  X   x  Y   y  
p(0, 0) = 1/8
m n
px(0) . py(0) = 1/8 * 2/8 = 1/32
  xi y j p  xi , y j    x . y
p(0, 0) ≠ px(0) . py(0) i 1 j i
33 34
Correlation coefficient Return to 3 coins tossed

 Associated with covariance.  X = number of heads, Y = number of
sequence changes.
 Check for yourself:
cov(( x, y )
 ; 1    1
 x y 3
x  ,  y  1
2
3 1
 x  ,  y2 
2
4 2
35 36
6
Covariance for example The sum of two random variables
m n
 xy   xi y j p  xi , y j    x . y o Consider two real estate agents.
i 1 j i
1 2 1 2 1 1 3 X = the number of houses sold by Albert in a

 0.0.  1.1.  1.2.  2.1.  2.2.  3.0.  .1 week
8 8 8 8 8 8 2
12 3 Y = the
th number
b off houses
h sold
ld by
b Beatrice
B t i in i a
  0
8 2 week
 xy
cov(x, y )=0 and    0. o Bivariate distribution of X and Y shown on
 x y
next slide
 X and Y are uncorrelated.
37 38
Bivariate distribution of X and Y We can show (check these at home!)

 E(X)=0.7
X
Y
0 1 2 py(y)  V(X)=0.41
0 0 12
0.12 0 42
0.42 0 06
0.06 0 60
0.60
1 0.21 0.06 0.03 0.30  E(Y)=0.5
2 0.07 0.02 0.01 0.10
px(x) 0.40 0.50 0.10 1  V(Y)=0.45
39 40
Suppose interest is in X+Y Repeat this for 0, 1, 2, 3, 4…
 That is, the total number of houses Albert and

Beatrice sell in a week. x+y 0 1 2 3 4
 Possible values of X+Y: 0, 1, 2, 3, 4. p(x+y) 0.12 0.63 0.19 0.05 0.01
 Then, P(X+Y=2) = sum of all joint
probabilities for which x+y=2;  Can evaluate mean and variance of (X+Y)
 That is P(X+Y=2) = p(0,2) + p(1,1) + p(2,0) E(X+Y) = 1.2
= 0.07 + 0.06 + 0.06 V(X+Y) = 0.56
=0.19 (check these at home!)
41 42
7
Law of expected value and variance Application of this – portfolio
of the sum of two variables diversification and asset allocation
If a and b are constants, and A and Y are  See Keller

◦ Pages 210-214 (7th edition)
random variables, then
 In Finance, use variance and standard deviation to assess
risk of an investment.
E (aX  bY )  aE ( X )  bE (Y )
 Analysts reduce risk by diversifying their investments –
V (aX  bY )  a 2V ( X )  b2V (Y )  2ab cov( X , Y ) that is, combining investments where the correlation is
small.
43 44
Continuous probability distribution About the function

 Remember: discrete data has a limited  f(x) must satisfy the following:
(finite) number of possible values  1. f(x)≥0 for all x, that is, it must be non-
discrete probability distributions can be negative.
put in tables
p 2
2. The total area underneath the curve
 Continuous data have an infinite number representing f(x) = 1.
of possible values  we use a smooth
function, f(x) to describe the probabilities
45 46
Notes about continuous pdfs Notes about continuous pdfs

1) P(a<X<b) = area under the curve 2) For a continuous pdf, the probability that
between a and b X will take any specific value is zero.
Evaluate as Let a b – see that area  0.
b
P  a  X  b    f ( x)dx
d
a
47 48
8
Notes about continuous pdfs The Normal Distribution
3) A continuous random variable has a
 Bell-shaped, symmetric about µ, reaches highest
mean and a variance! point at x=µ, tends to zero as x→±∞.
The mean measures the location of the
distribution, the variance measures the
spread of the distribution.
49 50
Notes about the Normal Distribution Different means
1. E(X) = µ; V(X) = σ². 0.5
2. Area under curve = 1

3. Different means – shift curve up and down x-
axis
f(x)
4. Different variances – curve becomes more

peaked
5. Shorthand notation: X~N(µ, σ²).
0
x
51 52
Different variances Probabilities from the Normal Distribution (1)
 Generally, we require probabilities P(X<a)

1
σ=1
1
σ=0.5
σ=2
f(x)
0
-4 -2 0 2 4
x
53 54
9
OR we require (2) So, need to find the area under the curve…(3)
 That is, need to integrate as follows:

 P(a<X<b)
2
b b 1  x  
1  
 
Area =  f  x dx   e 2 dx.
a a  2
 Not easy to do!
55 56
Tabulated values Standardising

 Tables made to provide probabilities.  The process of converting any Normal random
 However, obviously, different values needed for each variable to a Standard Normal Random Variable.
different μ and σ² - infinite possible values, so impossible to
have all the tables needed!
 If X~N(μ,σ²), then use the linear transformation
below:
 So we select one particular normal distribution – μ
μ=00, σ
σ² =11 –
call this the Standard Normal Distribution, and tabulate all the
probabilities for it.
 Call a r.v. from this a Standard Normal r.v., use notation X 
Z~N(0,1) Z ~ N (0,1)
 Now we just need a way to convert any other normal 
distribution to the standard normal – then we can use the
existing tables
57 58
Standardising (cont.) Rules to find probabilities normal tables

 So, for ANY random variable that comes from a  Symmetry
normal distribution, if we subtract the mean and
 P(Z<-a) = P(Z>a)
divide by the standard deviation, we get a
r.v.~N(0,1).  P(Z>a) = 1 – P(Z<a)
 S th
See the Z
Z-table
t bl in
i Appendix
A di B-8.
B 8 This
Thi Table
T bl  P( <Z<b) = P(Z<b) – P(Z<a)
P(a<Z<b) P(Z< )
provides P(Z<z) for various values of z  Total area under curve is 1, total area under each
 Others give P(Z>z) for various values of z. half of curve is 0.5, i.e. P(Z<0)=P(Z>0)=0.5
 Draw the curve, shade the area, break it up into
areas you can find (differences or sums)
59 60
10
Examples using tables (1) Examples using tables (2)
1) P(Z<1.5) = 0.9332 (from table) 2) P(Z>1) = 1 – P(Z<1)
= 1 – 0.8413 (from tables)
= 0.1587
61 62
Examples using tables (3) Examples using tables (4)

3) P(Z<-1) = P(Z>1) by symmetry 4) P(1<Z<1.5) = P(Z<1.5) – P(Z<1)
= 0.1587 (from (2)) = 0.9332 – 0.8413
= 0.0919
63 64
In general
 Given X~N(μ,σ²), suppose we require P(X<a).
X 
Know that Z  ~ N (0,1).

 X  a 
So, P  X  a   P  
   
 a 
 PZ  where Z ~ N (0,1).
  
65
11
Outline
Section 3
 Distribution of sample means
 The central limit theorem
Sampling Distribution
Reading materials:
Chap 9 (Keller)
1 2
Distribution of Sample Means: example (1) Another 50 observations; 1000 observations,

 Data were collected on the time taken for a pizza order to be on the time to complete a pizza order (2)
completed in minutes (from order taken to pizza handed over
to customer). Below is a histogram of 50 observations and
some summary statistics.
100
10
10
Frequency
Frequency
5 50
Frequency
0 0
6 8 10 12 14 16 18 20 22 24 26 10 20 30
Pizza time Pizza time
0
10 12 14 16 18 20 22 24 26
Pizza time Variable N Mean Median StDev
Pizza time 50 17.585 17.374 3.872
Variable N Mean Median StDev Variable N Mean Median StDev
Pizza time 50 17.256 17.041 3.743 Pizza time 1000 17.934 17.627 4.009
3 4
10,000 observations on the time to In general (4)

complete a pizza order (3)
 One thousand datasets, each with 10 observations in it (that
is, 1 thousand samples of size 10) are generated (simulated
data) from this model and for each sample, the average
600 (sample mean), median (sample median) and sample
500 standard deviation are calculated and recorded.
400 Variable N Mean Median StDev
quency
300 average 1000 18.007 18.020 1.231

Freq
200 median
di 1000 17
17.757
757 17
17.804
804 1 433
1.433
100
10 20 30 40
Pizza time
90
80
80
70 70
60 60
Frequency
Frequency
Variable N Mean Median StDev 50

40
50
40
Pizza time 10000 18.046 17.744 4.006 30

20
30
20
10 10
0 0
13 14 15 16 17 18 19 20 21 22 14 15 16 17 18 19 20 21 22 23
average median
5 6
1
More random numbers
S.D for the 1000 random samples of size 10
 Another thousand datasets are generated from the same model,
but this time each dataset has 25 observations.
90
80
70
100 80
60 90
uency
70
80
50
70 60
Frequ
40
Frequency
Frequency
60 50
30 50 40
40
20 30
30
20
10 20
10 10
0
0 0
1 2 3 4 5 6 7 15.5 16.5 17.5 18.5 19.5 20.5 14 15 16 17 18 19 20 21 22
stdev average median
Variable N Mean Median StDev  Variable N Mean Median StDev

stdev 1000 3.8183 3.7282 0.9505
 average 1000 17.991 17.982 0.814
 median 1000 17.711 17.675 1.017
7 8
S.D for samples of size 25 Notices as we take larger samples….

samples….
 The histograms for all three statistics (sample mean,
sample median and sample standard deviation) are
70
60
becoming more and more symmetric and bell-shaped
50
and less variable, particularly those for the sample
uency
40
mean
Frequ
30
20
Also notice that the estimated standard deviation of

10
0 
the sample mean is not only decreasing as sample
2 3 4 5 6
stdev
size increases, but is also approximately the same for

Variable N Mean Median StDev
the same sample sizes.
stdev 1000 3.9637 3.9391 0.6048
9 10
A general result of great importance The Central Limit Theorem

 No matter what model a random sample is taken  Whatever the population dist. looks like (normal
from, as the sample size (number of random or not), when a sample size is large enough, the
observations) increases, the distribution of the distribution of sample means will be normal and
sample mean becomes closer and closer to the we can use Z-statistic to calculate probability of
normal distribution,, and any mean value
 No matter what model a random sample is taken
from, and for any sample size n, the standard
deviation of the sample mean is the model standard
deviation, , (the theoretical standard deviation)
divided by n, that is, /n => Called standard error
of the means (SE).
11 12
2
This is the Central Limit Theorem So, how large does n need to be?
 If X is a random variable with a mean µ  Generally, it depends on the original distribution of
and variance σ², then in general, X.
◦ If X has a normal distribution, then the sample mean has a
normal distribution for all sample sizes.
◦ If X has a distribution that is close to normal, the
 2 
X  N  ,  approximation is good for small sample sizes (e.g. n=20).
 n  ◦ If X has a distribution that is far from normal, the
approximation requires larger sample sizes (e.g. n=50).
X 
Z  Z ~ N  0,1 as n  .
 n
13 14
Activity 1
 The average height of Vietnamese women is
1.6m, with a standard deviation of 0.2m. If I
choose 25 women at random, what is the
probability that their average height is less than
1 53m?
1.53m?
15
3
Outline
Estimation  Concepts of estimation – point and interval
estimators; unbiasedness and consistency
 Estimating the population mean when the
population variance is known
 Estimating the population mean when the
population variance is unknown
Reading materials:  Selecting the sample size
Chap 10 (Keller)
1 2
Recap: The Central Limit Theorem Recap: What size n?
 As n→∞, the distribution of the sample mean  If the distribution of X is normal, then for all n the
becomes Normal, with centre µ and standard sample mean will follow a normal distribution.
deviation σ/√n.  If the distribution of X is VERY not normal, then
 This happens regardless of the shape of the original we will need a large n for us to see the normality
of the distribution of the sample mean.
mean
population.
 In all cases, as n gets larger, the distribution of the
 i.e. X follows a Normal distribution with mean gets more normal.
E ( X )   and
var( X )  
2
n
3 4
How does this help? Estimation

 This means that if we have a large enough
sample, we can always find out probabilities to  The aim of estimation is to determine the
approximate value of a parameter of the population
do with the mean, since it will have a normal using statistics calculated in respect of a sample
distribution no matter what the original drawn from that population.
distribution.
distribution • As an example,l we estimate
i the
h mean off a population
l i using
i
the mean of a sample drawn from that population. That is,
the sample mean is an estimator of the population mean.
• The actual statistic we calculate in respect of the sample is
called an estimate of the population parameter. For example,
a calculated sample mean is an estimate of the population
mean.
5 6
1
Estimators Desirable qualities of estimators
 There are two types of estimators  Want our estimators to be precise and accurate
Point estimate: a single value or point, i.e. sample Accurate: on average, our estimator is getting
towards the true value
mean = 4 is a point estimate of the population
Precise: our estimates are close together
mean, µ.
Interval estimate: Draws inferences about a  Sample mean is a precise and accurate estimator
population by estimating a parameter using an of the population mean. (Sometimes, accurate
interval (range). and precise together is referred to as unbiased.)
• E.g. We are 95% confidence that the unknown mean
score lies between 56 and 78.
7 8
Interval estimators for  , is known

Point and interval estimators
 A point estimate is just that, an interval gives some  We know that

idea of how sure we are.
 Interval estimator:
 2 
 Gi
Give an iintervall (range)
( ) based
b d on a samplel statistic
i i
x ~ N  , .
 This interval corresponds to a probability and this
probability is never equal to 100%
 n 
x 
So, Z  ~ N  0,1 .
 n
9 10
Put these things together….

Interval estimators (cont.) And rearranging…
 We also know that, for a standard normal
distribution, 95% of the area is contained
P  1.96  Z  1.96   0.95
between -1.96 and + 1.96.
 x  
P  1.96   1.96   0.95
P  1.96  Z  1.96   0.95   n 

P 1.96  n  x    1.96  
n  0.95

P x  1.96  n    x  1.96  
n  0.95
11 12
2
   
P  x  1.96    x  1.96   0.95 Example 1
 n n
 This is called a 95% confidence interval  Suppose we know from experience that a
for μ. random variable X~N(μ, 1.66), and for a sample
 What this means: of size 10 from this population, the sample mean
• In repeated sampling, 95% of the intervals is 1.58.
createdd this
hi way would
ld contain
i μ andd 5%  N
Now,
would not.
 Can change how confident we are by    
changing the 1.96 P  x  1.96    x  1.96   0.95
• Use 1.64 to get a 90% confidence interval  n n
• Use 2.57 to get a 99% confidence interval
13 14
    General notation
P  x  1.96    x  1.96   0.95
 n n  In general, a 100(1-α)% confidence interval
 estimator for μ is given by
1.66 1.66 
P 1.58  1.96    1.58  1.96   0.95
 10 10   
   x  Z / 2
 
P  x  Z / 2   100(1   )%
 n n
P  0.78    2.38   0.95  Notations:
 Interpretation: If the experiment were carried out C o n f id e n c e le v e l: 1 0 0 (1   ) % -
multiple times, 95% of the intervals created in this th e p ro b . th a t a p a r a m e te r f a lls in to C I
way would contain μ.

 Lower Confidence Limit: 0.78, Upper Confidence C I: x  Z  /2
n
Limit: 2.38
 
LCL: x  Z /2 ; U CL: x  Z /2
n n
15 16
What does 100(1

100(1--α)% mean What does Zα/2 mean?
 If we want 95% confidence, α=0.05 (or
 We want to find the middle 100(1- α)% area of the
5%). standard normal curve:
 If we want 90% confidence, α=0.10 (or ◦ So the area left in each tail will be α/2.
10%).
0%). ◦ Zα/2 iss tthee point
po t which
w c marks
a s off
o area
a ea of
o α/
α/2 in the
t e tail
ta
 If we want 99% confidence, α=0.01 (or ◦ Need to look up normal tables to find this!
1%).
17 18
3
Factors influence width of the interval IMPORTANT!
  fixed; can’t be changed  Remember that it is the INTERVAL that

 Vary the sample size: as n gets bigger, the interval changes from sample to sample.
gets narrower.
 µ is a fixed and constant value. It is either
 V
Vary th confidence
the fid l l If we wantt to
level: t be
b more
confident, then we simply change the 1.96 to
within the interval or not.
another number from the standard normal, 2.33  You should interpret a 95% confidence
will give 98% confidence, 2.575 will give 99% interval as saying “In repeated sampling,
confidence; increasing confidence will make the 95% of such intervals created would contain
interval wider. the true population mean”.
19 20
1. A 95% confidence interval for the

Example 2
population mean height.
 Average height of a sample of 25 men is found
to be 178cm. Assume that the standard    
deviation of male heights is known to be 10cm, P  x  1.96    x  1.96   0.95
 n n
and that heights follow a normal distribution.
Find  10 10 
P  178  1.96
1 96    178  11.96
96   00.95
95
1. A 95% confidence interval for the population  25 25 
mean height. P 174.08    181.92   0.95
2. A 90% confidence interval for the population
mean height.
 So, in repeated sampling, we would expect
95% of the intervals created this way to
contain μ.
21 22
2. A 90% confidence interval for the Interval estimators for  ,  is unknown

population mean height.
P  1.645  Z  1.645   0.90,  We can’t simply substitute s in for σ, since X  
does not have a standard normal s n
that is Z / 2  1.645 distribution!
     However, it does follow a known distribution: it
P  x  1.645    x  1.645   0.90
 n n follo s a t-distribution
follows t distrib tion with
ith n-11 degrees of freedom.
freedom
The statistic is called t-statistic:
 10 10 
P  178  1.645    178  1.645   0.90
 25 25  x
P 174.71    181.29   0.90 t 
s/ n
 So, in repeated sampling, we would expect 90%
of the intervals created this way to contain μ.
23 24
4
About the t-
t-distribution (1) About the t-distribution (2)
 Found by Gossett, published under pseudonym
Normal
“Student”.
distribution
 Called “Student’s t-distribution”
 It is symmetric around 0, mound shaped (like a Bell-shaped t ((df = 13))
normal), but has a higher variance than a normal Symmetric
distribution. More spread out t (df = 5)
 The higher the degrees of freedom, the more
normal the curve looks.
Z
t
0
25 26
Degree of freedom (df) Hints for Using the t-

t-tables
 Bottom row has df=∞; this is the standard normal

 Number of obs whose value are free to vary probabilities.
after calculating the sample mean ◦ If df is very large, use Z tables even if σ is unknown
 E.g  If df is not on tables as exact, use whatever df is
df = n -11
◦ X 2 closest
= 3 -1
◦ Difference between values for large df is small
=2
◦ X1 = 1 (or another value) ◦ E.g. df=74; would use values for df=70 as this is closest.
Then say:
X2 = 2 (or another value)
X3 = 3 (can’t be changed) t0.05,74  t0.05,70  1.667
27 28
Confidence Interval for  , is unknown Example 3
 A random sample, size n = 25, x = 50,

 s s 
P  x  t / 2    x  t / 2   100(1   )% s = 8. Use 95% confidence level to estimate .
 n n
s s
s s x  t / 2    x  t / 2
 CI: x  t / 2    x  t / 2 n  n
n n 8 8
50  2.0639    50  2.0639
25 25
Note: (i) The population must follow normal 46.69    53.30

distribution to get t-statistic
(ii) Use t-table to find t-value
29 30
5
Determine the sample size Sample size required
 Suppose that before we gather data, we
know that we want to get an average  Example 4: Assume that the standard
within a certain distance of the true deviation of a population is 5. I want to
population
p p value. estimate the true p
population
p mean lying
y g in a
 We can use the CLT to find the minimum range of 3, with 99% certainty.
sample size required to meet this  Step 1: set up the equation needed.
condition, if the standard deviation of the
population is known.  
P X    3  0.99
31 32
Sample size continued Sample size continued

 Step 3: solve for n.
 Step 2: standardise.
 X  3  P  Z  2.575   0.99
P    0.99
 n  n 3 n
 
 2.575
 3  5
P Z    0.99
 5 n n  (2.575*5) / 3
 3 n n  18.42
P  Z    0.99
 5 
 Therefore, I need a minimum sample size of 19 to be able to
estimate the true population mean lying in CI of 3, with 99%
certainty
33 34
Activity 1
 Suppose that we know the standard
deviation of men’s heights is 10cm. How
many men should we measure to ensure
that the sample
p mean we obtain is no
more than 2cm from the population mean
with 99% confidence?
35
6
Outline
Section 4
 Hypothesis testing: basic concepts;
Hypothesis Testing  Testing µ when  is known
 Testingg µ when  is unknown
 Testing for the difference of two means
(independent samples)
Reading materials:
Chap 11, 12 (Keller)
2
1
Hypothesis testing Plan

 Collect data and use this to decide which idea is
 Making decisions in the face of uncertainty most likely to be correct
 Hypothesis testing is a structure for making these  Depending on the decision, we either will or will
decisions not carry an umbrella.
 We have in mind two competing
p g ideas – call these  Decision matrix – thinking about consequences.
hypotheses
◦ First idea: null hypothesis What actually happens (truth)
◦ Second idea: alternative hypothesis It rains It doesn’t rain
 The ideas must be distinct; e.g. What Take umbrella  
◦ Idea 1(H0): it will rain today you Don’t take
◦ Idea 2 (HA): it will not rain today decide  
umbrella
3 4
In Statistics: An analogy for hypothesis testing –

criminal law
Truth
H0 true HA true
Criminal law Hypothesis testing

Accept H0  Type 2 Accused is innocent Null hypothesis
Error
Decision
 Accused is guilty Alternative hypothesis
Accept HA Type 1 
Error Gathering evidence Gathering data
α = significance level = P(type 1 error) Build case – presenting Presenting a summarising
β = 1 – power = P(type 2 error) and summarising data, building a test
Power=P(reject H0 when it is false) evidence statistic
5
6
1
Analogy continued – outcomes (1) Analogy continued – outcomes (2)
If we say we have a 95% chance of making the right
decision, it means we have a 5% chance of making an error.
Criminal law Hypothesis testing But, what type of error do we have a 5% chance of making?
 A Type 1 error is considered to be more serious than a Type 2 error.
Accused is acquitted Choose H0 Therefore, by convention, we set up testing so the probability of
yp 1 error,, α,, is small;;
makingg a Type
Accused is convicted Choose HA
 Ideally, we would also like to have the probability of making a Type 2
Convict an innocent error, β, small. But reducing chance of Type 1 error increases chance
Type 1 Error of Type 2 error;
person
 Therefore, we choose to set α to 5% (i.e. a 5% chance we reject H0
Acquit a guilty person Type 2 Error
when it is true), or some other fixed, low probability and ignore β
“Beyond reasonable “95% Certainty of making
doubt” the right decision”
7
8
Analogy continued – outcomes (3) Steps for hypothesis tests

1. State null and alternative hypotheses
 In hypothesis testing, we also make a 2. Calculate test statistic
“presumption of innocence”. This means that, 3. Formulate a Decision Rule using either the
when we test a hypothesis, we start by assuming Rejection Region, or p-value - found from
null is true. Then, we gather data, and if we find appropriate distribution (std normal), or
enough evidence, we will reject the null confidence interval approach
hypothesis and accept the alternative hypothesis. 4. Reach a conclusion regarding whether to
accept the null or alternative hypothesis.
9 10
Testing µ when  is known Rules for hypotheses
 Example 1: A store manager is considering a new billing

system for credit customers. New system will only be cost
 Null hypothesis:
effective if mean monthly account is more than $170.  Always about a population value (greek letter)
Random sample of 400 monthly accounts gives sample  Always has an “=“
average of $178.
$178 Manager knows that accounts are  Al
Alternative
i hypothesis:
h h i
approximately normally distributed, with standard
deviation of $65. Can the manager conclude from this data  Always about a population value (greek letter)
that the new system will be cost effective?  Has one of <, > or ≠
 Want to find out if µ, true mean monthly account, is bigger  Looks like null, but “=“ has been replaced.
than $170.
11
12
2
Applying the rules to example 1 Recap: The Central Limit Theorem
 Null hypothesis  The central limit theorem says that a sample

H0:µ=170 average has a normal distribution with a centre at
µ and a standard deviation of  / n . So, if we
 Alternative hypothesis calculate the test statistic below, it should follow a
HA:µ>170 standard normal distribution
 Having done this, the question now becomes: “is

$178 far enough away from $170 to conclude that X  
µ is bigger than $170?”  Z ~ N  0 ,1  a s n   .
 n
14
13
Applying this to the example (1) Applying this to example 1 (2)

 We have σ=65.
 We calculate a test statistic – this measures (in  Test statistic in this case:
standardised units) how far from the hypothesised µ
our sample average is.
X   178  170 8
 Formula: Z    2.46
2 46
X   n 65 400 3.25
Z
 n
 Z should follow a standard normal distribution IF the
true µ is equal to the one in our null hypothesis.
15 16
Decision Rule Applying this to example 1

 Three methods – rejection region, p-value, or confidence interval
 Rejection region:  Point that marks off top 5% of a standard normal
 We want to be 95% certain. This means a 5% chance of rejecting H0 when
it is true.
is 1.645.
 So, we find the EXTREME 5% of the standard normal (according to our  So, we will reject the null hypothesis if our test
alternative hypothesis) and this will be our rejection region.
statistics lies above 1.645.
1 645
 Here, Test statistic = 2.46.
 So we reject the null hypothesis in favor of the
alternative hypothesis. In other words, there is
sufficient evidence to conclude that the mean
monthly account is higher than $170
17
18
3
P-value approach (by hand or computer) Applying this to example 1 (by hand)
 From the standard normal tables:
 This is probability of getting our test statistic or
further away from middle if the null is true. P(Z>2.46) = 1 – P(Z<2.46)
 Draw a diagram – it is the area more extreme than = 1 – 0.9931
our test statistic, i.e. for the last example, p-value is = 0.0069
P(Z>2 46)
P(Z>2.46).
 Small p-value is evidence against the null  This means that the probability of observing a sample mean
hypothesis. at least as large as 178 for a population whose mean is 170
 Rule: is 0.0069, or extremely small (much smaller than 0.05).
Therefore, we reject the null and conclude that the mean
 If p-value < α, => reject null hypothesis; monthly account is higher than $170 (the same conclusion
 If p-value > α, => Do not reject null hypothesis as we did using the rejection region approach)
19 20
Confidence interval (CI) approach Applying this to example 1
 For a 5% significance level, we set up a rejection  The 95% confident interval for μ is:
region:
65 65
178-1.96 <  178  1.96
X  X  400 400
 1.96 or  1.96
 n  n  171.63< <184.37
X 
Acceptance Region is: -1.96<  1.96
 n  Because µ does not lie b/w this CI, we reject the
Then the 95% CI for  is: X -1.96 / n <  X  1.96 / n null in favor of the alternative
21 22
One tailed vs two tailed tests So if alternative is “≠

“ ≠“
 Two sided or two tailed test
 If the alternative hypothesis is “<“ or “>”
 This is a one tailed test  Rejection Region will be Z<-Zα/2, Z>+Zα/2
 Rejection region will be in either upper or lower tail  P-value will be P(Z>T.S)+P(Z<T.S)
 P-value is the probability of getting a more extreme
result
 If the alternative hypothesis is “≠”
 This is a two tailed test
 Rejection region needs to be split between both tails
 P-value will include an absolute value – i.e. will be the
probability of getting further away from the hypothesised
mean on either side
23 24
4
So if alternative is “>
“>“
If alternative is “<
“<“
 Right tailed test
 Left tailed test
 Rejection Region will be Z>+Zα
 Rejection Region will be Z<-Zα
 P-value will be P(Z>T.S)
 P-value will be P(Z<T.S)
25 26
Testing µ when  is unknown Example 2

 Use the gssft.sav file to test the hypothesis
 Similar to the case of estimation, we can substitute s
that college graduates work a 40-hour work
in for σ and calculate the t-statistic.
week.
 The basic process of hypothesis testing remains the
same,, with the followingg changes
g
 Test statistic is now calculated as
X 
t
s n
 It follows the t-distribution with n-1 degrees of freedom (use
t-table to find rejection region or p-value).
27 28
Hypotheses, test statistic for 2-

2-tailed test CI for µ when σ is unknown
 H0: µ=40
 HA: µ  40
 Also use t-distribution for confidence intervals for µ
 Here are results from SPSS
when σ is unknown.
 If σ has been estimated from data, confidence
One-Sample
p Test interval will be of form
Test Value = 40
s
95% Confidence
Interval of the
X  t / 2,n 1 .
Difference
n
Sig. (2- Mean
t df tailed) Difference Lower Upper s s
Number of hours
Or X  t / 2,n 1    X  t / 2,n 1 .
worked last week
14.326 436 .000 6.995 6.04 7.96 n n
29
30
5
Conclusion
 Based on either t-statistic or p-value, or confidence

interval approach, we reject the null hypothesis. In
other words, there is sufficiently statistical evidence
to conclude that full-time workers work more than 40
hours per week.
31
6
Outline
Section 5  Simple Regression:
 Form of the general model
 Procedure in SPSS
Regression analysis  Interpretation of SPSS output
 T i significance
Testing i ifi off a slope/intercept
l /i
 Assumption checking
Reading materials:  Multiple Regression:

 As above
Chap 17, 18 (Keller)
1 2
Regression analysis Types of relationships

 Regression analysis investigates whether and how variables
Positive linear relationship Negative linear relationship
are related to each other. More specifically, regression
analysis can be used to:
• Determine whether the value of one variable has any effects on the
values of another;
• Determine whether, as one variable changes, another tend to increase
or decrease?
• Predict the values of one variable based on the values of one or more
other variables.
 E.g: Non-linear relationship No relationship
• How price is related to product demand => making changes on price,
how product demand will change?
• How salary of staffs depend on their education and experience?
3 4
Simple linear relationship Simple linear relationship: example

Respondent Duration of Quality of Attitude Towards
 In simple linear relationship, we want to see whether Number Residence infrastructure City
1 10 3 6
a linear relationship exist b/w one dependent variable 2 12 11 9
(Y) and one independent variable (X). 3 12 4 8
 Example: want to see whether the time persons have 4 4 1 3
li d in
lived i a city
i (in
(i years)) affects
ff their
h i attitude
i d towards
d 5 12 11 10
that city in a linear manner. Attitude towards the city 6 6 1 4

7 8 7 5
is measured on an 11-point scale (1=do not like, 11= 8 2 4 2
very much like). 9 18 8 11
10 9 10 9
11 17 8 10
12 2 5 2
5 6
1
Steps in regression analysis Simple linear regression: notation
1. Analyse the nature of the relationship b/w independent  Simple regression – one predictor
and dependent variables  We have n observations.
 Xi = value of the independent variable on ith obs
2. Make a scatterplot
 Yi= value of dependent variable on ith obs.
3
3. Formulate the mathematical model that describes the  sx=sample standard deviation of the independent variables
relationship b/w the independent and dependent variables
 sy=sample standard deviation of the dependent variables
4. Estimate and interpret the coefficients of the model  Y is the sample average of the independent variables
5. Test the model  X is the sample average of the dependent variables
6. Evaluate the strength of the relationship and prediction

accuracy
7
8
Simple linear regression: scatterplot Simple linear regression: Model
 Step 2: Make a Scatterplot  Step 3: Formulate the General Model

Example – city attitudes vs duration of residence  Fit a straight line to the data, fitting the following
model:
Scatterplot of Attitude Towards City vs Duration of Residence
Intercept Error terms
11
10
(Residual)
9
Yi   0  1 X i   i
Attitude Towards City
4 Slope
3
2  Slope and intercept are estimated by the ordinary

0 5 10
Duration of Residence
15 20
least squares (OLS) method.
9 10
OLS method and assumptions

Gauss--Markov assumptions
Gauss
Want:   i minimum
2
 Assumption on linear relation

Y A0: linear model
Yi   0  1X i   i Observed
value  Assumption on the factor
A5: Exogeneity
g y assumption:
p Cov( X ,  )  0
i = error terms
 Assumption on the error terms:
A1 : E ( i )  0 i  1,..., n
A2 : Normality of error terms  ~ N
   0  1X i
YX A3 : Non-autocorrelation of error terms cov( i ,  j )  0 i  j
X A4 : Homoskedasticity Var( i )   2 i  1,..., n
Source: Dehon’s lecture

12
11
2
Estimate the parameters Applying this to example
 Step 4: Estimate the parameters (slope and
intercept)  Slope = 16.333/27.697
Yî  ˆ 0  ˆ 1 X i = 0.5897
 Can calculate estimates of slope and intercept  Intercept = 6.5833-0.5897*9.333
using formulae, which are derived from the OLS =1.0796
n n n
n X iYi   X i  Yi
1  i 1 i 1 i 1 Fitted Equation: Yî 1.07960.5897*Xi
2
 n
n

n X    X i  i
2
i 1  i 1 
0  Y  1 X
13 14
Interpreting the coefficients Step 5: Testing for significance of estimated

parameters
 ̂1= 0.5897 means that each additional year of  Can test significance of linear relationship
staying in the city, your attitude towards city will  H0:β1=0
increase by an average of 0.5897 points  HA:β1≠0
 ̂ 0 = 1.0796 is the value when X=0. This means  Test Statistic:
that other reasons unrelated to the duration of ˆ1  1
residence make your attitude towards city equal to T ; where sˆ is the standard error of ˆ1.
sˆ 1
1.0796 points. 1
Decision Rule: Compare to a t-distribution with

Note: sometimes, ˆ0 makes non-sense when X=0,


n-2 degrees of freedom.
we don’t interpret the meaning of this coefficient.
15 16
Applying this to example Decision rule

 H0:β1=0  So, rejection region will be t>2.2281 or
 HA:β1≠0 t<-2.2281 for 5% significance (use df=10)
 Test Statistic:  OR from SPSS, p-value = 0.000.
ˆ1  1 5897  0
00.5897  Conclusion: Reject the null hypothesis.
hypothesis
t   8.412
sˆ 0.0701 There is a significant linear relationship
1
between duration of residence and attitude
 Compare this t-value with the t-distribution to to the city.
make decision rule.
17 18
3
Step 6: Determine the strength and
significance of association Applying this to example
 Measured by r2 – coefficient of  Here is outputs from SPSS

determination.
 r2 measures proportion of total variation S = 1.22329 R-Sq = 87.6% R-Sq(adj) =
(Y) explained by the variation in X, i.e. 86 4%
86.4%
explained variation SS x  So, 87.6% of variation in Y is explained by

r2   the variation in X.
total variation SS y
19 20
Step 6: Check prediction accuracy Checking assumption

 Regression analysis makes several
 Can use standard error of the estimate, sε.
assumptions:
SSres  Error terms normally distributed
s 
n  k 1  Error terms have mean 0, constant variance
 Error terms are independent
 Interpretation: average residual; average error in predicting  These should be checked with plots (see
Y from the regression equation.
 Used to construct confidence intervals
multiple regression section)
◦ for mean value of Y for given X
◦ for all values of Y for given X
21 22
Example using SPSS Multiple Regression

 Data:
 Use the cntry15.sav data file for SPSS  one dependent variable
practice.
 two or more independent variables
 Example: Are consumers
consumers’ perceptions of
quality determined by the perceptions of
prices, brand image and brand attributes?
23
24
4
Model – general form Interpreting a Partial Regression Coefficient
Y   0  1 X 1   2 X 2     k X k    Imagine a case with two predictors

which is estimated by
Y   0  ˆ1 X 1  ˆ2 X 2    ˆk X k
ˆ ˆ Y   0  1 X 1   2 X 2  
ˆ0  estimated intercept 1 represents the expected change in Y when X 1

ˆ
i  estimated partial regression coefficient is increased by one unit, but X 2 is held constant
or otherwise controlled.
 As before, use least squares method to estimate
parameters, minimise the error (residual) sum of
squares.
25 26
Example 2 General Model
 Attitude to city now being explained by  Let

 Duration of residence
 Y=attitude to city
 Quality of infrastructure
 X1=duration
duration of residence
 X2=quality of infrastructure
Y   0  1 X 1   2 X 2  
27 28
Estimation (SPSS) Strength of relationship (R2)

The regression equation is
Attitude Towards City = 0.337 + 0.481 Duration of Residence
+ 0.289 quality of infrastructure
Coefficientsa  As before, is the proportion of variation

Unstandardized Standardized explained by the model.
Coefficients Coefficients
explained variation SS reg
Model B Std Error
Std. Beta t Sig
Sig.
R2  
1 (Constant) .337 .567 .595 .567 total variation SS y
duration .481 .059 .764 8.160 .000
quality .289 .086 .314 3.353 .008  In the example, 94.5% of variation in Y
a. Dependent Variable: attitude can be explained by the variation in X1
and X2
29
30
5
Points about R2 Significance Testing
 Now called coefficient of multiple  Can test two different things
determination 1. Significance of the overall regression
 Will go up as we add more explanatory 2. Significance of specific partial regression
terms to the model whether they are coefficients.
“i
“important” ” or not.
 Often we use “adjusted R2” –
compensates for adding more variables, so
is lower than R2 when variables are not
“important”
31 32
1. Significance of the overall regression Applying this to example–

example–SPSS output
 H0: β1= β2= β3=…= βk=0

 HA: not all slopes = 0  This is the test done in the ANOVA section of the
output.
 Test Statistic:  In this case, we reject the null hypothesis – at least
one of the slopes is significantly different from
zero.
SSreg / k R2 / k
F 
SS /(n  k  1) 1  R  /(n  k  1)
2
 Decision res
Rule: Compared to an F-distribution
with k, (n-k-1) degrees of freedom.
 If H0 is rejected, one or more slopes are not zero.
Additional tests are needed to determine which
slopes are significant.
33 34
2. Significance of specific partial Applying this to example

regression coefficients.
 H0: βi=0 Coefficientsa
 HA: βi≠0 Unstandardized Standardized
Coefficients Coefficients
 Test Statistic:
ˆ  i î Model B Std. Error Beta t Sig.
t i 
sˆ sˆ 1 ((Constant)) .337 .567 .595 .567
i i
duration .481 .059 .764 8.160 .000
 Decision Rule: Compared to a t-distribution with (n-k-1) quality .289 .086 .314 3.353 .008
degrees of freedom (i.e. residual d.f.) a. Dependent Variable: attitude
 If H0 is rejected, the slope of the ith variable is
 Once the quality of infrastructure is considered, the
significantly different from zero. That is, once the other
duration of residence still has a significant linear
variables are considered, the ith predictor has a
relationship with the attitude to a city.
significant linear relationship with the response.
36
35
6
Check residuals Error terms normally distributed
 Can be checked by looking at a histogram
 Assumptions made: of the residuals - look for bell-shaped
 Error terms normally distributed
distribution.
 Error terms have mean 0, constant variance
 Error terms are independent  Also normal probability plot – look for
 Definition: A residual (also called error term) is straight line.
the difference between the observed response  For preference, use standardised residuals
value Yi, and the value predicted by the
– have a std dev of 1.
regression equation, Yî
 (Vertical distance between point and line.)
37 38
Error terms have mean 0, constant variance Error terms are independent
 Check in previous plots; also in residuals
 Checked by using plots of residuals vs vs time/order.
predicted values; residuals vs independent  Look for random scatter of residuals.
variables.
variables
 Look for random scatter of points around
zero.
 If not, may indicate linear regression is not
appropriate – may need to transform data
39 40
Example
Residual Plots for Attitude Towards City
Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
99 2
Standardized Residual
90
1
Percent
50 0
10 -1
1 -2
2
S
-2 -1 0 1 2 2 4 6 8 10
Standardized Residual Fitted Value
Histogram of the Residuals Residuals Versus the Order of the Data

3 2
Standardized Residual
1
Frequency
2
0
1
-1
0 -2
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 1 2 3 4 5 6 7 8 9 10 11 12
Standardized Residual Observation Order
41
7
8/13/2012
Section 6 Outline
 Overview of time series;
 Stationarity
hypothesis
Introduction to Time Series  Autoregressive
g processes;
p ;
 Determining process order;
Reading materials:
Chap 20 (Keller)
1 2
Different Types of Data

 Cross sectional data: You observe each member in your sample
Overview of Time Series
ONCE (usually but not ne cessarily at the same time).
◦ Examples of Cross Sectional Data:
 Observing the heights and weights of 1000 people.  Time series is time-ordered data.
 Observing the income, education, and experience of 1000 people.
 Observing the per-capita GDP, population, and real defence ◦ We will assume that the observations are made at equally
spending of 80 nations. spaced time intervals. This assumption enables us to use the
 Time series: You observe each variable once per time period for a interval between two successive observations as the unit of
number of periods. time.
time
◦ Examples of Time Series
 Observing U.S. inflation and unemployment from 1961-1995.  The total number of observations in a time series is
 Observing the profitability of one firm over 20 years.
 Observing the daily closing price of gold over 30 years.
called the length of the time series (or the length of
 Pooled time series (="panel data"): You observe each member in the data).
your sample once per time period for a number of periods. ◦ More Examples of Time Series:
◦ Examples of Pooled Time Series (="panel data")
 Observing the output and prices of 100 industries over 12 quarters.  Daily closing stock prices; and,
 Observing the profitability of 20 firms over 20 years.
 Observing the annual rate of return of 300 mutual funds over the 1960-1997 period.  Monthly unemployment figures.
3 4
Overview of Time Series Overview of Time Series

 Univariate time series models:  Think of a time series as a random or
◦ Model and predict financial variables using only information stochastic~random process.
contained in their own past values and possibly current and ◦ Do not know the outcome until the experiment is
past values of an error term. implemented
 Virtually any quantity recorded over time yields a  The closing value of next trading day of Dow Jones Index.
 The
Th annuall output
t t growth
th off Malaysia
M l i nextt year.
time series. To "visualize" a time series we plot our
observations as a function of the time. This is called a ◦ When collecting a time series data set, we get one possible
outcome under a certain number of conditions. Changing
time plot. conditions => get different set of outcomes (different cross-
sectional samples from a population)
5 6
1
8/13/2012
Stationary time series Stationary time series

 Recall Gauss-Markov assumptions for  Strict stationary: A TS is stationary if the joint
OLS estimation of cross sectional data probability distribution of any set of times is not
affected by an arbitrary shift along the time axis.
◦ Error terms are normally distributed. If not,
apply LLN and CLT  More clearly: the joint distribution of ( yt , yt ,..., yt ) 1 2 m
i the
is th same as the
th joint
j i t distribution
di t ib ti off
 LLN and CLT hold for TS if the process ( yt1 h , yt2 h ,..., ytm h )
satisfies stationary conditions
 Weak or covariance stationary if covariances
b/w y t and y for any h do not depend upon
t  h
t.
7 8
Covariance stationary Autoregressive Processes

 Then:
EYt       An autoregressive model is one where the current value of a
variable, y, depends only upon the values that the variable took
V Yt   E(Yt  )2   0   in previous periods, plus an error term.
covYt ,Yt k   E(Yt  )(Yt k  )   k , k  1,2,3,...  For example, a first-order process as is where y is influenced by
1 lag.
lag This is known as an AR(1) model,model or an autoregressive
 A t
Autocorrelation: l ti t d di i  k gives
standardising i model of order 1. This is formalised below:
autocorrelation  k as :
yt     1 yt  1  ut
cov( yt , ytk )  k  In general, an autoregressive model of order p, denoted AR(p) is
k  
V  yt  0 expressed as:
 Which measures dependency among yt     1 yt 1   1 yt  2   1 yt 3  ...   1 yt  p  ut
observations or number of lags
9 10
Autoregressive Processes Determining Process Order

 What does an AR(1) look like?
 What does a white noise process look 1. Autocorrelation Function (ACF)
like? ◦ ACF measures the correlation between the current
observation and the k’th lag.
 How can the lag order be determined?  i.e. the correlation between yt and ytt-kk.
1. ACF; ◦ For an AR process the ACF can decay slowly or rapidly,
but it will decay geometrically to zero.
2. PACF;
3. AIC and SIC criteria; and,
4. White noise residuals.
12
11
2
8/13/2012
Determining Process Order Determining Process Order
Autocorrelation Function for ASX ALL ORDINARIES - PRICE IN

2. Partial Autocorrelation (PACF)
(with 5% significance limits for the autocorrelations)
◦ PACF measures the correlation between the observation k
1.0
0.8
periods ago and the current observation, after controlling for
0.6
observations at intermediate lags (i.e. all lags < k).
04
0.4  For example,
example the PACF for lag 3 would measure the
Autocorrelation
n
0.2
correlation between yt and yt-3, after controlling for the
0.0
-0.2
effects of yt-1 and yt-2.
-0.4  Note: at lag 1, the autocorrelation and partial
-0.6
autocorrelation coefficients are equal, since there are no
-0.8
-1.0
intermediate lag effects to eliminate.
1 10 20 30 40 50 60 70 80
Lag
14
13
Determining Process Order Determining Process Order

3. AIC and SIC criterion
◦ Akaike (AIC) and Schwarz information criterion (SIC)
Partial Autocorrelation Function for ASX ALL ORDINARIES - PRICE IN
(with 5% significance limits for the partial autocorrelations)
1.0
0.8
RSS
0.6
A IC  e 2k / n k = lag order
Partial Autocorrelation
04
0.4
0.2
n n = # obs
0.0 RSS
-0.2
◦
IC  n k
S Technique: / n
-0.4 n
-0.6 1. Fit model with k lags. Calculate AIC and SIC;
-0.8
2. Fit another model with k+1 or k-1 lags; and,
-1.0
1 10 20 30 40 50 60 70 80
3. Best model will have lowest AIC/SIC.
Lag
15 16
Determining Process Order

4. White noise approach
◦ Recall that yt is autocorrelated.
◦ If we have fitted the correct number of lags, (to take into
account the autocorrelation), then there should be none left
in the residuals.
◦ That is the residuals are white noise
◦ White noise properties;
 Homoscedastic;
 Constant mean; and,
 No autocorrelation.
17

Statistics Lecture PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Lecture PDF

Uploaded by

Copyright:

Available Formats

Introduction to the Course

 Teaching staff: Dr. Tran Thi Bich

Why is statistics important?

Types of data Qualitative data

Quantitative data Activity 1

Moving from population to sample Reasons to take sample

 A census can give accurate data but collecting information

Types of sample Getting acquainted with SPSS

 Quasi random sampling

Why do we have to summarise data Tables: frequency distribution

Simple frequency distribution Simple frequency table: example 1

 Applications: Marks Number of students (frequency)

 Example of discrete variable with few values 6 2

• You are given a raw data of midterm marks of 20 students 7 4

 Go to ‘Analyse’ => ‘Tables’ => ‘Tables of frequency’ Philippines 6

Grouped frequency table: continuous variable

Charts Bar and pie charts

Bar charts: example of UNSW Pie charts: example of UNSW

Number of inter. students at UNSW Percentage of inter.students at UNSW

Shapes of histograms – positive skew (long

Histogram of Positive skew

Shapes of histograms – negative skew (long

Histogram of Negative skew Histogram of Bimodal

 Modal class – class with highest number of

Ogive How to draw an ogive

Class Frequency Cumulative 350

Measures of location (central tendency) Arithmetic mean

 Three most useful measures of location:  Arithmetic mean from sample:

 Median Where: Xi, xi - the value of each item

Advantages and disadvantages of arithmetic mean

Example Advantages and disadvantages of median

2. Identify the mode as the most frequent value 17 8

Bimodal (two modes) Multimodal (several modes)

Which measure of centre is best? Measures of dispersion

Measures of dispersion Why do we need measures of dispersion?

 Two data sets of midterm marks of 5 students:

 Need to know the spread of other values around the

 Advantages of Range: easy to calculate for  Advantages:

Standard deviation ( ) Application of this in finance

Example – 2 funds over 10 years Chebyshev’s law or the law of 3

x A  16% xB  12% ( x  2s)  ( x  2s)

Shapes of Boxplots Coefficient of variation (C of V)

 Standard deviation can compare dispersion of two distributions with

 Which one is more spread out?

Symmetric Positive skew Negative skew Bimodal

sets and describe their shapes of distribution

Measure correlation between two variables Covariance

Values of covariance Values of covariance

Scatterplot of Zero vs X values

 If r=0, no LINEAR relationship 1 7 -2.5 6.25 3 9 -7.5

Covariance cont’d Summary

 Techniques for summarizing data

 Prob is a crucial component to obtain

Terminology (1) Terminology (2)

 The probability assigned to each simple  Consider two events, A and B.

P(A and B) = P(A ∩ B) = P(A intersection with B)

P(A|B)=P(A occurs given that B has occurred)

B2 = Mutual fund Probabilities B1 B2

Marginal Probabilities (2) Conditional probability

Activity 1 Random Variables

Notation Discrete vs continuous R.V.s

Activity 2 Describing the Probability Distribution

o What is the probability of at most one  Expected value or mean of a discrete