Simple LR Lecture

Simple Linear Regression
Simple Linear Regression

Regression is a statistical method that attempts to
represent the relationship between two variables
by approximating this relationship by a straight
line
Since all relationships are not linear (straight line) in
fashion, simple LR only works well for bivariate data
that has a linear relationship
Regression analysis develops an linear equation
showing how the two variable are related
Requires a slope (b0) and Y-intercept (b1)
Computing Regression Line

6
3
data
data
0
-1
-1
0
-1
-1
time
time
Computing Regression Line

6
e4
4
data
e3
e2
e1
0
-1
-1
time
Computing Regression Line:

Least Squares Line
The least square line is the straight line that
best passes through the points of a scatter
diagram
The least squares line is the line through
the data that minimizes the sum of the
differences between the observations and
the line (these differences are commonly
called as residuals).
e2 = e12 + e22 + e32 + + en2
Sum of Squares of Error

How can we determine the sum of squares of
error?
By using the following formulas, remember, the
regression line minimizes this value (SSE)
n
i 1
SSE ( yi y i ) 2
i 1
y i b0 b1xi
Y-hat is the y value from regression line

Y is the actual observed y value
Least Squares Formula

n
xi x yi y
n xi yi xi yi
S xy
i 1
i 1
i 1 i 1
b1
2
2
S xx
n 2 n
n
n xi xi
xi x
i 1
i 1
i 1
b0 y b1 x
Straight-Line Relationship
Y = b0 + b1X
b0 represents the Y-intercept which is
the value of Y if X = 0.
b1 is the slope of the line which is the
amount of change in Y for a unit
change in X.
Assumptions of Simple LR
PDF of Y
at x=3
PDF of Y
at x=1
PDF of Y
at x=4
PDF of Y
at x=2
Y/X=4
Y/X=3
Y/X=2
Y/X=1
Note
E ( y | x) 0 1x
y 0 1x
y b0 b1 x
y b0 b1x e
Deterministic model representation for mean of Y given X but

not for actual y value. In any case, the derived LR equation,
as long as proven stable and reliable could be used to predict
either the average y-value or the actual y-value.
Assumptions of Simple LR
E ( / X ) 0
Zero mean
Var ( / X ) 2
Constant,
homogenous/homoscedastic
variance
is normally distributed
Value of associated with any particular value of Y is
independent of associated with any other value of Y. As if
errors come from a random sample.
Y1 0 1x1 1
Y2 0 1x2 2
1 and 2 are independent
Another Note
/ X ~ N (0, ) Y / X ~ N (0, )
An unbiased estimate of 2 is:
S yy b1S xy
SSE
n2
n2
n
S yy ( yi y )
i 1
Inferences on Regression Coefficients

On 1 using b1:
Using confidence interval:
b1
Using hypothesis testing:
On 0 using b0:
Using confidence interval:
b0
Using hypothesis testing:
t / 2 s
S xx
b 1
s/
S xx
t / 2 s
xi
i 1
nS xx
b0 0
n
xi
i 1
nS xx
v=n-2 for all

inferences
Hypothesis Test on the

Slope of the Regression Line
How do we know that a significant linear
relationship exists using regression?
The slope, b1, will give us an indication.

Therefore, if the slope of the least square line is
zero, there is no linear relationship. However, if the
slope of the least squares line is significantly greater
than 0 or is significantly less than 0, then we can
conclude a linear relationship exists
Therefore, we want to test the following hypothesis:
Ho : 1 = 0 (X provides no information)
Ha : 1 0 (X does provide information)

Ho : 1 = 0
Ha : 1 0
Ho : 1 0
Ha : 1 > 0
Ho : 1 0
Ha : 1 < 0
b1
t
sb1
b1
t
sb1
b1
t
sb1
Reject Ho if t*> t, n-2
Reject Ho if t* < -t, n-2
Reject Ho if |t*| > t/2, n-2
s
anddf n - 2
Note: sb1
S xx
Prediction
On confidence interval on mean response y/X0:
2
y 0 t / 2 s
x0 x
S xx
On prediction interval on single response y0:
1
y 0 t / 2 s 1
n
x0 x
S xx
Example: Salary and Experience

00
0
,
$55 has
s
arn and ience
e
ry ear, per
a
M r y f ex
pe rs o
ea
y
20
Salary vs. Years Experience
Experience
15
10
20
5
15
5
Salary
30
35
55
22
40
27
Salary ($thousand)
For n = 6 employees
Linear (straight line) relationship
Increasing relationship
higher salary generally goes with higher experience
Correlation r = 0.8667
60
50
40
30
20
0
10
20 Experience

Summarizes bivariate data: Predicts Y from X
with smallest errors (in vertical direction, for Y axis)
Intercept is 15.32 salary (at 0 years of experience)
Slope is 1.673 salary (for each additional year of experience, on average)
Salary (Y)
60
50
ry
a
l
a
S
40
30
.32
5
1
=
E
3
7
6
+ 1.
Y = b0 + b1X
20
10
0
10
20
ce
n
e
i
r
xp e
Experience (X)
Predicted Values and Residuals

Predicted Value comes from the prediction equation
y = b0 + b1X = 15.32 + 1.673X
For example, Mary (with 20 years of experience) has a
predicted salary = 15.32 + 1.673(20) = 48.8
So does anyone with 20 years of experience
Residual is the actual Y minus predicted Y (Y )

Marys residual is 55 48.8 = 6.2
She earns about $6,200 more than the predicted salary for
a person with 20 years of experience
A person who earns less than predicted will have a
negative residual
Predicted and Residual (continued)

Marys residual is 6.2 (55 48.8)
60
Mary earns 55 thousand
50
Marys predicted value is 48.8
Salary
40
30
20
10
0
10
Experience
20
Simple Linear Regression Model

When we use a straight line to predict
parameters, we use a statistical model in the
form:
Y 0 1 X e
Assumed line about which

all values of X and Y will
fall
Error: contains all other

variability not explained
by the independent
variable (X)
Note: 0 and 1 refer to the straight line for the

population, we will be using sample data and will use b0
and b1 to refer to the straight line for the sample
Error Variance
The measures most commonly used to measure
how well a line fits through a set of points is to
use the error variance and error standard deviation
SSE
s
n2
2
SSE
s
n2
What is s?
Measure of the variation of the Y values around the
least squares line
Average distance of prediction from actual
Average size of residuals
Standard deviation of residuals
Error Variance
Interpretation: similar to standard deviation
Can move least-squares line up and down by s
About 68% of the data are within one standard error of
estimate of the least-squares line
(For a bivariate normal distribution)
60
Salary
50
u
q
s
st
a
e
(L
40
30
20
0
s
a re
li
+
ne)
S
t- s
s
a
(L e
10 Experience 20
re
a
u
q
)
line

Regression and Prediction Error
Predicting Y as not using regression)
Errors are approximately SY = 11.686
Predicting Y as b0 + b1X (using regression)

Errors are approximately S = 6.52
Errors are smaller when regression is used!
This is often the true payoff for using regression
Measuring the Strength of the Model

Another item of interest is to determine how
well the regression model fits the data
To determine this, we use the coefficient of
determination (r2) which gives the
percentage of explained variation in the
dependent variable using the model.
Coefficient of Determination
r 2 coefficient of determination
SSE
1
SYY
percentage of explained variation in the dependent
variable using the simple linear regression model
Getting the square root of the coefficient of
determination gives the correlation coefficient r.
Correlation Coefficient
The sample correlation coefficient, r,
measures the strength of the linear
relationship that exists within a sample of n
bivariate data.
r b1
S xy
S xx
S yy
S xx S yy
Note: When one compute r by getting the square root of r2, affix
the sign of b1 to the final value.
Interpreting the Correlation

Coefficient
If r = 1 then X and Y have a perfect positive linear
relationship.
If r= -1 then X and Y have a perfect negative linear
relationship.
If r= 0 then X and Y have no linear relationship.
If 0 < r < 1 then X and Y are positively related. The closer
to 1 the stronger the linear relationship.
If 0 > r > -1 then X and Y are negatively related. The
closer to -1 the stronger the linear relationship.
Correlation Coefficient Summary

r ranges from -1.0 to 1.0.
The larger | r | is, the stronger the linear relationship.
R near zero indicates that there is no linear relationship. X
and Y are uncorrelated
The sign of r tells you whether the relationship between X
and Y is a positive or a negative relationship.
The value of r tells you very little about the slope of the
line. Except if the sign of r is positive the slope of the line
is positive and if r is negative then the slope is negative.
Examples: Interpreting Correlation
rxy
=1
A perfect straight line

tilting up to the right
rxy =
X
Y
No overall tilt
No linear relationship?
rxy = 1
A perfect straight line
tilting down to the right
X
Y
X
Y
X
Y
Various Values of rxy
Significance test for the Correlation

Do to sampling error, the value of r may not
reflect the true relationship of the entire
population, especially if the sample is quite small
Therefore, a formal hypothesis test may be needed
Hypothesis tested:
Ho: = 0 (no correlation)
Ha: 0 (correlation exists)
Note: = population correlation coefficient
Hypothesis Test
Another way to test if a linear relationship exist
between the two variables of interest is to use the
relationship between b1 and r (they are closely related)
Ho : 1 = 0 (no linear relationship exists)
Ha : 1 0 ( a linear relationship exists)
t
*
r
1 r 2
n2
This will give

exactly the same
b
value for t* as t * 1
sb1
Reject Ho if |t| > t, n-2
Hypothesis Test Continued

If one desires to carry out the general test:
Ho : = 0
Ha : 0 / > 0 / < 0
One can use:
n 3 (1 r )(1 0 )
z
ln
2
(1 r )(1 0 )
This works on the assumption that both X and Y
follows the bivariate normal distribution.
Exercise Problems Problems
Problem #5, p. 359 (manual)

Problem #6, p. 359 (excel)
Problem #7, p. 359 (excel)
Problem #7, p. 371.
Problems #1 and 2, p.396
Problem #5 p. 380
Check for Model Significance

and Adequacy
The ANOVA approach:
n
( yi y ) ( y i y ) ( yi y i )
i 1
i 1
i 1
S yy bS xy SSE
SST SSR SSE
Similar to testing:
Ho : 1 = 0
Ha : 1 0
Check for Model Significance

and Adequacy
The ANOVA table:
Sources of
Variation
Regression
Error
Total
Sum of
Squares
SSR
SSE
SST
Dof Mean
Comp. F
Square
1
SSR
SSR/s2
n-2 SSE/n-2
n-1
Reject H0 if comp F > F(1,n-2)
Check for Model Significance and Adequacy

If repeated observations are made at several X values
the SSE term shown previously could be further divided
into Error-Lack of Fit and Error-Pure Experimental.
Computational formula:
Yij = the jth value of the random variable Yi
Yi. = Ti. =
Y i.
nj
yij
Ti.
ni
k = no. distinct values of x
ni
si2
( yij y i. )
s2 = pure experimental error
j 1
ni 1
k
ni = no. of observations at xi
j 1
( ni 1) si
i 1
nk
k nj
mean square SSE(pure)
( yij y i. )
i 1 j 1
nk
SS(Lack of Fit)= SSE-SSE(pure)
Check for Model Significance and

Adequacy
The ANOVA table becomes:
Sources of
Variation
Regression
Error
Lack of Fit
Sum of
Squares
SSR
SSE
SSE- SSE(pure)
Pure Error SSE(pure)

Total
SST
Dof
1
n-2
k-2
n-k
n-1
Model significant if Freg > F(1,n-k)

Model adequate if Flack of fit < F(k-2,n-k)
Mean Square
(MS)
SSR
Comp. F
SSR/s
SSE- SSE(pure) MS(lack of fit)

k-2
s2
s2
Model Adequacy
Significant lack of fit means that there is
considerable variation being caused by
higher-ordered terms- these are terms in x
other than the linear or first-order terms.
Illustration given in Figure 11.11 and 11.12
pp. 378-379 of book
Checking Model Assumptions

1.
The errors are normally distributed with a mean of zero.
Construct a normal probability plot (plot of residuals)
2.
If the resulting graph is linear, the normality assumption is verified
Conduct goodness-of-fit test- chi-sqaure, KS, Shapiro-Wilcoxon

Statistical test on kurtosis and skewness
The variance of the error component is the same for each

value of X.
3.
Plot residual against independent variable, X

If no pattern exist, this assumption holds
The errors are independent of each other.
Look for autocorrelation
Plot sample residuals by time time series analysis
Normal Probability Plot

Compares the cumulative distribution of actual data values
with the cumulative distribution of a normal distribution.
(If normal points should fall around the diagonal straight
line.)
Deviations from Normality
Statistical Checking for Normality

Deviations from Normality
Kurtosis refers to the peakedness or
flatness of the distribution. If
normal,kurtosis is zero.
Skewness -deals with the symmetry of the
distribution, a skewed variable is a variable
whose mean is not in the center of the
distribution. If normal, skewness is zero.
Checking for Normality

Statistical Test for Kurtosis
kurtosis
z
24
N
Statistical Test for Skewness

skewness
z
6
N
Checking for Normality

Other Tools
Histogram (good for numerous data points)
Goodness of Fit Tests (good for many data
points- 30 or more; but overly sensitive for
very large sample- 1000 or more)

Equal Variance
A: assumption holds
B: assumption violated

autocorrelation
Autocorrelation exists, violation of errors being independent

of each other
Importance of Assumptions
Normality t-test, F-test.
Homoscedasticity- t-test, F-test, to ensure variance
used in explanation and prediction is distributed
across the range of independent variable value
Absence of Correlated Errors confidence that
prediction errors are independent of the levels at
which one is trying to predict, assurance that no
other systematic variable is affecting the results and
left out of the analysis
On Violation of Assumptions
One violation can be the result of another.
Example: violation of non-normality is
linked to or can be the result of nonconstant variance.
A remedy applied to one can solve another.
Remedy available: data or variable
transformation
Notes on Transformation
Two Purposes:
1. Correct violations of statistical assumptions.
2. Improve correlation between variables.
How to Choose?
1. Theoretical basis nature of data (e.g. sqrt
transform works well with frequency count data, arcsin

transform for proportion data)
2. Trial and Error
Not a magic cure to all violations. Will not

eliminate all violations but could lead to very
significant improvements.
Suggested Transforms for Non-normality
Note: Inverse transform usually works well with flat distributions.
Suggested Transforms for

Heteroscedasticity
If cone opens to the right try inverse
transform
If cone opens to the left try square root
transform
Some General Guidelines on

Transformation
For noticeable effect of transformation, ratio of
variable mean to std. dev < 4.
If transformation can be performed on two
variables (in non-linearity) , select variable with
smallest average to s ratio.
Transformation should be applied to IVs except
for cases of heteroscedasticity. If relationship is
heteroscedastic and non-linear there might be a
need to transform both IV and DV.
Transformation may change the interpretation of
variables. Be careful!
Suggested Transforms for Non-linearity
In any of the given illustrations, transformation could be

carried out on either the independent or dependent variable.
When multiple transformation possibilities are shown, start
with the top method in each quadrant and move downward
until linearity is achieved.
Simple Non-linear Regression by

Linearization
Function
Proper
Form of Simple
Transformation Linear Regression
*
*
Exponential: Y = ln y
Regress y vs x
Y=ex
*
*
*
Power:
Y = log y
Regress y against x
*
X = log x
Y=x
Reciprocal: X* = 1/X
Regress y against x*
Y=+(1/X)
*
8
*
Hyperbolic: Y = 1/Y
Regress y against x
*
x
X = 1/X
Y=
x
Notes on Non-Linear Regression

by Linearization
Model in the transformed variables that has
a proper additive error structure is a result of
a model in the natural variables with a
different type of error structure.
Performance criteria (s2 and R2) for the
transformed model should be based on
values of the residuals in the metric of the
untransformed response.
Sample Problem to be given in class.
Obtaining The Regression Output

To fit a linear regression using Excel
Choose Data Analysis, then Regression
Choose the two data columns for which the
Regression is to be calculated
The Y variable will be on the vertical axis
Click residual plot and normal probability plot
to check assumptions
Statistica could also be used.
Caveats in Simple LR
Linear Model May Be Wrong
Nonlinear? Unequal variability? Clustering?
Predicting Intervention from Experience is Hard

Relationship may become different if you intervene
Intercept May Not Be Meaningful

if there are no data near X = 0
Explaining Y from X vs. Explaining X from Y

Use care in selecting the Y variable to be predicted
Is there a hidden Third Factor?

Use it to predict better with multiple regression

Simple LR Lecture

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple LR Lecture

Uploaded by

Copyright:

Available Formats

Simple Linear Regression

Simple Linear Regression

Computing Regression Line

Computing Regression Line

Computing Regression Line:

Sum of Squares of Error

Y-hat is the y value from regression line

Least Squares Formula

Deterministic model representation for mean of Y given X but

1 and 2 are independent

Inferences on Regression Coefficients

Using hypothesis testing:

Using hypothesis testing:

v=n-2 for all

Hypothesis Test on the

Hypothesis Test on the

Hypothesis Test on the

Reject Ho if t*> t, n-2

Reject Ho if t* < -t, n-2

Reject Ho if |t*| > t/2, n-2

On prediction interval on single response y0:

Example: Salary and Experience

Salary vs. Years Experience

Example: Salary and Experience

Predicted Values and Residuals

Residual is the actual Y minus predicted Y (Y )

Predicted and Residual (continued)

Mary earns 55 thousand

Simple Linear Regression Model

Assumed line about which

Error: contains all other

Note: 0 and 1 refer to the straight line for the

Example: Salary and Experience

Predicting Y as b0 + b1X (using regression)

Measuring the Strength of the Model

Interpreting the Correlation

Correlation Coefficient Summary

Examples: Interpreting Correlation

A perfect straight line

Various Values of rxy

Significance test for the Correlation

This will give

Reject Ho if |t| > t, n-2

Hypothesis Test Continued

Exercise Problems Problems

Problem #5, p. 359 (manual)

Check for Model Significance

Check for Model Significance

Reject H0 if comp F > F(1,n-2)

Check for Model Significance and Adequacy

k = no. distinct values of x

s2 = pure experimental error

mean square SSE(pure)

SS(Lack of Fit)= SSE-SSE(pure)

Check for Model Significance and

Pure Error SSE(pure)

Model significant if Freg > F(1,n-k)

SSE- SSE(pure) MS(lack of fit)

Checking Model Assumptions

The errors are normally distributed with a mean of zero.

Construct a normal probability plot (plot of residuals)

If the resulting graph is linear, the normality assumption is verified