12 views

Uploaded by Anjali Shah

Research paper on regression & hypothesis testing

- Econometrics Unit 2
- Chapter 09 W12 L1 Multiple Regression Analysis 2015 UTP C10.pdf
- fuzzy logic
- Multiple Regression Diagnostics Collection
- Final 0708 sem 2 answer
- 2014 Statistical Officer Paper
- STA302_Mid_2012S
- Competitive Priorities
- Case Regression Analysis
- 54199223 a Report on Comparative Study of Home Loan Amp Customer Satisfaction
- Sreeram_1.pdf
- Questions on ANOVA Correln and Regn
- IJEAIS171118
- Multiple Regression
- SPSS Regression Output
- Cost Behavior_CH5_session_3_B (1).ppt
- AP Statistics 1st Semester Study Guide
- Session-Correlation and Regression
- Regression.pdf
- Problem Set 2

You are on page 1of 24

Regression

Goals

Linear regression in R

Estimating parameters and hypothesis testing

with linear models

Develop basic concepts of linear regression from

a probabilistic framework

Regression

Technique used for the modeling and analysis of

numerical data

Exploits the relationship between two or more

variables so that we can gain information about one of

them through knowing values of the other

Regression can be used for prediction, estimation,

hypothesis testing, and modeling causal relationships

Regression Lingo

Y = X

1

+ X

2

+ X

3

Dependent Variable

Outcome Variable

Response Variable

Independent Variable

Predictor Variable

Explanatory Variable

Why Linear Regression?

Suppose we want to model the dependent variable Y in terms

of three predictors, X

1

, X

2

, X

3

Y = f(X

1

, X

2

, X

3

)

Typically will not have enough data to try and directly

estimate f

Therefore, we usually have to assume that it has some

restricted form, such as linear

Y = X

1

+ X

2

+ X

3

Linear Regression is a Probabilistic Model

Much of mathematics is devoted to studying variables

that are deterministically related to one another

!

y = "

0

+ "

1

x

!

"

0

!

y

!

x

!

"

1

=

#y

#x

!

"y

!

"x

But were interested in understanding the relationship

between variables related in a nondeterministic fashion

A Linear Probabilistic Model

!

"

0

!

y

!

x

!

y = "

0

+ "

1

x + #

Denition: There exists parameters , , and , such that for

any xed value of the independent variable x, the dependent

variable is related to x through the model equation

!

"

0

!

"

1

!

"

2

!

" is a rv assumed to be N(0, #

2

)

!

y = "

0

+ "

1

x

True Regression Line

!

"

1

!

"

2

!

"

3

Implications

The expected value of Y is a linear function of X, but for xed

x, the variable Y differs from its expected value by a random

amount

Formally, let x* denote a particular value of the independent

variable x, then our linear probabilistic model says:

!

E(Y | x*) =

Y| x*

= mean value of Y when x is x *

!

V(Y | x*) = "

Y| x*

2

= variance of Y when x is x *

Graphical Interpretation

!

y = "

0

+ "

1

x

!

"

0

+ "

1

x

1

!

"

0

+ "

1

x

2

!

x

1

!

x

2

!

y

!

x

For example, if x = height and y = weight then is the average

weight for all individuals 60 inches tall in the population

!

Y| x

1

=

!

Y| x

2

=

!

Y| x=60

One More Example

Suppose the relationship between the independent variable height

(x) and dependent variable weight (y) is described by a simple

linear regression model with true regression line

y = 7.5 + 0.5x and

Q2: If x = 20 what is the expected value of Y?

!

Y| x=20

= 7.5 + 0.5(20) = 17.5

Q3: If x = 20 what is P(Y > 22)?

Q1: What is the interpretation of = 0.5?

!

"

1

The expected change in height associated with a 1-unit increase

in weight !

" = 3

!

P(Y > 22 | x = 20) = P

22 -17.5

3

"

#

$

%

&

'

=1( )(1.5) = 0.067

Estimating Model Parameters

Point estimates of and are obtained by the principle of least

squares

!

"

0

!

"

1

!

f ("

0

,"

1

) = y

i

#("

0

+ "

1

x

i

)

[ ]

i=1

n

$

2

!

"

0

!

y

!

x

!

"

0

= y #

"

1

x

Predicted and Residual Values

Predicted, or tted, values are values of y predicted by the least-

squares regression line obtained by plugging in x

1

,x

2

,,x

n

into the

estimated regression line

!

y

1

=

"

0

#

"

1

x

1

!

y

2

=

"

0

#

"

1

x

2

Residuals are the deviations of observed and predicted values

!

e

1

= y

1

"

y

1

e

2

= y

2

"

y

2

!

y

!

x

!

e

1

!

e

2

!

e

3

!

y

1

!

y

1

Residuals Are Useful!

!

SSE = (e

i

i=1

n

"

)

2

= (y

i

i=1

n

"

#

y

i

)

2

They allow us to calculate the error sum of squares (SSE):

Which in turn allows us to estimate :

!

"

2

!

"

2

=

SSE

n #2

As well as an important statistic referred to as the coefcient of

determination:

!

r

2

=1"

SSE

SST

!

SST = (y

i

" y )

2

i=1

n

#

Multiple Linear Regression

Extension of the simple linear regression model to two or

more independent variables

!

y = "

0

+ "

1

x

1

+ "

2

x

2

+ ... + "

n

x

n

+#

Partial Regression Coefcients: !

i

! effect on the

dependent variable when increasing the i

th

independent

variable by 1 unit, holding all other predictors

constant

Expression = Baseline + Age + Tissue + Sex + Error

Categorical Independent Variables

Qualitative variables are easily incorporated in regression

framework through dummy variables

Simple example: sex can be coded as 0/1

What if my categorical variable contains three levels:

x

i

=

0 if AA

1 if AG

2 if GG

Categorical Independent Variables

Previous coding would result in colinearity

Solution is to set up a series of dummy variable. In general

for k levels you need k-1 dummy variables

x

1

=

1 if AA

0 otherwise

x

2

=

1 if AG

0 otherwise

AA

AG

GG

x

1

x

2

1

1

0

0

0 0

Hypothesis Testing: Model Utility Test (or

Omnibus Test)

The rst thing we want to know after tting a model is whether

any of the independent variables (Xs) are signicantly related to

the dependent variable (Y):

!

H

0

: "

1

= "

2

= ... = "

k

= 0

H

A

: At least one "

1

# 0

f =

R

2

(1$ R

2

)

k

n $(k +1)

!

Rejection Region : F

",k,n#(k+1)

Equivalent ANOVA Formulation of Omnibus Test

We can also frame this in our now familiar ANOVA framework

- partition total variation into two components: SSE (unexplained

variation) and SSR (variation explained by linear model)

Equivalent ANOVA Formulation of Omnibus Test

We can also frame this in our now familiar ANOVA framework

!

Rejection Region : F

",k,n#(k+1)

- partition total variation into two components: SSE (unexplained

variation) and SSR (variation explained by linear model)

n-1 Total

n-2 Error

k Regression

F MS Sum of Squares df Source of

Variation

!

SSR

k

!

SSE

n "2

!

MS

R

MS

E

!

SSR = (

y

i

" y )

2

#

!

SSE = (y

i

"

y

i

)

2

#

!

SST = (y

i

" y )

2

#

F Test For Subsets of Independent Variables

A powerful tool in multiple regression analyses is the ability to

compare two models

For instance say we want to compare:

!

Full Model : y = "

0

+ "

1

x

1

+ "

2

x

2

+ "

3

x

3

+ "

4

x

4

+#

!

Reduced Model : y = "

0

+ "

1

x

1

+ "

2

x

2

+#

!

f =

(SSE

R

" SSE

F

) /(k " l)

SSE

F

/([n "(k +1)]

Again, another example of ANOVA:

SSE

R

= error sum of squares for

reduced model with predictors

!

l

SSE

F

= error sum of squares for

full model with k predictors

Example of Model Comparison

We have a quantitative trait and want to test the effects at two

markers, M1 and M2.

!

f =

(SSE

R

" SSE

F

) /(3"2)

SSE

F

/([100 "(3+1)]

=

(SSE

R

" SSE

F

)

SSE

F

/96

Full Model: Trait = Mean + M1 + M2 + (M1*M2) + error

Reduced Model: Trait = Mean + M1 + M2 + error

!

Rejection Region : F

a, 1, 96

Hypothesis Tests of Individual Regression

Coefficients

Hypothesis tests for each can be done by simple t-tests:

!

"

i

!

H

0

:

"

i

= 0

H

A

:

"

i

# 0

T =

"

i

$"

i

se("

i

)

Condence Intervals are equally easy to obtain:

!

"

i

t

# / 2,n$(k$1)

se(

"

i

)

!

Critical value : t

" / 2,n#(k#1)

Checking Assumptions

Critically important to examine data and check assumptions

underlying the regression model

! Outliers

! Normality

! Constant variance

! Independence among residuals

Standard diagnostic plots include:

! scatter plots of y versus x

i

(outliers)

! qq plot of residuals (normality)

! residuals versus tted values (independence, constant variance)

! residuals versus x

i

(outliers, constant variance)

Well explore diagnostic plots in more detail in R

Fixed -vs- Random Effects Models

In ANOVA and Regression analyses our independent variables can

be treated as Fixed or Random

Fixed Effects: variables whose levels are either sampled

exhaustively or are the only ones considered relevant to the

experimenter

Random Effects: variables whose levels are randomly sampled

from a large population of levels

Expression = Baseline + Population + Individual + Error

Example from our recent AJHG paper:

- Econometrics Unit 2Uploaded byHasnat Khan
- Chapter 09 W12 L1 Multiple Regression Analysis 2015 UTP C10.pdfUploaded byJesse Oh
- fuzzy logicUploaded byDoğan Bıçakcı
- Multiple Regression Diagnostics CollectionUploaded byJuan Jose Torres
- Final 0708 sem 2 answerUploaded byviknesh5555
- 2014 Statistical Officer PaperUploaded byRameez Abbasi
- STA302_Mid_2012SUploaded byexamkiller
- Competitive PrioritiesUploaded bysaraaqil
- Case Regression AnalysisUploaded byvishalLala
- 54199223 a Report on Comparative Study of Home Loan Amp Customer SatisfactionUploaded bySunil Soni
- Sreeram_1.pdfUploaded bySreeram Venkataraman
- Questions on ANOVA Correln and RegnUploaded byIrshad Shah
- IJEAIS171118Uploaded byNandhitha Ram
- Multiple RegressionUploaded byruthpalupi
- SPSS Regression OutputUploaded bydivyavartika
- Cost Behavior_CH5_session_3_B (1).pptUploaded byRPhArsalan
- AP Statistics 1st Semester Study GuideUploaded bySusan Huynh
- Session-Correlation and RegressionUploaded byDhivya Sivanantham
- Regression.pdfUploaded byJhuma Debnath
- Problem Set 2Uploaded byRaga
- Praktkum Analisis Regresi 16 March 17Uploaded byNezarAbdilahPrakasa
- 5F16E004d01Uploaded byMai Hậu
- Linking Rural Road Environment, Speed and Safety Factors With a ʻTwo-Stageʼ Model a Feasibility Study 16-5048_revisedUploaded byramadan1978
- Vol 5 _2_ - Cont. J. Soc. Sci. 22-31_2Uploaded byharuna1
- AlgorithmsUploaded byTimotius Danny
- 10 Mws Gen Reg Ppt LinearUploaded byMDjMalik
- Uts AplikomUploaded byYahdi Furqon Busnia
- Uji AutocorrelationUploaded bytripramono
- leastsquareUploaded bycokbin
- Agren CNPinPlants EcLet04 CopyUploaded byEsteban Klop

- Mis PPTUploaded byAnjali Shah
- Ch 1Uploaded byAnjali Shah
- Decision MakingUploaded byAnjali Shah
- Final ResearchUploaded byAnjali Shah
- PMUploaded byYaser Mohamed
- Diffusion of InnovationUploaded byAnjali Shah
- omUploaded byroyalviren
- Annual Report,apollo tyresUploaded byAnjali Shah
- Annual Report 2009-10 Part3,apoll tyresUploaded byAnjali Shah
- Omr Paper IIIUploaded byRavinder Singh
- Nature and ScopeUploaded byAnjali Shah
- managementUploaded byAnjali Shah
- 071011 Apollo Annual Report Fy11Uploaded bygardianjoe86
- Shipping,Research,port managementUploaded byAnjali Shah
- Shipping,Research,port managementUploaded byAnjali Shah
- jun_9_mbaUploaded byVinodh Veluswamy
- consumer behaviourUploaded byAnjali Shah
- Uplifting earthUploaded byAnjali Shah
- GemsandJewellery_sectoralUploaded byAnkit Nindra

- 362assn7-solnsUploaded byKasih Pandu
- Introduction to Survival Analysis With Stata SeminarUploaded byhubik38
- SPC basicUploaded byamy
- Midterm QMBUploaded byAnh Bui
- 3640001.pdfUploaded byHiren Chauhan
- chap3-1-forecasting (1)Uploaded byvundavilliravindra
- linear-correlation-1205885176993532-3Uploaded byAnonymous 7Un6mnqJzN
- Chi – SquareUploaded byglemarmonsales
- Capm Indian ContextUploaded byankitrao31
- 11651 2. Assump CLRM Remedial Measures (2)Uploaded byKshitij Tripathi
- orthogonalized regressorsUploaded byceffone
- Standard DeviationUploaded bysababathy
- Ancova No LinealUploaded by290971
- ancova.pdfUploaded byRizal Mustofa
- regression modelsUploaded bysbc
- supplement 5 - multiple regression.pdfUploaded bymonte
- Day1Uploaded byAnonymous BKnJpeyh
- Pr StatistikUploaded byYuda Fraizal
- Stata Cheat SheetsUploaded byameliasoeharman
- Hanley 2003, Orientation to GEEUploaded byHarshoi Krishanna
- Computer Oriented Statistical MethodsUploaded byRavi Kushwaha
- IS4240 - AY1314S2 - Assignment - DM1Uploaded byAnantha Lakshmi
- Kimia komputasiUploaded byDinda Saharaa
- Random regression models in the evaluation of the growth curve of Simbrasil beef cattleUploaded byLuciano Pinheiro da Silva
- coefficient corelation.pdfUploaded byLiza Dwi Wahyuni
- GDP vs RainfallUploaded byscottrolik
- Sampling DistributionsUploaded byGary Omar Pacana
- Buku R StatisticUploaded byYonathanRacing
- HW3 managerial financeUploaded byAndrew Hawkins
- TI guideUploaded byfocus16hoursgmailcom