You are on page 1of 17

MULTIPLE REGRESSION

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

The Multiple Regression Model


Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi)
Multiple Regression Model with k Independent Variables:
Y-intercept

Population slopes

Random Error

Yi 0 1X1i 2 X 2i k X ki i

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

The error term is normally distributed. For each fixed value of X, the distribution of Y is normal. The mean of the error term is 0 and SD should be one . The variance of the error term is constant. This variance does not depend on the values assumed by X. The error terms are uncorrelated. In other words, the observations have been drawn independently.

Assumptions

The regressors themselves.

are

independent

amongst

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

Assumptions
Independent variables should be uncorrelated with residual.. Model should be properly specified. No. of observation should be more than no. of parameters Model is linear in parameters Independent variables are fixed in repeated samples.
Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

Statistics Associated with Multiple Regression


Coefficient of multiple determination.

The strength of association in multiple regression is measured by the square of the multiple correlation coefficient, R2, which is also called the coefficient of multiple determination.
Adjusted R2
R2, coefficient of multiple determination, is adjusted for the number of independent variables and the sample size to account for the diminishing returns. After the first few variables, the additional independent variables do not make much contribution.
Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

Statistics Associated with Multiple Regression


F test Used to test the null hypothesis that the coefficient of multiple determination in the population, R2pop, is zero.

The test statistic has an F distribution with k and (n - k - 1) degrees of freedom.


Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

Statistics Associated with Multiple Regression


Partial regression coefficient. The partial regression coefficient, b1, denotes the change in the predicted value,Y , per unit change i in X1 when the other independent variables, X2 to X k, are held constant.

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

Multiple Regression Output


Regression Statistics Multiple R R Square Adjusted R Square Standard Error 0.72213 0.52148 0.44172 47.46341

Observations

ce) 74.131(Adv ertising) 15Sales 306.526 - 24.975(Pri


SS 2 29460.027 MS 14730.01 3 F 6.53861 Significance F 0.01201

ANOVA Regression

df

Residual
Total

12
14 Coefficien ts

27033.306
56493.333 Standard Error 114.25389 25.96732

2252.776

t Stat 2.68285 2.85478

P-value 0.01993 0.01449

Lower 95% 57.58835

Upper 95% 555.46404

Intercept

306.52619

Price
Advertising

-24.97509
74.13096

Mr. Pranav Ranjan & Ms. Razia Sehdev 10.83213 -2.30565 0.03979 ICTC, LPU

-48.57626
17.55303

-1.37392
130.70888

The Multiple Regression Equation


Sales 306.526- 24.975(Pri ce) 74.131(Adv ertising)
where Sales is in number of pies per week Price is in $ Advertising is in $100s.

b1 = -24.975: sales will decrease, on average, by 24.975 pies per week for each $1 increase in selling price, net of the effects of changes due to advertising
Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

b2 = 74.131: sales will increase, on average, by 74.131 pies per week for each $100 increase in advertising, net of the effects of changes due to price

Using The Equation to Make Predictions


Predict sales for a week in which the selling price is $5.50 and advertising is $350:
Sales 306.526 - 24.975(Pri ce) 74.131(Advertising) 306.526 - 24.975 (5.50) 74.131(3.5) 428.62
Note that Advertising is in $100s, so $350 means that X2 = 3.5

Predicted sales is 428.62 pies


Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

Multiple Coefficient of Determination


Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.72213 0.52148

(continued)

SSR 29460.0 r .52148 SST 56493.3


2

0.44172
47.46341 15

52.1% of the variation in pie sales is explained by the variation in price and advertising
SS MS F Significance F 14730.01 3 6.53861 2252.776

ANOVA

df

Regression
Residual Total

2
12 14 Coefficien ts

29460.027
27033.306 56493.333 Standard Error 114.25389 25.96732

0.01201

t Stat 2.68285 2.85478

P-value 0.01993 0.01449

Lower 95% 57.58835

Upper 95% 555.46404

Intercept

306.52619

Price
Advertising

-24.97509
74.13096

Mr. Pranav Ranjan & Ms. Razia Sehdev 10.83213 -2.30565 0.03979 ICTC, LPU

-48.57626
17.55303

-1.37392
130.70888

Adjusted
Regression Statistics

2 r

(continued)

Multiple R
R Square Adjusted R Square Standard Error Observations

0.72213
0.52148

2 adj

.44172

0.44172
47.46341 15

44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables
SS MS 14730.01 3 2252.776 F 6.53861 Significance F 0.01201

ANOVA Regression Residual

df 2 12

29460.027 27033.306

Total

14
Coefficien ts

56493.333
Standard Error 114.25389 Upper 95% 555.46404 -1.37392

t Stat 2.68285

P-value 0.01993

Lower 95% 57.58835 -48.57626

Intercept Price

306.52619 -24.97509

Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU 10.83213 -2.30565 0.03979

F Test for Overall Significance


Regression Statistics Multiple R R Square Adjusted R Square Standard Error 0.72213 0.52148 0.44172 47.46341

(continued)

Observations

15

MSR 14730.0 F 6.5386 MSE 2252.8


With 2 and 12 degrees of freedom
SS MS 14730.01 3 2252.776 F 6.53861 Significance P-value for the F Test F 0.01201

ANOVA Regression Residual Total

df 2 12 14 Coefficien ts

29460.027 27033.306 56493.333 Standard Error 114.25389 10.83213

t Stat 2.68285 -2.30565

P-value 0.01993 0.03979

Lower 95% 57.58835 -48.57626 17.55303

Upper 95% 555.46404 -1.37392 130.70888

Intercept Price Advertising

306.52619 -24.97509

74.13096 Mr. Pranav 25.96732 0.01449 Ranjan & Ms. 2.85478 Razia Sehdev
ICTC, LPU

Are Individual Variables Significant?


Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.72213 0.52148 0.44172 47.46341 15

(continued) t-value for Price is t = -2.306, with p-value .0398 t-value for Advertising is t = 2.855, with p-value .0145
SS MS 14730.01 3 2252.776 F Significance F

ANOVA

df

Regression
Residual Total

2
12 14 Coefficien ts

29460.027
27033.306 56493.333 Standard Error

6.53861

0.01201

t Stat

P-value

Lower 95%

Upper 95%

Intercept
Price Advertising

306.52619
-24.97509 74.13096

114.25389
10.83213

2.68285
-2.30565

0.01993
0.03979

57.58835
-48.57626 17.55303

555.46404
-1.37392 130.70888

Mr. Pranav Ranjan & Ms. Razia Sehdev 25.96732 2.85478 0.01449 ICTC, LPU

Multicollinearity
Multicollinearity arises when intercorrelations among the predictors are very high. Result in several problems, including: The partial regression coefficients may not be estimated precisely. The standard errors are likely to be high. The magnitudes as well as the signs of the partial regression coefficients may change from sample to sample. It becomes difficult to assess the relative importance of the independent variables in explaining the variation in the dependent variable. Predictor variables may be incorrectly included or removed in stepwise regression. Mr. Pranav Ranjan & Ms. Razia Sehdev
ICTC, LPU

Multicollinearity
A simple procedure for adjusting for multicollinearity consists of using only one of the variables in a highly correlated set of variables.
Alternatively, the set of independent variables can be transformed into a new set of predictors that are mutually independent by using techniques such as principal components analysis. More specialized techniques, such as ridge regression and latent root regression, can also be used.
Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

Multicollinearity Diagnostics:

Variance Inflation Factor (VIF) measures how much the variance of the regression coefficients is inflated by multicollinearity problems. If VIF equals 0, there is no correlation between the independent measures. A VIF measure of 1 is an indication of some association between predictor variables, but generally not enough to cause problems. A maximum acceptable VIF value would be 10; anything higher would indicate a problem with multicollinearity.

Tolerance the amount of variance in an independent variable that is not explained by the other independent variables. If the other variables explain a lot of the variance of a particular independent variable we have a problem with multicollinearity. Thus, small values for tolerance indicate problems of multicollinearity. The minimum cutoff value for tolerance is typically .10. That is, the tolerance value must be smaller than .10 to indicate a problem of multicollinearity.
Mr. Pranav Ranjan & Ms. Razia Sehdev ICTC, LPU

You might also like