You are on page 1of 16

Multiple Regression and Dummy Variable Analysis

Executive team :Shridara Manisha Dsouza Shwetha Vedaprakash Keshav Murthy Basavaraj

Multiple Regression
Multiple regression is a statistical technique that allows us to predict

someones score on one variable on the basis of their scores on several


other variables.

EXAMPLE:
Suppose we were interested in predicting how much an individual enjoys their job. Variables such as salary, extent of academic qualifications, age, sex, number of years in full-time employment and socioeconomic status might all contribute towards job satisfaction.

Purposes:
Prediction Explanation Theory building

Uses of Multiple Regression


1. You can use this statistical technique when

exploring

linear

relationships

between

the

predictor and criterion variables that is, when the relationship follows a straight line. 2. The criterion variable that you are seeking to predict should be measured on a continuous scale

(such as interval or ratio scale).

Contd.
3. Multiple regression requires a large number of observations. The number of cases (participants) must substantially exceed the number of predictor variables you are using in your regression. The absolute minimum is that you have five times as many participants as predictor variables. A more acceptable ratio is 10:1, but some people argue that

this should be as high as 40:1 for some statistical selection


methods.

Simple vs. Multiple Regression


One dependent variable Y predicted from one independent variable X One regression coefficient r2: proportion of variation in dependent variable Y predictable from X One dependent variable Y predicted from a set of independent variables (X1, X2 .Xk) One regression coefficient for each independent variable R2: proportion of variation in dependent variable Y predictable by set of independent variables (Xs)

Assumptions
Independence: the scores of any particular subject are independent of the scores of all other subjects Normality: in the population, the scores on the dependent variable are normally distributed for each of the possible combinations of the level of the X variables; each of the variables is normally distributed Homoscedasticity: in the population, the variances of the dependent variable for each of the possible combinations of the levels of the X variables are equal. Linearity: In the population, the relation between the dependent variable and the independent variable is linear when all the other independent variables are held constant.

Methods
Multiple regression is an extension of bivariate linear regression. The generalized equation is Y= 0 + 1X1 + 2 X2 + .. + n Xn + where, 0 = a constant , the value of Y when all X values are zero. 1= the slope of regression surface = an error term, normally distributed about a mean of zero.

Multiple Regression with Dummy Variables


Y D ... D e 1 1 k k
Example: Explaining house prices (continued) Regress Y = house price on D1 = driveway dummy and D2 = rec room dummy Four types of houses: Houses with a driveway and a rec room (D1=1, D2=1)

Houses with a driveway but no rec room (D1=1, D2=0)


Houses with a rec room but no driveway (D1=0, D2=1) Houses with no driveway and no rec room (D1=0, D2=0)

Example: Explaining house prices (continued)


Coeff. Inter. D1 D2 47099.1 21159.9 16023.7 St. Error 2837.6 3062.4 2788.6 t Stat 16.60 6.91 5.75 P-value 2.E-50 1.E-11 1.E-08 Lower 95% 41525 15144 10546 Upper 95% 52673 27176 21502

If D1=1 and D2=1, then Y 47,099 21,160 16,024 84,283 The average price of houses with a driveway and rec room is $84,283.
1 2

Example: Explaining house prices (continued)


If D1 = 1 and D2=0, then

Y 47,099 21,160 68,259


1

The average price of houses with a driveway but no rec room is $68,259.

If D1=0 and D2=1, then

Y 47,099 16,024 63,123


2

The average price of houses with a rec room but no driveway is $63,123.

If D1=0 and D2=0, then

Y 47,099

The average price of houses with no driveway and no rec room is $47,099.

Dummy variables
A dummy variable is a variable which takes on the value 1 or 0 depending upon the answer to a yes or no question. For example, a dummy variable might

take on the value 1 if male and 0 if female, or 1 if


republican and 0 otherwise or 1 if Jewish and 0 otherwise or 1 if the year was 1980 or later and 0 otherwise.

Example (1)
WAGE=1 + 2 ED + 3 MALE, Where MALE= 1 if the person is male, and 0 if the person is female. This generates two equations one for females and one for males. (FEMALE):WAGE=1 + 2 ED

(MALE):WAGE=1 + 2 ED + 3 = (1 + 3 ) + 2 ED
A test of the hypothesis Ho: 3=0 is a test of the hypothesis

that the wage equation is the same for men and women.

USES
Analysis of financial and economical data their fore correct interpretation of regression report is

possible.
Use to detect any systematic differences which

are attributable to industrial classes


Uses to explain industrial differences which

cannot be explained by the control variables


(leverage, growth, pay-out, long size)

Conclusion
Hence the multiple regression and dummy variable analysis helps to find the dependent and independent value for some extent it could give an accurate interpretation but if the independent value is more in number it is quite complex to make interpretation

THANK YOU

You might also like