You are on page 1of 5

REGRESSION The most commonly used form of regression is linear regression, and the most common type of linear

regression is called ordinary least squares regression. Linear regression uses the values from an existing data set consisting of measurements of the values of two variables, X and Y, to develop a model that is useful for predicting the value of the dependent variable, Y for given values of X.

ELEMENTS OF A REGRESSION EQUATION The regression equation is written as Y = a bX e

Y is the value of the !ependent variable "Y#, what is being predicted or explained a or $lpha, a constant% equals the value of Y when the value of X=& b or 'eta, the coefficient of X% the slope of the regression line% how much Y changes for each one(unit change in X. X is the value of the )ndependent variable "X#, what is predicting or explaining the value of Y e is the error term% the error in predicting the value of Y, given the value of X "it is not displayed in most regression equations#. *or example, say we +now what the average speed is of cars on the freeway when we have , highway patrols deployed "average speed=-. mph# or /& highway patrols deployed "average speed=0. mph#. 'ut what will be the average speed of cars on the freeway when we deploy . highway patrols1 $verage 2peed on *reeway "Y# -. 0. 3umber of 4atrol 5ars !eployed "X# , /&

*rom our +nown data, we can use the regression formula "calculations not shown# to compute the values of and and obtain the following equation6 Y= 7. "(.# X, where Y is the average speed of cars on the freeway a=7., or the average speed when X=& b="(.#, the impact on Y of each additional patrol car deployed

X is the number of patrol cars deployed That is, the average speed of cars on the freeway when there are no highway patrols wor+ing "X=&# will be 7. mph. *or each additional highway patrol car wor+ing, the average speed will drop by . mph. *or five patrols "X=.#, Y = 7. "(.# ".# = 7. ( ,. = 8& mph There may be some variations on how regression equations are written in the literature. *or example, you may sometimes see the dependent variable term "Y# written with a little 9hat9 " : # on it, or called Y(hat. This refers to the predicted value of Y. The plain Y refers to observed values of Y in the data set used to calculate the regression equation. You may see the symbols for alpha "a# and beta "b# written in ;ree+ letters, or you may see them written in <nglish letters. The coefficient of the independent variable may have a subscript, as may the term for X, for example, b/X/ "this is common in multiple regression#.

ASSESSING THE REGRESSION EQUATION =e now have a regression equation. 'ut how good is the equation at predicting values of Y, for given values of X1 *or that assessment, we turn to measures of association and measures of statistical significance that are used with regression equations. r2 r, is a measure of association% it represents the percent of the variance in the values of Y that can be explained by +nowing the value of X. r, varies from a low of &.& "none of the variance is explained#, to a high of /.& "all of the variance is explained#. s.e.b s.e.b is the standard error of the computed value of b. $ t(test for statistical significance of the coefficient is conducted by dividing the value of b by its standard error. 'y rule of thumb, a t( value of greater than ,.& is usually statistically significant but you must consult a t(table to be sure. )f the t(value indicates that the b coefficient is statistically significant, this means that the independent variable or X "number of patrol cars deployed# should be +ept in the regression equation, since it has a statistically significant relationship with the dependent variable or Y "average speed in mph#. )f the relationship was not statistically significant, the value of the b coefficient would be "statistically spea+ing# indistinguishable from >ero. F * is a test for statistical significance of the regression equation as a whole. )t is obtained by dividing the explained variance by the unexplained variance. 'y rule of thumb, an *(value of greater than ?.& is usually statistically significant but you must consult an *(table to be sure. )f * is significant, than the regression equation helps us to understand the relationship between X and Y. *or our example above, say we obtained the following values6

r2 = .9 @nowing the value of X "the number of patrol cars deployed#, we can explain A&B of the variance in Y "the average speed of motorists on the freeway#. s.e.b = /.. !ividing b by s.e.b, we obtain a value for t = (.C/.. = (0.0. 5onsulting a t(table, we find that the coefficient is statistically significant. This means that the independent variable X "number of patrol cars deployed# should be +ept in the regression equation, since it has a statistically significant relationship with the dependent variable Y "average speed in mph#. F= 8.4 *rom the *(table, we see that the regression equation as a whole is statistically significant. This means that the regression equation is helping us to understand the relationship between X and Y.

STEPS IN LINEAR REGRESSION /. 2tate the hypothesis. ,. 2tate the null hypothesis 0. ;ather the data. ?. 5ompute the regression equation .. <xamine tests of statistical significant and measures of association 8. Delate statistical findings to the hypothesis. $ccept or reEect the null hypothesis. -. DeEect, accept or revise the original hypothesis. Fa+e suggestions for research design and management aspects of the problem. <xample6 The motor pool wants to +now if it costs more to maintain cars that are driven more often. Gypothesis6 maintenance costs are affected by car mileage 3ull hypothesis6 there is no relationship between mileage and maintenance costs !ependent variable6 Y is the cost in dollars of yearly maintenance on a motor vehicle )ndependent variable6 X is the yearly mileage on the same motor vehicle !ata are gathered on each car in the motor pool, regarding number of miles driven in a given year, and maintenance costs for that year. Gere is a sample of the data collected. 5ar 3umber / , 0 ? . Files !riven "X# 7&,&&& ,A,&&& .0,&&& /0,&&& ?.,&&& Depair 5osts "Y# H/,,&& H/.& H8.& H,&& H0,.

The regression equation is computed as "computations not shown#6 Y = .& *or example, if X=.&,&&& then Y = .& .&0 ".&,&&&# = H/,..&

.&0 X

a=.& or the cost of maintenance when X=&% if there is no mileage on the car, then the yearly cost of maintenance=H.& b=.&0 the value that Y increases for each unit increase in X% for each extra mile driven "X#, the cost of yearly maintenance increases by H.&0 s.e.b = .&&&.% the value of b divided by s.e.b=8&.&% the t(table indicates that the b coefficient of X is statistically significant "it is related to Y# r2=.90 we can explain A&B of the variance in repair costs for different vehicles if we +now the vehicle mileage for each car 5onclusion6 DeEect the null hypothesis of no relationship and accept the research hypothesis, that mileage affects repair costs.

ASSUMPTIONS OF LINEAR REGRESSION )n theory, there are several important assumptions that must be satisfied if linear regression is to be used. These are6 /. 'oth the independent "X# and the dependent "Y# variables are measured at the interval or ratio level. ,. The relationship between the independent "X# and the dependent "Y# variables is linear. 0. <rrors in prediction of the value of Y are distributed in a way that approaches the normal curve. ?. <rrors in prediction of the value of Y are all independent of one another. .. The distribution of the errors in prediction of the value of Y is constant regardless of the value of X. There are a number of advanced statistical tests that can be used to examine whether or not these assumptions are true for any given regression equation. Gowever, these are beyond the scope of this discussion.

TIME SERIES REGRESSION Linear regression is useful for exploring the relationship of an independent variable that mar+s the passage of time to a dependent variable when the relationship is linear% that is, when there is an obvious downward, or upward, trend in the data over time.

Gowever, if the trend of the dependent variable over time is not linear, then linear regression will not capture the relationship. Linear regression fails to capture seasonal, cyclical, and counter(cyclical trends in time series data. 3either does linear regression capture the effects of changes in direction of time series data, nor changes in the rate of change over time. *or time series regression, it is important to obtain a plot of the data over time and inspect it for possible non(linear trends. There is also a problem if the values at one point in the time series are determined or strongly influenced by values at a previous time. This is called auto(correlation. This occurs when the values of the dependent variable over time are not randomly distributed. Linear regression can be used with interrupted time series research designs. *or example, say a policy is implemented to reduce the number of accidents among teenage drivers. /. !ata are gathered for at least ,& or 0& time periods "months or quarters# before the policy is implemented, and then for another ,& or 0& time periods after the policy is implemented. ,. Ine linear regression is performed for the accident rate data on the pre(policy time periods. 0. $nother linear regression is performed for the accident rate data on the post(policy time period. ?. There should be differences in the values of the constant, b coefficient, s.e.b , and r, for the two equations. )f there is a difference between the two equations, then the policy has had an effect. )f all the data points "both pre( and post(# had been included in the regression equation, the amount of variance explained "r,# would be quite low. This is because, if there is a change after the policy is introduced, the trend is no longer linear. )nstead, there are two different linear trends, one before the policy was introduced, and another, different one after it was introduced. )n setting up the data for time series regression, the researcher must remember to number the years "or other time periods# consecutively from / to n. These are the values for the independent "X# variable. The value of the dependent variable is the accident rate. *or example, )ndependent Jariable "X# ( Year / , 0 ? !ependent Jariable "Y# ( $ccident Date .&,&&& ./,&&& .,,&&& .0,&&&

You might also like