Professional Documents
Culture Documents
Our earlier look at estimating a demand function demonstrated how multiple regres-
sion could be used to estimate the demand for gasoline as a function of various predictors,
including its price. The simple model described at the end of the case was based on the
price index of gasoline (logPG), per capita income (logI) and Year2 (YRSQ):
Regression Analysis
Analysis of Variance
Source DF SS MS F P
Regression 3 0.145410 0.048470 332.44 0.000
Residual Error 32 0.004666 0.000146
Total 35 0.150076
Although this model fits the data reasonably well, it does suffer from a difficulty —
it does not address the time ordering of the data. In fact, the residuals from this model
exhibit autocorrelation, as can be seen from this time series plot:
1
SRES1
-1
-2
Index 10 20 30
The Durbin–Watson statistic supports this, as it equals 0.50; so does the runs test
(although a bit weaker):
SRES1
K = 0.0100
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8 9
As we’ve discussed, one approach for handling autocorrelation is to use a lagged version
of the target variable as a predictor (Lagged logGpc, saying that the previous year’s gas
consumption goes a long way to predicting this year’s consumption, due to basic stability
in the process). Also, in thinking about the dynamics of how people decide to use their
automobiles, it seems reasonable to consider also using a lagged version of the price index
of gasoline, Lagged logPG (saying that consumption might be affected not only by current
price, but previous price, because of the perception of people that prices are increasing
or decreasing). Generally speaking, using lagged versions of predictors is not designed to
specifically address autocorrelation (as the use of the lagged target as a predictor often is),
but rather based on such use making sense in context.
Here is a scatter plot of logged per capita consumption on the previous year’s logged
per capita consumption. We can see that there is a strong relationship, although it is
apparently weaker for the higher values. I haven’t bothered to give the plot of logged per
capita consumption versus previous year’s price index, since it looks very similar to the
one for current year’s price index that we saw earlier.
Regression Analysis
Analysis of Variance
Source DF SS MS F P
Regression 3 0.127284 0.042428 637.51 0.000
Residual Error 31 0.002063 0.000067
Total 34 0.129347
0
SRES2
-1
-2
-3
-4
Index 10 20 30
1 1960 0.85772 * * *
2 1961 0.85583 -2.19803 0.166530 0.241328
3 1962 0.86806 0.02270 0.174651 0.000027
4 1963 0.87579 -0.80749 0.145953 0.027857
5 1964 0.89125 0.07544 0.130924 0.000214
6 1965 0.90611 0.61353 0.113090 0.011999
7 1966 0.92591 0.88899 0.089974 0.019534
8 1967 0.93752 -0.14895 0.073089 0.000437
This year was the year of a serious recession and the first Gulf War (Operation Desert
Storm), so apparently gasoline consumption decreased during this time period. As an
outlier, we could contemplate removing this case and reanalyzing the data. Unfortunately,
if we do that, we will disturb the natural time ordering in the data. An alternative approach
is to substitute a “reasonable” value, such as the average of the two neighboring values,
for the outlying value, and then reanalyze the entire adjusted data set. This is admittedly
an ad hoc solution, and more complex (and theoretically justified) substitution methods
are possible. Still, very simple techniques like this can work quite adequately.
Regression Analysis
Analysis of Variance
Source DF SS MS F P
Regression 3 0.128926 0.042975 912.79 0.000
Residual Error 31 0.001460 0.000047
Total 34 0.130386
The model fits slightly better, but the coefficients have changed little. More impor-
tantly, there is no autocorrelation, and no outliers are apparent:
1
Normal Score
-1
-2
-3 -2 -1 0 1 2
Standardized Residual
0
SRES3
-1
-2
-3
Index 10 20 30
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1 2 3 4 5 6 7 8
SRES2
K = -0.0148
1 1960 0.85772 * * *
2 1961 0.85583 -2.55989 0.166530 0.327330
3 1962 0.86806 0.09034 0.174651 0.000432
4 1963 0.87579 -0.90839 0.145953 0.035255
5 1964 0.89125 0.13495 0.130924 0.000686
6 1965 0.90611 0.78300 0.113090 0.019544
7 1966 0.92591 1.09185 0.089974 0.029467
8 1967 0.93752 -0.15210 0.073089 0.000456
9 1968 0.96368 1.61432 0.068043 0.047567
10 1969 0.98779 1.42412 0.067586 0.036752
The residual versus fitted plot gives a slight indication of structure, but given the very
high R2 here, it is unlikely that any corrective action would make much of a difference.
1
Standardized Residual
-1
-2
-3
0.85 0.95 1.05
Fitted Value
This new gas demand function has an appealing intuitive justification. Given the last
two years’ prices, gasoline demand is directly to last year’s demand (1% higher demand
last year is associated with 1.08% estimated expected increase this year). Given last year’s
demand and price, this year’s demand is inversely related to this year’s price, which is the
inverse demand / price relationship expected from economic theory (1% higher price is
associated with .33% estimated expected decrease in demand). Further, given this year’s
price and last year’s demand, this year’s demand is directly related to last year’s price
(1% higher price last year is associated with .30% estimated expected increase in demand
this year). This also makes sense, since a higher value of last year’s price, given this year’s
price is fixed, is consistent with a decreasing price trend, which would encourage additional
consumption. The standard error of the estimate implies that per capita gas demand can
be predicted to within 3% (10.013724 = 1.03).
The fill–in method for handling an outlier used here has two limitations that are
worth noting. First, adjusting the target (y) value will not fix leverage points, so they
are characterized by unusual predictor values, not unusual target values. Second, unusual
observations often occur in “patches” in time series data, reflecting a temporary change in
the underlying structure of the process; a constant fill–in for four or five (say) consecutive
time periods is obviously not accurately reflecting what we think the series really should
be.
Regression Analysis
Analysis of Variance
Source DF SS MS F P
Regression 4 0.128105 0.032026 773.86 0.000
Residual Error 30 0.001242 0.000041
Total 34 0.129347
The fitted coefficients are virtually the same as when the fill–in method is used. One
additional piece of information from this approach is the coefficient for Year1991: given
previous year’s gasoline consumption, and this and last year’s gasoline price index, the
observed logged per capita consumption for 1991 is seen to have been .0298 lower than