You are on page 1of 19

Chapter 13 Time Series Forecasting

Topics to be covered in this chapter:


Time Series Plot Trend Analysis Seasonal Patterns Decomposition Autocorrelation Moving Average Models Exponential Smoothing Models

Time Series Plot


Example 13.1 of PBS discusses monthly retail sales of General Merchandise Stores beginning January 1992 and ending May 2002 (125 months). The data are provided in EG13_001.MTW. Minitab can be used to plot the monthly retail sales by selecting

Stat

Time Series

Time Series Plot

from the menu. This command plots measurement data on the y-axis versus time data on the x-axis. Minitab assumes that the y-axis data occurred in the order that the values appear in the column, in equally spaced time intervals. If the data did not occur that way, you may want to use

Graph

Plot

to plot the y-axis data versus a date/time column on the x-axis. In the dialog box, Graph variables defines the variables to be used in each graph. Under Y enter a column containing the observations on the graph. The x-axis is automatically the time axis. The x-axis time scales can be labeled with values from a date/time column, called a date/time stamp, or with time units chosen in the dialog box: index units, calendar units, or clock units.

203

204

Chapter 13

Click on the Annotation button to add a title to the plot. You can also change the default start time, suppress time unit scales, or tell Minitab to use a different time interval by clicking on the Options button. For each time unit axis shown in the Options sub-dialog box, check the box to show the time scale, or uncheck the box to suppress the axis. This is useful for getting rid of cluttered axes. For the retail sales data, we uncheck the box to suppress the month axis:

Time Series Forecasting

205

Since January 1992, overall sales have gradually increased and a distinct pattern repeats itself approximately every 12 months.

Trend Analysis
Example 13.2 of PBS discusses monthly retail sales of General Merchandise Stores beginning January 1992 and ending May 2002 (125 months). The pattern of increasing growth in the time series plot of the retail sales data is an example of a linear trend. Minitab can be used to estimate the linear trend. Select

Stat

Time Series

Trend Analysis

from the menu. This command fits a general trend model to time series data and provides forecasts.

You can choose among linear, quadratic, exponential, and S-curve models. To forecast future sales, check Generate forecasts and enter a number in Number of forecasts. In the Starting from origin box, enter a positive integer to specify a starting point for the forecasts. If you leave this space blank, Mini-

206

Chapter 13

tab generates forecasts from the end of the data. For the retail sales data, we enter Sales as the Variable and under Model Type, choose Linear. Minitab estimates the linear trend to be Yt = 18736.5 + 145.53 t where t is the number of months elapsed beginning with the first month of the time series. Trend Analysis
Data Length NMissing Sales 125.000 0

Fitted Trend Equation Yt = 18736.5 + 145.531*t

Accuracy Measures MAPE: MAD: MSD: 12.4750 3756.77 36632916

The trend analysis includes a graphic display plot as shown below. To turn the graphics display on (or off), click on the Results button in the trend analysis dialog box.

We can also use regression techniques to fit a linear model to the above data. This gives additional output, such as the R 2 value for the model. First, select

Calc

Make Patterned Data

Simple Set of Numbers

from the menu to create a variable x (or t), where x is the number of months elapsed, beginning
with the first month of the time series. That is, x = 1 corresponds to January 1992, x = 2 corresponds to February 1992, etc. Next, select

Time Series Forecasting

207

Stat

Regression

Regression

from the menu. Enter Sales as the response variable and x as the predictor variable and click OK. Regression Analysis: Sales versus x
The regression equation is Sales = 18736 + 146 x Predictor Constant x S = 6102 Coef 18736 145.53 SE Coef 1098 15.12 T 17.06 9.62 P 0.000 0.000

R-Sq = 42.9%

R-Sq(adj) = 42.5%

Analysis of Variance Source Regression Residual Error Total DF 1 123 124 SS 3446933715 4579114499 8026048215 MS 3446933715 37228573 F 92.59 P 0.000

The trend-only model ignores the seasonal variation in the retail sales time series. Notice that the R 2 value for the trend-only model is 42.9% or 0.429. Case 13.1 of PBS discusses the sales of DVD players since the introduction of the DVD format in March 1997. At the end of June 2002, nearly 33 million DVD players had been sold in the U.S. with over 18,000 titles available in the DVD format. The Consumers Electronic Association tracks monthly sales of DVD players. The data are provided in CA13_001.MTW. Select Stat Time Series Time Series Plot from the menu to plot the DVD sales data.

The pattern of increasing growth in this plot is an example of an exponential trend. Minitab can be used to estimate the exponential trend. Select Stat Time Series Trend Analysis from the menu. In the dialog box, enter Sales as the Variable. Under Model Type, choose Exponential growth. As shown in the trend analysis, Minitab estimates the exponential trend to be

Yt = 29523.6(1.07137 t )

208

Chapter 13

where t is the number of months elapsed, beginning with the first month of the time series. Since 1.07137 is equal to e0.06894, this is equivalent to the model given in Case 13.1 of PBS.
Trend Analysis
Data Length NMissing Sales 63.0000 0

Fitted Trend Equation Yt = 29523.6*(1.07137**t) Accuracy Measures MAPE: MAD: MSD: 40.6388 224482 129023909665

The graphic display plot illustrates the exponential growth together with the raw data.

Seasonal Patterns
A trend equation may be a good description of the long run behavior of the data, but we need to account for short run phenomena like seasonal variation to improve the accuracy of our forecasts. As in Example 13.3 of PBS, we can use indicator variables to add the seasonal pattern to the trend model for the monthly retail sales data. First, select Calc Make Patterned Data Simple Set of Numbers from the menu to create a new variable Month that takes the value 1 for each January, 2 for each February, . and 12 for each December in the data set. Next select

Calc

Make Indicator Variables

from the menu. In the dialog box, enter Month under Indicator variables for, and specify 12 new columns in the Store results in textbox. This will create 12 indicator variables, one for each month. Name the indicator variables X1, X2,,X12. Select Stat Regression Regression from the menu and

Time Series Forecasting

209

enter Sales as the response variable. Enter x and all 12 indicator variables as the predictor variables and click OK. (Recall that we defined the variable x in the previous section as the number of months elapsed beginning with the first month of the time series). We get the following output from Minitab:
Regression Analysis: Sales versus x, X1, ...
* X12 is highly correlated with other X variables * X12 has been removed from the equation The regression equation is Sales = 37473 + 140 x - 24276 X1 - 23749 X2 - 20271 X3 - 20250 X4 - 18518 X5 - 19575 X6 - 20324 X7 - 18627 X8 - 20878 X9 - 18933 X10 - 13842 X11 Predictor Constant x X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 S = 964.1 Coef 37472.7 140.130 -24276.2 -23748.7 -20271.2 -20250.5 -18518.0 -19574.5 -20323.6 -18626.9 -20878.1 -18933.0 -13842.2 SE Coef 343.4 2.393 421.4 421.4 421.3 421.3 421.3 431.4 431.3 431.3 431.2 431.2 431.2 T 109.14 58.57 -57.61 -56.36 -48.11 -48.07 -43.96 -45.37 -47.12 -43.19 -48.42 -43.91 -32.10 P 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

R-Sq = 98.7%

R-Sq(adj) = 98.6%

Analysis of Variance Source Regression Residual Error Total DF 12 112 124 SS 7921942577 104105638 8026048215 MS 660161881 929515 F 710.22 P 0.000

Minitab automatically removes the last indicator variable from the equation because it is highly correlated with the first 11 indicator variables. The R 2 value for the trend-and-season model is 98.7% which is a dramatic improvement over the trend-only model. Recall that the R 2 value for the trendonly model was 42.9%.

Decomposition
Another approach to accounting for seasonal variation is to calculate an adjustment factor for each season. The trend is adjusted each particular season by multiplying it by the appropriate seasonality factor. Using seasonality factors views the model as a trend component times a seasonal component. Y = TREND SEASON Example 13.4 of PBS calculates the seasonality factors for the retail sales data. Minitab can be used to calculate seasonality factors by selecting

Stat

Time Series

Decomposition

from the menu. You can use decomposition to separate the time series into linear trend and seasonal components. You can choose whether the seasonal component is additive or multiplicative with the trend.

210

Chapter 13

In the dialog box, enter the column containing the time series under Variable, and a positive integer as the Seasonal length. Since the retail sales data is monthly data, we use a seasonal length of 12. Under Model Type choose Multiplicative, and under Model Components choose Seasonal only. By default the first observation is in seasonal period one because Minitab assumes that the first data value in the series corresponds to the first seasonal period. Enter a different number to specify a different starting value. Check Generate forecasts if you want to generate forecasts.

Minitab calculates the following seasonality factors for the retail sales data:
Time Series Decomposition
Data Length NMissing Sales(NSA) 125.000 0

Seasonal Indices Period 1 2 3 4 5 6 7 8 9 10 11 12 Index 0.781569 0.805703 0.919066 0.930090 0.988080 0.958632 0.931024 0.984559 0.910419 0.982894 1.15661 1.65135

These seasonal indices (or seasonality factors) differ slightly from the indices in Example 13.4 of PBS. This is because in Minitab the data is smoothed before the seasonal indices are found.

Time Series Forecasting

211

Autocorrelation
The residuals from a regression model that uses time as an explanatory variable should be examined for signs of autocorrelation. Examples 13.7 and 13.8 of PBS examine the residuals that result from fitting an exponential trend to the DVD player sales data. The data are provided in CA13_001.MTW. First, select Calc Calculator from the menu and calculate loge(Sales). Name this column lnSales. Next, select Stat Regression Regression from the menu and regress lnSales on the predictor variable x, where x is the number of months elapsed beginning with the first month of the time series. In the dialog box, click on the Storage button and check Residuals to obtain the residuals from the exponential trend model. Finally, select Graph Plot from the menu and plot the residuals versus x, i.e., in time order.

The pattern in the plot indicates positive autocorrelation among the residuals. An alternative plot for detecting autocorrelation is a lagged residual plot. Select

Stat

Time Series

Lag

from the menu. In the dialog box, enter the column containing the variable that you want to lag under Series. Under Store lags in, select the storage column for the lags and then specify the value for the lag. To lag the residuals of the DVD Sales data, specify a lag of one and name the output column lag_RES.

212

Chapter 13

Since the lag selected is one, Minitab moves the row elements of a column down one row. There will be one missing value symbol (*) at the top of the output column. The output column has the same number of rows as the input column, so the last value from the input column is not lagged.

Next, select Graph Plot from the menu and plot the residuals versus the lagged residuals. The graph on the following page shows a linear pattern with positive slope. This indicates that the residuals may have positive autocorrelation.

Time Series Forecasting

213

Minitab can be used to calculate the autocorrelation for the lagged residuals lagged. Select

Stat

Time Series

Autocorrelation

from the menu. In the dialog box, enter the column containing the residuals under Series. Specify one as the number of lags to use instead of the default and check Nongraphical ACF.

Minitab calculates the autocorrelation to be 0.614.


Autocorrelation Function: RESI1
ACF of RESI1 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 +----+----+----+----+----+----+----+----+----+----+ 0.614 XXXXXXXXXXXXXXXX

214

Chapter 13

Moving Average Models


Moving average models use the average of the last several values of the time series to forecast the next value. Example 13.12 of PBS calculates moving averages for quarterly JCPenny sales data with the moving average forecast model based on a span of k = 4. The data are available in TA13_001.MTW. Select

Stat

Time Series

Moving Average

from the menu. In the dialog box, enter the column containing the sales data under Variable. In the Moving Average (MA) length box, enter the span k = 4. Check Generate forecasts and enter one for Number of forecasts.

Minitab calculates the forecasted value for the first quarter of 2002 and generates a time series plot.
Moving average
Data Length NMissing Sales 24.0000 0

Moving Average Length: 4 Accuracy Measures MAPE: 10 MAD: 809 MSD: 1072724 Row 1 Period 25 Forecast 8001 Lower 5970.98 Upper 10031.0

Time Series Forecasting

215

Moving Average

10200 9200 8200

Actual
Predicted
Forecast
Actual Predicted Forecast

Sales

7200
Moving Average

6200 5200 4200

Length: MAPE: MAD: MSD:

4 10 809 1072724

10

15

20

25

Time

Exponential Smoothing Models


Example 13.15 of PBS uses an exponential smoothing model to forecast monthly returns on Philip Morris stock. The data are available in TA01_010.MTW. In this example, we use the smoothing constant w = 0.5. Select

Stat

Time Series

Single Exp Smoothing

from the menu. In the dialog box, enter the column containing the time series under Variable. Under Weight to Use in Smoothing, choose Use to enter a specific weight, then type 0.5 in the text box.

216

Chapter 13

With the Use option, Minitab uses the average of the first six observations for the initial smoothed value by default. You can change this default by specifying a different value in the Single Exponential Smoothing - Options sub-dialog box.

Minitab forecasts a 0.1939% return for Philip Morris stock in August 2001.
Single Exponential Smoothing
** Note ** Zero values of Yt exist; MAPE calculated only for non-zero Yt Data Percent retu Length 134.000 NMissing 0 Smoothing Constant Alpha: 0.5 Accuracy Measures MAPE: 226.740 MAD: 7.288 MSD: 81.604 Row 1 Period 135 Forecast 0.193989 Lower -17.6624 Upper 18.0504

Time Series Forecasting

217

Single Exponential Smoothing

Actual

20

Predicted
Forecast
Actual Predicted Forecast

Percent retu

0
Smoothing Constant Alpha: 0.500 226.740 7.288 81.604

-20

MAPE: MAD: MSD:

50

100

Time

To calculate all past forecasts as shown in Example 13.15 of PBS, click on the storage button and select Fits (one-period-ahead-forecasts).

218

Chapter 13

EXERCISES

13.1

Table 13.1 of PBS and TA13_001.MTW contain retail sales for JCPenney in millions of dollars. The data are quarterly beginning with the first quarter of 1996 and ending with the fourth quarter of 2001. (a) Before plotting these data, inspect the values in the table. Do you see any interesting features of JCPenney quarterly sales? (b) Now, select Stat Time Series Time Series Plot from the menu to make a time plot of the data. Be sure to connect the points in your plot to highlight patterns. (c) Is there an obvious trend in JCPenney quarterly sales? If so, is the trend positive or negative? (d) Is there an obvious repeating pattern in the data? If so, clearly describe the repeating pattern. In Exercise 13.1, you took a first look at the data in Table 13.1 of PBS and TA13_001.MTW. Use Minitab to further investigate the JCPenney sales data. (a) Select Stat Regression Regression from the menu and find the least-squares line for the sales data. Use 1,2,3, K as the values for the explanatory variable with X = 1 corresponding to the first quarter of 1996, X = 2 corresponding to the second quarter of 1996, etc. (b) The intercept is a prediction of sales for what quarter? (c) Interpret the slope in the context of JCPenney quarterly sales. (d) Using the equation of least-squares line, forecast sales for the first quarter of 2002 and for the fourth quarter of 2002. (e) Which forecast in part (d) do you believe will be more accurate when compared to actual JCPenney sales? Why? Table 13.2 of PBS and TA13_002.MTW display the time series of number of Macintosh computers shipped in each of eight consecutive fiscal quarters. Select Stat Time Series Time Series Plot from the menu to make a time plot of the data. With only eight quarters, a strong quarterly pattern is hard to detect. Select Stat Regression Regression from the menu and calculate the least-squares regression line for predicting the number of Macs shipped (in thousands of units). The explanatory variable Time simply takes on the values 1,2,3, K,8 in time order. Next, add indicator variables for first, second, and third quarters to the linear trend model. Call these indicator variables X1, X2, and X3, respectively. Select Stat Regression Regression from the menu to fit this multiple regression model. (a) Write down the estimated trend-and-season model. (b) Explain why no indicator variable is needed for fourth quarters. (c) What does the ANOVA F test indicate about this model? In Exercise 13.1, you made a time plot of the JCPenney sales data in Table 13.1 of PBS and TA13_001.MTW. Sales seem to follow a pattern of ups and downs that repeats every four quarters. Add indicator variables for first, second, and third quarters to the linear trend model. Call these indicator variables X1, X2, and X3, respectively. Select Stat Regression Regression from the menu to fit this multiple regression model. (a) Write down the estimated trend-and-season model. (b) Explain why no indicator variable is needed for fourth quarters.

13.3

13.4

13.5

Time Series Forecasting 218a

(c)

(d) (e) 13.6

Does the intercept still predict sales for a specific quarter? If so, what quarter? Compare the estimated intercept of this model with that of the trend-only model. Given the pattern of seasonal variation, which appears to be the better estimate? Using the equation of the trend-and-season model, forecast sales for the first quarter of 2002 and for the fourth quarter of 2002. Compare your forecasts to the same forecasts based on the trend-only model of Exercise 13.3.

In Exercise 13.4, you fit a linear trend-only model to the Macs shipped time series in TA13_002.MTW. Starting with this trend-only model, incorporate seasonality factors for each quarter. (a) Select Stat Time Series Decomposition from the menu to calculate the seasonality factor for each quarter. Since the data is quarterly data, use a seasonal length of 4 in the dialog box. (b) Average the four seasonality factors. Is this average close to one? If so, interpret the seasonality factor for first quarters. (c) Select Graph Plot from the menu and make a scatterplot of seasonality factor versus quarter. Connect the points to see the general pattern of seasonal variation. Also, draw a horizontal line at the average of the four seasonality factors. In Exercise 13.3, you fit a linear trend-only model to the JCPenney sales data. Starting with this trend model, we want to incorporate seasonality factors to account for the pattern that repeats every four quarters. (a) Select Stat Time Series Decomposition from the menu to calculate the seasonality factor for each quarter. Since the data is quarterly data, use a seasonal length of 4 in the dialog box. (b) Average the four seasonality factors. Is this average close to one? If so, interpret the seasonality factor for fourth quarters. (c) Select Graph Plot from the menu and make a scatterplot of seasonality factor versus quarter. Connect the points to see the general pattern of seasonal variation. Also, draw a horizontal line at the average of the four seasonality factors. (d) Using the linear trend-only model and the seasonality factors, forecast sales for the first quarter of 2002 and for the fourth quarter of 2002. (e) Compare your forecasts to the same forecasts based on the trend-only model of Exercise 13.3. (f) Compare your forecasts to the same forecasts based on the trend-and-season model of Exercise 13.5. Select Stat Time Series Decomposition from the menu to calculate the seasonality factor for each quarter for the JCPenney sales data. (Since the data is quarterly data, use a seasonal length of four). (a) In the dialog box, click on the Storage button and check Seasonally adjusted data to calculate the seasonally adjusted JCPenney sales time series. (b) Select Stat Time Series Time Series Plot from the menu to make a time plot of the original sales data with the seasonally-adjusted sales data superimposed. In the dialog box, fill in two rows as the Graph variables: the original time series and the seasonally adjusted time series. Click on the Frame button and choose Multiple Graphs. In the sub-dialog box, choose Overlay graphs on the same page and click OK.

13.7

13.16

218b

Chapter 13

(c)

Did seasonally-adjusting JCPenneys sales data smooth the time series to the degree that seasonally-adjusting the sales data in Figure 13.7 of PBS did? What does this imply about the strength of the seasonal pattern in these two time series?

13.17

In Exercise 13.3, a linear trend-only model was fit to the JCPenney sales data. Using the residuals from this model, look for evidence of autocorrelation. (a) Select Stat Time Series Time Series Plot from the menu and make a time plot of the residuals. Describe any pattern you see in this plot. (b) Select Stat Time Series Lag from the menu and calculate the lagged residuals. Select Graph Plot from the menu and plot the residuals versus the lagged residuals. Select Stat Time Series Autocorrelation from the menu and calculate the correlation between successive residuals. Do we have evidence of autocorrelation? In Exercise 13.5, a trend-and-season model was fit to the JCPenney sales data. Using the residuals from this model, look for evidence of autocorrelation. (a) Select Stat Time Series Time Series Plot from the menu and make a time plot of the residuals. Describe any pattern you see in this plot. (b) Select Stat Time Series Lag from the menu and calculate the lagged residuals. Select Graph Plot from the menu and plot the residuals versus the lagged residuals. Select Stat Time Series Autocorrelation from the menu and calculate the correlation between successive residuals. Do we have evidence of autocorrelation? The United States Department of Agriculture (USDA) tracks prices received by Montana farmers for winter wheat crops. The prices are tracked monthly in dollars per bushel. EX13_022.MTW has the wheat prices time series beginning in July 1929 and ending with October 2002 (880 months). Use Minitab to analyze this time series. (a) Select Stat Time Series Time Series Plot from the menu and make a time plot of the wheat prices time series. (b) Describe any important features of the time series. Be sure to comment on trend, seasonal patterns, and significant shifts in the series. (c) Select Stat Time Series Moving Average from the menu to calculate 12-month moving averages and generate a time series plot. In the dialog box, enter the Moving Average (MA) length = 12. (d) Select Stat Time Series Moving Average from the menu to calculate 120-month moving averages and generate a time series plot. In the dialog box, enter the Moving Average (MA) length = 120. (e) Compare the 12-month and 120-month moving averages. Which features of the wheat prices time series does each capture? Which features does each smooth? Example 1.7 of PBS looked at the trend and seasonal variation in the average monthly price of oranges. Figure 1.7 of PBS is a time series plot of the data. The data is found in FG01_007.MTW. (a) Select Stat Time Series Single Exp Smoothing from the menu to calculate exponential smoothing models using smoothing constants of w = 0.1, 0.5, and 0.9. (b) Comment on the smoothness of each exponential smoothing model in part (a). Which model would be best for forecasting monthly ups and downs in orange prices?

13.18

13.22

13.29

Time Series Forecasting 218c

(c)

(d)

Select Stat Time Series Single Exp Smoothing from the menu to calculate and compare forecasts for January 2001 orange prices for each of the models in part (a). Which model provided the most accurate forecast? (The actual value of the orange prices time series for January 2001 is 224.2.) Update your data by appending the January 2001 observed value of 224.2. Now select Stat Time Series Single Exp Smoothing from the menu to forecast the February 2001 orange price with each of the models from part (a). Which model provided the most accurate forecast? (The actual value of the orange prices time series for February 2001 is 229.6.)

You might also like