You are on page 1of 10

Hasan Abuelreich

912373854
Section: A01
STA137 Take Home Final
Viewing the Data
First, in order to get a clear picture of our data we will plot it before any
modifications.

It is evident that this data has trend and seasonality, which implies
there certain patterns in our data that in order to make a proper
forecast we need to get rid of.
We will use the differencing method, which is a process that will aid us
in getting rid of these patterns so we can make a proper forecast. Our
goal will be to make the data stationary, which means it will have no
seasonality or trends. This will allow us make a more accurate forecast
based on the residuals.

Hasan Abuelreich
912373854
Section: A01

The plot above is of the data after the differencing has been applied.
This data has been differenced twice because in the first difference it
did not appear stationary and still had a pattern in it or trend and
seasonality. It is obvious that the data does not follow an exponential
pattern as it did before. However, in order to be sure the data is
stationary we will run a KPSS test, which will have a high p-value if the
data is stationary.
KPSS Test for Level Stationarity
data: z
KPSS Level = 0.0357, Truncation lag parameter = 2, p-value = 0.1

The p-value is considered large in this case so this assures us further


that the data we are now dealing with is stationary. Now that our data
is stationary, we view ACF and PACF plots, which aid us in choosing a
model. We choose a model based on these plots to correct any
autocorrelation remaining in differenced series.

Hasan Abuelreich
912373854
Section: A01

According to our plot, the ACF, lags approaches after one lag because
we are counting every 12th lag as one lag because of the seasonal
component. The seasonal component is 12 because it is yearly data
and there 12 month in every year. This allows us to account for
seasonality. On the other hand, PACF lags gradually approach zero after
lag 1. Hence, it seems that this is a MA (1) model. Hence we will fit an
MA (1) model. However, to be certain that there isnt a better model
we will use the auto.arima function in R, which will generate the most
optimum taking seasonal component into consideration. It turns out

Hasan Abuelreich
912373854
Section: A01
both models are MA (1) , hence we look at the AIC which is a criteria
for a better fit and usually the lower the AIC, the better the fit. The
following results are obtained from our models:
1.
Call:
arima(x = z, order = c(0, 0, 1))
Coefficients:
ma1 intercept
-0.5707
0.0015
s.e. 0.0671
0.0382
sigma^2 estimated as 0.9227: log likelihood = -164.26, aic = 332.52
2.
Series: z
ARIMA(0,0,1)(0,0,1)[12] with zero mean
Coefficients:
ma1
sma1
-0.5791 -0.8205
s.e. 0.0791 0.1137
sigma^2 estimated as 0.5448: log likelihood=-139.55
AIC=285.1 AICc=285.3 BIC=293.43

It appears from the above results that the AIC of our second model
seasonally fitted using auto.arima is lower than the AIC for the other
MA(1) model. Hence, we will use the second model in our forecasting.
We currently have a model that we can use to forecast, however it is
important that we check if our data is normal first because our data
needs to be normal for our forecast to be accurate. As visualization we
can look at the residuals on a qq-normal plot, where the residuals
should mainly follow the qq-line to be normal.

Hasan Abuelreich
912373854
Section: A01

According to our plot, few points deviate away from the line, hence we
may assume that this data follows a normal distribution. To be certain
we run a Shapiro test, where a high p-value would further support that
the data follows a normal distribution. The following results are
obtained:
Shapiro-Wilk normality test
data: auto_arima_fit$resid
W = 0.9821, p-value = 0.1135

The p-value obtained is considered high, hence this further supports


that our data is normal.
Now that the appropriate model has been selected and normality tests
were run to ensure that the data we are dealing with is normal, we may
proceed to the next step, which is to make a forecast based on these
residuals. The following forecast is obtained for our model:

Hasan Abuelreich
912373854
Section: A01

The plot above displays a forecast of the CO2 levels in 2005, however
to make more sense of this or to get a better understanding of the
data, we should construct confidence intervals. The confidence
intervals that we will look at are 90% and 95% confidence intervals.
This implies that we are 90% and 95% confident the true value lies in
our intervals. The intervals obtained along with a point estimate, which
is a precise estimate of what the value may be are displayed below.

It can be seen from the forecast that levels of carbon dioxide fluctuate
every month from the point forecast estimate and confidence intervals.

Hasan Abuelreich
912373854
Section: A01
However, one should note that typically January the coldest month of
the year the levels are seen negative and lower. On the other hand
July, usually the hottest month of the year with more vehicles on the
road, the levels are relatively high. This may be just a coincidence, but
noteworthy of observing. In conclusion, the plot above and the values
above are forecasts for the year 2005 for the CO2 levels based on the
data given.

Hasan Abuelreich
912373854
Section: A01

R appendix:
library(TSA)
data(co2)
plot(co2, main = 'CO2 levels')
x = co2
y = diff(x, lag = 12)
plot(y, main = 'CO2 levels', ylab = 'CO2')
z = diff(y)
plot(z, ylab = 'CO2', main = 'CO2 levels')
#seems stationary run kpss.test to make sure

kpss.test(z)

acf(z, main ='CO2 Series', lag.max = 60)


pacf(z, main = 'CO2 Series', lag.max = 60)
#hence it appears to be a seasonal MA(1)

#fit MA(1) model


fit = arima(z, order = c(0,0,1))
summary(fit)

Hasan Abuelreich
912373854
Section: A01
auto_arima_fit = auto.arima(z)
auto_arima_fit

#normality check qqplot and shapiro.test()


qqnorm(auto_arima_fit$resid)
qqline(auto_arima_fit$resid)

shapiro.test(auto_arima_fit$resid)

#forecasting
fc = forecast(auto_arima_fit, 12, level = c(0,.95))
fc
plot(fc)
fitted.values(auto_arima_fit)
#let auto.arima do the differencing
fit2 = auto.arima(x)
fitted(auto_arima_fit)
fc= forecast(fit2, 12)
plot(fc)

Hasan Abuelreich
912373854
Section: A01

You might also like