Professional Documents
Culture Documents
Time Series
My Introduction
Name: Bharani Kumar
Educa+on: IIT Hyderabad
Indian School of Business
Professional cer+fica+ons:
PMP Project Management Professional
4 DATA SCIENTIST
RESEARCH in
ANALYTICS, DEEP
LEARNING & IOT
3
2 DeloiHe
Driven using US policies
1 Infosys
Driven using Indian policies under Large enterprises
ITC Infotech
Driven using Indian policies SME
HSBC
Driven using UK policies
01 Cross-sec4onal Data
02
Time Series Data
02 Data Collec4on
04 Pre-Process Data
05 Par44on Series
Data
Collec-on
#3 Series Granularity? #4. Domain exper4se
• Coverage of the data – • Necessary informaHon source
Geographical, populaHon, Hme,… • Affects modeling process from start
• Should be aligned with goal to end
• Level of communicaHon/
coordinaHon between forecasters &
domain experts
Response
Summer
Response
Mo., Qtr.
1000
Power usage (KilowaHs)
900
800
700
600
500
400
1980 1982 1984 1986 1988 1990
Year
1000
800
600
400
200
0
0 2 4 6 8 10 12 14
Age of bus
= ObservaHon in Hme period t
= ObservaHon in Hme period t – k
= Mean of the values of the series
= AutocorrelaHon coefficient for k-step lag
© 2013 ExcelR Solutions. All Rights Reserved
Standard error of rk
• The standard error is
The standard error of the mean esHmates the
variability between samples whereas the
standard deviaHon measures the variability within
a single sample.
Is there a TREND?
Is there a
Are the data
SEASONALITY?
RANDOM?
80
80
60
60
40
40
y
y
20
20
0
0
0 5 10 15 20 0 1 2 3
t Log t
4.5
4.5
3 3.5 4
3.5 4
Log y
Log y
3
2.5
2.5
0 5 10 15 20 0 1 2 3
t Log t
Strategy to choose
Valida4on Data
Period
Length of Underlying
series condi4ons
affec4ng series
© 2013 ExcelR Solutions. All Rights Reserved
Rolling-forward forecasts
Lag plot
ACF plot
Histogram
ME 0 0
MAD 1 10
MSE 1 100
ME 0 0
• MSE has the added disadvantage that its unit is in square. RMSE
does not have this added disadvantage
Zero Counts
Missing values MAE/RMSE: no problem
Cannot compute MAPE
Compute average metrics
Exclude Zero count
Exclude missing values
Use alternate measure - MASE
1
Probability of 95% that the
value will be in the range [a,b]
2
If the forecast errors are normal, predic4on
interval is
σ = es4mated standard devia4on of forecast errors
k = some mul4ple
(k=2 corresponds to 95% probability)
Challenges to formula
3
• Errors ofen non-normal
• If model is biased (over/under-forecasts), symmetric interval around Ft+k?
• EsHmaHng the error standard deviaHon is tricky
One soluHon is transforming errors to normal
© 2013 ExcelR Solutions. All Rights Reserved
Forecast / Prediction Interval – Non-Normal
To construct predicHon interval for 1-step-ahead forecasts
1. Create roll-forward forecasts (Ft+1) on validaHon period
2. Compute forecast errors
3. Compute percenHles of error distribuHon
(e(5)=5th percenHle; e(95)=95th percenHle)
4. PredicHon interval:
[ Ft+1 + e(5) , Ft+1 + e(95) ]
In Excel =percen+le
5th percenHle = -307.0
95th percenHle = 292.8
95% predicHon interval for 1-step ahead forecast Ft+1:
[Ft+1 – 307 , Ft+1 + 292.8]
© 2013 ExcelR Solutions. All Rights Reserved
Forecasting Different Methods
• Linear regression
• Autoregressive models
• ARIMA Model Data • Naïve forecasts
• Smoothing
•
•
LogisHc regression
Econometric models
based driven • Neural nets
Sales(t) =
ForecasHng Amount spend in g{ f(sales(t-1, t-2, ... , t-6),
Internet Sales adverHsements a1*SQRT[AdSpend(t-1)] + ... +
a6*SQRT[AdSpend(t-6)] }
1 2 3
Global Trend Seasonality Irregular PaHerns
• Linear Trend • AddiHve (Y)
(constant growth)
• ExponenHal Trend • MulHplicaHve
(% growth) log(Y)
© 2013 ExcelR Solutions. All Rights Reserved
Autoregressive (AR) Models
• AR model is used to forecast errors
• AutocorrelaHon measures how strong the values of a Hme series are related to
their own past values
• How to find out whether there in random walk to not in the data?
• Run AR(1) model & check for the value of β1
• Do a differenced series and run ACF plot
• If the drif parameter is 0, then the k-step-ahead forecast is Ft+k = Yt for all k
• Success depends on
– Moving averages choosing window width
• Balance between over &
under smoothing
– ExponenHal smoothing
Removing Seasonality
& CompuHng seasonal
indexes
© 2013 ExcelR Solutions. All Rights Reserved
Moving Average - Calculations
Centered Moving Average Trailing Moving Average
It is calculated based on a It is calculated based on
window centered around a window from Hme ‘t’ &
Hme ‘t’ backwards
6000
5250
4500
Sales
3750
3000
2250
1500
86
86
Q 7
87
Q 8
Q 8
89
Q 9
90
Q 0
Q 1
91
Q 2
Q 2
93
Q 3
94
Q 4
Q 5
95
96
8
8
8
9
9
9
1-
3-
1-
3-
1-
3-
1-
3-
1-
3-
1-
3-
1-
3-
1-
3-
1-
3-
1-
3-
1-
Q
Q
Q
Quarter
W=5
t-4 t-3 t-2 t-1 t
3. Compute average of values in the window:
yt −W +1 + yt −W + 2 + ! + yt −1 + yt
MAt =
W
Winter’s method
• Assigns more • Trend
weight to most • Seasonality
Selecting α
“Typical” values: 0.1, 0.2
Trial & error: effect on visualization
Minimize RMSE or MAPE of training data
MA ES
• BeKer to forecast when
data & environment is • BeKer to forecast when
not volaHle data & environment is
volaHle
• Window width is key to
success • Smoothing constant (α)
value is key to success
• Simple & popular for removing trend and / or seasonality from a Hme series 2
• Lag-1 difference: Yt – Yt-1 (For removing trend) ; Lag-M difference: Yt – Yt-M (For
removing seasonality)
Differencing
• Double – differencing: difference the differenced series
• It is always beKer to choose default ‘α’ & ‘β’ values (0.2, 0.15)
• St = seasonal index of period ‘t’
• M = number of seasons
Yt
Level: Lt = α + (1- α )( L t -1 + Tt -1 )
St − M
Trend (same as Holt’s): Tt = β ( L t − L t -1 ) + (1- β ) Tt -1
Seasonality (mulHplicaHve): S t = γ Yt + (1- γ ) S t -M
Lt
• Yt = θ0 + εt + θ1 εt – 1 , εt white noise
θ1 = 0.8
θ1 = – 0.8
• Such a model has non-zero ACF and non-zero PACF at all lags
• AR(p) is ARMA(p,0)
• MA(q) is ARMA(0,q)