Professional Documents
Culture Documents
Introduction Background Setup for AR models Model Selection Using Minimum Description Length (MDL) General principles Application to AR models with breaks Optimization using Genetic Algorithms Basics Implementation for structural break estimation Simulation examples Piecewise autoregressions Slowly varying autoregressions Applications Multivariate time series EEG example Application to nonlinear models (parameter-driven state-space models) Poisson model Stochastic volatility model
2
Introduction
Structural breaks: Kitagawa and Akaike (1978) fitting locally stationary autoregressive models using AIC computations facilitated by the use of the Householder transformation Davis, Huang, and Yao (1995) likelihood ratio test for testing a change in the parameters and/or order of an AR process. Kitagawa, Takanami, and Matsumoto (2001) signal extraction in seismology-estimate the arrival time of a seismic signal. Ombao, Raz, von Sachs, and Malow (2001) orthogonal complex-valued transforms that are localized in time and frequency- smooth localized complex exponential (SLEX) transform. applications to EEG time series and speech data.
3
Introduction (cont)
Locally stationary: Dahlhaus (1997, 2000,) locally stationary processes estimation Adak (1998) piecewise stationary applications to seismology and biomedical signal processing MDL and coding theory: Lee (2001, 2002) estimation of discontinuous regression functions Hansen and Yu (2001) model selection
4
Introduction (cont)
Time Series: y1, . . . , yn Piecewise AR model:
Introduction (cont)
Motivation for using piecewise AR models: Piecewise AR is a special case of a piecewise stationary process (see Adak 1998),
m ~ Yt ,n = Yt j I [ j 1 , j ) (t / n ),
where {Yt }, j = 1, . . . , m is a sequence of stationary processes. It is argued in Ombao et al. (2001), that if {Yt,n} is a locally stationary process (in the sense of ~ Dahlhaus), then there exists a piecewise stationary process {Yt ,n } with
mn with mn / n 0, as n ,
j =1
that approximates {Yt,n} (in average mean square). Roughly speaking: {Yt,n} is a locally stationary process if it has a time-varying spectrum that is approximately |A(t/n,)|2 , where A(u,) is a continuous function in u.
6
1976
1978
1980 Year
1982
1984
200
Differenced Counts
= bg + N = bg ((tt)) + N tt
1976 1978 1980 Year 1982 1984
-400
-600
LF (y) = code length of y relative to F M Typically, this term can be decomposed into two pieces (two-part code),
L(F|y) = code length of the fitted model for F L( e|F) = code length of the residuals based on the fitted model
9
First term L(F|y) : Let nj = j j-1 and j = ( j , j1 ,K, jp j , j ) denote the length of the jth segment and the parameter vector of the jth AR process,
respectively. Then
L(F|y) = L(m) + L( 1 ,K, m ) + L( p1 ,K, pm ) + L(1 | y ) + L + L( m | y ) = L(m) + L(n1 ,K, nm ) + L( p1 ,K, pm ) + L(1 | y ) + L + L( m | y )
Encoding: integer I : log2 I bits (if I unbounded) log2 IU bits (if I bounded by IU)
MLE : log2N bits (where N = number of observations used to compute ; Rissanen (1989))
10
n L( e | F) j (log2 ( 22j ) + 1) j =1 2 For fixed values of m, (1,p1),. . . , (m,pm), we define the MDL as
MDL(m, ( 1 , p1 ),K, ( m , pm )) = log2 m + m log2 n + log2 p j +
j =1 j =1 m m
pj + 2 2
log2 n j +
j =1
nj 2
log2 (22j ) +
n 2
The strategy is to find the best segmentation that minimizes MDL(m,1,p1,, m,pm). To speed things up, we use Y-W estimates of AR parameters.
11
After a sufficient number of offspring are produced to form a second generation, the process then restarts to produce a third generation. Based on Darwins theory of natural selection, the process should produce future generations that give a smaller (or larger) objective function.
12
where
1,, if no break point at tt,, 1 if no break point at tt = = p jj,, if break point at time tt = jj11 and AR order is p jj.. p if break point at time = and AR order is p
For example,
c = (2, -1, -1, -1, -1, 0, -1, -1, -1, -1, 0, -1, -1, -1, 3, -1, -1, -1, -1,-1) t: 1 6 11 15
would correspond to a process as follows: AR(2), t=1:5; AR(0), t=6:10; AR(0), t=11:14; AR(3), t=15:20
13
Various Strategies: include the top ten chromosomes from last generation in next generation. use multiple islands, in which populations run independently, and then allow migration after a fixed number of generations. This implementation is amenable to parallel computing.
15
200
400 Time
600
800
1000
16
Span configuration for model selection: Max AR order K = 10, p mp p 0 25 .08 1 25 .1 2 30 .1 3 35 .09 4 40 .09 5 45 .09 6-10 50 .09
17
Fitted Model
0.3
0.2
0.1
0.0
0.0
0.2
0.4 Time
0.6
0.8
1.0
0.0 0.0
0.1
0.2
0.3
0.2
0.4 Time
0.6
0.8
1.0
18
ASE = n ((1//33)) {log ff ((tt//n,, jj)) log ff ((tt//n,, jj)}22,, ASE = n 1 33 {log n log n )}
11 =1 =0 t t =1 j j =0
nn 32 32
j j = 2jj//64.. = 2 64
GA
# of segments 2 3
% 0 60.0
% 0 82.0
34.0
17.5
5 6
5.0 1.0
0 0.5 3.83
19
2 0.60
86.0 89.0
3 0 11.6 10.4
4 0 1.8 0.6
5 0 0.6 0
6 0 0 0
0 0
20
.9Yt 1 + t , if 1 t < 198 Yt = .9Yt 1 + t , if 198 t 1024 where {t} ~ IID N(0,1).
-5 1
200
400 Time
600
800
1000
21
True Model
0.4 0.0 0.1 0.2 0.3 0.5
0.2
0.4 Time
0.6
0.8
1.0
22
Auto-SLEX % ASE
0 44.0 1.81 (.039) .220 (.056) .211 (.047) .290 (.096) .255 (.100)
# of segmnts
1 2
ASE
.017 (.013) .060 (.044)
37.0
3 4
3.0
7 8 9
23
-2 1
100
200 Time
300
400
500
24
# of segments
1
89.0
.096
.017
3 4
2.0
.048 .092
.020 .011
25
-2
-6
-4
0.4 0
0.6
0.8
a_t
1.0
1.2
200
400 Time
600
800
1000
200
400 time
600
800
1000
26
Fitted Model
0.4
0.3
0.2
0.1
0.0
0.0 0.0
0.1
0.2
0.3
0.0
0.2
0.4 Time
0.6
0.8
1.0
0.2
0.4 Time
0.6
0.8
1.0
27
Auto-SLEX % ASE
14.0 27.0 .238 (.030) .228 (.025) .232 (.029) .243 (.033) .269 (.040) .308
# of segments
1 2
ASE
35.0
63.0
7 8 9
1.0
0
28
3 1.4 2.8
4 1.4 0
5 0 0
3 0 1.6 1.6
4 0 0 0.8
5 0 0 0
29
In the graph below right, we average the spectogram over the GA fitted models generated from each of the 200 simulated realizations.
True Model
0.5
0.5
Average Model
0.4
0.3
0.2
Frequency
0.1
0.0
0.0 0.0
0.1
0.2
0.3
0.4
0.0
0.2
0.4 Time
0.6
0.8
1.0
0.2
0.4 Time
0.6
0.8
1.0
30
Counts
1200
1400
1600
1800
2000
2200
1976
1978
1980 Year
1982
1984
31
1978
1980 Year
1982
1984
Results from GA: 3 pieces; time = 4.4secs Piece 1: (t=1,, 98) IID; Piece 2: (t=99,108) IID; Piece 3: t=109,,120 AR(1)
32
where 0 = 1 < 1 < . . . < m-1 < m = n + 1, and {t} is IID(0, Id). In this case,
+
j =1
p j d 2 + d + d ( d + 1) / 2 2
1 j log n j + log(| Vt |) + ( Yt Yt )T Vt 1 ( Yt Yt ) , j =1 2 t = j 1
-5
-10
-5 1
200
400 Time
600
800
1000
200
400 Time
600
800
1000
34
P3 Channel
EEG T3 channel
-200
-400
-600
-400 1
-300
-200
-100
50
100
150
200
250
300
Time in seconds
Time in seconds
35
P3 Channel
Frequency (Hertz)
30
20
10
0 1
10
20
30
50
100
150
200
250
300
Time in seconds
Time in seconds
36
50
100
150
200
250
300
37
Time in seconds
use minimum description length (MDL) as an objective function use genetic algorithm for optimization
39
40
MDL
5 0 1 100 200 300 400 500
1002 1
1004
1006
1008
1010
10
1012
100
200
300
400
500
time
Breaking Point
True model: Yt | t Pois(exp{.7 + t }), t = .5t-1+ t , {t}~IID N(0, .3), t < 250 Yt | t Pois(exp{.7 + t }), t = .5t-1+ t , {t}~IID N(0, .3), t > 250. GA estimate 251, time 267secs
41
SV Process Example
Model: Yt | t N(0,exp{t}), t = + t-1+ t , {t}~IID N(0, 2)
1315 3 2
MDL
1 500 1000 1500 2000 2500 3000
-2
1295 1
1300
-1
1305
1310
500
1000
1500
2000
2500
3000
time
Breaking Point
True model: Yt | t N(0, exp{t}), t = -.05 + .975t-1+ t , {t}~IID N(0, .05), t 750 Yt | t N(0, exp{t }), t = -.25 +.900t-1+ t , {t}~IID N(0, .25), t > 750. GA estimate 754, time 1053 secs
42
SV Process Example
Model: Yt | t N(0,exp{t}), t = + t-1+ t , {t}~IID N(0, 2)
-505
0.5
MDL
1 100 200 300 400 500
0.0
-0.5
-530 1
-525
-520
-515
-510
-500
100
200
300
400
500
time
Breaking Point
True model: Yt | t N(0, exp{t}), t = -.175 + .977t-1+ t , {t}~IID N(0, .1810), t 250 Yt | t N(0, exp{t }), t = -.010 +.996t-1+ t , {t}~IID N(0, .0089), t > 250. GA estimate 251, time 269s
43
SV Process Example-(cont)
True model: Yt | t N(0, exp{t}), t = -.175 + .977t-1+ t , {t}~IID N(0, .1810), t 250 Yt | t N(0, exp{t }), t = -.010 +.996t-1+ t , {t}~IID N(0, .0089), t > 250. Fitted model based on no structural break: Yt | t N(0, exp{t}), t = -.0645 + .9889t-1+ t , {t}~IID N(0, .0935) original series
0.5
simulated series
0.0
y
-0.5 1 0.0
-0.5
0.5
1.0
100
200
300
400
500
100
200
300
400
500
time
time
44
SV Process Example-(cont)
Fitted model based on no structural break: Yt | t N(0, exp{t}), t = -.0645 + .9889t-1+ t , {t}~IID N(0, .0935) simulated series
1.0
0.5
MDL
1 100 200 300 400 500
0.0
-478 1
-476
-0.5
-474
-472
-470
-468
-466
100
200
300
400
500
time
Breaking Point
45
Summary Remarks
1. MDL appears to be a good criterion for detecting structural breaks. 2. Optimization using a genetic algorithm is well suited to find a near optimal value of MDL. 3. This procedure extends easily to multivariate problems. 4. While estimating structural breaks for nonlinear time series models is more challenging, this paradigm of using MDL together GA holds promise for break detection in parameter-driven models and other nonlinear models.
46