Random Forests Detect Time Series Change Points

Control Engineering Practice 18 (2010) 9901002
Contents lists available at ScienceDirect
Control Engineering Practice

journal homepage: www.elsevier.com/locate/conengprac
Change point detection in time series data with random forests

Lidia Auret, Chris Aldrich n
Department of Process Engineering, University of Stellenbosch, Private Bag X1, Matieland 7602, South Africa
a r t i c l e in f o a b s t r a c t
Article history: A large class of monitoring problems can be cast as the detection of a change in the parameters of a
Received 1 August 2009 static or dynamic system, based on the effects of these changes on one or more observed variables. In
Accepted 16 April 2010 this paper, the use of random forest models to detect change points in dynamic systems is considered.
Available online 26 May 2010
The approach is based on the embedding of multivariate time series data associated with normal
Keywords: process conditions, followed by the extraction of features from the resulting lagged trajectory matrix.
Time series analysis The features are extracted by recasting the data into a binary classification problem, which can be
Detection algorithms solved with a random forest model. A proximity matrix can be calculated from the model and from this
Machine learning matrix features can be extracted that represent the trajectory of the system in phase space. The results
Subspace methods
of the study suggest that the random forest approach may afford distinct advantages over a previously
Singular value decomposition
proposed linear equivalent, particularly when complex nonlinear systems need to be monitored.
& 2010 Elsevier Ltd. All rights reserved.
1. Background (Qing, Chan, Beard, & Kumar, 2006) and nuclear power plants
(Bartlett & Uhrig, 1991), etc.
Many monitoring problems can be cast as the detection of a These problems are driven by the growing availability of
change in the parameters of a static or dynamic system, represented sophisticated sensors in industrial and natural environments,
by one or more observed variables. The task of change point increasing complexity of most technological processes, as well as
detection is then to identify a shift in the process state generating the widespread use of advanced information processing systems
the observed variables, based on a description of the preceding state (Basseville & Nikiforov, 1993). Solutions to these problems are
observations. This can be a challenging task, since the key difficulty critically important for safety, ecological and economical reasons,
is to detect intrinsic changes that are not necessarily directly and over the last few decades a variety of complex monitoring
observed and that are measured together with other types of algorithms have been proposed.
perturbations. A distinction is made between unsupervised fault These algorithms do not require a priori knowledge of fault
detection and unsupervised change detection: the reference or conditions, but require only reference or normal operating
normal operating conditions for fault detection remain constant, conditions, and are termed unsupervised approaches. Examples
while those for change detection are continually updated. of data-driven unsupervised techniques include the use of CUSUM
Over the last three or more decades, the online detection of tests (Shu, Jiang, & Wu, 2008; Gombay & Serban, 2009; Messaoud
change points has assumed major significance in a wide range of & Weihs, 2009; Luo, Li, & Wang, 2009), chronological clustering
real-world problems, such as condition-based maintenance of (Andersen et al., 2009), regression models (Li & Nilkitsaranont,
industrial equipment and processes (Wang, Chen, Wu, & Wu, 2009), maximum likelihood methods (Apostolopoulos, 2008) and
2001; Zang & Imregun, 2001), infectious diseases (Hohle, Paul, & principal component analysis (Borguet & Leonard, 2009). Non-
Held, 2009), biomedicine (e.g. Staudacher, Telser, Amann, Hinter- linear methods in particular have received considerable attention
huber, & Ritsch-Marte, 2005; Piryatinska et al., 2009), financial recently, such as the use of artificial neural networks (Oh, Moon, &
markets (e.g. Oh & Han, 2000; Andreaou & Ghysels, 2006), Kim, 2005, Oh & Han, 2000), various energy statistics (Kim,
intrusion detection in computer networks (Tartakovsky, Rozovs- Marzban, Percival, & Stuetzle, 2009), one-class classification with
kii, Blazek, & Kim, 2006), prediction of natural catastrophic events, support vector machines (Camci & Chinnam, 2008), generative
such as earthquakes (Ogata, 1989), detection of ecological trends topographical mappings (Olier & Vellido, 2008), distance similar-
and environmental conditions (Ha & Ha, 2005; Carslaw & Carslaw, ity matrices (Lopez & Sarigul-Klijn, 2009), kernel principal
2007; Andersen, Carstensen, Hernandez-Garca, & Duarte, 2009), component analysis (Nielsen & Canty, 2008) and information
safety of complex systems, such as aircraft (Boller, 2000), rockets theoretic approaches (Srivastav, Ray, & Gupta, 2009).
Despite these rapid developments, the detection of changes in
a large variety of systems remains challenging and in this paper, a
n
Corresponding author. Tel.: + 27 21 808 4496; fax: +27 21 808 2059. novel nonlinear approach is investigated based on the use of
E-mail address: ca1@sun.ac.za (C. Aldrich). random forests. With this approach, no assumptions have to be
0967-0661/$ - see front matter & 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.conengprac.2010.04.005
L. Auret, C. Aldrich / Control Engineering Practice 18 (2010) 9901002 991
Nomenclature s similarity measure element

T feature matrix
A BZ and autocatalytic reactant T indication of test conditions (subscript)
A autocatalytic model parameter T simulated time series system
a reduced dimensionality T BZ variable
B indication of base conditions (subscript) t feature vector
B autocatalytic reactant t time index
B autocatalytic model parameter U eigenvector matrix
b autocatalytic model parameter V indication of validation conditions (subscript)
b extent of base matrix V BZ reactant and autocatalytic volume
C covariance matrix v BZ variable
C BZ ion concentration and autocatalytic reactant w lag parameter
C number of classes X predictor matrix
c rate constant X0 synthetic predictor matrix
D dissimilarity matrix X BZ reactant
D autocatalytic model parameter X autocatalytic dimensionless number
d pair-wise dissimilarity x predictor vector
E cost function for multidimensional scaling x BZ variable
e noise variable Y BZ reactant
e distance metric instance Y autocatalytic dimensionless number
e~ normalized distance metric instance y response vector
F joint response and predictor variable distribution y response scalar
f simulated time series parameter Z concatenation of X and X0
G forward mapping functions (subscript) Z BZ reactant
g BZ parameter Z autocatalytic dimensionless number
H Hankel matrix z BZ variable
H reverse mapping functions (subscript) a significance level
H BZ reactant b size of base matrix
h Hankel column vectors g ratio of species
h BZ parameter d Euclidian distances in mapping
K number of ensemble members e relative distance metric instance
k base learner indicator Z node identifier
L learning sample matrix H set of ensemble functions
l learning sample vector Y ensemble function
M dimensionality of predictor vector h base learner parameter vector
m number of random split variables k class outcome
N number of observations l eigenvalue
n time interval iteration m BZ and autocatalytic time variable
P distribution of base learner parameter vectors S size of test matrix
p^ fraction of samples s test matrix termination index
Q autocatalytic system volumetric flow rate B split position on variable
Q node impurity measure Bn best split position among variables
R decision tree local region t time lag between test and base matrix
r reaction rate j base learner function
S similarity matrix R dimensional space
made with regard to the nature or distribution of the time series trees, such that each tree depends on a random vector sampled
data, and as will be shown by means of simulated processes, it can independently from the data (Breiman, 2001).
deal with complex nonlinear dynamic systems. The paper is
organized as follows. In the next section, a brief introduction to
2.1. Classification and regression trees
random forests algorithms and a methodology for change point
detection based on these algorithms are given. In Section 3,
A classification tree divides the feature space into recursive
a conceptually related methodology previously proposed by
binary partitions, allocating a fixed-value prediction to each
Moskvina and Zhigljavsky (2003) is explained and in Sections 4
partition. In order to select optimal splitting variables and split
and 5, these two algorithms are compared in two case studies.
positions for recursive partitioning, node purity criteria are used.
Section 6 presents results on simulations for noise and parameter
For a classification problem with class outcome k 1, 2, y, C, the
settings, while the conclusions of the paper are summarized in
following is defined for region RZ with NZ observations (node Z):
Section 7.
1 X
p^ Zk Iyi k 1
NZ x A R
i Z
2. Random forest algorithms
p^ Zk is the fraction of samples allocated to node Z with class
Random forests (RF) are nonlinear regression or classification outcome k, with y the response vector and x the predictor vector.
models consisting of an ensemble of regression or classification Observations reporting to node Z are classified as class k(Z), the
992 L. Auret, C. Aldrich / Control Engineering Practice 18 (2010) 9901002
majority class in that node, where 2.4. Feature extraction with random forests
kZ arg maxk p^ Zk 2
Breiman and Cutler (2003) and Shi and Horvath (2006) have
The node impurity measure QZ for node Z of tree j can be explored the use of random forests in unsupervised learning.
defined as Feature extraction with random forests is based on a proximity
matrix calculated from the data, while the inverse of the
X
C
QZ p^ Zk 1$p^ Zk 3 proximity matrix results in a random forest dissimilarity measure.
k1 The method for obtaining the random forest dissimilarity
matrix and subsequent multidimensional scaling for dimension
where QZ is the Gini index often used in classification problems.
reduction based on this matrix is briefly outlined below (Breiman
Other indices can also be used, if so desired.
& Cutler, 2003; Shi & Horvath, 2006).
2.2. Ensemble models i) For an unlabeled learning set XARN % M, create a synthetic
data set X0ARN % M by random sampling from the product of
The regression or classification problem can be formalized by marginal distributions of X.
considering a learning sample l(yi, xi), i 1, 2, y, N, where ii) Label X as class 0 and X0 as class 1 and construct matrix
yAR1, is a response and xARM an M-dimensional vector of Z[X;X0]AR2NxM.
predictor variables. The learning sample matrix consists of N iii) Construct a random forest classifier on Z, designed to predict
independent realizations of L (y, X), whose joint distribution can the (0, 1) labels of the samples in Z.
be denoted by F. iv) Initialize a null matrix S 0ARN % N.
An ensemble model, such as a random forest algorithm, starts v) For each data point pair combination xi and xj, (for i, j1,
with a base learner j(X, h), where h is a vector of parameters 2, y, N) and each decision tree in the random forest
controlling the fitting. If P(h9L) is some distribution on h, typically classifier, determine whether the data points report to the
depending on the learning sample, then an ensemble of models is same terminal (leaf) node of the tree.
constructed by K independent draws from the distribution P(h9L). vi) If data points xi and xj report to the same leaf node, increase
These models jk(X,h), for k1, 2, y, K can be combined to yield sij by 1, where sij is the (i, j)th element of S. Repeat for all trees
and aggregated estimator Y, according to Eq. (4) and all data point pairings.
vii) Scale S by dividing by the number of trees in the forest. The
1XK
resulting similarity matrix S is symmetric, positive definite,
YX, h j X, h 4
Kk1 k with entries ranging between 0 and 1.
viii) Construct the dissimilarity matrix DIS
ix) From D, calculate the multidimensional scaling (MDS)
2.3. Random forest algorithm coordinates in a dimensions by preserving the pair-wise
dissimilarities of the original data points (dij) as Euclidian
The basic learners in Eq. 4 are constructed as follows. Let distances of the new feature vectors t in the mapping (dij).
the number of training cases be N, and the number of predictor The mapping can be induced by minimizing the cost function
variables be M: E in Eq. (5)
X
E dij $dij 2 5
i) Take a bootstrap sample of size N from the N available ij
samples, i.e. by selecting N samples with replacement from all
N available training samples. The samples that are not in the
training set are used to estimate the error of the forest. 2.5. Structure of change point algorithm
ii) At node Z in the tree jk, randomly sample m5M of the
predictor variables and find the best split Bj among all possible The random forest change point detection algorithm con-
splits for the jth variable (j 1, 2, y, m). structs a subspace Ra of a extracted features for a moving window
iii) Select the best split Bn among the j 1, 2, y, m splits Bj and of normal process or reference conditions, and measures the
split the data on node Z by partitioning the samples based on distance of new test data to this reference subspace. This can be
whether xij o Bn. done in three steps, viz. construction of an a-dimensional
iv) Repeat steps iiv on all descendent nodes, to grow an subspace, construction of a test matrix and computation of
unpruned regression tree. suitable test statistics or diagnostic sequences.
v) Predict new data, by averaging the predictions of the K trees. The embedding of the time series to create the base matrix and
subsequent feature extraction and mapping are illustrated in Fig. 1.
The definitions of the base and test matrices are given in Figs. 2
Note that error rates can be estimated from the training data, by
and 3. Given a stationary multivariate time series of observations, a
using the tree grown from the bootstrap sample to predict the
window of width b is slid across the time series data.
data not in the bootstrap sample (also referred to as the out of bag
sample) and aggregating the predictions or out of bag (OOB)
estimates of all the trees in the forest. 2.5.1. Construction of a-dimensional space
In summary, the user has to specify the parameters m (the A lagged trajectory matrix (Hankel matrix) HB for the reference
number of variables selected for splitting) and K, the number of conditions is constructed, with w lags across the time interval
trees in the forest. The parameters N (the number of samples) and [n +1, n+ b]
0 1
M (the number of variables) are fixed by the input data. When the xn 1 xn 2 ... xn b
number of trees in the forest is sufficiently large, the general- B C
B xn 2 xn 3 . . . xn b 1 C
ization error of the model converges to a limit. This error depends HBn B
B ^
C
C 6
@ ^ ^ ^ A
on the strength of the trees in the forest (higher is better) and the
xn w xn w 1 . . . xn b
correlation between the trees (lower is better).
Fig. 1. Schematic of random forest phase space feature extraction and regression mappings for change point detection (the mapping and reverse mapping regression
forests created for the base matrix are used for the test matrix feature extraction and reverse mapping steps).
Fig. 3. Base and test matrix coordinates for random forest change point detection,
where wlag parameter, b extent of base matrix, b size of base matrix (number
of columns) b $ w+1, n iteration number, t time lag between test and base
matrices, s test matrix termination index, S size of test matrix (number of
columns) s $ t.
Fig. 2. Shifting base and test matrices for random forest change point detection. 2.5.2. Construction of test matrix
A test matrix of size w % S is then constructed from time series
n data following on from the data associated with the base matrix,
The columns of the base matrix are labeled hj .
but still part of the normal operating region of the observations:
! "T 0 1
n
hj xn j ,xn j 1 ,. . .,xn j w$1 with j 1,2, . . ., b 7 xn t 1 xn t 2 ... xn s
B C
B xn t 2 xn t 3 ... xn s 1 C
The matrix is then augmented to set up a binary classification HTn B
B
C
C 8
@ ^ ^ ^ ^ A
problem and from this random forest model; a proximity matrix xn t w xn t w 1 . . . xn s w$1
is derived, followed by the generation of multidimensional
n
scaling, as described in Section 2.4. A set (HG) of a random forest The test matrix consists of columns hj with j t + 1, t + 2, y,
models are then constructed to enable direct mapping from t + S.
lagged trajectory matrices to MDS features, one for each feature,
since random forests cannot handle more than one output at a 2.5.3. Computation of test statistics
time. Another set (HH) of w random forest models are constructed The sum of squared Euclidian distances of test matrix columns
n
to enable reverse mapping from the feature space to the original hj to the a-dimensional space defined by the base matrix serves
lagged variable space (one random forest for each variable). as a detection statistic from which a diagnostic sequence can be
The result is that the time series data in the normal operating built. To determine the distance of the new lagged test matrix to
region of the process is represented by a low-dimensional set of the reference base matrix, the following three steps are followed.
features with a set of random forest models (HG) to map new data The test data are projected to the non-linear manifold by using
to this feature space, as well as a different set of random forest the existing regression random forest mapping models (HG). The
models (HH) to reconstruct the time series data from the mapped test data features are used to determine the reconstructed test
features. data in w dimensions, using the regression random forest reverse
mapping models (HH). The distance metric for the test window eT
is calculated as follows:
s
X
eT ^ n :2
:Hjn $H j
j t1
^ n reconstructed test data in w dimensions
with H 9
j
The sum of squared distances is normalized in terms of the

number of elements in the test matrix
1
e~ T eT 10
wS
In order to compensate for possible overfitting of the random
forest mapping and demapping ensembles (HG and HH), a fraction of
the base matrix samples are not included in training, but rather forms
a validation set. The distance metric for validation base conditions (eV)
is calculated as for eT (9), with the validation base matrix and
reconstructed validation base matrix substituting the test matrices
data, and the relevant number of samples used for normalization.
The relative distance metric e is defined as the ratio of the
distance metric for the test window to the distance metric for the
validation base window samples
e~
e ~T 11
eV
2.6. Detection threshold
The diagnostic is the ratio of, on the one side, the mean sum of
square residuals between the test lag matrix and reconstructed
test lag matrix, and on the other, the mean sum of square
residuals between the validation base lag matrix and recon-
structed validation base lag matrix. Given that the proposed
method is an unsupervised approach, certain assumptions are
inevitable. For the RF diagnostic, it is assumed that the first
2 % b+ w instances of the time series can be considered as
representing conditions where no change has occurred. This
assumption should be considered during data preprocessing, and Fig. 4. Test matrix positions for different parameter settings: (a) w b/2; t w+1;
the selection of the change detection technique to use. s b+ 1; (b) w b/2; t b; s b+ w; (c) wb/2; t b; s b + 1.
When the assumption is valid, the next question is the actual
value of the threshold. One could determine the distribution to 2.7.2. Length and location of test samples t and s
which the diagnostic statistic would belong, estimate the para- Choose t Z b. If the difference between t and s is too large (a
meters of said distribution, and select a threshold to represent a large test window), then the behaviour of e becomes too smooth.
certain Types I and II error range. As the base window cannot be If it is too small (a small test window), e is sensitive to noise.
excessively large (to avoid smoothing over changes), only a
restricted sample size is available to estimate the distribution
parameters. To circumvent the issues of picking a distribution and 2.7.3. Window width b
collecting enough samples to estimate the distribution parameters Choice of b depends on what kind of structural changes is
accurately, a simple percentile approach is used. searched for. If b is too large, changes can be missed or smoothed
The relative distance metric for the first 2 % b+ w instances of out. If b is too small, an outlier may register as a structural change.
the time series is calculated and the upper threshold ecrit defined
as an 1 $ a% percentile of this sequence. 2.7.4. Subset of parameter settings
To obtain a robust diagnostic sequence, three parameters
settings are used, and the average inspected. The three settings as
2.7. Choice of parameters
used in Moskvina and Zhigljavsky (2003) are applied, as
demonstrated in Fig. 4.
Significant changes in time series structure will be detected for
any reasonable choice of parameters (Moskvina & Zhigljavsky,
2003). To detect a small change in noisy data, tuning of
3. Change point detection based on singular spectrum
parameters may be required:
analysis (SSA)
2.7.1. Choice of lag w and number of features a The random forest approach to change point detection in time
The recommendation (Moskvina & Zhigljavsky, 2003) is to series data described in Section 2 is a nonlinear counterpart of an
choose w b/2 and the first a components to provide a good approach previously described by Moskvina and Zhigljavsky
description of the signal. (2003) and applied by others (Salgado & Alonso, 2006; Palomo,
Sanchis, Verdu, & Ginestar, 2003). This approach is based on the ' T3: 500 samples of data generated by an autoregressive
use of principal component analysis (PCA) and for comparative process of the form x(t + 1)fx(t)+ e(0, 0.1), where e has a zero
purposes, both algorithms will be evaluated in the case studies mean Gaussian distribution with a variance of 0.1. In the first
described in Sections 4 to 6. A brief summary of the PCA-based 250 samples, parameter f0.9 and in the 2nd 250 samples,
approach is therefore in order. f0.5 to simulate an abrupt shift in the autocorrelation of
The SSA and RF algorithms are similar in that two segments of the data.
the multivariate time series are likewise embedded into a base ' T4: Two cross-correlated time series of 500 samples each. The
and a test matrix. The SSA approach differs from the random first 250 time series samples have a covariance matrix of
forest model, in that principal component analysis or single value ' (
decomposition is used to embed the data (Moskvina & Zhigljavsky, 1 0
C
2003). As a result, it is not necessary to construct forward and 0 1
reverse models to map new data to the feature space (principal while the last 250 observations have a covariance matrix of
components) or to reconstruct time series data from the feature ' (
space, such as required by the random forest approach (HG 1 0:8
C
and HH). The details of the SSA change point detection approach 0:8 1
are found in Moskvina and Zhigljavsky (2003). In summary, This simulates an abrupt shift in the correlation between the
the distance metric is the sum of squared distances between the variables. Both algorithms were constructed on the first 250
actual test lag matrix and the SSA reconstruction of the test lag observations of each of the four systems, with parameters as
matrix, using a eigenvectors for projection (U). The detection summarized in Tables 1 and 2. The subscripts in t and s refer to
diagnostic is normalized by dividing by the number of elements in the three different settings for the time lag and test matrix
the test matrix, and standardized by dividing by the normalized parameters.
sum of squared distances e~ B between the actual base matrix and tj and sj (for j 1, 2, 3) correspond with each of the base and
the SSA reconstruction of the base matrix test matrices shown in Fig. 4. As shown in Table 1, the
Xs %# $T # $T & dimensionality of the subspace to which the data were projected
eT Hn
j
Hjn $ Hn
j
UUT Hn
j
12 in the SSA-based algorithm ranged from 13 to 38 (a in Table 1),
j t1
explaining more than 90% of the variance in the aggregated
trajectory matrix. A significantly lower-dimensional subspace was
1
e~ T eT 13 generated with the random forest model, where a 2 could
wS
explain more than 80% of the variance in all four data sets.
e~ The performance of the two algorithms on the T1, T2, T3 and T4
e ~T 14 data is given in Figs. 58. In all these figures, the observations
eB
themselves are shown on top, while the average diagnostic
sequences of the SSA and random forest models are shown in the
3.1. RF and SSA parameter selection middle and bottom panels, respectively.
From Fig. 5, it is evident that the random forest algorithm
For both algorithms, the parameters were selected as follows: detected the mean shift at the time index of 250, while the SSA
method did. The random forest diagnostic sequence shows a lag
' An appropriate base matrix extent (b) value was chosen based before returning to a condition below the detection threshold,
on the autocorrelation function of the multivariate time series once the base window enters the mean-shifted section of the time
that yielded the largest lag value before assuming an series and the model is updated. Both algorithms flagged a change
insignificant value. in the variance of the time series in Fig. 6, with a longer delay for
' The lag parameter (w) was calculated as proposed by Moskvina the SSA based approach. In Fig. 7, it can be seen that the SSA and
and Zhigljavsky (2003) as w b/2. RF algorithms both failed to detect the change in the autocorrela-
' The parameters t and s were likewise calculated, as used by tional structure of the T3 data change after the 250th sample.
Moskvina and Zhigljavsky (2003). The change in the correlation between the two time series at
the 250th sample in data set T4, was detected by the RF algorithm
A significance level of 1 $ a 0.99 was employed.
Table 1
SSA parameter values for case study 1.
4. Case study 1: Simulated time series data
Data set b w a t1 t2 t3 s1 s2 s3 Sla
In the first case study, four systems (T1, T2, T3, T4) are
T1 50 25 19 26 50 50 51 75 51 90.8
considered. The first three (T1T3) represent univariate time T2 50 25 22 26 50 50 51 75 51 91.0
series observations and the fourth (T4) multivariate time series. T3 50 25 13 26 50 50 51 75 51 90.7
The systems are briefly described below T4 50 25 38 26 50 50 51 75 51 90.0
' T1: 500 samples, of which the first 250 are identically,
independently distributed Gaussian data, with zero mean Table 2
RF parameter values for case study 1.
and unit variance. The last 250 samples have a mean of 2
and unit variance to simulate an abrupt mean shift in Data set b w a t1 t2 t3 s1 s2 s3 R2a
the data.
' T2: 500 samples, of which the first 250 are identically, T1 50 25 2 26 50 50 51 75 51 0.834
T2 50 25 2 26 50 50 51 75 51 0.840
independently distributed Gaussian data, with zero mean
T3 50 25 2 26 50 50 51 75 51 0.904
and unit variance. The last 250 samples have a mean of 0 and a T4 50 25 2 26 50 50 51 75 51 0.849
variance of 2 to simulate an abrupt shift in the variance of
the data. a
Determined with random forest model.
Fig. 5. Change point detection in simulated dataset (T1).
(with a delay), but not by the SSA algorithm. The SSA algorithm It has the following chemical scheme (Zhang et al., 1993):
exhibited erratic behavior, as can be seen from the scale of the
X Y H-2V 15
vertical axis of the graph in the middle panel.
Y A 2H-V X 16
2X-V 17
5. Case study 2: Simulation of nonlinear chemical reactions
1
5.1. BelousovZhabotinsky (BZ) reaction X A H-X Z 18
2
The BelousovZhabotinsky (BZ) reaction is an unstable 1

X Z- X 19
chemical reaction that maintains self-oscillations and propagating 2
waves, which may display chaos under certain conditions. A
simplified model by Gyorgyi and Field (1991) is considered here. V Z-Y 20
Z M-products 21 r5 c5 X)Z), c5 7000: 26

In (15)(21), A BrO$
3,

H H and M MA are chemical
r6 gc6 Z)V), c6 0:09 27
species with constant concentrations, while the concentrations of
Y Br $ , X HBrO2, ZCe(IV) and V BrMA are the model
variables. The rates (r) and rate constants (c) are (with [.] r7 bc7 M)Z), c7 0:23 28
indicating concentration of relevant reactant)
By assuming that Y is a fast variable, the system of scaled
r1 c1 H)X)Y), c1 4:0 % 106 22 differential equations is
r2 c2 A)H)2 Y), c2 2 23 "

dx c2 AH2 Y0 2
T $c1 HY0 xy! !
y$2c3 X0 x
r3 c3 X)2 , c3 3000 24 dm X0
(
1 1
0:5 1:5 0:5 c4 A0:5 H1:5 X$0:5
0 C$Z0 zx0:5 $ c5 Z0 xz$cf x 29
r4 c4 A) H) C$Z)X) , c4 55:2 25 2 2
' % & (
dz C system states. As indicated in Fig. 9, the change in the parameters
T c4 A0:5 H1:5 X0:5 $z x0:5 $c5 X0 xz$gc6 V0 zv$hc7 Mz$cf x
dm 0
Z0 of the system resulted in an outward shift of the trajectory.
30
" # 5.2. Autocatalytic reaction
dv 2c1 HX0 Y0 c2 AH2 Y0 c3 X20 2
T xy! y! x $gc6 Z0 zv$cf v 31
dm V0 V0 V0 The autocatalytic process considered in this simulation con-
where m t=T and T 10c2 AHC $1
are the scaled time sists of two parallel, isothermal autocatalytic reactions taking
with scaling factors: x X/X0, X0 c2AH2/c5, Y0 4c2AH2/c5, place in a continuous stirred tank reactor (CSTR) (Lee & Chang,
zZ/Z0, Z0 CA/(40M), v V/V0 and V0 4AHC=M 2 , which are 1996). The system is capable of producing self-sustained oscilla-
the scaled concentration variables. The approximation of the tions based on cubic autocatalysis with catalyst decay at certain
fast variable Y is parameters. The system proceeds mechanically as follows:
' ( A 2B-3B 33
gc6 Z0 V0 zv
c1 HX0 x c2 AH2 cf
y! 32 B-C 34
Y0
D 2B-3B 35
where C is the total cerium ion concentration and g, h and cf are
where A, B and C are participating chemical species. The reaction
adjustable parameters. Zhang et al. (1993) have shown chaotic
rates are governed by the following rate equations:
behaviour for these chemical conditions at A0.1 M, M0.25 M,
H0.26 M, C0.000833 M, g 6000/9 and h 8/23 for different $rA c1 A)B)2 36
windows of cf, which is the flow rate.
The reaction model (29)(31) was simulated with the ODE45 rC c2 B) 37
subroutine in MATLAB 7.2 with initial values of x0 0.0099,
z0 2.2001 and n0 0.4582. After the first 1000 stationary data $rD c3 D)B)2 38
points were obtained, a fault condition was introduced in the
where c1, c2 and c3 are the rate constants for the chemical
form of a slow drift in the flow rate cf, from 4.5 % 10 $ 4 s $ 1 to
reactions and [.] signifies concentrations. To describe the system
5.0 % 10 $ 4 s $ 1 over the next 1000 observations. For the last 1000
by three ordinary differential equations, the dimensionless
observations, the parameter cf was again kept constant. The
concentration, ratios of species in feed and dimensionless time
parameters used for change point detection of the BZ reaction
are described as follows:
time series are given in Table 2.
In order to visualize the dynamics of the reaction system and A) D) B)
X , Y , Z 39
the simulated parameter drift, singular value decomposition was A)0 D)0 B)0
performed on the covariance matrix of the lagged trajectory
matrix of the entire time series. The three-dimensional scores c1 B)20 V c3 B)0 V c2 V
were calculated with principal component analysis from the lag A , B , D 40
Q Q Q
matrix, and are displayed in Fig. 9. The vertical legend on the right
hand side indicate the history of the system, with dark blue A)0 D)0
g1 , g2 41
indicating older systems states, and red indicating more recent B)0 B)0
Fig. 9. BZ reaction system unfolded in phase space.

t:Q identical in each data set, as indicated in the 4th column of each
m 42
V table. tj and sj (for j 1, 2, 3) correspond with each of the base and
The ordinary differential equations governing the system are test matrices shown in Fig. 4.
subsequently defined by Fig. 11 shows the progression of variable Y in the BZ reaction
as a time series (top panel). The middle and bottom panels show
dX
1$X$AXZ 2 43 the performance of the SSA- and RF-based algorithms,
dm
respectively. Only the RF-based algorithm could detect the
dY change (at around the 1300th observation). The SSA-based
1$Y$BYZ 2 44 algorithm failed to detect any change in the system and instead
dm
gave a false alarm at the 900th sample.
dZ Fig. 12 shows the progression of variables X and Y in the
1$1 DZ g1 AXZ 2 g2 BYZ 2 45
dm autocatalytic reaction as a time series (top panel). The middle and
Lee and Chang (1996) have shown that for A18,000; B 400; bottom panels show the performance of the SSA- and RF-based
D 80; g1 1.5; g2 4.2 and with initial conditions X0 0; Y0 0; algorithms, respectively. Only the RF-based algorithm could
Z0 0, the system exhibits chaotic behaviour. detect the change (at around the 2000th observation), although
The system (43)(45) was simulated with the ODE45 sub- the diagnostic is noisy. The SSA-based algorithm failed to detect
routine in MATLAB 7.2 at a sampling rate of 0.005. For the first any change in the system and instead gave several false alarms in
1000 observations the same parameters as specified by Lee and the first 1000 samples.
Chang (1996) were used. A simulated fault condition was then
introduced through a slight increase in the feed concentrations of
A and D, reflected in an increase in the parameters g1 and g2. The
Table 3
parameters g1 and g2 were allowed to drift slowly from their SSA parameter values for case study 2.
starting values, g1 1.5 and g2 4.2, to g1 1.55 and g2 4.25,
after which the parameters were kept constant at their new Data set b w a t1 t2 t3 s1 s2 s3 Sla
values, g1 1.55 and g2 4.25, for another 1000 observations.
BZ reaction 100 50 3 51 100 100 101 150 101 89.6
Fig. 10 shows a multivariate embedding of the samples of the Y Autocatalytic 100 50 9 51 100 100 101 150 101 91.2
and Z variables, where the colour coding gives an indication of the
history of the data exemplifying the gradual transition between
two equilibrium states (red and blue). As before, the attractor
shifted only slightly.
Table 4
RF parameter values for case study 2.
5.3. Monitoring of the nonlinear reaction system
Data set b w a t1 t2 t3 s1 s2 s3 R2a
The first 1000 samples were used to calculate the random
BZ reaction 100 50 3 51 100 100 101 150 101 0.984
forest and singular spectrum based change point detection Autocatalytic 100 50 9 51 100 100 101 150 101 0.886
models. The parameters of both models are shown in Tables 3
and 4. The dimensionality of the SSA and RF subspaces were a
Determined with random forest model.
Fig. 10. Autocatalytic reaction system unfolded in phase space.

Fig. 11. Detection of parameter drift in B-Z system by means of SSA and RF algorithms. A 90%-confidence limit was used to signal change.
Fig. 12. Detection of parameter drift in autocatalytic system by means of SSA and RF algorithms.
6. Parameter and noise sensitivity To investigate the effect of additive noise on the RF diagnostic,
different levels of Gaussian noise were added to time series T1
When using the parameter groupings proposed by Moskvina employed in case study 1. Three levels of Gaussian noise, with
and Zhigljavsky (2003), the only free parameters to estimate are a standard deviations of 10%, 20% and 50% of the difference in
and b. The number of multiple components required can be means of the signal before and after the change were added. The
accurately determined from reconstruction errors. The choice of b parameter b was set to 50. The results are shown in Fig. 14. From
is not as simple. To investigate the effect of b on the RF diagnostic, Fig. 14, the RF diagnostic seems only to be adversely affected
different values of b were used for detecting change in time series when 50% Gaussian noise is added. This suggests that the RF
T1 employed in case study 1. The resulting diagnostic sequences relative distance diagnostic is, for at least this time series, not
are given in Fig. 13. sensitive to reasonable noise levels.
From Fig. 13, it is apparent that a low value for b (10) results in
a noisy diagnostic sequence. As b increases, the diagnostic
sequence becomes less noisy. The probable indicated change 7. Discussion and conclusions
points for b-values of 3090 remain around the actual change
point of 250. For this time series, the RF diagnostic sequence is In this paper, the use of random forest algorithms to detect
robust to changes in b, bar quite small values of b. changes in time series data is proposed and compared with the
Fig. 13. Random forest change point metrics for different parameter settings on T1.
Fig. 14. Random forest change point metrics for different noise levels on T1.
methodology of singular spectrum analysis, previously proposed Andreaou, E., & Ghysels, E. (2006). Monitoring disruptions in financial markets.
by Moskvina and Zhigljavsky (2003). The two methodologies are Journal of Econometrics, 135(12), 77124.
Apostolopoulos, G. (2008). On-line statistical processing of radiation detector
similar in that two windows in series are slided across the time pulse trains with time-varying count rates. Nuclear Instruments and Methods in
series data being monitored and projections of the data in the Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated
leading window are compared with projections of data in Equipment, 595(2), 464473.
Bartlett, & Uhrig (1991). Nuclear power plant diagnostics using artificial neural
the trailing window. However, they differ in that way in which networks, Proceedings of the Frontiers in Innovative Computing for the Nuclear
the data are projected and reconstructed for comparative Industry (pp. 644653). Jackson, Wyoming, USA, September 1518.
purposes. Among other, singular spectrum analysis is a linear Basseville, M, & Nikiforov, I. V. (1993). Detection of abrupt changes: theory and
application. Engelwood Cliffs, NJ, USA: Prentice-Hall.
method of projection, based on the covariance matrix of the Boller, C. (2000). Next generation structural health monitoring and its integration into
observed variables and lagged copies of these variables. aircraft design. International Journal of Systems Science, 31(110), 13331349.
In contrast, random forests generate nonlinear projections of Borguet, S., & Leonard, O. (2009). Coupling principal component analysis and
Kalman filtering algorithms for on-line aircraft engine diagnostics. Control
the data from a rank based proximity matrix. In principle, this
Engineering Practice, 17(4), 494502.
gives an advantage over the linear projections in that these Breiman, L., & Cutler, A. (2003). Random Forests Manual v4.0. Technical report. UC
nonlinear projections can approximate more complex data Berkeley, /ftp://ftp.stat.berkeley.edu/pub/users/breaiman/Using_random_for-
manifolds than linear projections based on linear principal ests _v4.0.pdfS.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 532.
component models. Camci, F., & Chinnam, R. B. (2008). General support vector representation machine
In the case studies considered in this investigation, the random for one-class classification of non-stationary classes. Pattern Recognition,
forest based algorithms performed better than the algorithms 41(10), 30213034.
Carslaw, D. C., & Carslaw, N. (2007). Detecting and characterising small changes in
based on singular spectrum analysis, especially with regard to the urban nitrogen dioxide concentrations. Atmospheric Environment, 41(22),
two nonlinear dynamic systems. Unlike the singular spectrum 47234733.
approach, the main drawback of the random forest algorithm is Gombay, E., & Serban, D. (2009). Monitoring parameter change in AR(p) time series
models. Journal of Multivariate Analysis, 100(4), 715725.
the high computational expense that scales O(N2) with an Gyorgyi, L., & Field, R. J. (1991). Simple models of deterministic chaos in the
increase in the observation sample size. Despite this, the use of Belousov-Zhabotinsky reaction. Journal of Physical Chemistry, 95(17), 6594
random forests to monitor time series data appears to be 6602.
Ha, K.-J., & Ha, E. (2005). Climatic change and interannual fluctuations in the long-
promising and justifies further work to be done in this area. term record of monthly precipitation for Seoul. International Journal of
Climatology, 26(5), 607618.
Hohle, M., Paul, M., & Held, L. (2009). Statistical approaches to the monitoring and
References
surveillance of infectious diseases for veterinary public health. Preventive
Veterinary Medicine, 91(1), 210.
Andersen, T., Carstensen, J., Hernandez-Garca, E., & Duarte, C. M. (2009). Ecological Kim, A. Y., Marzban, C., Percival, D. B., & Stuetzle, W. (2009). Using labeled data to
thresholds and regime shifts: approaches to identification. Trends in Ecology & evaluate change detectors in a multivariate streaming environment. Signal
Evolution, 24(1), 4957. Processing, 89(12), 25292536.
Lee, J. S., & Chang, K. S. (1996). Applications of chaos and fractals in process Qing, X. P., Chan, H.-L., Beard, S. J., & Kumar, A. (2006). An active diagnostic system
systems engineering. Journal of Process Control, 6(2/3), 7187. for structural health monitoring of rocket engines. Journal of Intelligent Material
Li, Y. G., & Nilkitsaranont, P. (2009). Gas turbine performance prognostic for Systems and Structures, 17(7), 619628.
condition-based maintenance. Applied Energy, 86(10), 21522161. Salgado, D. R., & Alonso, F. J. (2006). Tool wear detection in turning operations
Lopez, I., & Sarigul-Klijn, N. (2009). Distance similarity matrix using ensemble of using singular spectrum analysis. Journal of Materials Processing Technology,
dimensional data reduction techniques: Vibration and aerocoustic case 171, 451458.
studies. Mechanical Systems and Signal Processing, 23(7), 22872300. Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors.
Luo, Y., Li, Z., & Wang, Z. (2009). Adaptive CUSUM control chart with variable Journal of Computational and Graphical Statistics, 15(1), 118138.
sampling intervals. Computational Statistics and Data Analysis, 53(7), 26932701. Shu, L., Jiang, W., & Wu, Z. (2008). Adaptive CUSUM procedures with
Messaoud, A., & Weihs, C. (2009). Monitoring a deep hole drilling process by nonlinear Markovian mean estimation. Computational Statistics & Data Analysis, 52(9),
time series modeling. Journal of Sound and Vibration, 321(35), 620630. 43954409.
Moskvina, V., & Zhigljavsky, A. (2003). An algorithm based on singular spectrum Srivastav, A., Ray, A., & Gupta, S. (2009). An information-theoretic measure for
analysis for change-point detection. Communications in Statistics: Simulation anomaly detection in complex dynamical systems. Mechanical Systems and
and Computation, 32(2), 319352. Signal Processing, 23(2), 358371.
Nielsen, A.A, & Canty, M.J. (2008). Kernel principal component analysis for change Staudacher, M., Telser, S., Amann, A., Hinterhuber, H., & Ritsch-Marte, M. (2005). A
detection. SPIE Europe Remote Sensing Conference. Cardiff, Great Britain, 1518 new method for change-point detection developed for on-line analysis of the
September 2008. http://www.imm.dtu.dk/pubdb/p.php?5667. heart beat variability during sleep. Physica A: Statistical Mechanics and its
Ogata, Y. (1989). Statistical model for standard seismicity and detection of Applications, 349(34), 582596.
anomalies by residual analysis. Tectonophysics, 169(13), 159174. Tartakovsky, A. G., Rozovskii, B. L., Blazek, R. B., & Kim, H. (2006). A novel approach
Oh, K. J., & Han, I. (2000). Using change-point detection to support artificial neural to detection of intrusions in computer networks via adaptive sequential and
networks for interest rates forecasting. Expert Systems with Applications, 19(2), batch-sequential change-point detection methods. IEEE Transactions on Signal
105115. Processing, 54(9), 33723382.
Oh, K. J., Moon, M. S., & Kim, T. Y. (2005). Variance change point detection via Wang, W. J., Chen, J., Wu, X. K., & Wu, Z. T. (2001). The application of some non-
artificial neural networks for data separation. Neurocomputing, 68, 239250. linear methods in rotating machinery fault diagnosis. Mechanical Systems and
Olier, I., & Vellido., A. (2008). Advances in clustering and visualization of time Signal Processing, 15(4), 697705.
series using GTM through time. Neural Networks, 21(7), 904913. Zang, C., & Imregun, M. (2001). Structural damage detection using artificial neural
Palomo, M. J., Sanchis, R., Verdu, G., & Ginestar, D. (2003). Analysis of pressure networks and measured frequency response function data reduced via
signals using a singular system analysis (SSA) methodology. Progress in Nuclear principal component projection. Journal of Sound and Vibration, 242(5),
Energy, 43(14), 329336. 813827.
Piryatinska, A., Terdik, G., Woyczynski, W. A., Loparo, K. A., Scher, M. S., & Zlotnik, Zhang, D., Gyorgyi, L., & Peltier, W. R. (1993). Deterministic chaos in the Belousov-
A. (2009). Automated detection of neonate EEG sleep stages. Computer Methods Zhabotinsky reaction: Experiments and simulations. Chaos, 3(4), 723745.
and Programs in Biomedicine, 95(1), 3146.

Random Forests Detect Time Series Change Points

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Random Forests Detect Time Series Change Points

Uploaded by

Copyright:

Available Formats

Control Engineering Practice 18 (2010) 9901002

Contents lists available at ScienceDirect

Control Engineering Practice

Change point detection in time series data with random forests

Nomenclature s similarity measure element

The sum of squared distances is normalized in terms of the

2.6. Detection threshold

Fig. 5. Change point detection in simulated dataset (T1).

Fig. 6. Change point detection in simulated dataset (T2).

The BelousovZhabotinsky (BZ) reaction is an unstable 1

Fig. 7. Change point detection in simulated dataset (T3).

Fig. 8. Change point detection in simulated dataset (T4).

Z M-products 21 r5 c5 X)Z), c5 7000: 26

r2 c2 A)H)2 Y), c2 2 23 "

Fig. 9. BZ reaction system unfolded in phase space.

Fig. 10. Autocatalytic reaction system unfolded in phase space.

You might also like