Professional Documents
Culture Documents
Abstract
Support vector machine, chaos theory, and particle swarm optimization are combined to build the prediction model of
dam safety. The approaches are proposed to optimize the input and parameter of prediction model. First, the phase
space reconstruction of prototype monitoring data series on dam behavior is implemented. The method identifying
chaotic characteristics in monitoring data series is presented. Second, support vector machine is adopted to build the
prediction model of dam safety. The characteristic vector of historical monitoring data, which is taken as support vector
machine input, is extracted by phase space reconstruction. The chaotic particle swarm optimization algorithm is intro-
duced to determine support vector machine parameters. A chaotic support vector machinebased prediction model of
dam safety is built. Finally, the displacement behavior of one actual dam is taken as an example. The prediction capability
on the built prediction model of dam displacement is evaluated. It is indicated that the proposed chaotic support vector
machinebased model can provide more accurate forecasted results and is more suitable to be used to identify efficiently
the dam behavior.
Keywords
Dam safety, prediction model, support vector machine, chaos theory, particle swarm optimization
system can be extracted from the historical data. The m are chosen. According to x(t), t, and m, a new series
nonlinear model is selected to approximate the dynamic is obtained as follows
system characteristics in the reconstructed phase space,
and the behavior within a certain period can be fore- X t = fxt, xt + t , . . . , xt + m 1t g 1
casted. Considering chaotic characteristics in prototype
where t = 1, 2, ., M, M = N 2 (m 2 1)t. This m
monitoring data series on dam safety, this article
dimensional state space, which consists of observed
adopts the phase space reconstruction technology to
value and delay time values, is the reconstructed phase
extract the characteristic vector of historical monitoring space. Obviously, the determination of delay time t
data, which is taken as SVM input. The chaotic sup- and embedding dimension m in equation (1) is a key
port vector machine (CSVM)-based prediction model step of phase space reconstruction.
of dam safety is studied. The delay time t and embedding dimension m are
The SVM parameters influence remarkably the gen- traditionally determined separately. In fact, the certain
eralization ability of SVM-based model. As the conven- correlation between above two parameters exists. The
tional approaches, the gird search method and genetic quality of phase space reconstruction is affected by
algorithm can be used to select the SVM parameters. independently choosing appropriate delay time t and
However, the gird search method needs to spend larger embedding dimension m and especially the determina-
calculation and longer operation time. The genetic tion of embedding window width t w = (m 2 1)t.
algorithm is often very complicated and needs to design In the C-C method proposed by Kim et al.,11 the
different crossover or mutation modes for different statistics constructed with the correlation integral of
optimization problems. The particle swarm optimiza- embedding time series is used to describe the correla-
tion (PSO), which searches the optimal solution tion of nonlinear time series. The delay time t and
through the individual collaboration, has the advan- embedding window width tw are estimated, and the
tages such as simple structure and easy implementation. embedding dimension is calculated. The C-C method
However, it is difficult to ensure the quality of initial has the advantages such as small calculation and easy
particles and is easy to fall into local optimal solution operation. A detailed description on the C-C method is
at a later stage. In this article, the chaotic particle as follows.
swarm optimization (CPSO) is introduced to obtain the Assume that x(n) (n = 1, 2, ., N) represents a mon-
optimal parameters of SVM model. itoring data series which will be taken to carry out the
phase space reconstruction, t is the delay time, and m is
the embedding dimension. The number of phase points
Phase space reconstruction and chaotic is M = N 2 (m 2 1)t. The number of vector pairs
characteristics identification of prototype related with above M points in the phase space is calcu-
monitoring data series on dam behavior lated. The vectors with the distances less than a given
positive number r are called related vectors. In the
Phase space reconstruction method of prototype reconstructed phase space, the correlation integral of
monitoring data series embedding time series X(i) is defined as the proportion
To analyze the chaotic time series, the phase space of related vector pair number in M2 possible matching
reconstruction needs to be implemented in the first pairs, namely
place. It is considered that the evolution characteristic
of any component of one system is determined by other 1 X M
C m, N , r, t = ur kX i X jk 2
interactional components and is the comprehensive M 2 i, j = 1
reflection of various components interaction. The devel-
opment process of any component implies the informa- where
tion on other components of one system. So, the phase
space of one system can be reconstructed by revealing 0, x<0
u x = 3
the implicit information in the component time series. 1, x.0
For the observed components, their observations at the
The time series x(n) (n = 1, 2, ., N) is divided into
delay points of some fixed time are regarded as new
t disjoint subsequences with the length of N/t, namely
components. The new components can be taken to
reconstruct an equivalent phase space. fx1, xt + 1, x2t + 1, . . .g
Assume that x represents the observed component, fx2, xt + 2, x2t + 2, . . .g
x(t) (t = 1, 2, ., N) is a monitoring data series. The ..
phase space reconstruction can be fulfilled as follows. .
The appropriate delay time t and embedding dimension fxt, xt + t, x2t + t, . . .g
Su et al. 641
The statistic of each subsequence S(m, N, r, t) is cal- Chaotic characteristic identification method of
culated as follows prototype monitoring data series
Chaotic characteristics of prototype monitoring data
1X t
series can be identified by calculating the characteristic
S m, N , r, t = fCl m, N =t, r, t
t l=1 parameters on strange attractor of chaotic signal. The
Cl 1, N =t, r, tm g 4 common parameters include the Lyapunov exponent
or largest Lyapunov exponent describing the diver-
where Cl is the correlation integral obtained by the lth gence rate of adjacent track, the correlation dimension
subsequence. describing the attractor dimension, and the
S(m, N, r, t) reflects the autocorrelation characteris- Kolmogorov entropy reflecting the information fre-
tic of time series. If the time series is independent and quency. Above three parameters are attractor invar-
identically distributed, then for the certain m and t, iants. In this article, the largest Lyapunov exponent
when N!N, S(m, N, r, t) is equal to 0 for all r. and correlation dimension are used to identify the
However, the actual time series is finite and there is a chaotic characteristics of prototype monitoring data
significant correlation. Therefore, S(m, N, r, t) is not series on dam behavior.
equal to 0 in general. The local maximum time can be
selected when S(m, N, r, t);t is the null point or when
S(m, N, r, t);t shows the minimum changes for all Lyapunov exponent. Chaotic system is sensitive to the ini-
neighborhood radius r. Under above cases, the points tial value. The slight variation of initial system state
in phase space are almost uniformly distributed, and will cause the exponential divergence of system beha-
the chaotic trajectory is completely unfolded in phase vior with time, but it will eventually converge to a
space. So the maximum and minimum radiuses r corre- strange attractor. Lyapunov exponent is a parameter
sponding to S(m, N, r, t) are chosen, and the following describing the divergence rate of adjacent track.
difference is defined Lyapunov exponentbased method is used to identify
the chaotic characteristics of prototype monitoring
DS m, N , t = maxS m, N , ri , t data series according to the diffusion motion of phase
orbit.
min S m, N , rj , t , i 6 j 5
A chaotic system has at least one Lyapunov expo-
DS(m, N, t);t can be used to measure the maximum nent which is greater than zero. It is an important fea-
deviation of S(m, N, r, t) for all neighborhood radius r. ture distinguishing strange attractor and other
The null points of S(m, N, r, t);t are same for all m attractors. So, only the maximum Lyapunov exponent
and r, and the minimum values of DS(m, N, t);t are l1 is usually estimated to implement the chaotic identi-
also same for all m. The delay time t corresponds to the fication of actual dynamic systems with Wolf method,
minimum value among the local maximum times t. Jacobian method, and small data setsbased method.12
It is indicated that, when 2 < m < 5, s/2 < r < 2s, The calculation steps of small data setsbased method
and N 500, the good results can be obtained. s is the with small calculation and easy operation can be
standard deviation of time series. According to equa- described as follows:
tions (4) and (5), the three statistics are defined as
follows 1. Determine the delay time t, the embedding dimen-
sion m and the average period p.
1 X 5 X 4 2. Reconstruct the phase space
S9t = S m, N , rj , t 6
16 m = 2 j = 2
X t = fxt, xt + t , . . . , xt + m 1t g
X
5 t = 1, 2, . . . , M :
1
DS9t = DS m, N , t 7
4 m=2 3. Search the nearest point X(t#) of every point X(t)
in the phase space and limit short separation,
Scor t = jS9tj + DS9t 8 namely
In the C-C method, the delay time t is regarded as
dt 0 = minkX t X t9k, jt t9j.p 9
the minimum between two local maximum times t cor- ^t
responding to the first null point of S#(t) and the first
minimum of DS#(t), and the embedding window width where t# = 1, 2, ., M and t6t#.
tw is regarded as the time t corresponding to the mini- 4. Calculate the distance dt(i) of neighborhood point
mum of Scor(t). for every point X(t) in the phase space after i
642 Structural Health Monitoring 15(6)
case such as h = 1, the following first component of variables with same number of optimized variables are
X(t + h) can be obtained from equation (17) generated. The chaos is added to the optimized vari-
ables with a similar way of signal carrier and the vari-
xt + 1 = f fxt, xt t , . . . , xt m 1t g 18 ables present the chaotic state. At the same time, the
ergodic range of chaotic motion is extended to the
Obviously, {x(t), x(t 2 t), ., x[t 2 (m 2 1)t]} can
value range of optimized variables. Then, the chaotic
be regarded as the input vector of SVM which is used
variables are directly taken to search. Because the chao-
to fulfill the regression prediction of x(t + 1).
tic motion has the characteristics such as randomness,
ergodicity, and sensitivity to initial conditions, the
CSVM parameter selection chaos-based search is better than other random
The SVM parameter selection has a great influence on searches.
the learning effect and generalization ability of SVM The selection on penalty factor C and RBF kernel
which is used to carry out the regression analysis.1518 function parameter g is equivalent to solving the two-
The gird search method is usually used to select the dimensional (2D) optimization problem as follows
penalty factor C and the parameter g of radial basis
min f x1 , x2
function (RBF) kernel function. In fact, it searches the 20
s:t: ai <xi <bi , i = 1, 2
optimal parameter combination from an exhaustive
parameter combination series, which needs to exhaust where the optimized variable x1 represents the penalty
long time. Therefore, some intelligent optimization factor C, the optimized variable x2 represents the para-
algorithms such as PSO are combined with SVM to meter g of RBF kernel function. The objective function,
search the SVM parameters. PSO first initializes ran- namely, the fitness value f, is to average the forecast
domly a particle group in a search space. The position mean square errors of k validation sets. Considering
of each particle is a solution of optimization problem. that the big penalty factor C can cause the decrease in
Each particle has a fitness value measuring its perfor- model generalization ability, the penalty factor C
mance, and there is a speed which decides the flight should be controlled as small as possible under the pre-
direction and distance of each particle. Then, the parti- mise of ensuring a certain prediction accuracy on cross
cles track the current optimal particle, dynamically validation of training set.
adjust their speeds and positions, and the optimal solu- The CPSO can be used to select the penalty factor C
tion is found by iteration. In each iteration, the particle and the RBF kernel function parameter g as follows.
updates itself by tracking two following extremes. One The flowchart is shown in Figure 1:
is the individual extreme pbest, namely, the current
optimal solution found by the particle itself. Another is 1. Implement the chaotic particle initialization.
the global extreme gbest, namely, the current optimal Generate randomly one 2D vector z1 = (z11, z12),
solution found by the entire population. Compared
and its components are between 0 and 1. According
with genetic algorithm, PSO algorithm finds the opti-
to equation (20), zi + 1j = 4zij(1 2 zij), i = 1, 2, .,
mal solution through individual collaboration. The
N 2 1 and j = 1, 2. N particles, z1, z2, ., zN, can
PSO algorithm has two shortcomings.19 One is that the
be obtained.
initialization process is random, but the quality of each Add the components of zi to the optimized variables
particle cannot be guaranteed. Another is that it is easy
with the signal carrier way. xij = aj + (bi 2 aj)zij,
to fall into the local optimal solution at a later stage.
i = 1, 2, ., N and j = 1, 2. N particles, x1, x2, .,
As an improved PSO, the CPSO can take advantage
xN, can be generated.
of chaotic motion ergodicity to find a particle swarm Calculate the fitness values, fN 3 1, for all particles,
with good individual quality when the particle swarm is
and select m ones with better performance among N
initialized. The chaotic disturbance on the particles is
particles to form initial particle swarm xm 3 2. For
carried out so as to make the solution out of local
above selected m particles, their fitness values fi and
extreme interval when the particles are updated.
the first components xi1 (namely, penalty factor C)
Logistic mapping is a typical chaotic system, and its
are both smaller. The initialization speed is vm 3 2
iterative formula can be expressed as follows
= zeros(m, 2).
Initialize the individual extreme, pxbest0 = xm 3 2.
zi + 1 = mzi 1 zi , i = 0, 1, 2, . . . , m 2 2, 4 19
The corresponding fitness value is pfbest0 = fm 3 1.
When the control parameters m = 4, 0 < z0 < 1, Initialize the global extreme, gxbest0, and the corre-
logistic is completely in a chaotic state. According to sponding fitness value, gfbest0, according to the fit-
the chaotic motion characteristics, the optimal search ness values fi and the first components xi1 (namely,
can be implemented as follows. A set of chaotic penalty factor C) of m particles.
644 Structural Health Monitoring 15(6)
Figure 3. Time curve on observed horizontal displacement of No. 5 dam section crest.
fitting and forecasting ability of built model and con- reconstruction of monitoring data series, namely x(t),
ventional model is compared. t = 1, 2, ., 1825, is implemented. The phase space
vector series is obtained as follows
Table 1. CPSO algorithm parameters. cross validation mean square error (CVMSE). When
the CVMSE difference between two parameter sets is
Parameter Value not more than 1025, the parameter set with smaller C is
Random particle number (N) 50 better.
Chaotic particle number (m) 10 The gird search method uses the logarithmic form,
Maximum iteration number (kmax) 10 namely, log2 C and log2 g, to construct the gird. The
Maximum particle speed (vmax) 0.6 3 28 change step length of log2 C and log2 g is 0.8, namely,
Acceleration constant (c1) 1.5 3 rand the change step length of C and g is 1.7411. Figure 8
Acceleration constant (c2) 1.7 3 rand
Chaotic disturbance range [2b, b] [21, 1] shows the search results on optimal parameters by the
gird search method. The obtained optimal parameters
are that the penalty factor C = 84.4485 and the RBF
kernel function parameter g = 0.0068. The corre-
the output variable. For the 1746 sample points of dam sponding CVMSE is 0.0026, and the consuming time is
displacement, the 1382 sample points observed from 168 s.
2003 to 2006 forms a training set in order to establish CPSO algorithm takes x = (C, g) as the particle,
the prediction model of dam displacement, and the 364 and CVMSE as the fitness value. Table 1 lists the para-
sample points observed in 2007 forms a test set in order meters related to CPSO algorithm. The search results
to judge the prediction performance of built model. on optimal parameters by CPSO algorithm are shown
in Figure 9. The obtained optimal parameters are that
the penalty factor C = 63.6498 and the RBF kernel
Model parameters selection. RBF kernel function is function parameter g = 0.0039. The corresponding
selected and the insensitive loss function parameter is CVMSE is 0.0026, and the consuming time is 88 s.
e = 0.001. The grid search method and the CPSO are
used to determine the penalty factor C and the para-
meter g of RBF kernel function. The ranges of C and g Prediction model of dam displacement and its
are both set as 22828. The evaluation indexes of model performance. The determined parameters and training
performance are taken as the penalty factor C and the set of CSVM are taken to train CSVM. The
648 Structural Health Monitoring 15(6)
Figure 10. Calculated results of CSVM-based model with optimal parameters searched by CPSO algorithm.
Table 2. Fitting and prediction abilities of two CSVM-based models with optimal parameters searched by CPSO (Model I) and by
gird search method (Model II).
CSVM-based prediction model of dam displacement displacement; however, CPSO algorithm spends less
can be obtained. Figure 10 shows the fitting and pre- time than the gird search method on searching the
diction results of CSVM-based prediction model with optimal parameters.
the optimal parameters searched by CPSO algorithm.
Table 2 lists the evaluation indexes on fitting and pre-
Conclusion
diction performances of two CSVM-based prediction
models with the optimal parameters searched by Considering the chaotic characteristics in prototype
CPSO algorithm and gird search method. It can be monitoring data series on dam behavior, the learning
seen from Table 2 that two models are well matched method of SVM and chaos theory are integrated to
in the fitting and prediction abilities of dam study the establishment problem on prediction model
Su et al. 649