Professional Documents
Culture Documents
a r t i c l e
i n f o
Article history:
Received 24 September 2010
Received in revised form 5 January 2011
Accepted 14 August 2011
Available online 1 September 2011
Keywords:
Particle swarm optimization (PSO)
Gradient descent
Global optimization techniques
Stochastic optimization
a b s t r a c t
Stochastic optimization algorithms like genetic algorithms (GAs) and particle swarm optimization (PSO)
algorithms perform global optimization but waste computational effort by doing a random search. On
the other hand deterministic algorithms like gradient descent converge rapidly but may get stuck in
local minima of multimodal functions. Thus, an approach that combines the strengths of stochastic and
deterministic optimization schemes but avoids their weaknesses is of interest. This paper presents a new
hybrid optimization algorithm that combines the PSO algorithm and gradient-based local search algorithms to achieve faster convergence and better accuracy of nal solution without getting trapped in local
minima. In the new gradient-based PSO algorithm, referred to as the GPSO algorithm, the PSO algorithm
is used for global exploration and a gradient based scheme is used for accurate local exploration. The
global minimum is located by a process of nding progressively better local minima. The GPSO algorithm
avoids the use of inertial weights and constriction coefcients which can cause the PSO algorithm to
converge to a local minimum if improperly chosen. The De Jong test suite of benchmark optimization
problems was used to test the new algorithm and facilitate comparison with the classical PSO algorithm.
The GPSO algorithm is compared to four different renements of the PSO algorithm from the literature
and shown to converge faster to a signicantly more accurate nal solution for a variety of benchmark
test functions.
2011 Elsevier B.V. All rights reserved.
1. Introduction
The problem of nding the global minimum of a function
with large numbers of local minima arises in many scientic
applications. In typical applications the search space is large and
multidimensional, prior information about the function is not
available and traditional mathematical techniques are not applicable. Global optimization is a NP complete problem and heuristic
approaches like genetic algorithms (GAs) and simulated annealing
(SA) have been used historically to nd near optimal solutions [1,2].
The particle swarm optimization (PSO) algorithm is a new sociologically inspired stochastic optimization algorithm introduced by
Kennedy and Eberhart in 1995 [35]. The PSO algorithm is easy to
implement, has few parameters, and has been shown to converge
faster than traditional techniques like GAs for a wide variety of
benchmark optimization problems [6,7]. Because of its simplicity,
ease of implementation, and high convergence rates, the PSO algorithm has been applied to a variety of problems like evolving the
structure as well as weights for articial neural nets, power system optimization, process control, dynamic optimization, adaptive
control and electromagnetic optimization.
354
general
unconstrained
function
f (x1 , x2 , . . . , xN )
f : RN R.
(1)
(2)
(3)
Vid (k + 1) = (Vid (k) + 1d (gid Xid (k)) + 2d (Gid Xid (k))
Xid (k + 1) = Xid (k) + Vid (k)
(4)
Use of inertial weight or constriction coefcient essentially narrows the region of search as the search progresses. In (3) the inertial
weight w(k) is initially set to a value of 1 but is reduced to 0 as the
iterations progress. A value near 1 encourages exploration of new
regions of the search space while a small value encourages exploitation or detailed search of the current region. In (4) the value of the
constriction coefcient x is less than 1, which reduces the velocity
step size to zero as the iterations progress. This is because when
the velocity is repeatedly multiplied by a factor less than one (x or
w(k)) the velocity becomes small and consequently the region of
search is narrowed.
However, since the location of the global minimum is not known
a priori, reducing the step size to a small value too soon might result
in premature convergence to a local minimum. Also, a step size
that is too small discourages exploration and wastes computational
effort. Thus, to accurately locate the global minimum without getting trapped in local minima, the step size must be reduced only in
the neighborhood of a tentative global minimum. However, there
is no procedure that allows the values of the inertial weight or the
constriction coefcient to be set or adjusted over the optimization period to manipulate the step size as needed to maintain an
optimal balance between exploration and exploitation in the PSO
algorithm. Proper choice of inertial weights or constriction coefcients is problematic because the number of iterations necessary
to locate the global minimum is not known a priori. This paper
presents an approach where the balance between exploration and
exploitation is achieved by using the PSO algorithm without constriction coefcient or inertial weight for global exploration and a
deterministic local search algorithm for accurate location of good
local minima. This approach is advantageous because it allows for
exploration of new regions of the search space while retaining the
ability to improve good solutions already found.
4. A new gradient based PSO (GPSO) algorithm
In a classical gradient descent scheme [12] a point is chosen randomly in the search space and small steps are made in the direction
of the negative of the gradient according to (5).
X(k + 1) = X(k) (C(X(k)))
(5)
where is the step size; X(k) is the approximation to the local minimum at iteration k and (C(X(k))) is the gradient of the cost function
evaluated at X(k). The gradient is the row vector composed of the
rst order partial derivatives of the cost function with respect to X.
The gradient can be computed using a forward difference approximation for the partial derivatives, or by more advanced methods.
With gradient descent, a large will result in fast convergence of
X(k) towards the minimum but once in the vicinity of the nearest local minimum oscillations will occur as X(k) overshoots the
minimum. On the other hand, a small step size will result in slow
convergence towards the minimum but the nal solution will be
more accurate. Since the choice of is problematic, a random step
size can be used; for example uniformly distributed in an interval
[0,0.5] might be used. Since the negative gradient always points in
the direction of steepest decrease in the function, the nearest local
minimum will be reached eventually. Since the gradient is zero at
a local minimum, smaller steps will automatically be taken when
a minimum is approached. Also, movement in a direction other
START
355
Set L = G
Set i = i + 1
i NP
YES
NO
Do for NG iterations { Deterministic
derivative based local search with G as
starting point and store the best solution
found in variable L }
Set L = L +1
YES
L LMAX
NO
STOP
solution found by the PSO algorithm as its starting point. If the best
solution found by the PSO algorithm (G) has a larger cost than the
nal solution found by local search during the previous iteration
(L), then L is used as the starting point for the local search. This
ensures that the local search is done in the neighborhood of the
best solution found by the GPSO in all previous iterations. Thus, the
PSO algorithm is used to go near the vicinity of a good local minima
and the gradient descent scheme is used to nd the local minimum accurately. Next, this accurately computed local minimum is
used as the global best solution in the PSO algorithm to identify
356
180
Table 1
Test functions used for comparison of GPSO and PSO algorithms.
160
Test function
Range of search
Equation
Sphere
[100,100]N
f1 (x) =
140
120
xi2
i=1
N
100
[100,100]N
Ellipsoid
80
f2 (x) =
ixi2
i=1
60
Rosenbrock
[100,100]
f3 (x) =
40
N1
i=1
20
0
N
[100,100]N
Rastrigin
200
400
600
800
1000
1200
1400
N
Griewangk
[600,600]N
f5 (x) = 1 +
1
4000
Fig. 2. Solid line shows convergence of GPSO algorithm with 50 local search iterations done every 10 PSO iterations. Dotted line shows convergence of GPSO
algorithm with 5 local search iterations done every PSO iteration. Total number of
PSO and local search iterations are 100 and 500 respectively in both cases. Rastrigins
test function was used.
still better local minima and the cycle is repeated. In this way the
GPSO algorithm locates the global minimum by locating progressively better local minima. Fig. 1 below shows the owchart for the
GPSO algorithm. The number of particles NP , the number of iterations LMAX and the number of local search iterations NG are dened
at the start of the algorithm.
In this paper the quasi NewtonRaphson (QNR) algorithm was
used to perform the local search in Fig. 1. The QNR algorithm
optimizes by locally tting a quadratic surface and nding the minimum of that quadratic surface. The QNR algorithm uses the second
derivative or Hessian and makes use of information about the curvature in addition to the gradient information. The GPSO algorithm
given in Fig. 1 can be generalized by using more powerful local
search techniques to rene the best solution found in the PSO portion. Derivative free methods like NelderMead Simplex method
can be used for the local search when the objective function is not
continuously differentiable.
Effect of performing the local search periodically instead of
every PSO iteration as in Fig. 1 was considered. Fig. 2 shows that
for multimodal functions distributing local search iterations evenly
among the PSO iterations leads to high convergence rates. In the
results presented the best solution found is plotted as a function
of number of function evaluations rather than iterations. Convergence of different optimization algorithms cannot be done based
on iterations because an algorithm can do more computation per
iteration and appear to converge faster. Hence the number of function evaluations provides a more objective measure of convergence
rates than number of iterations.
10
10
10
10
10
10
xi2
i=1
+ (1 xi )
xi2 10 cos(2xi )
i=1
N
1600
FUNCTION EVALUATIONS
f4 (x) = 10N +
2
N
cos
i=1
10
-5
------ GPSO
___ PSO
-10
-15
-20
10
1000
2000
3000
4000
5000
6000
FUNCTION EVALUATIONS
Fig. 3. Mean best solution versus number of function evaluations for the Sphere test
function.
functions are highly multimodal and test the ability of an optimization algorithm to escape from local minima.
A population size of 20 was used for the GPSO and PSO algorithms. When the PSO algorithm was used with a constriction
coefcient (4), the value of x was set to 0.7. Since the best solution found at a given iteration is a random variable affected by the
random initialization and the stochastic nature of the optimization
process, average values of the cost over 50 independent runs were
used in evaluating and comparing the convergence performance of
the algorithms. Figs. 26 show the best cost solution over 50 independent runs versus number of function evaluations for the GPSO
and PSO algorithms.
The Sphere and Ellipsoid test functions are unimodal, convex
and differentiable test functions without at regions. For such test
functions, the classical gradient descent algorithm will perform
better than stochastic optimization techniques that waste computational effort by making a random search of the solution space as
discussed in Section 4 (Figs. 3 and 4).
Fig. 5 shows the performance of the GPSO and PSO algorithms
for the Rosenbrock test function, which has at regions. Flat regions
pose a challenge to deterministic local optimization algorithms
since small changes about a point in a at region do not lead
to improvement. Stochastic optimization algorithms explore at
regions by performing a biased random search, however a pure
random search is computationally costly and ineffective in
10
10
10
10
-5
10
------ GPSO
___ PSO
-10
10
2000
4000
6000
8000
10000
10
10
10
10
10
------ GPSO
___
___ PSO
500
1000
1500
2000
2500
Fig. 7. Mean best solution versus number of function evaluations for the
Griewangks test function.
15
10
------ GPSO
___ PSO
-5
-10
-15
FUNCTION EVALUATIONS
Fig. 5. Mean best solution versus number of function evaluations for the Rosenbrocks test function.
10
10
-10
10
FUNCTION EVALUATIONS
Fig. 4. Mean best solution versus number of function evaluations for the Ellipsoid
test function.
10
-5
10
10
12000
FUNCTION EVALUATIONS
10
10
-15
-15
10
357
10
10
-5
------ GPSO
10
___ PSO
-10
10
200
400
600
800
1000
1200
FUNCTION EVALUATIONS
Fig. 6. Mean best solution versus number of function evaluations for the Rastrigin
test function.
358
Table 2
Mean best solution found in 50 independent trials.
Test function
Dimension
Sphere
10
20
30
0
0
0
Rosenbrock
10
20
30
1.3553E006
6.2106E011
6.9424E011
Rastrigin
10
20
30
Griewangk
10
20
30
GPSO
PSO-1 [13]
PSO-2 [13]
0.01
0.01
0.01
PSO-3 [13]
0.01
0.01
0.01
0.01
0.01
0.01
27.11
51.56
63.35
2.102
28.17
35.27
9.946
17.94
28.97
0
0
0
2.069
11.74
29.35
4.63
26.29
69.72
2.268
15.32
36.23
0
0
0
0.067
0.028
0.016
0.066
0.027
0.017
0.054
0.029
0.019
Table 3
Test functions used for comparison of GPSO and NM-PSO algorithms.
Test
function
Range of search
Equation
B2
[100,100]2
Easom
[100,100]2
Goldstein &
Price
[100,100]2
GP(x) = 1 + (x1 + x2 + 1)
Zakharov
[600,600]
ZN (x) =
N
j=1
xj2
N
0.5jxj2
2
N
+
j=1
30 + (2x1 3x2 )
4
0.5jxj2
j=1
Table 4
Comparison of the performance of GPSO and NM-PSO algorithms.
Test function
Dim.
Mean function
evaluations GPSO
Mean error
NM-PSO
[14]
Mean error
GPSO
Sphere
Rosenbrock
B2
Goldstein & Price
Zakharov
Easom
3
5
2
2
5
2
130.84
2119
187
5248.8
1195
6709
291
3308
240
217
1394
165
5E5
3E5
3E5
3E5
2.6E4
4E5
0
1.18E8
0
5.33E15
0
1.33E15
the performance of the NM-PSO algorithm degrades with increasing dimension. Tables 2 and 4 show the quality of nal solution is
signicantly better for all test cases although the GPSO algorithm
required more function evaluations in a minority of cases.
Costly deterministic local search is done only around the global
best solution in case of the GPSO algorithm while in case of the
NM-PSO algorithm roughly one third of the particles perform local
search.
The NM-PSO algorithm requires 3N+1 particles to solve an N
dimensional problem making it computationally inefcient for
higher dimensions. Thus, to optimize a test function of dimension 30, the NM-PSO requires 91 particles. On the other hand,
Tables 1 and 2 show that good results can be obtained with the
GPSO algorithm while using only 20 particles.
6. Conclusion
In this paper, a new hybrid optimization algorithm, referred to
as the GPSO algorithm, that combines the stochastic PSO algorithm
and the deterministic gradient descent algorithm is presented. The
key ideas explored are the use of the PSO to provide an initial point
within the domain of convergence of the quasi-Newton algorithm,
synergistic interaction between local and global search and avoid-
359
[9] R.C. Eberhart, Y. Shi, A modied particle swarm optimizer, in: Proceedings of
the IEEE International Conference on Evolutionary Computation, vol. 6, 1998,
pp. 973.
[10] M. Clerc, J. Kennedy, The particle swarm: exploration, stability and convergence
in a multi-dimensional complex space, in: Proceedings of the IEEE International Conference on Evolutionary Computation, vol. 6, February, 2002, pp.
5873.
[11] R.C. Eberhart, Y. Shi, Parameter selection in particle swarm optimization,
in: Evolutionary Programming VII, Lecture Notes in Computer Science 1447,
Springer, 1998, pp. 591600.
[12] R. Horst, H. Tuy, Global OptimizationDeterministic Approaches, SpringerVerlag, New York, 1996.
[13] A. Ratanaweera, S.K. Halgamuge, H.C. Watson, Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefcients, IEEE Transactions on Evolutionary Computation 8 (June) (2004)
240255.
[14] S.K. Fan, Y.C. Liang, E. Zahara, Hybrid simplex search and particle swarm optimization for the global optimization of multimodal functions, Engineering
Optimization 36 (August (4)) (2004) 401418.