You are on page 1of 9

A spectral KRMI conjugate gradient method under the strong-Wolfe line search

, , , ,
Wan Khadijah , Mohd. Rivaie , Mustafa Mamat , and Ibrahim Jusoh

Citation: 1739, 020072 (2016); doi: 10.1063/1.4952552


View online: http://dx.doi.org/10.1063/1.4952552
View Table of Contents: http://aip.scitation.org/toc/apc/1739/1
Published by the American Institute of Physics
A Spectral KRMI Conjugate Gradient Method under the
Strong-Wolfe Line Search
Wan Khadijah1, a), Mohd Rivaie2, b), Mustafa Mamat1, c) and Ibrahim Jusoh2, d)
1
Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin (UniSZA), Terengganu, Malaysia.
2
Department of Computer Sciences and Mathematics, Universiti Teknologi MARA (UiTM), Terengganu, Malaysia.
a)
Corresponding author: wankhadijahws@gmail.com
b)
rivaie75@yahoo.com
c)
must@unisza.edu.my
d)
ibrahimju@tganu.uitm.edu.my

Abstract. In this paper, a modification of spectral conjugate gradient (CG) method is proposed which combines the
advantages of the spectral CG method and the RMIL method namely as spectral Khadijah-Rivaie-Mustafa-Ibrahim
(SKRMI) to solve unconstrained optimization problems. Based on inexact line searches, the objective function generates
a sufficient descent direction and the global convergence property for the proposed method has been proved. Moreover,
the method reduces to the standard RMIL method if exact line search is applied. Numerical results are also presented to
examine the efficiency of the proposed method.

INTRODUCTION
The capability of conjugate gradient (CG) methods to solve large-scale unconstrained optimization problems
rendering it widely used in mathematical issues. In addition, the factor of its simplicity and low memory storage also
played important role in the CG methods. The minimum values of the function for unconstrained optimization are
obtained by using the nonlinear CG methods. Consider the unconstrained optimization problem:
minn f x ,
xR
n
where f : R o R is continuously differentiable. The CG methods are generated iteratively by:
xk 1 xk  D k d k where k 0,1,2,
with search direction, d k defined by:
­ g k , if k 0,
dk ®
¯ g k  E k d k 1,
if k t 1,
where x k is the current iterate point, D k ! 0 is a positive stepsize, g k is a gradient of the gradient coefficient and
E k is a scalar known as the CG coefficient. Some commonly used and widely known formulas are presented as
follows:
g kT g k
Ek FR
2
, Fletcher-Reeves [1] 1
g k 1
g kT g k
E k DY , Dai-Yuan [2] 2
g k  g k 1 T d k 1

Innovations Through Mathematical and Statistical Research


AIP Conf. Proc. 1739, 020072-1–020072-8; doi: 10.1063/1.4952552
Published by AIP Publishing. 978-0-7354-1396-2/$30.00

020072-1
g kT g k
E k CD  T
. Conjugate Descent [3] 3
d k 1g k 1
There are several methods known as Armijo [4], Wolfe [5], Goldstein [6] or Grippo-Lucidi [7] line searches can
be applied in the algorithm to perform a one-dimensional search or known as line searches in order to determine the
value of stepsize. In this paper, we have analyzed the results on the convergence of the line search methods with
inexact line search, which is a strong-Wolfe condition as follow:
f ( xk  D k d k ) d f xk  UD k d kT g k , (4)
g ( xk  D k dk )T dk d VdkT g k , (5)
with 0  U  V  1.

Rivaie et al. [8] suggested a new and simple E k that is applicable, in which E k is defined by the following:
g kT g k  g k 1
E k RMIL , (6)
d kT1 d k 1  g k
where ˜ denotes the Euclidean norm of vectors. Based on the numerical results, the method performs the best
results compared to other classical CG methods. The global convergence result is determined by applying exact line
search where the method satisfies the sufficient descent condition and the global convergence properties.
Birgin and Martinez [9] introduced three types of spectral conjugate gradient method by combining conjugate
gradient method and spectral gradient method in the following way:
dk T k g k  E k sk 1 ,
where the parameter T k is taken to be the spectral gradient and computed as follows:
s kT1 s k 1
Tk ,
s kT1 y k 1
where the parameter E k is computed by the following:

E k1
T k y k 1  s k 1 T g k , E k2
T k y kT1 g k
, E k3
T k g kT g k
,
s kT1 y k 1 D k 1T k 1 g kT1 g k 1 D k 1T k 1 g kT1 g k 1
where y k 1 g k  g k 1 , and s k 1 xk  xk 1. The numerical results show that these formulas perform well and
efficient. Unfortunately, the spectral conjugate gradient method cannot ensure to generate descent directions.
Recently, Liu and Jiang [10] combine the CD method and the spectral gradient method where the direction is
giving is motivated by Zhang et al. [11] as shown below:
­° g k ,
dk ® if k 0, 7
°̄ T k g k  E k d k 1 , if k t 1,
where Ek is:
­°E kCD ,
Ek ® if g kT d k 1 d 0,
°̄0, else,
g kT d k 1
Tk . 1
g kT1d k 1
The formula fulfills the sufficient descent condition and has been proven globally convergent under strong-Wolfe
line search where the numerical results perform well.
Zhang et al. [11] make some modification to the Fletcher-Reeves (MFR) method such that MFR has been proven
globally convergent with Armijo-type line search. The search direction is defined as (7) where T k and is E k as
follow:

020072-2
d kT1 yk 1
Tk 2
,
g k 1
Ek EkFR ,
which can be written as:
§ gT d ·
dk ¨1  E kFR k k 21` ¸ g k  E kFRd k 1. (8)
¨ gk ¸
© ¹

In this paper, we modified the RMIL method such that the direction generated by the modified RMIL method is
always a descent direction of the objective function. The modification of RMIL method is named SKRMI since the
method of spectral CG is combined with RMIL conjugate gradient method. Under mild conditions, we proved that
SKRMI method with strong Wolfe type line search is globally convergent.
In this paper, we show our new spectral conjugate gradient method and algorithm. Next, we prove the global
convergence of the modified RMIL. Then, we report the numerical results and discussions. We also compare the
performance of this modified RMIL with other methods. Lastly, the conclusions are presented.

NEW SPECTRAL CONJUGATE GRADIENT METHOD


In this part, we explain the spectral KRMI method which is quite same to Zhang et al. [11] formula but different
parameters of E k and T k . Let x k be the current iterate. Let d k be defined by equation (7) where E k defined by
equation (6) and
g kT d k 1`
Tk 1 E RMIL
k 2
, (9)
gk
which can be rewritten as:
§ gT d ·
dk ¨1  E kRMIL k k 21` ¸ g k  E kRMIL d k 1 .
¨ gk ¸
© ¹
It is easy to see from 6 , 7 , and 9 , that
2
g kT .d k  gk .

If g kT .d k 1 d 0, E k E kRMIL , we have
§ gT d ·
¨1  E kRMIL k k 21` ¸ g k
2 2
g kT .d k  E kRMIL .g kT d k 1  gk .
¨ gk ¸
© ¹
If g kT .d k 1 ! 0, E k 0, we have
§ T ·
¨ RMIL g k d k 1` ¸ 2 2
T
g .d k  1  Ek gk  E kRMIL .g kT d k 1  gk .
k
¨ g
2 ¸
© k ¹
This indicates that d k manages a descent direction of f at x k . It is also clear that if exact line search is used,
then g kT d k 1 0. In this case, we have
g kT d k 1`
Tk 1  E kRMIL 2
1.
gk

020072-3
Consequently, the SKRMI method changes to the original RMIL method. Based on the above interpretation, the
algorithm is given as follows:

Algorithm 1 (SKRMI method with Strong-Wolfe-type line search)

Step 1: Initialization.
Given constants 0  U  V  1. Choose an initial point x0  R n . Set k 0.
Step 2: Computing the CG coefficient.
Compute E k based on 1 , 2 , 3 and 6 formula.
Step 3: Calculate the spectral search direction based on spectral formula as in equation (8).
If g k 0, then stop.
Step 4: Computing stepsize or D k such that
f ( xk  D k d k ) d f xk  UD k d kT g k ,
g ( x k  D k d k ) T d k d Vd kT g k .
Step 5: Updating new point.
xk 1 xk  D k d k .
Step 6: Convergence test and stopping criteria.
If f x k 1  f x k and g k d H then stop.
Otherwise go to Step 1 with k k  1.

CONVERGENCE ANALYSIS
In this section, the new spectral CG method has been proven globally convergent by the following assumption.
However, we need to simplify the E kRMIL first, so that our convergence proof will be significantly easier. From 6 ,

g kT g k  g k 1
2
g kT g k  g kT g k 1 gk  g kT g k 1
E RMIL
.
d kT1 d k 1  g k
k
d kT1d k 1  d kT1 g k 2
d k 1  d kT1 g k
Hence, we gain
2
gk
E kRMIL d 2
. (10)
d k 1

The following basic assumptions are required in the analysis of global convergence properties of CG methods.

Assumption 1
1. f is bounded below on the level set R n , continuous and differentiable in a neighbourhood N of the level
set " ^x  R n
`
| f x d f x0 at the initial point x0 .
2. The gradient g x is Lipschitz continuous in N, so a constant L!0 exists, such that
g x  g y d L x  y for any x, y  N.

Under Assumption 1, we have the following lemma, which was proven by Zoutendijk [12] to validate the global
convergence of nonlinear CG methods.

020072-4
Lemma 1. Suppose that Assumption 1 holds true. Consider any CG method of the form equation (7) where E k is
defined by equation (6). Then, the following condition, known as the Zoutendijk condition holds.
f
g d
T 2

¦
k 0
k

dk
k
2
 f.

The proof of this Lemma can be seen from [12]. By using Lemma 1, we can obtain the following convergent
theorem of the CG method using 10 .

Theorem 1. Suppose that Assumption 1 holds true. Let the sequence ^g k ` and ^d k ` be generated by Algorithm 1
and D k the equation (4) and (5). Then,
lim inf g k 0. (11)
k of
Proof. To proof Theorem 1, suppose that Lemma 1 all hold and we use contradiction to determine 11 . That is, if
Theorem 1 is not true, then a positive constant c ! 0 exists such that
g k t c, (12)
and holds for k t 0. Rewriting 7 as
dk T k g k  E k d k 1 ,
dk  Tk gk E k d k 1 ,
and squaring both sides of the equation, we obtain:
2 2 2
dk  2T k g kT d k  T k2 g k E k2 d k 1 ,
2 2 2
dk E k2 d k 1  2T k g kT d k  T k2 g k .

Dividing both sides of this equality by g kT d k 2


we get:
2 2 2 2
dk dk E k2 d k 1 2T k g kT d k T k2 g k
  ,
g T
k dk 2
gk
4
g T
k dk 2
g T
k dk 2
g T
k dk
2

2
Note that, g kT .d k  gk . By applying the condition, we find that:
2 2
d k 1 2T k g kT d k T k2 g k
E RMIL 2
  ,
k
g T
k dk 2
g T
k dk 2
g T
k dk
2

2
d k 1
E RMIL 2
k 4

1
2
T 2
k  2T k .
gk gk
By using completing square, we gain:
2
d k 1
E RMIL 2
k 4

1
2
T 2
k  2T k  1  1 ,
gk gk
T k  1 2
2
d k 1
E RMIL 2
k 4
 2

1
2
.
gk gk gk
Applying 10 yields
2
§ g k 2 · d k 1 2 T  1 2 1
¨ ¸  k 2  ,
¨ d 2 ¸ g 4
g g
2
© k 1 ¹ k k k

020072-5
T k  1 2
4 2
gk d k 1 1
4 4
 2
 2
,
d k 1 gk gk gk
1 1
d 2
 2
. (13)
d k 1 gk
§ 1 1 ·
¨
By noting that ¨
¸, from 13 we have:
¨ d 2 2 ¸¸
© 0 g0 ¹
2
dk 1 1
 .
g kT d k
2
g0
2
gk
2

Thus,
2
dk k 1
1 k
d¦ d .
g T
k dk
2
i 0 gi
2
c2
Taking summation on both sides,
g T
dk 2
1
¦k t1
k

dk
2
t c2 ¦
k t1 k
f. (14)

Considering that the left side is a Harmonic Series, therefore it diverges. Then, from 14 and 12 , it follows that:
k
g d T 2

¦i 0
k

dk
k
2
f.

This contradicts the Zoutendijk condition in Lemma 1. Hence, the condition 11 holds and proof is shown.

NUMERICAL EXPERIMENTS
This part is about the numerical results of SKRMI are compared with the other CG methods above by the
Algorithm 1 based on [13] and [14] test problems. The parameters in the strong Wolfe line search are set by the
requirements: G 0.01 and V 0.1. We considered g k d 10 6 and the gradient value as stopping criteria. All
codes were typewritten in MATLAB R2011b and run using Intel Core i5 with RAM 2GB and Window 8 operation
system. Table 1 presents a list of problem functions, dimension and the initial points to test the proposed method.
Figures show the performance profiles based on Dolan and More [15] with respect to the iterations number and
CPU time. Based on the figures, SKRMI denotes spectral Khadijah,-Rivaie-Mustafa-Ibrahim, MFR denotes
modified Fletcher-Reeves [11], MDY denotes modified Dai-Yuan and MCD denotes modified CD method. For each
method, we plot the fraction ܲ௦ ሺ‫ݐ‬ሻ of problems solved for which the method is within the number of iterations for
Fig. 1 and CPU time for Fig. 2. The left side of the figure gives the percentage of the test problems for which
method is the fastest; while the right side gives the percentage of the test problems that are successfully solved by
each of the methods. The top curve is the method that solved the most problems at a time of a factor ‫ ݐ‬of the best
time. From the figures, it is clearly shown that the SKRMI method is the best methods when compared to the other
spectral CG methods (MFR, MDY and MCD) since it is converge faster than other CG methods and able to deal
with all the test problems.

020072-6
TABLE 1: A list of problem functions

Problem Dimension Initial Points


No
Six Hump Function 2 (25,25), (50,50), (100,100), (200,200)
1
Booth Function 2 (25,25), (50,50), (100,100), (200,200)
2
Trecanni Function 2
3 (25,25), (50,50), (100,100), (200,200)
Zettl Function 2 (25,25), (50,50), (100,100), (200,200)
4
Leon Function 2 (2,2), (10,10), (25,25), (50,50)
5
2 (-3,-3), (3,3), (6,6), (9,9)
Extended White and Holst Function
6 4, 10, 100, 500, 1000 (-3,-3,...,-3) ,(3,3,...,3), (6,6,...,6), (9,9,...,9)
2 (13,13), (16,16), (20,20), (30,30)
Extended Rosenbrock Function
7 4, 10, 100, 500, 1000 (13,13,...,13), (16,16,...,16), (20,20,...,20), (30,30,...,30)
2 (10,10), (50,50), (100,100), (200,200)
Extended DENSCHNB Function
8 4, 10, 100, 500, 1000 (10,10,...,10), (50,50,...,50), (100,100,...,100), (200,200,...,200)
2 (10,10), (50,50), (100,100), (200,200)
Shallow Function
9 4, 10, 100, 500, 1000 (10,10,...,10), (50,50,...,50), (100,100,...,100), (200,200,...,200)
2 (10,10), (50,50), (100,100), (200,200)
Sum Squares Function
10 4, 10, 100, 500, 1000 (10,10,...,10), (50,50,...,50), (100,100,...,100), (200,200,...,200)
2 (10,10), (50,50), (100,100), (200,200)
Extended Tridiagonal 1 Function
11 4, 10, 100, 500, 1000 (10,10,...,10), (50,50,...,50), (100,100,...,100), (200,200,...,200)
Extended Extended Himmelblau 2 (10,10), (50,50), (100,100), (200,200)
12 Function 4, 10, 100, 500, 1000 (10,10,...,10), (50,50,...,50), (100,100,...,100), (200,200,...,200)
Extended Freudenstein & Roth 2 (2,2), (20,20), (75,75), (202,202)
13 Function 4, 10, 100, 500, 1000 (2,2,...,2), (20,20,…,20), (75,75,…,75), (202,202,…202)

1.0

SKRMI
0.8

MDY
0.6 MFR/MCD
Ps(t)

0.4

0.2

0.0
e0 e1 e2 e3 e4

FIGURE 1: Performance profile based on the iterations number.

020072-7
1.0

0.8
SKRMI

MDY
0.6

Ps(t)
MFR/MCD

0.4

0.2

0.0
e0 e1 e2 e3 e4 e5

FIGURE 2: Performance profile based on the CPU time.

CONCLUSION
Recent modifications and various researches on CG methods have governed to new variations. In this study, we
have implemented the E kRMIL that is combined with spectral conjugate gradient namely SKRMI. Numerical results
show that the performance of SKRMI method is the best when distinguished to MFR, MDY and MCD methods. We
also provided proof that this method is globally convergent based on strong-Wolfe line search.

ACKNOWLEDGEMENT
We really appreciate to the Ministry of Higher Education of Malaysia (MOHE) and The Fundamental Research
Grant Scheme Vot 59256 for funding this research under the Ministry of Education (Sponsorship MyPhD KPM).

REFERENCES
1. R. Fletcher and C. Reeves, Comput. J. 7, 149–154 (1964).
2. Y. H. Dai and Y. Yuan, SIAM J. Optim. 10, 177–182 (1999).
3. R. Fletcher, “Practical Method of Optimization,” in Unconstrained Optimization, 1, (John Wiley & Sons, New
York, 1987).
4. L. Armijo, Pacific J. Math. 16, 1–3 (1966).
5. P. Wolfe, SIAM Rev. 11, 226–235 (1969).
6. A. A. Goldstein, SIAM J. Control 3, 147–151 (1965).
7. L. Grippo and S. Lucidi, Mathematical Programming 78(3), 375-391 (1997).
8. M. Rivaie, M. Mustafa, I. Mohd, and M. Fauzi, Journal of Interdisciplinary Maths. 13(3), 241-251 (2010).
9. E. G. Birgin and J. M. Martinez, J. Appl. Maths. Optim. 43, 117–128 (2001).
10. J. Liu and Y. Jiang, Abstract and Applied Analysis 2012, Article ID: 758287 (2012).
11. L. Zhang, W. Zhou, and D. Li, Numer. Math. 104, 561-572 (2006).
12. G. Zoutendijk, “Nonlinear Programming Computational Methods,” in Integer and Nonlinear Programming, by
J. Abadie (Ed.) (North Holland, Amsterdam, 1970), pp. 37-86.
13. M. Jamil and X. S. Yang, Int. Journal of Mathematical Modelling and Numerical Optimisation 4(2), 150-194
(2013).
14. J. J. More, B. S. Garbow, and K. E. Hillstrom, ACM Transactions on Mathematical Software 7(1), 17-41
(1981).
15. E. Dolan and J. J. More, Maths. Prog. 91, 201–213 (2002).

020072-8

You might also like