You are on page 1of 3

Connexions module: m11240 1

Optimization Theory ∗

Don Johnson
This work is produced by The Connexions Project and licensed under the
Creative Commons Attribution License †

Optimization theory is the study of the extremal values of a function: its minima and maxima. Topics
in this theory range from conditions for the existence of a unique extremal value to methodsboth analytic
and numericfor nding the extremal values and for what values of the independent variables the function
attains its extremes. In this book, minimizing an error criterion is an essential step toward deriving optimal
signal processing algorithms. An appendix summarizing the key results of optimization theory is essential
to understand optimal algorithms.

1 Unconstrained Optimization
The simplest optimization problem is to nd the minimum of a scalar-valued function of a scalar variable
f (x)the so-called objective functionand where that minimum is located. Assuming the function is
dierentiable, the well-known conditions for nding the minimalocal and globalare1
d
f (x) = 0
dx

d2
f (x) > 0
dx2
All values of the independent variable x satisfying these relations are locations of local minima.
Without the second condition, solutions to the rst could be either maxima, minima, or inection points.
Solutions to the rst equation are termed the stationary points of the objective function. To nd the
global minimumthat value (or values) where the function achieves its smallest valueeach candidate
extremum must be tested: the objective function must be evaluated at each stationary point and the smallest
selected. If, however, the objective function can be shown to be strictly convex, then only one solution of
dx (f ) = 0 exists and that solution corresponds to the global minimum. The function f (x) is strictly
d

convex if, for any choice of x1 , x2 , and the scalar a, f (ax1 + (1 − a) x2 ) < af (x1 ) + (1 − a) f (x2 ). Convex
objective functions occur often in practice and are more easily minimized because of this property.
When the objective function f (·) depends on a complex variable z , subtleties enter the picture. If the
function f (z) is dierentiable, its extremes can be found in the obvious way: nd the derivative, set it
equal to zero, and solve for the locations of the extrema. However, there are many situations in which
this function is not dierentiable. In contrast to functions of a real variable, non-dierentiable functions of
2
a complex variable occur frequently. The simplest example is f (z) = (|z|) . The minimum value of this
function obviously occurs at the origin. To calculate this obvious answer, a complication arises: the function
f (z) = z is not analytic with respect to z and hence not dierentiable. More generally, the derivative of a
∗ Version 1.4: Mar 22, 2008 4:24 pm GMT-5
† http://creativecommons.org/licenses/by/1.0
1 The maximum of a function is found by nding the minimum of its negative.

http://cnx.org/content/m11240/1.4/
Connexions module: m11240 2

function with respect to a complex-valued variable cannot be evaluated directly when the function depends
on the variable's conjugate. See Churchill [2] for more about the analysis of functions of a complex variable.
This complication can be resolved with either of two methods tailored for optimization problems. The
rst is to express the objective function in terms of the real and imaginary parts of z and nd the function's
minimum with respect to these two variables.2 This approach is unnecessarily tedious but will yield the
solution. The second, more elegant, approach relies on two results from complex variable theory. First,
the quantities z and z can be treated as independent variables, each considered a constant with respect
to the other. A variable and its conjugate are thus viewed as the result of applying an invertible linear
transformation to the variable's real and imaginary parts. Thus, if the real and imaginary parts can be
considered as independent variables, so can the variable
 and its conjugate
 with the advantage that the
2 2
mathematics is far simpler. In this way, ∂z (|z|) = z and ∂z (|z|) = z . Seemingly, the next step to
∂ ∂

minimizing the objective function is to set the derivatives with respect to each quantity to zero and then
solve the resulting pair of equations. As the following theorem suggests, that solution is overly complicated.

Theorem 1:
If the function f (z, z) is real-valued and analytic with respect to z and z , all stationary points can
be found by setting the derivative (in the sense just given) with respect to eitherz or z to zero [1].
2
Thus, to nd the minimum of (|z|) , compute the derivative with respect to either z or z . In most cases,
2
the derivative with respect to z is the most convenient choice. Thus, ∂z (|z|) = z and the stationary
3 ∂

point is z = 0. As this objective function is strictly convex, the objective function's sole stationary point is
its global minimum.
When the objective function depends on a vector-valued quantity x, the evaluation of the function's
stationary points is a simple extension of the scalar-variable case. However, testing stationary points as
possible locations for minima is more complicated [3]. The gradient of the scalar-valued function f (x) of
a vector x (dimension N ) equals an N -dimensional vector where each component is the partial derivative of
f (·) with respect to each component of x.
 

∂x 1
f (x)
..
 
∇x (f (x)) =  .
 

 

∂xN f (x)

For example, the gradient P of x Ax is Ax + A x. This result is easily derived by expressing the quadratic
T T

form as a double sum ( i ∧ j (Aij xi xj )) and evaluating the partials directly. When A is symmetric, which
is often the case, this gradient becomes 2Ax.
The gradient "points" in the direction of the maximum rate of increase of the function f (·). This fact
is often used in numerical optimization algorithms. The method of steepest descent is an iterative
algorithm where a candidate minimum is augmented by a quantity proportional to the negative of the
objective function's gradient to yield the next candidate.
∀α, α > 0 : (xk , xk−1 − α∇x (f (x)))
If the objective function is suciently "smooth" (there aren't too many minima and maxima), this approach
will yield the global minimum. Strictly convex functions are certainly smooth for this method to work.
The gradient of the gradient of f (x), denoted by ∇2x (f (x)), is a matrix where j th column is the gradient
of the j th component of f 's gradient. This quantity is known as the Hessian, dened to be the matrix of
all the second partials of f (·).
 2  ∂2
∇x (f (x)) ij = f (x)
∂xi ∂xj
2 The multi-variate minimization problem is discussed in a few paragraphs.
3 Why should this be? In the next few examples, try both and see which you feel is "easier".

http://cnx.org/content/m11240/1.4/
Connexions module: m11240 3

The Hessian is always a symmetric matrix.


The minima of the objective function f (x) occur when

∇x (f (x)) = 0

and
∇2x (f (x)) > 0
, positive denite. Thus, for a stationary point to be a minimum, the Hessian evaluated at that point
i.e.

must be a positive denite matrix. When the objective function is strictly convex, this test need not be
performed. For example, the objective function f (x) = xT Ax is convex whenever A is positive denite and
symmetric.4
When the independent vector is complex-valued, the issues discussed in the scalar case also arise. Because
of the complex-valued quantities involved, how to evaluate the gradient becomes an issue: is ∇z or ∇z∗ more
appropriate?. In contrast to the case of complex scalars, the choice in the case of complex vectors is unique.

Theorem 2:
Let f (z, z) be a real-valued function of the vector-valued complex variable z where the dependence
on the variable and its conjugate is explicit. By treating z and z as independent variables, the
quantity pointing in the direction of the maximum rate of change of f (z, z) is ∇z (f (z))[1].
To show this result, consider the variation of f given by
X ∂ ∂

T T
δf = (f ) δzi + (f ) δzi = (∇z (f )) δz + (∇z (f )) δz
i
∂zi ∂z i

 
H
This quantity is concisely expressed as δf = 2< (∇z (f )) δz . By the Schwarz inequality, the maximum
value of this variation occurs when δz is in the same direction as (∇z (f )). Thus, the direction corresponding
to the largest change in the quantity f (z, z) is in the direction of its gradient with respect to z. To implement
the method of steepest descent, for example, the gradient with respect to the conjugate must be used.
To nd the stationary points of a scalar-valued function of a complex-valued vector, we must solve

∇z (f (z)) = 0 (1)

For solutions of this equation to be minima, the Hessian dened to be the matrix of mixed partials given
by ∇z (∇z (f (z))) must be positive denite. For example, the required gradient of the objective function
zH Az is given by Az, implying for positive denite A that a stationary point is z = 0. The Hessian of the
objective function is simply A, conrming that the minimum of a quadratic form is always the origin.

References
[1] D.H. Brandwood. A complex gradient operator and its application in adaptive array theory. IEE Proc.,
Pts. F and H, 130:1116, 1983.

[2] R.V. Churchill and J.W. Brown. Complex Variables and Applications. McGraw-Hill, 1989.

[3] D.G. Luenberger. Optimization by Vector Space Methods . Wiley, 1969.

4 Note that the Hessian of xT Ax is 2A.

http://cnx.org/content/m11240/1.4/

You might also like