You are on page 1of 51

Positive Definite Matrices

Notes on Linear Algebra


Chia-Ping Chen

Department of Computer Science and Engineering


National Sun Yat-Sen University
Kaohsiung, Taiwan ROC

Positive Definite Matrices – p. 1


Introduction
The signs of the eigenvalues of a matrix can be
important.
For example, in a system of differential equations,
the signs determine whether the system is stable.
The signs can also be related to the minima, maxima,
and saddle points of a multi-variate function.

Positive Definite Matrices – p. 2


Expansion of a Function
For a “well-behaved” function of x, y , the value near the
origin can be written as
∂f ∂f 1 ∂2f 2 ∂2f 1 ∂2f 2
f (x, y) = f (0) + x+ y+ x + xy + y + ...,
∂x ∂y 2 ∂x2 ∂x∂y 2 ∂y 2

where the derivative is evaluated at the origin.


Near a point (α, β), the value can be written as
∂f ∂f
f (x, y) =f (α, β) + (x − α) + (y − β)
∂x ∂y
1 ∂2f 2 ∂2f 1 ∂2f 2
+ 2
(x − α) + (x − α)(y − β) + 2
(y − β) + ...,
2 ∂x ∂x∂y 2 ∂y

where the derivative is evaluated at (α, β).

Positive Definite Matrices – p. 3


Stationary Points
By definition, a point (α, β) is stationary if the first-order
partial derivatives evaluated at (α, β) are all 0.
A stationary point is either a maximum, minimum, or a
saddle point.
It is a maximum if the functional value of all nearby
points are not greater.
It is a minimum if the functional value of all nearby
points are not smaller.
It is a saddle point if it is neither a maximum nor a
minimum.
How do we decide?

Positive Definite Matrices – p. 4


Expansion Near a Stationary Point
The expansion of a function near a stationary point is
∂f ∂f
f (x, y) =f (α, β) + (x − α) + (y − β)
∂x ∂y
1 ∂2f 2 ∂2f 1 ∂2f 2
+ (x − α) + (x − α)(y − β) + (y − β) + ...
2 ∂x 2 ∂x∂y 2 ∂y 2

. 1 ∂2f 2 ∂2f 1 ∂2f 2


=f0 + (δx) + (δx)(δy) + (δy)
2 ∂x2 ∂x∂y 2 ∂y 2

Positive Definite Matrices – p. 5


Hessian Matrix
The Hessian matrix at a point x = (x, y) is defined by
 
∂2f ∂2f
H(x) =  ∂x2 2 ∂x∂y 
∂ f ∂2f
∂x∂y ∂y 2

It follows that near a stationary point x,


1 T
δf = δx H(x)δx.
2

Positive Definite Matrices – p. 6


Quadratic Form
By definition, a function f is quadratic in variables x, y
if f has the following form

f (x, y) = ax2 + 2bxy + cy 2 .

f (x, y) can be expressed by a real symmetric matrix


  
h i a b x
f (x, y) = x y    .
b c y

Note that the expansion of a function near a stationary


point is quadratic in the variables δx = (δx, δy).

Positive Definite Matrices – p. 7


Positive/Negative (Semi-)Definite
By definition, a quadratic function f is positive definite
if its value is positive for all points except for the origin.
f is said to be positive semidefinite if the values are
non-negative.
f is said to be negative definite if the values are
negative except for the origin.
f is said to be negative semidefinite if the values are
non-positive.

Positive Definite Matrices – p. 8


Necessary Conditions
For f (x, y) = ax2 + 2bxy + cy 2 to be p.d., a > 0 and c > 0.
This can be shown by looking at points (1, 0) and
(0, 1).
But these are merely necessary conditions, as the
following example demonstrates

f0 (x, y)|x=1,y=1 = x2 − 4xy + y 2 < 0.

Positive Definite Matrices – p. 9


Sufficient Condition
We can express f using squares by
2
b b
f (x, y) = ax2 + 2bxy + cy 2 = a(x + y)2 + (c − )y 2 ,
a a
Since square is non-negative, we see from above that
f is positive definite if (sufficient condition)

a > 0, ac > b2 .

Positive Definite Matrices – p. 10


Negative Definite
f (x, y) = ax2 + 2bxy + cy 2 is negative definite if
f (x, y) < 0 other than (x, y) = (0, 0).
f is negative definite iff −f is positive definite.
A sufficient condition for negative definiteness is

(−a) > 0, (−a)(−c) > (−b)2 .

Equivalently, getting rid of negative signs,

a < 0, ac > b2 .

example and diagram

Positive Definite Matrices – p. 11


Singular Case
We have a singular case if

ac = b2 .

If a > 0, f is still non-negative everywhere, since


b 2
f (x, y) = a(x + y) .
a
The surface z = f (x, y) degenerates from a bowl to a
valley, along the line ax + by = 0.
f is said to be positive semidefinite (psd) if a > 0 and
negative semidefinite if a < 0.
example and diagram

Positive Definite Matrices – p. 12


Saddle Point
The remaining case is when

ac < b2 .

In this case, (0, 0) is a saddle point.


Along one direction (0, 0) is a minimum, and along
another direction (0, 0) is a maximum.
f is said to be indefinite.

Positive Definite Matrices – p. 13


Example and Diagram
Let
f (x, y) = x2 − y 2 .
Note
ac = −1 < 0 = b2 .
The point (0, 0) is a minimum along the x-axis, a
maximum along the y -axis.

Positive Definite Matrices – p. 14


Matrix and Quadratic Form
A matrix A defines a quadratic form,
  
a11 ... a1n x
  1 n
 . ..   .. 
i
..
h X
xT Ax = x1 ... xn  .. . .   . = aij xi xj
    i,j=1
an1 ... ann xn

Note that for i 6= j , the coefficient of xi xj is aij + aji .


Given a quadratic form, we make A symmetrical by
requiring aij = aji .
The positive-definiteness of a matrix is defined via
f (x) = xT Ax.

Positive Definite Matrices – p. 15


Positive Definite Matrices
A real symmetric matrix A is said to be positive definite
if xT Ax > 0 except for x = 0.
From the earlier discussion, we have, in the
two-dimensional case,
 
a b
A=  
b c

is positive definite iff a > 0 and ac > b2 .


We generalize to the case that A is n × n.

Positive Definite Matrices – p. 16


Conditions for Positive Definiteness
The following are equivalent conditions for positive
definiteness.
All eigenvalues are positive.
All determinants of the principal submatrices are
positive.
All pivots are positive.
example

Positive Definite Matrices – p. 17


Positive Eigenvalues
If a matrix is positive definite, then it has all positive
eigenvalues.
Let xi be an eigenvector with eigenvalue λi .

xTi Axi = λi (xTi xi ) > 0 ⇒ λi > 0.

Conversely, if all eigenvalues are positive, then a


matrix is positive definite.
Via a complete set of eigenvectors,
X
T T T
x Ax = x QΛQ x = c2i λi > 0.
i

example

Positive Definite Matrices – p. 18


Positive Determinants
If A is positive definite, then the principal submatrices
Ak ’s have positive determinants.
Ak is positive definite. To prove, let x be a vector
with the last n − k components being 0
  
h i A k ∗ xk
xT Ax = xTk 0     = xTk Ak xk > 0.
∗ ∗ 0

Since all eigenvalues of Ak are positive,


Y
|Ak | = λi > 0.
i

example
Positive Definite Matrices – p. 19
Positive Pivots
Suppose A is positive definite, then all pivots di are
positive.
|Ak |
dk = > 0,
|Ak−1 |
since |Ak | > 0 for all k .
Conversely, if di > 0 for all di , then A is positive definite.
X
T T
A = LDL ⇒ x Ax = di (LT x)2i > 0.
i

example

Positive Definite Matrices – p. 20


Relation to Least Squares
For a least squares problem Rx = b we solve the
normal equation
RT Rx̄ = RT b.
Note that the matrix A = RT R is symmetric.
(theorem) A is positive definite if R has linearly
independent columns.

= 0, x = 0,
xT Ax = xT RT Rx = (Rx)T (Rx)
> 0, x 6= 0

example

Positive Definite Matrices – p. 21


Cholesky Decomposition
(Cholesky decomposition) A positive definite matrix is
the product of a lower-triangular matrix and its
transpose.
A special case comes from LDU decomposition,

A = LDLT = LD1/2 D1/2 LT = RT R.

There are infinite ways to decompose a positive


definite A = RT R. In fact, R′ = RQ, where Q is
orthogonal, also satisfies A = R′T R′ .
example

Positive Definite Matrices – p. 22


Ellipsoids in n Dimensions
Consider the equation xT Ax = 1, where A is p.d.
If A is diagonal, the graph is easily seen as an ellipsoid.
If A is not diagonal, then using the eigenvectors as
columns of Q, we have

xT Ax = xT QΛQT x = y T Λy = λ1 y12 + · · · + λn yn2 .

yi = qiT x is the component of x along the ith


eigenvector qi .

Positive Definite Matrices – p. 23


Principal Axes
The axes of the ellipsoid defined by xT Ax = 1 point
toward the eigenvectors (qi ) of A.
They are called principal axes.
The principal axes are mutually orthogonal.

The length of the axis along qi is 1/ λi .
example

Positive Definite Matrices – p. 24


Semidefinite Matrices
A matrix is said to be positive semidefinite if

xT Ax ≥ 0 for all x.

Each of the following conditions is equivalent to


positive semi-definiteness.
All eigenvalues are non-negative.
|Ak | ≥ 0 for all principal submatrices Ak .
All pivots are non-negative.
A = RT R for some R.

Positive Definite Matrices – p. 25


Example
The following matrix is positive semidefinite.
 
2 −1 −1
 
A=  −1 2 −1

−1 −1 2

quadratic form:

xT Ax = (x1 − x2 )2 + (x2 − x3 )2 + (x3 − x1 )2 ≥ 0.

eigenvalues: 0, 3, 3 ≥ 0; pivots: 2, 23 ≥ 0.
determinants:

|A1 | = 2, |A2 | = 3, |A3 | = 0 ≥ 0.


Positive Definite Matrices – p. 26
Congruence Transform
The congruence transform of A by a non-singular C is
defined by
A → C T AC
Congruence transform is related to quadratic form,

x = Cy ⇒ xT Ax = y T C T ACy.

If A is symmetric, so is its congruence transform


C T AC .
example

Positive Definite Matrices – p. 27


Sylvester’s Law
The signs of eigenvalues are invariant under a
congruence transform.
(proof) For simplicity, suppose A is nonsingular. Let

C = QR, C(t) = tQ + (1 − t)QR.

The eigenvalues of C(t)T AC(t) change gradually as


we vary t from 0 to 1, but they are never 0 since
C(t) = Q(tI + (1 − t)R) is never singular. So the signs
of eigenvalues never change.
Since QT AQ and A have the same eigenvalues, the
law is proved.

Positive Definite Matrices – p. 28


Signs of Pivots
The LDU -decomposition of a symmetric matrix A is

A = LDU = U T DU.

So A is a congruence transform of D.
As a result of the Sylvester’s law, the signs of the
pivots (eigenvalues of D) agree with the signs of the
eigenvalues.
example

Positive Definite Matrices – p. 29


Locating Eigenvalues
The relation between pivots and eigenvalues can be
used to locate eigenvalues.
First, note that if A has an eigenvalue λ, then A − cI
has the eigenvalue λ − c with the same eigenvector,

Ax = λx ⇒ (A − cI)x = (λ − c)x.

Positive Definite Matrices – p. 30


Example
Consider
   
3 3 0 1 3 0
   
  
.
A = 3 10 7 , B = A − 2I = 3 8 7
0 7 8 0 7 6

B has a negative pivot, so it has a negative eigenvalue.


A is positive definite.
It follows

λA > 0, λB = λA − 2 < 0 ⇒ 0 < λA < 2.

Positive Definite Matrices – p. 31


Generalized Eigenvalue Problem
A generalized eigenvalue problem has the form of

Ax = λM x,

where A, M are given matrices.


We consider only the case that A is symmetric and M
is positive definite.
example

Positive Definite Matrices – p. 32


Equivalent Eigenvalue Problem
A generalized eigenvalue problem can be converted to
an equivalent eigenvalue problem.
We can write M = RT R, where R is invertible.
Let y = Rx,

Ax = λM x = λRT Rx ⇒ AR−1 y = λRT y.

Let C = R−1 so (RT )−1 = C T . Then

C T ACy = λy.

This is an equivalent eigenvalue problem: same


eigenvalues, related eigenvectors x = Cy .

Positive Definite Matrices – p. 33


Properties
Since C T AC is symmetric, the eigenvectors yj can be
made orthonormal.
The eigenvectors xj are M -orthonormal, i.e.,

xTi M xj = xTi RT Rxj = yiT yj = δij .

example

Positive Definite Matrices – p. 34


Simultaneous Diagonalization
Both M and A can be diagonalized by the generalized
eigenvectors xi ’s,

xTi M xj = yiT yi = δij , xTi Axj = λj xTi M xj = λj δij .

That is, using xi ’s as the columns of S , we have

S T AS = Λ, S T M S = I.

example
Note they are congruence transforms to diagonal
matrices rather than similarity transforms, as S T is
used, not S −1 .

Positive Definite Matrices – p. 35


Singular Value Decomposition
The singular value decomposition (SVD) of a matrix A
is defined by
A = U ΣV T ,
where U, Σ, V are related to the matrices AT A and AAT
Here A is not limited to be a square matrix.

Positive Definite Matrices – p. 36


SVD Theorem
Any m × n real matrix A with rank r can be factored by

A = U ΣV T = (orthogonal)(diagonal)(orthogonal).

U is an eigenvector matrix of AAT .


V is an eigenvector matrix of AT A.
Σ contains the r singular values on the diagonal.
A singular value is the square root of a non-zero
eigenvalue of AT A.

Positive Definite Matrices – p. 37


Proof
Suppose v1 , . . . , vn are orthonormal eigenvectors of
AT A, with eigenvalues in non-increasing order

viT AT Avj = λj viT vj = λj δij , λ1 ≥ λ2 ≥ · · · ≥ λn .

Since AT A has the same nullspace as A, there are r


non-zero eigenvalues.
These non-zero eigenvalues are positive since AT A is
p.s.d.
Define the singular value for a positive eigenvalue
p
σj = λj , j = 1, . . . , r.

Positive Definite Matrices – p. 38


Proof
uj = Av
σj
j
, j = 1, . . . , r is an eigenvector of AA T , and u ’s
j
are orthonormal.

T AAT Avj Avj


AA uj = = λj = λj uj , uTi uj = δij .
σj σj

Construct V with v ’s, and U with u’s and eigenvectors


for 0,

0 if j > r,
T T
(U AV )ij = ui Avj =
σj uT uj = σj δij if j ≤ r.
i

That is, U T AV = Σ. So A = U ΣV T .

Positive Definite Matrices – p. 39


Remarks
A multiplied by a column of V produces a multiple of a
column of U ,

AV = U Σ, or Avj = σj uj .

U is the eigenvector matrix of AAT and V is the


eigenvector matrix of AT A.

AAT = U ΣΣT U T ; AT A = V ΣT ΣV T .

The non-zero eigenvalues of AAT and AT A are the


same. They are in ΣΣT .

Positive Definite Matrices – p. 40


Example
Consider
 
  1 −1 0
−1 1 0 T
 
A=  . A A = −1 2 −1 .

0 −1 1
 
0 −1 1

The singular values are 3, 1.
Finding vi and ui , one has

 1 −2 1 (/√6)
 
 √
1 −1 1  3 0 0  √ 
A= √  −1 0 1 (/ 2)
2 1 1 0 1 0



1 1 1 (/ 3)
Positive Definite Matrices – p. 41
Applications of SVD
Through SVD, we can represent a matrix as a sum of
rank-1 matrices

A = U ΣV T = σ1 u1 v1T + · · · + σr ur vrT .

Suppose we have a 1000 × 1000 matrix, for a total of 106


entries. Using the above expansion and keep only the
50 most significant terms. This would require
50(1 + 1000 + 1000) numbers, a save of space of almost
90%.
commonly used in image processing

Positive Definite Matrices – p. 42


SVD for Image

Positive Definite Matrices – p. 43


Pseudo-Inverse
Consider the normal equation

AT Ax̂ = AT b.

AT A is not always invertible and x̂ is not unique.


Among all solution, we denote the one with the
minimum length by x+ .

AT Ax+ = AT b.

The matrix that produces x+ from b is called the


pseudo-inverse of A, denoted by A+ ,

x+ = A+ b.
Positive Definite Matrices – p. 44
Pseudoinverse and SVD
A+ is related to SVD A = U ΣV T by

A+ = V Σ+ U T ,

Note Σ+ is the n × m matrix with diagonals being


1 1
σ1 , . . . , σr and others being 0.

If A is invertible, then

AA+ = U ΣV T V Σ+ U T = I ⇒ A+ = A−1 .

Positive Definite Matrices – p. 45


Minimum Length
Consider |Ax − b|.
Multiplying U T leaves the length unchanged,

|Ax − b| = |U ΣV T x − b| = |ΣV T x − U T b| = |Σy − U T b|.

x, y have the same length since y = V T x = V −1 x.


Σy has at most r non-zero components, which are
equated to U T b. Other components of y are set to 0 to
minimize |y|, so y + = Σ+ U T b.
The minimum-length least-square solution for x is

x+ = V y + = V Σ+ U T b = A+ b.

Positive Definite Matrices – p. 46


Different Perspectives
We have solved several classes of problems, including
system of linear equations, and
eigenvalue problems.
The same solutions can be achieved by completely
different problem formulations.

Positive Definite Matrices – p. 47


Optimization Problem for Ax = b
If A is positive definite, then
1 T
P (x) = x Ax − xT b
2
reaches its minimum where

Ax = b.

This is proved by showing


1
P (y) − P (x) = (y − x)T A(y − x) ≥ 0.
2

Positive Definite Matrices – p. 48


T T
A Ax = A b
The least-squares problems for Ax = b has a flavor of
minimization.
The function to be minimized is
E(x) = |Ax − b|2 = (Ax − b)T (Ax − b)
= xT AT Ax − xT AT b − bT Ax + bT b.

E(x) has essentially the same form as P (x). The


minimum is achieved where

A′ x = b′ i.e. AT Ax = AT b.

Positive Definite Matrices – p. 49


Rayleigh’s Quotient
The Rayleigh’s quotient is defined by

xT Ax
R(x) = T ,
x x
where A is real-symmetric.
The smallest eigenvalue for

Ax = λx

can be solved by minimizing R(x).

Positive Definite Matrices – p. 50


Rayleigh’s Principle
The minimum value of the Rayleigh’s quotient R(x) is
the smallest eigenvalue λ1 of A, achieved by the
corresponding eigenvector x1 .
This follows from
(Qy)T A(Qy) y T Λy λ1 y12 + · · · + λn yn2
R(x) = = T =
T
(Qy) (Qy) y y y12 + · · · + yn2
λ1 (y12 + · · · + yn2 )
≥ 2
= λ1 .
y1 + · · · + yn 2

Positive Definite Matrices – p. 51

You might also like