Positive Definite Matrices Explained

Positive Definite Matrices
Notes on Linear Algebra

Chia-Ping Chen
Department of Computer Science and Engineering

National Sun Yat-Sen University
Kaohsiung, Taiwan ROC
Positive Definite Matrices – p. 1

Introduction
The signs of the eigenvalues of a matrix can be
important.
For example, in a system of differential equations,
the signs determine whether the system is stable.
The signs can also be related to the minima, maxima,
and saddle points of a multi-variate function.

Expansion of a Function
For a “well-behaved” function of x, y , the value near the
origin can be written as
∂f ∂f 1 ∂2f 2 ∂2f 1 ∂2f 2
f (x, y) = f (0) + x+ y+ x + xy + y + ...,
∂x ∂y 2 ∂x2 ∂x∂y 2 ∂y 2
where the derivative is evaluated at the origin.

Near a point (α, β), the value can be written as
∂f ∂f
f (x, y) =f (α, β) + (x − α) + (y − β)
∂x ∂y
1 ∂2f 2 ∂2f 1 ∂2f 2
+ 2
(x − α) + (x − α)(y − β) + 2
(y − β) + ...,
2 ∂x ∂x∂y 2 ∂y
where the derivative is evaluated at (α, β).

Stationary Points
By definition, a point (α, β) is stationary if the first-order
partial derivatives evaluated at (α, β) are all 0.
A stationary point is either a maximum, minimum, or a
saddle point.
It is a maximum if the functional value of all nearby
points are not greater.
It is a minimum if the functional value of all nearby
points are not smaller.
It is a saddle point if it is neither a maximum nor a
minimum.
How do we decide?

Expansion Near a Stationary Point
The expansion of a function near a stationary point is
∂f ∂f
f (x, y) =f (α, β) + (x − α) + (y − β)
∂x ∂y
1 ∂2f 2 ∂2f 1 ∂2f 2
+ (x − α) + (x − α)(y − β) + (y − β) + ...
2 ∂x 2 ∂x∂y 2 ∂y 2
. 1 ∂2f 2 ∂2f 1 ∂2f 2

=f0 + (δx) + (δx)(δy) + (δy)
2 ∂x2 ∂x∂y 2 ∂y 2

Hessian Matrix
The Hessian matrix at a point x = (x, y) is defined by
 
∂2f ∂2f
H(x) =  ∂x2 2 ∂x∂y 
∂ f ∂2f
∂x∂y ∂y 2
It follows that near a stationary point x,

1 T
δf = δx H(x)δx.
2

Quadratic Form
By definition, a function f is quadratic in variables x, y
if f has the following form
f (x, y) = ax2 + 2bxy + cy 2 .
f (x, y) can be expressed by a real symmetric matrix

  
h i a b x
f (x, y) = x y    .
b c y
Note that the expansion of a function near a stationary

point is quadratic in the variables δx = (δx, δy).

Positive/Negative (Semi-)Definite
By definition, a quadratic function f is positive definite
if its value is positive for all points except for the origin.
f is said to be positive semidefinite if the values are
non-negative.
f is said to be negative definite if the values are
negative except for the origin.
f is said to be negative semidefinite if the values are
non-positive.

Necessary Conditions
For f (x, y) = ax2 + 2bxy + cy 2 to be p.d., a > 0 and c > 0.
This can be shown by looking at points (1, 0) and
(0, 1).
But these are merely necessary conditions, as the
following example demonstrates
f0 (x, y)|x=1,y=1 = x2 − 4xy + y 2 < 0.

Sufficient Condition
We can express f using squares by
2
b b
f (x, y) = ax2 + 2bxy + cy 2 = a(x + y)2 + (c − )y 2 ,
a a
Since square is non-negative, we see from above that
f is positive definite if (sufficient condition)
a > 0, ac > b2 .

Negative Definite
f (x, y) = ax2 + 2bxy + cy 2 is negative definite if
f (x, y) < 0 other than (x, y) = (0, 0).
f is negative definite iff −f is positive definite.
A sufficient condition for negative definiteness is
(−a) > 0, (−a)(−c) > (−b)2 .
Equivalently, getting rid of negative signs,
a < 0, ac > b2 .
example and diagram

Singular Case
We have a singular case if
ac = b2 .
If a > 0, f is still non-negative everywhere, since

b 2
f (x, y) = a(x + y) .
a
The surface z = f (x, y) degenerates from a bowl to a
valley, along the line ax + by = 0.
f is said to be positive semidefinite (psd) if a > 0 and
negative semidefinite if a < 0.
example and diagram

Saddle Point
The remaining case is when
ac < b2 .
In this case, (0, 0) is a saddle point.

Along one direction (0, 0) is a minimum, and along
another direction (0, 0) is a maximum.
f is said to be indefinite.

Example and Diagram
Let
f (x, y) = x2 − y 2 .
Note
ac = −1 < 0 = b2 .
The point (0, 0) is a minimum along the x-axis, a
maximum along the y -axis.

Matrix and Quadratic Form
A matrix A defines a quadratic form,
  
a11 ... a1n x
  1 n
 . ..   .. 
i
..
h X
xT Ax = x1 ... xn  .. . .   . = aij xi xj
    i,j=1
an1 ... ann xn
Note that for i 6= j , the coefficient of xi xj is aij + aji .

Given a quadratic form, we make A symmetrical by
requiring aij = aji .
The positive-definiteness of a matrix is defined via
f (x) = xT Ax.

Positive Definite Matrices
A real symmetric matrix A is said to be positive definite
if xT Ax > 0 except for x = 0.
From the earlier discussion, we have, in the
two-dimensional case,
 
a b
A=  
b c
is positive definite iff a > 0 and ac > b2 .

We generalize to the case that A is n × n.

Conditions for Positive Definiteness
The following are equivalent conditions for positive
definiteness.
All eigenvalues are positive.
All determinants of the principal submatrices are
positive.
All pivots are positive.
example

Positive Eigenvalues
If a matrix is positive definite, then it has all positive
eigenvalues.
Let xi be an eigenvector with eigenvalue λi .
xTi Axi = λi (xTi xi ) > 0 ⇒ λi > 0.
Conversely, if all eigenvalues are positive, then a

matrix is positive definite.
Via a complete set of eigenvectors,
X
T T T
x Ax = x QΛQ x = c2i λi > 0.
i
example

Positive Determinants
If A is positive definite, then the principal submatrices
Ak ’s have positive determinants.
Ak is positive definite. To prove, let x be a vector
with the last n − k components being 0
  
h i A k ∗ xk
xT Ax = xTk 0     = xTk Ak xk > 0.
∗ ∗ 0
Since all eigenvalues of Ak are positive,

Y
|Ak | = λi > 0.
i
example
Positive Pivots
Suppose A is positive definite, then all pivots di are
positive.
|Ak |
dk = > 0,
|Ak−1 |
since |Ak | > 0 for all k .
Conversely, if di > 0 for all di , then A is positive definite.
X
T T
A = LDL ⇒ x Ax = di (LT x)2i > 0.
i
example

Relation to Least Squares
For a least squares problem Rx = b we solve the
normal equation
RT Rx̄ = RT b.
Note that the matrix A = RT R is symmetric.
(theorem) A is positive definite if R has linearly
independent columns.

= 0, x = 0,
xT Ax = xT RT Rx = (Rx)T (Rx)
> 0, x 6= 0
example

Cholesky Decomposition
(Cholesky decomposition) A positive definite matrix is
the product of a lower-triangular matrix and its
transpose.
A special case comes from LDU decomposition,
A = LDLT = LD1/2 D1/2 LT = RT R.
There are infinite ways to decompose a positive

definite A = RT R. In fact, R′ = RQ, where Q is
orthogonal, also satisfies A = R′T R′ .
example

Ellipsoids in n Dimensions
Consider the equation xT Ax = 1, where A is p.d.
If A is diagonal, the graph is easily seen as an ellipsoid.
If A is not diagonal, then using the eigenvectors as
columns of Q, we have
xT Ax = xT QΛQT x = y T Λy = λ1 y12 + · · · + λn yn2 .
yi = qiT x is the component of x along the ith

eigenvector qi .

Principal Axes
The axes of the ellipsoid defined by xT Ax = 1 point
toward the eigenvectors (qi ) of A.
They are called principal axes.
The principal axes are mutually orthogonal.
√
The length of the axis along qi is 1/ λi .
example

Semidefinite Matrices
A matrix is said to be positive semidefinite if
xT Ax ≥ 0 for all x.
Each of the following conditions is equivalent to

positive semi-definiteness.
All eigenvalues are non-negative.
|Ak | ≥ 0 for all principal submatrices Ak .
All pivots are non-negative.
A = RT R for some R.

Example
The following matrix is positive semidefinite.
 
2 −1 −1
 
A=  −1 2 −1

−1 −1 2
quadratic form:
xT Ax = (x1 − x2 )2 + (x2 − x3 )2 + (x3 − x1 )2 ≥ 0.
eigenvalues: 0, 3, 3 ≥ 0; pivots: 2, 23 ≥ 0.
determinants:
|A1 | = 2, |A2 | = 3, |A3 | = 0 ≥ 0.

Congruence Transform
The congruence transform of A by a non-singular C is
defined by
A → C T AC
Congruence transform is related to quadratic form,
x = Cy ⇒ xT Ax = y T C T ACy.
If A is symmetric, so is its congruence transform

C T AC .
example

Sylvester’s Law
The signs of eigenvalues are invariant under a
congruence transform.
(proof) For simplicity, suppose A is nonsingular. Let
C = QR, C(t) = tQ + (1 − t)QR.
The eigenvalues of C(t)T AC(t) change gradually as

we vary t from 0 to 1, but they are never 0 since
C(t) = Q(tI + (1 − t)R) is never singular. So the signs
of eigenvalues never change.
Since QT AQ and A have the same eigenvalues, the
law is proved.

Signs of Pivots
The LDU -decomposition of a symmetric matrix A is
A = LDU = U T DU.
So A is a congruence transform of D.
As a result of the Sylvester’s law, the signs of the
pivots (eigenvalues of D) agree with the signs of the
eigenvalues.
example

Locating Eigenvalues
The relation between pivots and eigenvalues can be
used to locate eigenvalues.
First, note that if A has an eigenvalue λ, then A − cI
has the eigenvalue λ − c with the same eigenvector,
Ax = λx ⇒ (A − cI)x = (λ − c)x.

Example
Consider
   
3 3 0 1 3 0
   
  
.
A = 3 10 7 , B = A − 2I = 3 8 7
0 7 8 0 7 6
B has a negative pivot, so it has a negative eigenvalue.

A is positive definite.
It follows
λA > 0, λB = λA − 2 < 0 ⇒ 0 < λA < 2.

Generalized Eigenvalue Problem
A generalized eigenvalue problem has the form of
Ax = λM x,
where A, M are given matrices.

We consider only the case that A is symmetric and M
is positive definite.
example

Equivalent Eigenvalue Problem
A generalized eigenvalue problem can be converted to
an equivalent eigenvalue problem.
We can write M = RT R, where R is invertible.
Let y = Rx,
Ax = λM x = λRT Rx ⇒ AR−1 y = λRT y.
Let C = R−1 so (RT )−1 = C T . Then
C T ACy = λy.
This is an equivalent eigenvalue problem: same

eigenvalues, related eigenvectors x = Cy .

Properties
Since C T AC is symmetric, the eigenvectors yj can be
made orthonormal.
The eigenvectors xj are M -orthonormal, i.e.,
xTi M xj = xTi RT Rxj = yiT yj = δij .
example

Simultaneous Diagonalization
Both M and A can be diagonalized by the generalized
eigenvectors xi ’s,
xTi M xj = yiT yi = δij , xTi Axj = λj xTi M xj = λj δij .
That is, using xi ’s as the columns of S , we have
S T AS = Λ, S T M S = I.
example
Note they are congruence transforms to diagonal
matrices rather than similarity transforms, as S T is
used, not S −1 .

Singular Value Decomposition
The singular value decomposition (SVD) of a matrix A
is defined by
A = U ΣV T ,
where U, Σ, V are related to the matrices AT A and AAT
Here A is not limited to be a square matrix.

SVD Theorem
Any m × n real matrix A with rank r can be factored by
A = U ΣV T = (orthogonal)(diagonal)(orthogonal).
U is an eigenvector matrix of AAT .

V is an eigenvector matrix of AT A.
Σ contains the r singular values on the diagonal.
A singular value is the square root of a non-zero
eigenvalue of AT A.

Proof
Suppose v1 , . . . , vn are orthonormal eigenvectors of
AT A, with eigenvalues in non-increasing order
viT AT Avj = λj viT vj = λj δij , λ1 ≥ λ2 ≥ · · · ≥ λn .
Since AT A has the same nullspace as A, there are r

non-zero eigenvalues.
These non-zero eigenvalues are positive since AT A is
p.s.d.
Define the singular value for a positive eigenvalue
p
σj = λj , j = 1, . . . , r.

Proof
uj = Av
σj
j
, j = 1, . . . , r is an eigenvector of AA T , and u ’s
j
are orthonormal.
T AAT Avj Avj

AA uj = = λj = λj uj , uTi uj = δij .
σj σj
Construct V with v ’s, and U with u’s and eigenvectors

for 0,

0 if j > r,
T T
(U AV )ij = ui Avj =
σj uT uj = σj δij if j ≤ r.
i
That is, U T AV = Σ. So A = U ΣV T .

Remarks
A multiplied by a column of V produces a multiple of a
column of U ,
AV = U Σ, or Avj = σj uj .
U is the eigenvector matrix of AAT and V is the

eigenvector matrix of AT A.
AAT = U ΣΣT U T ; AT A = V ΣT ΣV T .
The non-zero eigenvalues of AAT and AT A are the

same. They are in ΣΣT .

Example
Consider
 
  1 −1 0
−1 1 0 T
 
A=  . A A = −1 2 −1 .

0 −1 1
 
0 −1 1
√
The singular values are 3, 1.
Finding vi and ui , one has
 1 −2 1 (/√6)
 
 √
1 −1 1  3 0 0  √ 
A= √  −1 0 1 (/ 2)
2 1 1 0 1 0

√

1 1 1 (/ 3)
Applications of SVD
Through SVD, we can represent a matrix as a sum of
rank-1 matrices
A = U ΣV T = σ1 u1 v1T + · · · + σr ur vrT .
Suppose we have a 1000 × 1000 matrix, for a total of 106

entries. Using the above expansion and keep only the
50 most significant terms. This would require
50(1 + 1000 + 1000) numbers, a save of space of almost
90%.
commonly used in image processing

SVD for Image

Pseudo-Inverse
Consider the normal equation
AT Ax̂ = AT b.
AT A is not always invertible and x̂ is not unique.

Among all solution, we denote the one with the
minimum length by x+ .
AT Ax+ = AT b.
The matrix that produces x+ from b is called the

pseudo-inverse of A, denoted by A+ ,
x+ = A+ b.
Pseudoinverse and SVD
A+ is related to SVD A = U ΣV T by
A+ = V Σ+ U T ,
Note Σ+ is the n × m matrix with diagonals being

1 1
σ1 , . . . , σr and others being 0.
If A is invertible, then
AA+ = U ΣV T V Σ+ U T = I ⇒ A+ = A−1 .

Minimum Length
Consider |Ax − b|.
Multiplying U T leaves the length unchanged,
|Ax − b| = |U ΣV T x − b| = |ΣV T x − U T b| = |Σy − U T b|.
x, y have the same length since y = V T x = V −1 x.

Σy has at most r non-zero components, which are
equated to U T b. Other components of y are set to 0 to
minimize |y|, so y + = Σ+ U T b.
The minimum-length least-square solution for x is
x+ = V y + = V Σ+ U T b = A+ b.

Different Perspectives
We have solved several classes of problems, including
system of linear equations, and
eigenvalue problems.
The same solutions can be achieved by completely
different problem formulations.

Optimization Problem for Ax = b
If A is positive definite, then
1 T
P (x) = x Ax − xT b
2
reaches its minimum where
Ax = b.
This is proved by showing

1
P (y) − P (x) = (y − x)T A(y − x) ≥ 0.
2

T T
A Ax = A b
The least-squares problems for Ax = b has a flavor of
minimization.
The function to be minimized is
E(x) = |Ax − b|2 = (Ax − b)T (Ax − b)
= xT AT Ax − xT AT b − bT Ax + bT b.
E(x) has essentially the same form as P (x). The

minimum is achieved where
A′ x = b′ i.e. AT Ax = AT b.

Rayleigh’s Quotient
The Rayleigh’s quotient is defined by
xT Ax
R(x) = T ,
x x
where A is real-symmetric.
The smallest eigenvalue for
Ax = λx
can be solved by minimizing R(x).

Rayleigh’s Principle
The minimum value of the Rayleigh’s quotient R(x) is
the smallest eigenvalue λ1 of A, achieved by the
corresponding eigenvector x1 .
This follows from
(Qy)T A(Qy) y T Λy λ1 y12 + · · · + λn yn2
R(x) = = T =
T
(Qy) (Qy) y y y12 + · · · + yn2
λ1 (y12 + · · · + yn2 )
≥ 2
= λ1 .
y1 + · · · + yn 2

Positive Definite Matrices Explained

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Positive Definite Matrices Explained

Uploaded by

Copyright:

Available Formats

Positive Definite Matrices

Notes on Linear Algebra

Department of Computer Science and Engineering

Positive Definite Matrices – p. 1

Positive Definite Matrices – p. 2

where the derivative is evaluated at the origin.

where the derivative is evaluated at (α, β).

Positive Definite Matrices – p. 3

Positive Definite Matrices – p. 4

. 1 ∂2f 2 ∂2f 1 ∂2f 2

Positive Definite Matrices – p. 5

It follows that near a stationary point x,

Positive Definite Matrices – p. 6

f (x, y) = ax2 + 2bxy + cy 2 .

f (x, y) can be expressed by a real symmetric matrix

Note that the expansion of a function near a stationary

Positive Definite Matrices – p. 7

Positive Definite Matrices – p. 8

f0 (x, y)|x=1,y=1 = x2 − 4xy + y 2 < 0.

Positive Definite Matrices – p. 9

Positive Definite Matrices – p. 10

(−a) > 0, (−a)(−c) > (−b)2 .

Equivalently, getting rid of negative signs,

example and diagram

Positive Definite Matrices – p. 11

If a > 0, f is still non-negative everywhere, since

Positive Definite Matrices – p. 12

In this case, (0, 0) is a saddle point.

Positive Definite Matrices – p. 13

Positive Definite Matrices – p. 14

Note that for i 6= j , the coefficient of xi xj is aij + aji .

Positive Definite Matrices – p. 15

is positive definite iff a > 0 and ac > b2 .

Positive Definite Matrices – p. 16

Positive Definite Matrices – p. 17

xTi Axi = λi (xTi xi ) > 0 ⇒ λi > 0.

Conversely, if all eigenvalues are positive, then a

Positive Definite Matrices – p. 18

Since all eigenvalues of Ak are positive,

Positive Definite Matrices – p. 20

Positive Definite Matrices – p. 21

A = LDLT = LD1/2 D1/2 LT = RT R.

There are infinite ways to decompose a positive

Positive Definite Matrices – p. 22

xT Ax = xT QΛQT x = y T Λy = λ1 y12 + · · · + λn yn2 .

yi = qiT x is the component of x along the ith

Positive Definite Matrices – p. 23

Positive Definite Matrices – p. 24

Each of the following conditions is equivalent to

Positive Definite Matrices – p. 25

xT Ax = (x1 − x2 )2 + (x2 − x3 )2 + (x3 − x1 )2 ≥ 0.

|A1 | = 2, |A2 | = 3, |A3 | = 0 ≥ 0.

If A is symmetric, so is its congruence transform

Positive Definite Matrices – p. 27

C = QR, C(t) = tQ + (1 − t)QR.

The eigenvalues of C(t)T AC(t) change gradually as

Positive Definite Matrices – p. 28

Positive Definite Matrices – p. 29

Positive Definite Matrices – p. 30

B has a negative pivot, so it has a negative eigenvalue.

λA > 0, λB = λA − 2 < 0 ⇒ 0 < λA < 2.

Positive Definite Matrices – p. 31