You are on page 1of 20

Vector Space and Matrices

CU PG-I
Anirban Kundu

Contents
1 Introduction: Two-dimensional Vectors

2 Linear Vector Space

2.1

Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2

Cauchy-Schwarz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3

Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.4

Linear Independence and Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Linear Operators

3.1

Some Special Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2

Projection Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3

Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Matrices

12

4.1

Some Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2

Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3

Eigenvalues and Eigenvectors, Again . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.4

Degenerate Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.5

Functions of a Matrix: The Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . 20

This note is mostly based upon:


Dennery and Krzywicki:

Mathematics for Physicists

Introduction: Two-dimensional Vectors

Let us recapitulate what we learnt about vectors (for simplicity, consider 2-dimensional vectors in
the cartesian coordinates).
Any vector A can be written as
A = a1 i + a2 j ,

(1)

where i and j are unit vectors along the x- and y-axes respectively, and a1 and a2 are real
numbers. From now on, we will use the shorthand A = (a1 , a2 ) for eq. (1). This is called an
ordered pair, because (a1 , a2 ) 6= (a2 , a1 ). A set of n such numbers where ordering is important
is known as an n-tuple.
Two two-dimensional vectors A = (a1 , a2 ) and B = (b1 , b2 ) can be added to give another
two-dimensional vector C = (c1 , c2 ) with c1(2) = a1(2) + b1(2) .
We can multiply any vector A = (a1 , a2 ) by a real number d to get the vector D = dA.
The individual components are multiplied by d, so the magnitude of the vector increases by
a factor of d.
The null vector 0 = (0, 0) always satisfies A + 0 = 0 + A = A. Also, there is a vector
A = (a1 , a2 ) so that A + (A) = 0.
The scalar product of two vectors A and B is defined as
A.B =

2
X

ai bi .

(2)

i=1

We can also write this simply as ai bi with the convention that every repeated index is summed
over. This is known as the Einstein convention.

The magnitude or length of A is given by a = A.A = ai ai . Thus, A.B = ab cos where


is the angle between the two vectors. Obviously, |A.B| |A||B|, which is ab. The equality
sign applies only if = 0 or .
There is, of course, nothing sacred about the choice of i and j as the basis vectors. One can
rotate the coordinate axes by an angle in the counterclockwise direction, so that for the
new axes,
i0 = i cos + j sin , j0 = i sin + j cos .
(3)
It follows from i.i = j.j = 1 and i.j = 0 that i0 .i0 = j0 .j0 = 1 and i0 .j0 = 0.
Q. What are the components of A in the primed frame?
Q. Show that A.i projects out the i-th component of A. Hence show that A (A.i)i is orthogonal
to i. This is something that we will use later.

Linear Vector Space

Let us consider an assembly of some abstract objects, which we will denote as |i. If we want to
label them, we might call them |1i or |ji. (You will see this object a lot in quantum mechanics; this
is called a ket. In fact, we will develop the idea of vector spaces keeping its application in quantum
mechanics in mind.) Let this assembly be called S. We say that the kets live in the space S. This
will be called a linear vector space if the kets satisfy the following properties.
2

1. If |ai , |bi S, then (|ai + |bi) S.


2. If |ai S, then |ai S too, where is a complex number.
3. There exists a null element |0i S such that |ai S, |ai + |0i = |0i + |ai = |ai ( is a
shorthand for for all).
4. |ai S, there exists an |a0 i S such that |ai + |a0 i = |0i.
The addition and multiplication as defined above satisfy the standard laws:
1. |ai + |bi = |bi + |ai, |ai + (|bi + |ci) = (|ai + |bi) + |ci .
2. 1.|ai = |ai .
3. (|ai) = ()|ai, ( + )|ai = |ai + |ai, (|ai + |bi) = |ai + |bi.
If all the conditions are satisfied, the set S is called a linear vector space (LVS) and the objects |i
are called vectors.
We will often write the null vector |0i as just 0, because any vector |ai multiplied by 0 gives
|0i. We can write
|ai = 1|ai = (0 + 1)|ai = 0|ai + 1|ai = 0|ai + |ai .
(4)
Now suppose |ai + |bi = |0i (|bi is known as the inverse of |ai), so
|0i = |ai + |bi = 0|ai + |ai + |bi = 0|ai + |0i = 0|ai .

(5)

Thus, 0|ai = |0i , |ai S, and we can safely write 0 for |0i.
This also defines subtraction of vectors,
|ii |ji = |ii + (1)|ji = |ii + |j 0 i ,

(6)

where |j 0 i is the inverse of |ji.


Examples:
1. The two-dimensional vectors form an LVS. Note that the maximum number of linearly
independent vectors is 2 here. This is known as the dimensionality of the vector space. A more
formal definition will be given later.
2. All complex numbers z = a + ib form an LVS. z can be written as an ordered pair (a, b), and
the equivalent of dot product, z1 z2 , is (a1 a2 + b1 b2 , a1 b2 a2 b1 ). Note that if z1 = z2 , the product
is real.
3. All arrows in a two-dimensional space, when multiplied only by a real number, form a 2dimensional vector space. This is because such multiplications cannot change the orientations of
the arrows. If the multiplication is by a complex number, both magnitude and direction change,
and the LVS is 1-dimensional.
4. All sinusoidal waves of period 2 form an LVS. Any such wave can be described as sin(x+) =
cos sin x+sin cos x, so one can treat sin x and cos x as basis vectors. This is again a 2-dimensional
LVS.

2.1

Dual Space

The scalar product of two vectors |ai and |bi in S is a number, denoted by ha|bi (the symbol h is
called a bra, so that h|i gives a closed bracket. The notation is due to Dirac.) The properties of
the scalar product is as follows.
1. ha|bi = hb|ai . Thus, in general, ha|bi 6= hb|ai, p
but ha|ai is real. Also, ha|ai 0, where the
equality sign comes only if a = 0. This defines ha|ai as the magnitude of the vector |ai.
2. If |di = |ai + |bi, then hc|di = hc|ai + hc|bi is a linear function of and . However,
hd|ci = ha|ci + hb|ci is a linear function of and and not of and .
If somehow ha|bi = hb|ai, the LVS is called a real vector space. Otherwise, it is complex. The
LVS of two-dimensional vectors is a real vector space as A.B = B.A. That of the complex numbers
(example 2 above) is a complex vector space.
In quantum mechanics, the space in which the wavefunctions live is also an LVS. This is known
as the Hilbert space 1 , after the celebrated German mathematician David Hilbert. We can indeed
check that the Hilbert space is an LVS; in particular, that is why the superposition principle in
quantum mechanics holds. The wavefunctions are, however, complex quantities, and the scalar
product is defined as
Z
1 2 d3 x .

h1 |2 i =

(7)

This is a complex vector space as h1 |2 i = h2 |1 i .


The vectors |ai and |bi are orthogonal if they satisfy ha|bi = 0. Thus, if a vector |ai is orthogonal
to all vectors |i S, it must be a null vector, as ha|ai = 0.
If a set of vectors satisfy hi|ji = ij , where the Kr
onecker delta is defined as
ij = 0 if i 6= j ,

ij = 1 if i = j ,

(8)

the vectors are said to be orthonormal.


To facilitate the scalar product, and also taking into account the fact 2 above, we define, for every
LVS S with kets |ii, another space SD with bras hi|, so that there is a one-to-one correspondence
between |ii and hi| (thats why we use the same label i). The space SD is called the dual space of
S, and |ii and hi| are dual vectors to each other. The dual space must satisfy:
1. The product of ha| and |bi is just the scalar product:
ha|.|bi = ha|bi .

(9)

hd| = ha| + hb| .

(10)

2. If |di = |ai + |bi, then


That is the rule to get a dual vector: change the ket to the corresponding bra, and complex
conjugate the coefficients.
The scalar product is defined only between a vector from S and another vector from the dual
space SD . Of course, a lot of spaces, like the space for cartesian vectors, are self-dual; there is no
distinction between the original space and the dual space. And that is why you never knew about
dual space when learning dot product of ordinary vectors.
1
The proper definition of the Hilbert space may be found in, say, Dennery and Krzywicki. It is an infinitedimensional space but most of the times in quantum mechanics, we work with a small subset of the original space
which is finite.

2.2

Cauchy-Schwarz Inequality

Consider the ket


|ci = |ai xhb|ai|bi ,

(11)

where x is a real number. So hc| = ha| xha|bihb|. Thus,


hc|ci = x2 hb|aiha|bihb|bi 2xhb|aiha|bi + ha|ai 0 ,

(12)

because of the definition of the length of |ci.


This is a quadratic in x, with all real coefficients (hb|aiha|bi is real), and can never be negative.
Thus, the function can never go below the x-axis; it can at most touch it from above. In other
words, there cannot be two real roots, or
4 (hb|aiha|bi)2 4 (hb|aiha|bihb|bi) ha|ai

ha|aihb|bi hb|aiha|bi .

(13)

Eq. (13) is known as the Cauchy-Schwarz inequality. For ordinary vectors, this just means
|A|2 |B|2 > |A.B|2 | cos | 1 .

(14)

If either |ai or |bi is a null vector, this results in a trivial equality.


The triangle inequality of Euclidean geometry follows from (13). Consider three vectors |1i,
|2i, and |3i in a two-dimensional plane forming a triangle, so that |3i = |1i + |2i (this is a vector
sum). Thus,
h3|3i = h1|1i + h2|2i + h1|2i + h2|1i
|

{z

2Reh1|2i

h1|1i + h2|2i + 2|h1|2i|


q

h1|1i + h2|2i + 2 h1|1ih2|2i


so that

2.3

h3|3i

h1|1i +

(using CS inequality) ,

(15)

h2|2i.

Metric Space

A set R is called a metric space if a real, positive number (a, b) is associated with any pair of
its elements a, b R (remember that a and b need not be numbers) and (1) (a, b) = (b, a); (2)
(a, b) = 0 only when a = b; (3) (a, b) + (b, c) (a, c). The number (a, b) may be called the
distance between a and b. The third condition is nothing but the triangle inequality.
Do
p not confuse (a, b) with ha|bi. In particular, (a, a) = 0 (where a is some point in the LVS)
but ha|ai (where |ai is a vector) defines the length or norm of that vector. For example, h|i = 1
means that the wavefunction has been normalized to unity. More precisely, if one thinks |ai as the
radius vector starting at the origin and ending at the point a, and similarly for b, then (a, b) is
the norm of the vector |ai |bi (or the other way round).
If we have three vectors |ai, |bi, and |ci in an LVS and we define
|1i = |ai |bi ,

|2i = |bi |ci ,

|3i = |ai |ci ,

(16)

then |1i, |2i, |3i satisfy the triangle inequality, and also the first two conditions of a metric space,
so we can say:
If the scalar product is defined in an LVS, it is a metric space.
Note that the scalar product need not be defined for all linear vector spaces, but we will not
discuss those spaces.
5

1. The metric associated with the vectors


A = (a1 , a2 , a3 ) and B = (b1 , b2 , b3 ) in threep
dimensional cartesian coordinate is (a1 b1 )2 + (a2 b2 )2 + (a3 b3 )2 .
2. p
If two points (x, y, z) and (x + dx, y + dy, z+ dz) are sufficiently close, the metric is
dx2 + dy 2 + dz 2 . We can also write this is dxi dxi using the Einstein convention. We
may also remove the root by writing
ds2 = dx2 + dy 2 + dz 2 = dxi dxi .

(17)

3. Similarly, in two-dimensional plane polar coordinate system, the separation between two
points (r, ) and (r + dr, + d) is
ds2 = dr2 + r2 d2 .

(18)

In three-dimensional spherical polar coordinates,


ds2 = dr2 + r2 d2 + r2 sin2 d2 ,

(19)

and a good exercise is to deduce this.


4. In four-dimensional space-time, something that we use in special relativity, the separation
between two space-time coordinates (ct, x, y, z) and (c(t + dt), x + dx, y + dy, z + dz) is
ds2 = c2 dt2 dx2 dy 2 dz 2 .

(20)

Note the minus sign. This is a special property of the space-time coordinates, called the
Minkowski coordinates. Special relativity tells us that this interval is invariant no matter
which inertial frame you are in. One can trace the extra minus sign in front of the spatial
coordinates to the fact that there are two distinct vector spaces for 4-dimensional vectors, one
dual to the other, unlike the self-dual nature of ordinary 3-dimensional space. When you take
a vector from one space and the dual of another vector from the dual space, there comes the
minus sign, because the dual vector is formed by keeping the time component of the original
vector unchanged, while flipping the sign of the spatial components 2
It is nontrivial to write (20) in the Einstein convention because of the relative minus sign
between time and space coordinates. How one deals with this is discussed in detail later.

2.4

Linear Independence and Basis

A set of n vectors are said to be linearly independent if an equation of the form


n
X

ai |ii = 0

(21)

i=1

necessarily means all ai = 0. If there are at least two ai s that are nonzero, the vectors are called
linearly dependent.
The maximum number of linearly independent vectors in a space is called its dimension. If this
number is finite, the space is finite. If the number is infinite, the space is infinite too.
The three-dimensional space can have at most three linearly independent vectors, that is why
we call it three-dimensional. On the other hand, there are infinitely many independent states for a
2
It can be the other way around, only flipping the time component but keeping the spatial components unchanged.
The only thing that matters is the relative minus sign between the time and the space components.

particle in, say, an infinitely deep one-dimensional potential well (we take the depth to be infinite
so that all such states are bound; for a well of finite depth, there will be a finite number of bound
states and an infinite number of unbound states). When we expand any arbitrary function in a
Fourier series, there are infinitely many sine or cosine functions in the expansion, and they are
linearly independent 3 , so this is another infinite dimensional LVS.
If any vector |ai in an LVS can be written as
|ai =

n
X

ai |ii ,

(22)

i=1

the set of |ii vectors form a basis of the LVS. The number of basis vectors is obviously the dimension
of the space. We say that the basis vectors |ii span the space. The numbers ai are called components
of the vector |ai in the |ii basis (components depend on the choice of basis).
Given a basis, the components are unique. Suppose the vector |ai can be written both as
P
P
ai |ii and bi |ii. Subtracting one from the other, (ai bi )|ii = 0, so by the condition of linear
independence of the basis vectors, ai = bi for all i.
P

Starting from any basis |ai i, where i can be finite or infinite (but these basis vectors need not
be either orthogonal or normalized), one can always construct another orthonormal basis |ii. This
is known as Gram-Schmidt orthogonalization. The procedure is as follows.
1. Normalize the first vector of the original basis:
1
|1i = p
|a1 i .
ha1 |a1 i

(23)

Thus, h1|1i = 1.
2. Construct |20 i by taking |a2 i and projecting out the part proportional to |1i:
|20 i = |a2 i h1|a2 i|1i ,
which ensures

h1|20 i

= 0. Divide

|20 i

(24)

by its norm to get |2i.

3. Repeat this procedure, so that


|m0 i = |am i

m1
X

hi|am i|ii .

(25)

i=1

It is easy to check that |m0 i is orthogonal to |ii, i = 1 to m 1. Normalize |m0 i to unit norm,
|mi = p

1
|m0 i .
hm0 |m0 i

(26)

This completes the proof.


Q. If the basis vectors are orthonormal, hi|ji = ij , show that the components of any vector |ai are
projected out by the projection operator |iihi|.
Q. A two-dimensional vector is written as A = 3i + 2j. What will be its components in a basis
obtained by rotating the original basis by /4 in the counterclockwise direction?
Q. In a two-dimensional space, the basis vectors |ii and |ji are such that hi|ii = 1, hj|ji = 2,
hi|ji = 1. Construct an orthonormal basis.
Q. Suppose we take in the three-dimensional cartesian space the following vectors as basis: a =
i + j + k, b = i 2k, c = 2j + k. Normalize a and construct two other orthonormal basis vectors.
(This shows you that the orthonormal basis is not unique, of course we can take i, j, k as another
orthonormal basis.)
3
How to know that the sine and cosine functions are linearly independent? Easy; one cannot express, e.g., sin 2x
as a linear combination of sin x and cos x. sin x cos x is not a linear combination.

Linear Operators

A function f (x) associates a number y with another number x according to a certain rule. For
example, f (x) = x2 associates, with every number x, its square. The space for x and y need not
be identical. For example, if x is any real number, positive, negative, or zero, f (x) is confined only
to the non-negative part of the real number space.
Similarly, we can assign with every vector |xi of an LVS, another vector |yi, either of the same
LVS or of a different one, according to a certain rule. We simply write this as
|yi = O|xi ,

(27)

to
and O is called an operator, which, acting on |xi, gives |yi. We often put a hat on O, like O,
indicate that this is an operator. Unless a possible confusion can occur, we will not use the hat.
We will be interested in linear operators, satisfying
O[|ai + |bi] = O|ai + O|bi .

(28)

In very special circumstances, we may need an antilinear operator:


O[|ai + |bi] = O|ai + O|bi .

(29)

A function f (x) may not be defined for all x; f (x) = x is not defined for x < 0 if both x and f (x)
are confined to be real. Similarly, O|xi may not be defined for all |xi. The set of vectors |xi S
for which O|xi is defined is called the domain of the operator O.
O|xi may take us outside S. The totality of all such O|xi, where |xi is any vector in S and in
the domain of O, is called the range of the operator O. In quantum mechanics, we often encounter
situations where the range is S itself, or a part of it. Well see examples of both.
The identity operator 1 takes a vector to itself without any multiplicative factors: 1|xi =
|xi , |xi S.
The null operator 0 annihilates all vectors in S: 0|xi = |0i = 0 , |xi S.
If A and B are two linear operators acting on S, A = B means A|xi = B|xi , |xi S.
C = A + B means C|xi = A|xi + B|xi , |xi S.
D = AB means D|xi = A[B|xi] , |xi S. Note that AB is not necessarily the same as BA.
A good example is the angular momentum operators in quantum mechanics: Jx Jy 6= Jy Jx .
If AB = BA, the commutator [A, B] = AB BA is zero, and we say that the operators
commute.
The identity operator obviously commutes with any other operator A, as A1|xi = A|xi, and
1[A|xi] = A|xi.
One can multiply an operator with a number. If A|xi = |yi, then A|xi = |yi. Obviously,
A = A.
One can formally write higher powers of the operators. For example, A2 |xi = A[A|xi].
Similarly,
1
1
e A 1 + A + A2 + A3 +
(30)
2!
3!
8

The operator A can also act on the dual space SD . If A|ai = |ci, one may write hb|A|ai = hb|ci.
The vector hd| = hb|A is defined in such a way that hd|ai = hb|ci.
This is quite a common practice in quantum mechanics, e.g.,
h1 |H|2 i =

1 H2 d3 x .

(31)

Note that hb|A is not the dual of A|bi. To see this, consider the operator 1. Acting on |xi,
this gives |xi. The dual of this is hx| , which can be obtained by operating 1 , and not
1, on hx|.
Q.
Q.
Q.
Q.
Q.
Q.

If A and B are linear operators, show that A + B and AB are also linear operators.
If A + B = 1 and AB = 0, what is the value of A2 + B 2 ?
Show that eA eA = 1 .
If [A, B] = 0, show that eA eB = eA+B .
If [A, B] = B, show that eA BeA = eB .
If O|xi = |xi, check whether O is linear. Do the same if O|xi = [|xi] .

3.1

Some Special Operators

The inverse operator of A is the operator B if AB = 1 or BA = 1. The operator B is generally


denoted by A1 . If A1 A = 1, A1 is called the left inverse of A, and sometimes denoted by A1
l .
Similarly, if AA1 = 1, A1 is called the right inverse, and written as A1
.
Note
that
there
is
no
r
guarantee that any one or both the inverse operators will exist. Also, in general, AA1
=
6
1
and
l
A1
r A 6= 1.
However, if both the inverses exist, they must be equal:
1
1
1
1
A1
A1
l A = 1 Al AAr = Ar
l = Ar ,

(32)

where we have used the fact that any operator O multiplying 1 gives O.
The inverse in this case is also unique. To prove this, suppose we have two different inverses
1
A1
1 and A2 (whether left or right does not matter any more). Now
1
1
1
1
A1
A1
1 A = 1 A1 AA2 = A2
1 = A2 ,

(33)

leading to a reductio ad absurdum.


One can now define a unique inverse operator A1 for the operator A. It also follows that
(AB)1 = B 1 A1 , because
(AB)1 (AB) = B 1 A1 AB = B 1 1B = B 1 B = 1 .

(34)

Similarly, (AB)(AB)1 = 1.
Suppose the scalar product is defined in S. If there is an operator B corresponding to an
operator A such that
ha|A|bi = hb|B|ai , |ai , |bi S ,
(35)
then B is called the adjoint operator of A and denoted by A . Thus, it follows that hb|A is the
dual vector of A|bi.
Now, ha|(A ) |bi = hb|A |ai = ha|A|bi, so (A ) = A. Also,
ha|A B |bi = [ha|A ][B |bi] = {[hb|B][A|ai]} = hb|BA|ai = ha|(BA) |bi ,
9

(36)

where we have used the duality property of the vectors. Thus, for any two operators A and B,
A B = (BA) .

(37)

If A = A, the operator is called self-adjoint or hermitian. If A = A, it is anti-hermitian. The


hermitian operators play an extremely important role in quantum mechanics.
The operator d/dx is anti-hermitian but i d/dx is hermitian. This is because
 
Z
Z
Z
d
d
d
d1
h1 | |2 i = 1
2 dx =
(1 2 )dx
2 dx .
dx
dx
dx
dx

(38)

But the first integral is zero as both wavefunctions must vanish at the boundary of the integration region.
Now, complete the proof by showing that i d/dx is hermitian. This shows that momentum is indeed a
hermitian operator in quantum mechanics.

Another important class of operators is where U = U 1 . They are called unitary operators.
One can write
|U |ai|2 = [ha|U ][U |ai] = ha|U U |ai = ha|U 1 U |ai = ha|ai = ||ai|2 ,

(39)

which means that operation by unitary operators keeps the length or norm of any vector unchanged.
The nomenclature is quite similar to those used for matrices. We will show later how one can
represent 4 the action of an operator to a vector by conventional matrix multiplication.
Note that the combination |aihb| acts as a linear operator. Operating on a ket, this gives a ket;
operating on a bra, this gives a bra.
(|aihb|)|ci = hb|ci|ai ,

hd|(|aihb|) = hd|aihb| .

(40)

Also,
hx|(|aihb|)|yi = (hx|ai)(hb|yi) = [hy|biha|xi] = hy|(|biha|)|xi ,

(41)

so |biha| is the adjoint of |aihb|.


To sum up the important points once again:
1. The dual of A|xi is hx|A . This is the definition of the adjoint operator. If A = A , the
operator is hermitian.
2. In an expression like hx|A|xi, A can act either on hx| or |xi. But if A|xi = |yi, hx|A 6= hy|
unless A is hermitian.
3. The operator is a 3-dimensional vector as it satisfies all the transformation properties of a
vector (there is a 4-dimensional analogue too). Thus, .A is a scalar. A is a cross product,
which is just a way to combine two vectors to get another vector: A = BC Ai = ijk Bj Ck .
This is a vector operator, a vector whose components are operators. Note that d/dx is an
operator whose inverse does not exist unless you specify the integration constant.

3.2

Projection Operators

Consider the LVS S of two-dimensional vectors, schematically written as |xi. Let |ii and |ji be the
two unit vectors along the x- and y-axes. The operator Pi = |iihi|, acting on any vector |xi, gives
hi|xi|ii, a vector along the x-direction with a magnitude hi|xi.
4

We will also explain what representation means.

10

Obviously, the set Pi |xi is a one-dimensional LVS. It contains all those vectors of S that lie
along the x-direction, contains the null element, and also the unit vector |ii, which can be obtained
by Pi |ii. Such a space S 0 , all whose members are members of S but not the other way round, is
called a nontrivial subspace of S. The null vector, and the whole set S itself, are trivial subspaces.
The operator Pi is an example of the class known as projection operators. We will denote them
by P . These operators project out a subspace of S. Once a part is projected out, another projection
cannot do anything more, so P 2 = P . A projection operator must also be hermitian, since it is
necessary that it projects out the same part of the original space S and the dual space SD . Any
operator that is hermitian and satisfies P 2 = P is called a projection operator.
Suppose P1 and P2 are two projection operators. They project out different parts of the original
LVS. Is P1 + P2 a projection operator too? If P1 = P1 and P2 = P2 , (P1 + P2 ) = P1 + P2 . However,
(P1 + P2 )2 = P12 + P22 + P1 P2 + P2 P1 = (P1 + P2 ) + P1 P2 + P2 P1 ,

(42)

so that P1 P2 + P2 P1 must be zero. Multiply from left by P1 and use P12 = P1 , this gives P1 P2 +
P1 P2 P1 = 0. Similarly, multiply by P1 from the right, and subtract one from the other, to get
P1 P2 P2 P1 = 0 ,

(43)

so that the only solution is P1 P2 = P2 P1 = 0. Projection operators like this are called orthogonal
projection operators. As an important example, for any P , 1 P is an orthogonal projection
operator. They sum up to 1, which projects the entire space onto itself.
In short, if several projection operators P1 , P2 , Pn satisfy
Pi Pj = Pi for i = j ,
then

i Pi

0 otherwise ,

(44)

is also a projection operator.

Q. Show that dm /dxm is anti-hermitian if m is odd and hermitian if m is even.


Q. Show that if P is a projection operator, so is 1 P , and it is orthogonal to P .
Q. For a two-dimensional cartesian space, show that |iihi| and |jihj| are orthogonal projection
operators (|ii and |ji are the unit vectors along x and y axes respectively.)

3.3

Eigenvalues and Eigenvectors

If the effect of an operator A on a vector |ai is to yield the same vector multiplied by some constant,
A|ai = a|ai ,

(45)

We call it an eigenvalue equation, the vector |ai an eigenvector of A, and a the eigenvalue A.
If there is even one vector |xi which is a simultaneous eigenvector of both A and B, with
eigenvalues a and b respectively, then A and B commute. This is easy to show, as
(AB)|xi = A(b|xi) = bA|xi = ab|xi , (BA)|xi = B(a|xi) = aB|xi = ab|xi ,

(46)

and so [A, B]|xi = 0, therefore [A, B] is the null operator 0, or AB BA = 0.


The reverse is not necessarily true. It is true only if both A and B have non-degenerate
eigenvectors. If there are two or more eigenvectors for which the eigenvalues of an operator are the
same, the eigenvectors are called degenerate. If no two eigenvectors have same eigenvalues, they
are called non-degenerate. If an operator A has two degenerate eigenvectors |a1 i and |a2 i with
11

the same eigenvalue a, any linear combination c|a1 i + d|a2 i is also an eigenvector, with the same
eigenvalue (prove it). This is not true for non-degenerate eigenvectors.
Suppose both A and B have non-degenerate eigenvectors, and [A, B] = 0. Also suppose |xi is
an eigenvector (often called an eigenket) of A with eigenvalue a. We can write
[A, B]|xi = 0|xi = 0 AB|xi = BA|xi A(B|xi) = a(B|xi) ,

(47)

or B|xi is also an eigenvector of A with the same eigenvalue a. But A has non-degenerate eigenvalues; so this can only happen if B|xi is just some multiplicative constant times |xi, or B|xi = b|xi.
Thus, commuting operators must have simultaneous eigenvectors if they are non-degenerate.
One can have a counterexample from the angular momentum algebra of quantum mechanics.
The vectors are labelled by the angular momentum j and its projection m on some axis, usually
taken to be the z-axis. These vectors, |jmi, are eigenvectors of the operator J2 = Jx2 + Jy2 + Jz2
5 . They are also eigenvectors of J but not of J or J . So here is a situation where J2 and J
z
x
y
x
commute but they do not have simultaneous eigenvectors. The reason is that all these |jmi states
are degenerate with respect to J2 with an eigenvalue of j(j + 1)h2 .
The eigenvalues of hermitian operators are necessarily real. Suppose A is hermitian, A = A ,
and A|ai = a|ai. Then
ha|A|ai = aha|ai ,
ha|A|ai = ha|A |ai = ha|A|ai = a ha|ai ,

(48)

as the scalar product ha|ai is real. So a = a , or hermitian operators have real eigenvalues.
If an hermitian operator has two different eigenvalues corresponding to two different eigenvectors, these eigenvectors must be orthogonal to each other, i.e., their scalar product must be zero.
Suppose for an hermitian operator A, A|ai = a|ai and A|bi = b|bi. So,
hb|A|ai = ahb|ai ,
ha|A|bi = hb|A |ai = hb|A|ai = bha|bi

hb|A|ai = bhb|ai ,

(49)

using the fact that b is real and ha|bi = hb|ai. Subtracting one from the other, and noting that
a 6= b, we get ha|bi = 0, or they are orthogonal to each other.

Matrices

An m n matrix A has m rows and n columns, and the ij-th element Aij lives in the i-th row and
the j-th column. Thus, 1 i m and 1 j n. If m = n, A is called a square matrix.
The sum of two matrices A and B is defined only if they are of same dimensionality, i.e.,
both have equal number of rows and equal number of columns. In that case, C = A + B means
Cij = Aij + Bij for every pair (i, j).
The inner product C = AB is defined if and only if the number of columns of A is equal to the
number of rows of B. In this case, we write
C = AB = Cij =

n
X

Aik Bkj ,

(50)

k=1
5

Although we have used the cartesian symbols x, y, z, the angular momentum operators can act on a completely
different space.

12

one can also drop the explicit summation sign using the Einstein convention for repeated indices.
If A is an m n matrix, and B is an n p matrix, C will be an m p matrix. Only if m = p,
both AB and BA are defined. They are of same dimensionality if m = n = p, i.e., both A and B
are square matrices. Even if the product is defined both way, they need not commute; AB is not
necessarily equal to BA, and in this respect matrices differ from ordinary numbers, whose products
always commute.
The direct, outer, or Kr
onecker product of two matrices is defined as follows. If A is an m m
matrix and B is an n n matrix, then the direct product C = A B is an mn mn matrix with
elements Cpq = Aij Bkl , where p = m(i 1) + k and q = n(j 1) + l. For example, if A and B are
both 2 2 matrices,
a11 b11
a11 b21
=
a21 b11
a21 b12

AB=

a11 B a12 B
a21 B a22 B

a11 b12
a11 b22
a21 b12
a21 b22

a12 b11
a12 b21
a22 b11
a22 b12

a12 b12
a12 b22
.
a22 b12
a22 b22

(51)

A row matrix R of dimensionality 1 m has only one row and m columns. A column matrix C
of dimensionality m 1 similarly has only one column but m number of rows. Here, both RC and
CR are defined; the first is a number (or a 1 1 matrix), and the second an m m square matrix.
The unit matrix of dimension n is an n n square matrix whose diagonal entries are 1 and all
other entries are zero: 1ij = ij . The unit matrix commutes with any other matrix: A1 = 1A = A,
assuming that the product is defined both way (so that A is also a square matrix of same dimension).
From now on, unless mentioned explicitly, all matrices will be taken to be square ones.
If two matrices P and Q satisfy PQ = QP = 1, P and Q are called inverses of each other, and
we denote Q by P1 . It is easy to show that left and right inverses are identical, the proof is along
the same line as the proof for linear operators.
The necessary and sufficient condition for the inverse of a matrix A to exist is a nonzero determinant: det A 6= 0. The matrices with zero determinant are called singular matrices and do not
have an inverse. Note that for a square array

a1

b1

c
1

a2
b2
c2

a3

b3
c3

the determinant is defined as ijk... ai bj ck ..., where ijk... is an extension of the usual Levi-Civita
symbol: +1 for an even permutation of (i, j, k, ...) = (1, 2, 3, ...), 1 for an odd permutation, and 0
if any two indices are repeated.
If we strike out the i-th row and the j-th column of the n n determinant, the determinant of
the reduced (n 1) (n 1) matrix is called the ij-th minor of the original matrix. For example,
if we omit the first row (with ai ) and one of the columns in turn, we get the M1j minors. The
determinant Dn for this n n matrix can also be written as
Dn =

n
X

(1)1+j aj M1j .

(52)

j=1

If the i-th row is omitted, the first factor would have been (1)i+j .
As A1 A = 1, (det A1 ) (det A) = 1, as unit matrices of any dimension always have unit
determinant.
13

A similarity transformation on a matrix A is defined by


A0 = R1 AR .

(53)

Similarity transformations keep the determinant invariant, as


det A0 = det (R1 AR) = det R1 det A det R = det A .

(54)

Another thing that remains invariant under a similarity transformation is the trace of a matrix,
P
which is just the algebraic sum of the diagonal elements: tr A = i Aii . Even if A and B do not
commute, their traces commute, as tr (AB) = tr A tr B = tr (BA). It can be generalized: the trace
of the product of any number of matrices remains invariant under a cyclic permutation of those
matrices. The proof follows from the definition of trace, and the product of matrices:
tr (ABC P) =

i,j,k,l...p

(ABC P)ii =

Aij Bjk Ckl Ppi .

(55)

All the indices are summed over, so we can start from any point; e.g., if we start from the index k,
we get the trace as tr (C PAB).
Note that this is valid only if the matrices are finite-dimensional. For infinite-dimensional matrices,
tr (AB) need not be equal to tr (BA). A good example can be given from quantum mechanics. One can write
both position and momentum operators, x and p, as infinite-dimensional matrices. The uncertainty relation,
written in the form of matrices, now reads [x, p] = ih1. The trace of the right-hand side is definitely nonzero;
in fact, it is infinity because the unit matrix is infinite-dimensional. The trace of the left-hand side is also
nonzero, as tr (xp) 6= tr (px); they are infinite-dimensional matrices too.

Under a similarity transformation (53),


tr A0 = tr (R1 AR) = tr (RR1 A) = tr (1A) = tr A .

(56)

Now, some definitions.


A diagonal matrix Ad has zero or nonzero entries along the diagonal, but necessarily zero
entries in all the off-diagonal positions. Two diagonal matrices always commute. Suppose the
ii-th entry of Ad is ai and the jj-th entry of Bd is bj . Then
(Ad Bd )ik = (Ad )ij (Bd )jk = ai ij bj jk = ai bi ,

(57)

as the product is nonzero only when i = j = k. We get an identical result for (Bd Ad )ik , so
they always commute. A diagonal matrix need not commute with a nondiagonal matrix.
The complex conjugate A of a matrix A is given by (A )ij = (Aij ) , i.e., by simply taking
the complex conjugate of each entry. A need not be diagonal.
The transpose AT of a matrix A is given by (AT )ij = Aji , i.e., by interchanging the row and
the column. The transpose of an m n matrix is an n m matrix; the transpose of a row
matrix is a column matrix, and vice versa. We have
(AB)Tij = (AB)ji = Ajk Bki = BTik ATkj = (BT AT )ij ,

(58)

or (AB)T = BT AT .
The hermitian conjugate A of a matrix A is given by Aij = (Aji ) , i.e., by interchanging the
row and the column entries and then by taking the complex conjugate (the order of these
operations does not matter). If A is real, A = AT .
14

Q. Show that (AB) = B A .


Q. If two matrices anticommute, i.e., AB = BA, show that their product has trace zero.
Q. Show that for each of the three Pauli matrices


1 =

0
1

1
0

2 =

0 i
i 0

3 =

i1 = i . What are the hermitian conjugates of these matrices?


Q. Show that


1

exp
i2 = cos + i2 sin .
2
2
2

1 0
0 1

(59)

Q. The Pauli matrices satisfy [i , j ] = 2iijk k and {i , j } = 2ij . Show that for any two vectors
A and B,
(~ .A) (~ .B) = A.B + i~ .(A B) .
(60)

4.1

Some Special Matrices

Some more definitions, valid for square matrices only.


1. If a real matrix O is the inverse of its transpose OT , i.e., OT = O1 , or OOT = OT O = 1, it
is called an orthogonal matrix.
2. If a complex matrix U is the inverse of its hermitian conjugate U , i.e., U = U1 , or
UU = U U = 1, it is called a unitary matrix.

3. If a matrix H is equal to its hermitian conjugate, i.e., H = H , it is called an hermitian


matrix. As you can see, all the three Pauli matrices are hermitian.
4. If Sij = Sji , i.e., S = ST , it is called a symmetric matrix. For a symmetric matrix of
dimensionality n n, the 12 n(n 1) entries above the diagonal are identical to the 21 n(n 1) entries
below the diagonal. Thus, a symmetric matrix has only n2 12 n(n 1) = 21 n(n + 1) independent
entries. If the matrix is complex, we have to multiply by a factor of 2 to get the number of
independent elements.
5. If Aij = Aji , i.e., A = AT , it is called an antisymmetric matrix. For an antisymmetric
matrix of dimensionality n n, the 21 n(n 1) entries above the diagonal are the algebraic opposites
to the 21 n(n 1) entries below the diagonal. The diagonal entries are obviously all zero. Thus,
an antisymmetric matrix has only 12 n(n 1) independent entries; multiply by 2 if the entries are
complex.
Any matrix P can be written as a sum of a symmetric and an antisymmetric matrix. S = P + PT
is obviously symmetric, and A = P PT is antisymmetric, so P can be written as 21 (S + A).
The n n orthogonal matrix O has 21 n(n 1) independent elements. We have n2 elements to
start with, but the condition
OOTij = Oik OTkj = Oik Ojk = ij
(61)
gives several constraints. There are n such equations with the right-hand side equal to 1, which
P
look like k O21k = 1 for i = j = 1, and so on. There are n C2 = 12 n(n 1) conditions with the
right-hand side equal to zero, which look like
X

O1k O2k = 0 .

(62)

Thus, the total number of independent elements is n2 n 12 n(n 1) = 12 n(n 1). Note that
OT O = 1 does not give any new constraints; it is just the transpose of the original equation.
15

Rotation in an n-dimensional space is nothing but transforming a vector by operators which can
be represented (we are yet to come to the exact definition of representation) by n n orthogonal
matrices, with 12 n(n 1) independent elements, or angles. Thus, a 2-dimensional rotation can be
parametrized by only one angle; a 3-dimensional rotation by three, which are known as Eulerian
angles 6 .
One can have an identical exercise for the n n unitary matrix U. We start with 2n2 real
elements, as the entries are complex numbers. The condition
UUij = Uik Ukj = Uik Ujk = ij

(63)

gives the constraints. There are again n such equations with the right-hand side equal to 1, which
look like
X
|U1k |2 = 1
(64)
k

for i = j = 1, and so on. All entries on the left-hand side are necessarily real. There are n C2 =
1
2 n(n 1) conditions with the right-hand side equal to zero, which look like
X

U1k U2k = 0 .

(65)

However, the entries are complex, so a single such equation is actually two equations, for the real
and the imaginary parts. Thus, the total number of independent elements is 2n2 nn(n1) = n2 .
Again, U U = 1 does not give any new constraints; it is just the hermitian conjugate of the original
equation.

4.2

Representation

Suppose we have an orthonormal basis |ii, so that any vector |ai can be written as in (22). If the
space is n-dimensional, one can express these basis vectors as n-component column matrices, with
all entries equal to zero except one, which is unity. For example, in a 3-dimensional space, one can
write the orthonormal basis vectors as
1
0
0
|1i = 0 , |2i = 1 , |3i = 0 .
0
0
1

(66)

Of course there is nothing sacred about the orthonormal basis, but it makes the calculation easier.
The vector |ai can be expressed as

a1

|ai = a2 .
(67)
a3
Consider an operator A that takes |ai to |bi, i.e., A|ai = |bi. Obviously |bi has same dimensionality
as |ai, and can be written in a form similar to (67). The result is the same if we express the operator
A as an n n matrix A with the following property:
Aij aj = bi .

(68)

We now call the matrix A a representation of the operator A, and the column matrices a, b
representations of vectors |ai and |bi respectively.
Examples:
6

There is a conventional choice of Eulerian angles, but it is by no means unique.

16

1. In a two-dimensional place, suppose A|1i = |1i and A|2i = |2i. Then a11 = 1, a22 = 1,
a12 = a21 = 0, so that


1 0
A=
.
(69)
0 1
2. In a three-dimensional space, take A|1i = |2i, A|2i = |3i, A|3i = |1i. Thus, a21 = a32 =
a13 = 1 and the rest entries are zero, and
0

A= 1
0

0
0
1

1
0 .
0

(70)

3. Suppose the Hilbert space is 2-dimensional (i.e., the part of the original infinite-dimensional
space in which we are interested) and the operator A acts like A|1 i = 12 [|1 i + |2 i] and A|2 i =

1 [|1 i + |2 i]. Thus, a11 = a22 = a12 = a21 = 1/ 2.


2

4.3

Eigenvalues and Eigenvectors, Again

If there is a square matrix A and a column matrix a such that Aa = a, then a is called an
eigenvector of A and is the corresponding eigenvalue. Again, this is exactly the same that we got
for operators and vectors, eq. (45).
A square matrix can be diagonalized by a similarity transformation: Ad = RAR1 . For a diagonal matrix, the eigenvectors are just the orthonormal basis vectors, with the corresponding diagonal
entries as eigenvalues. (A note of caution: this is strictly true only for non-degenerate eigenvalues,
i.e., when all diagonal entries are different. Degenerate eigenvalues pose more complication which
will be discussed later.) If the matrix A is real symmetric, it can be diagonalized by an orthogonal
transformation, i.e., R becomes an orthogonal matrix. If A is hermitian, it can be diagonalized by
a unitary transformation:
Ad = UAU ,
(71)
where U = U1 . While the inverse does not exist if the determinant is zero, even such a matrix
can be diagonalized. However, determinant remains invariant under a similarity transformation, so
at least one of the eigenvalues will be zero for such a singular matrix.
Trace also remains invariant under similarity transformations.
Thus,
it is really easy to find


a b
out the eigenvalues of a 2 2 matrix. Suppose the matrix is
, and the eigenvalues are 1
c d
and 2 . We need to solve two simultaneous equations,
1 2 = ad bc , 1 + 2 = a + d ,

(72)

and that gives the eigenvalues.


We can find the eigenvalues by inspection for some special cases in higher-dimensional matrices
too. Consider, for example, the matrix
1
A = 1
1

1
1
1

1
1 .
1

(73)

The determinant is zero (all minors are zero for A) and there must be at least one zero eigenvalue.
How to know how many eigenvalues are actually zero?

17

Suppose the ij-th element of an n n matrix A is denoted by aij . If the system of equations
a11 x1 + a12 x2 + aan xn = 0 ,
b11 x1 + b12 x2 + ban xn = 0 ,

n11 x1 + n12 x2 + nan xn = 0 ,

(74)

have only one unique solution then the equations are linearly independent and the matrix is nonsingular, i.e., det A is nonzero. In this case no eigenvalue can be zero, and the matrix is said to be
of rank n.
If one of these equations can be expressed as a linear combination of the others, then no unique
solution of (74) is possible. The determinant is singular, i.e., A1 does not exist, and one of
the eigenvalues is zero. If there are m linearly dependent rows (or columns) on n m linearly
independent rows (or columns), the matrix is said to be of rank n m, and there are m number of
zero eigenvalues.
Only one row of (73) is independent; the other two rows are identical, so linearly dependent,
and the rank is 1. Therefore, two of the eigenvalues are zero. The trace must be invariant, so the
eigenvalues are (0, 0, 3).


The eigenvectors are always arbitrary up to an overall sign. Consider the matrix A =

1
1

1
.
1

The secular equation is



1

1

1
= 0,
1

(75)

which boils down to ( 2) = 0, so the eigenvalues are 0 and 2 (this can be checked just by
looking at the determinant and trace, without even caring about the secular equation). For = 0,
the equation of the eigenvector is


10
1
1
10

 

x
y

= 0,

(76)

or x + y = 0. Thus, we can choose the normalized eigenvector as (1/ 2, 1/ 2), but we could have
taken the minus sign in the first component
too.
eigenvector corresponding

Similarly,
the second

to = 2 or x y = 0 can either be (1/ 2, 1/ 2) or (1/ 2, 1/ 2).




2 7
will have a zero eigenvalue? What is the other
6 x
eigenvalue? Show that in this case the second row is linearly dependent on the first row.
Q. What is the rank of the matrix whose eigenvalues are (i) 2, 1, 0; (ii) 1, 1, 2, 2; (iii) i, i, 0, 0?
Q. The 3 rows of a 3 3 matrix are (a, b, c); (2a, b, c); and (6a, 0, 4c). What is the rank of this
matrix?
Q. Write down the secular equation for the matrix A for which a12 = a21 = 1 and the other elements
are zero. Find the eigenvalues and eigenvectors.

Q. For what value of x the matrix

4.4

Degenerate Eigenvalues

If the eigenvalues of a matrix (or an operator) are degenerate, the eigenvectors are not unique.
Consider the operator A with two eigenvectors |xi and |yi having the same eigenvalue a, so that
A|xi = a|xi ,

A|yi = a|yi .

18

(77)

Any linear combination of |xi and |yi will have the same eigenvalue. Consider the combination
|mi = |xi + |yi, for which
A[|xi + |yi] = (A|xi) + (A|yi) = a[|xi + |yi] = a|mi .

(78)

Thus one can take any linearly independent combination of the basis vectors for which the eigenvalues are degenerate (technically, we say the basis vectors that span the degenerate subspace) and
those new vectors are equally good as a basis. One can, of course, find an orthonormal basis too
using the Gram-Schmidt method. The point to remember is that if a matrix, or an operator, has
degenerate eigenvalues, the eigenvectors are not unique.
Examples:
1. The unit matrix in any dimension has all degenerate eigenvalues, equal to 1. The eigenvectors
can be chosen to be the standard orthonormal set, with one element unity and the others zero.
But any linear combination of them is also an eigenvector. But any vector in that LVS is a linear
combination of those orthonormal basis vectors, so any vector is an eigenvector of the unit matrix,
with eigenvalue 1, which is obvious: 1|ai = |ai.


2. Suppose the matrix A =




a
c

b
d

has eigenvalues 1 and 2 , and the eigenvectors

p1
q1

and

p2
. The matrix A + 1 must have the same eigenvectors, as they are also the eigenvectors of the
q2
2 2 unit matrix 1. The new eigenvalues, 1 and 2 , will satisfy
1 + 2 = (a + 1) + (d + 1) = a + d + 2 = 1 + 2 + 2 ,
1 2 = (a + 1)(d + 1) bc = (ad bc) + a + d + 1 = 1 2 + 1 + 2 + 1 ,

(79)

whose obvious solutions are 1 = 1 + 1, 2 = 2 + 1, as it should be.


3. Consider the matrix

1
A = 0
0

0
0
1

0
1 ,
0

(80)

for which the secular equation is ( 1)(2 1) = 0, so that the three eigenvectors are 1, 1,
and 1. First, we find the eigenvector for the non-degenerate eigenvalue 1, which gives x = 0 and
y + z = 0. So a suitably normalized eigenvector is
0
1
|1i = 1 .
2 1

(81)

For = 1, the only equation that we have is y z = 0 and there are infinite possible ways to solve
this equation. We can just pick up a suitable choice:
0
1
|2i = 1 .
2 1

(82)

The third eigenvector, if we want the basis to be orthonormal, can be found by the Gram-Schmidt
method. Another easy way is to have the cross product of these two eigenvectors, and we find
h3| = (1, 0, 0).

19

4.5

Functions of a Matrix: The Cayley-Hamilton Theorem

One can write a function of a square matrix just as one wrote the functions of operators. In fact,
to a very good approximation, what goes for operators goes for square matrices too. Thus, if |ai is
an eigenvector of A with eigenvalue a, then A2 |ai = a2 |ai and An |ai = an |ai.
Suppose A is some n n matrix. Consider the determinant of 1 A, which is a polynomial in
, with highest power of n , and can be written as
det(1 A) = n + cn1 n1 + + c1 + c0 .

(83)

det(1 A) = n + cn1 n1 + + c1 + c0 = 0

(84)

The equation
Eq. (84) is known as the secular or characteristic equation for A. The n roots correspond to n eigenvalues of A. The Cayley-Hamilton theorem states that if we replace by A in (84), the polynomial
in A should be equal to zero:
An + cn1 An1 + + c1 A + c0 = 0 .

(85)

In other words, a matrix always satisfies its characteristic equation.


Proof:
First, a wrong, or bogus proof. It is tempting to write det(A1 A) = det(A A) = det(0) = 0,
so the proof seems to be trivial. This is a bogus proof because (i) A1 is not supposed to be A 1,
and (ii) (84) is an ordinary equation while (85) is a matrix equation, i.e., a sum of n2 equations,
so they cannot be compared as such.
Now, the actual proof 7 . Suppose |ai is an eigenvector of A with eigenvalue a. Obviously, the
characteristic equation (84) is satisfied for = a. Applying the left-hand side of (85) on |ai, we get
h

An + cn1 An1 + + c1 A + c0 |ai = an + cn1 an1 + + c1 a + c0 |ai = 0 ,

(86)

from (84). This is true for all eigenvectors, so the matrix polynomial must identically be zero.
see what we exactly mean by the Cayley-Hamilton theorem, consider the matrix A =
 To 
a
c

b
. The characteristic equation is
d

a

c

b
= 2 (a + d) + (ad bc) = 0 .
d

(87)

If we replace by A, we get a matrix polynomial




a
c

b
d



a
c

b
d

(a + d)

a
c

b
d

+ (ad bc)

1
0

0
1

(88)

and it is straightforward to check that this is a 2 2 null matrix.


Examples:
1. Suppose the matrix A satisfies A2 5A + 4 = 0, where 4 is 4 times the unit matrix. The
characteristic equation is then 2 5 + 4 = 0, so that the two eigenvalues are 1 and 4.
2. The Pauli matrices satisfy i2 = 1, so the eigenvalues must be either +1 or 1.
7

This is not actually a watertight proof, but will do for us.

20

You might also like