You are on page 1of 491

Benjamin McKay

Linear Algebra

January 2, 2008

Preface
Up close, smooth things look atthe picture behind dierential calculus. In mathematical language, we can approximate smoothly varying functions by linear functions. In calculus of several variables, the resulting linear functions can be complicated: you need to study linear algebra. Problems appear throughout the text, which you must learn to solve. They often provide vital results used in the course. Most of these problems have hints, particularly the more important ones. There are also review problems at the end of each section, and you should try to solve a few from each section. Try to solve each problem rst before looking up the hint. Never use decimal approximations (for instance, from a calculator) on any problem, except to check your work; many problems are very sensitive to small errors and must be worked out precisely. Whenever ridiculously large numbers appear in the statement of a problem, this is a hint that they must play little or no role in the solution. The prerequisites for this course are basic arithmetic and elementary algebra, typically learned in high school, and some comfort and facility with proofs, particularly using mathematical induction. You cant prove that all men are wearing hats just by pointing out one example of a man in a hat; most proofs require an argument, and not just examples. Polya [9] and Solow [12] explain induction and provide help with proofs. Bretscher [2] and Strang [15] are excellent introductory textbooks of linear algebra.

iii

Contents
Matrix Calculations
1 2 3 4 5 6 7 Solving Linear Equations Matrices Important Types of Matrices Elimination Via Matrix Arithmetic Finding the Inverse of a Matrix The Determinant The Determinant via Elimination 3 17 25 35 43 51 61

Bases and Subspaces


8 Span 9 Bases 10 Kernel and Image 71 81 91

Eigenvectors
11 Eigenvalues and Eigenvectors 12 Bases of Eigenvectors 101 109

Orthogonal Linear Algebra


13 Inner Product 14 The Spectral Theorem 15 Complex Vectors 119 135 149

Abstraction
16 Vector Spaces 17 Fields 161 173

Geometry and Combinatorics


18 19 20 21 Permutations and Determinants Volume and Determinants Geometry and Orthogonal Matrices Orthogonal Projections 181 189 193 201

Jordan Normal Form


22 Direct Sums of Subspaces iv 207

Contents

v 23 24 25 26 27 Jordan Normal Form Decomposition and Minimal Polynomial Matrix Functions of a Matrix Variable Symmetric Functions of Eigenvalues The Pfaan 213 225 235 245 251

Factorizations
28 Dual Spaces and Quotient Spaces 29 Singular Value Factorization 30 Factorizations 265 273 279

Tensors
31 32 33 34 Quadratic Forms Tensors and Indices Tensors Exterior Forms 285 295 303 313 317 477 479 481

A Hints Bibliography List of Notation Index

Matrix Calculations

1 Solving Linear Equations


In this chapter, we learn how to solve systems of linear equations by a simple recipe, suitable for a computer.

1.1 Elimination
Consider equations 6 x3 + x2 = 0 3 x1 + 7 x2 + 4 x3 = 9 3 x1 + 5 x2 + 8 x3 = 3. They are called linear because they are sums of constants and constant multiples of variables. How can we solve them (or teach a computer to solve them)? To solve means to nd values for each of the variables x1 , x2 and x3 satisfying all three of the equations.

Preliminaries
a. Line up the variables: x2 x3 = 6 3x1 + 7x2 + 4x3 = 9 3x1 + 5x2 + 8x3 = 3 All of the x1 s are in the same column, etc. and all constants on the right hand side. b. Drop the variables and equals signs, just writing the numbers. 0 1 1 6 3 7 4 9 . 3 5 8 3 This saves rewriting the variables at each step. We put brackets around for decoration. 3

4 c. Draw a box around the entry the pivot. 0 3 3

Solving Linear Equations

in the top left corner, and call that entry 1 7 5 1 4 8 6 9 . 3

Forward elimination
(1) If the pivot is zero, then swap rows with a lower row to get the pivot to be nonzero. This gives 7 4 9 3 1 1 6 . 0 3 5 8 3 (Going back to the linear equations we started with, we are swapping the order in which we write them down.) If you cant nd any row to swap with (because every lower row also has a zero in the pivot column), then move pivot one step to the right and repeat step (1). (2) Add whatever multiples of the pivot row you need to each lower row, in order to kill o every entry under the pivot. (Kill o means make into 0). This requires us to add (row 1) to row 3 to kill o the 3 under the pivot, giving 3 7 4 9 0 1 1 6 . 0 2 4 6 (Going back to the linear equations, we are adding equations together which doesnt change the answerswe could reverse this step by subtracting again.) (3) Make a new pivot one step down and to the right: . 3 0 0 and start again at step (1). In our example, our next pivot, 1, must kill everything beneath it: 2. So we add 2(row 2) to (row 3), giving 3 0 0 7 1 0 4 1 2 9 6 . 6 7 1 2 4 1 4 9 6 . 6

1.1. Elimination

Figure 1.1: Forward elimination on a large matrix. The shaded boxes are nonzero entries, while zero entries are left blank. The pivots are outlined. You can see the rst few steps, and then a step somewhere in the middle of the calculation, and then the nal result.

We are done with that pivot. Move 7 3 1 0 0 0

. 4 1 2 9 6 . 6

Forward elimination is done. Lets turn the numbers back into equations, to see what we have: +7x2 +4x3 =9 3x1 x2 x 3 2x3 =6 =6

Each pivot solves for one variable in terms of later variables. Problem 1.1. Apply forward elimination to 0 0 1 0 1 0 1 0 3 1 1 1

1 1 0 1

Problem 1.2. Apply forward elimination to 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1

Back Substitution
Starting at the last pivot, and working up: a. Rescale the entire row to turn the pivot into a 1.

Solving Linear Equations

Figure 1.2: Back substitution on the large matrix from gure 1.1. You can see the rst few steps, and then a step somewhere in the middle of the calculation, and then the nal result. You can see the pivots turn into 1s.

b. Add whatever multiples of the pivot row you need to each higher row, in order to kill o every entry above the pivot. Applied to our example: 7 4 9 3 1 1 6 , 0 0 0 2 6 Scale row 3 by
1 2

: 3 0 0 7 1 0 4 1 1 9 6 3

Add row 3 to row 2, 4 (row 3) to row 1. 3 7 0 0 1 0 0 0 1 Add 7 (row 2) to row 1.

3 9 3

3 0 0

0 1 0

0 0 1

66 9 3 22 9 3

1 Scale row 1 by 3 .

1 0 0 Done. Turn back into equations:

0 1 0

0 0 1

x1 = 22 x2 = 9 x3 = 3.

1.2. Examples

Denition 1.1. Forward elimination and back substitution together are called GaussJordan elimination or just elimination. (Forward elimination is often called Gaussian elimination.) Remark 1.2. Forward elimination already shows us what is going to happen: which variables are solved for in terms of which other variables. So for answering most questions, we usually only need to carry out forward elimination, without back substitution.

1.2 Examples
Example 1.3 (More than one solution). x1 + x2 + x3 + x4 = 7 x1 + 2x3 =1 x2 + x3 =0 Write down the numbers:

1 1 0

1 0 1

1 2 1

1 0 0

7 1 . 0

Kill everything under the pivot: add 1 1 0 1 0 1 Done with that pivot; move 1 0 0 Kill: add row 2 to row 3: 1 0 0 Move . 1 1 1

(row 1) to row 2. 1 1 7 1 1 6 . 1 0 0

1 1 1

1 1 0

7 6 . 0

1 1 0

1 1 2

1 1 1

7 6 . 6

. Forward elimination is done. Lets 1 1 1 1 1 0 0 0 2

look at where the pivots lie: 1 7 1 6 . 1 6

8 Lets turn back into equations: x 1 +x 2 +x 3 +x 4 = 7 x 4 =6

Solving Linear Equations

x 2 +x 3

2 x 3 x 4 =6

Look: each pivot solves for one variable, in terms of later variables. There was never any pivot in the x4 column, so x4 is a free variable : x4 can take on any value, and then we just use each pivot to solve for the other variables, bottom up. Problem 1.3. Back substitute to nd the values of x1 , x2 , x3 in terms of x4 . Example 1.4 (No solutions). Consider the equations x1 + x2 + x3 = 1 2 x1 + x2 + x3 = 0 4x1 + 3x2 + 3x3 = 1. Forward eliminate: 1 2 4 1 1 3 2 1 5 1 0 1

Add 2(row 1) to row 2, 4(row 1) to row 3. 1 0 0 Move the pivot . 1 0 0 Add (row 2) to row 3. 1 0 0 1 1 0 2 3 0 1 2 1 1 1 1 2 3 3 1 2 3 1 1 1 2 3 3 1 2 3

1.3. Summary

9 . 1 0 0 1 1 0 2 3 0 1 2 1

Move the pivot

Move the pivot . 1 0 0 1 1 0 2 3 0 1 2 1

Turn back into equations: x1 + x2 + 2 x3 = 1 x 2 3 x 3 = 2 0 = 1. You cant solve these equations: 0 cant equal 1. So you cant solve the original equations either: there are no solutions. Two lessons that save you time and eort: a. If a pivot appears in the constants column, then there are no solutions. b. You dont need to back substitute for this problem; forward elimination already tells you if there are any solutions.

1.3 Summary
We can turn linear equations into a box of numbers. Start a pivot at the top left corner, swap rows if needed, move if swapping wont work, kill o everything under the pivot, and then make a new pivot from the last one. After forward elimination, we will say that the resulting equations are in echelon form (often called row echelon form ). The echelon form equations have the same solutions as the original equations. Each column except the last (the column of constants) represents a variable. Each pivot solves for one variable in terms of later variables (each pivot binds a variable, so that the variable is not free). The original equations have no solutions just when the echelon equations have a pivot in the column of constants. Otherwise there are solutions, and any pivotless column (besides the column of constants) gives a free variable (a variable whose value is not xed by the equations). The value of any free variable can be picked as we like. So if there are solutions, there is either only one solution (no free variables), or there are innitely many solutions (free variables). Setting free variables to dierent values gives dierent solutions. The number of pivots is called the rank. Forward elimination makes the pattern of pivots clear; often we dont need to back substitute.

10

Solving Linear Equations

Remark 1.5. We often encounter systems of linear equations for which all of the constants are zero (the right hand sides). When this happens, to save time we wont write out a column of constants, since the constants would just remain zero all the way through forward elimination and back substitution. Problem 1.4. Use elimination to solve the linear equations 2 x2 + x3 = 1 4 x1 x2 + x3 = 2 4 x1 + 3 x2 + 3 x3 = 4

1.3 Review Problems


Problem 1.5. Apply forward elimination 2 0 1 0 0 2 to 2 0 2

Problem 1.6. Apply forward elimination to 1 1 1 1 1 1 1 1 0 Problem 1.7. Apply forward elimination to 1 2 2 1 2 2 2 0 0 Problem 1.8. Apply forward elimination 0 0 1 0 1 1 to 1 1 1

1 2 1

Problem 1.9. Apply forward elimination to 0 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1

0 0 0 1

1.3. Summary

11

Problem 1.10. Apply forward elimination to 1 3 2 6 2 5 4 1 3 8 6 7 Problem 1.11. Apply back substitution to the result of problem 1.2 on page 5.

Problem 1.12. Apply back substitution to 1 1 0 0 2 0 0 0 1 Problem 1.13. Apply back substitution to 1 0 1 0 1 1 0 0 0 Problem 1.14. Apply back substitution to 2 1 1 0 3 1 0 0 0 Problem 1.15. Apply back substitution to 3 0 2 2 0 0 0 0 3 0 0 0

2 1 2 2

Problem 1.16. Use elimination to solve the linear equations x1 + 2 x2 + x3 + x4 = 1 x 1 + 2 x 2 + 2 x 3 + x 4 = 0 x3 + 2 x4 = 0 x4 = 2

12

Solving Linear Equations

Problem 1.17. Use elimination to solve the linear equations x1 + 2 x2 + 3 x3 + 4 x4 = 5 2 x1 + 5 x2 + 7 x3 + 11 x4 = 12 x2 + x3 + 4 x4 = 3

Problem 1.18. Use elimination to solve the linear equations 2 x 1 + x 2 + x 3 + x 4 = 0 x1 2 x2 + x3 + x4 = 0 x1 + x2 2 x3 + x4 = 0 x1 + x2 + x3 2 x4 = 0

Problem 1.19. Write down the simplest example you can to show that adding one to each entry in a row can change the answers to the linear equations. So adding numbers to rows is not allowed. Problem 1.20. Write down the simplest systems of linear equations you can come up with that have a. One solution. b. No solutions. c. Innitely many solutions.

Problem 1.21. If all of the constants in some linear equations are zeros, must the equations have a solution?
1 2 Problem 1.22. Draw the two lines 1 2 x1 x2 = 2 and 2 x1 + x2 = 3 in R . In your drawing indicate the points which satisfy both equations.

1.3. Summary

13

Problem 1.23. Which pair of equations cuts out which pair of lines? How many solutions does each pair of equations have? x1 x2 = 0 x1 + x2 = 1 x1 x2 = 4 2 x 1 + 2 x 2 = 1 x1 x2 = 1 3 x 1 + 3 x 2 = 3 (3) (2) (1)

(a)

(b)

(c)

Problem 1.24. Draw the two lines 2 x1 + x2 = 1 and x1 2 x2 = 1 in the x1 x2 -plane. Explain geometrically where the solution of this pair of equations lies. Carry out forward elimination on the pair, to obtain a new pair of equations. Draw the lines corresponding to each new equation. Explain why one of these lines is parallel to one of the axes. Problem 1.25. Find the quadratic function y = ax2 + bx + c which passes through the points (x, y ) = (0, 2), (1, 1), (2, 6). Problem 1.26. Give a simple example of a system of linear equations which has a solution, but for which, if you alter one of the coecients by a tiny amount (as tiny as you like), then there is no solution. Problem 1.27. If you write down just one linear equation in three variables, like 2x1 + x2 x3 = 1, the solutions draw out a plane. So a system of three linear equations draws out three dierent planes. The solutions of two of the equations lie on the intersections of the two corresponding planes. The solutions of the whole system are the points where all three planes intersect. Which system of equations in table 1.1 on the next page draws out which picture of planes from gure 1.3 on page 15?

14

Solving Linear Equations

x1 x2 + 2 x3 = 2 2 x1 + 2 x2 + x3 = 2 3 x 1 + 3 x 2 x 3 = 0 x 1 x 3 = 0 x1 2 x2 x3 = 0 2 x 1 2 x 2 2 x 3 = 1 x1 + x2 + x3 = 1 x1 + x2 + x3 = 0 x 1 + x 2 + x 3 = 1 2 x1 + x2 + x3 = 2 2 x 1 x 2 + 2 x 3 = 0 4 x 2 + 2 x 3 = 4 2 x2 x3 = 0 x1 x2 x3 = 1 3x1 3x2 3x3 = 0 Table 1.1: Five systems of linear equations

(1)

(2)

(3)

(4)

(5)

1.3. Summary

15

(a)

(b)

(c)

(d)

(e)

Figure 1.3: When you have three equations in three variables, each one draws a plane. Solutions of a pair of equations lie where their planes intersect. Solutions of all three equations lie where all three planes intersect.

2 Matrices
The boxes of numbers we have been writing are called matrices. Lets learn the arithmetic of matrices.

2.1 Denitions
Denition 2.1. A matrix is a nite box A of numbers, arranged in rows and columns. We write it as A11 A12 ... A1q A21 A22 ... A2q A= . . . . . . . . . . . . Ap 1 Ap 2 ... Apq and say that A is p q if it has p rows and q columns. If there are as many rows as columns, we will say that the matrix is square. Remark 2.2. A31 is in row 3, column 1. If we have 10 or more rows or columns (which wont happen in this book), we might write A1,1 instead of A11 . For example, we can distinguish A11,1 from A1,11 . Denition 2.3. A matrix x with only one column is called a vector and written x1 x2 . x= . . . xn The collection of all vectors with n real number entries is called Rn . Think of R2 as the xy -plane, writing each point as x y instead of (x, y ). We draw a vector, for example the vector 2 , 3 17

18

Matrices

Figure 2.1: Echelon form: a staircase, each step down by only 1, but across to the right by maybe more than one. The pivots are the steps down, and below them, in the unshaded part, are zeros.

as an arrow, pointing out of the origin, with the arrow head at the point x = 2, y = 3 :

Similarly, think of R3 as 3-dimensional space. Problem 2.1. Draw the vectors: 1 , 0 0 , 1 1 , 1 2 , 1 3 , 1 4 , 2 4 . 2

We draw vectors either as dots or more often as arrows. If there are too many vectors, pictures of arrows can get cluttered, so we prefer to draw dots. Sometimes we distinguish between points (drawn as dots) and vectors (drawn as arrows), but algebraically they are the same objects: columns of numbers.

2.1 Review Problems


Problem 2.2. Find points in R3 which form the vertices of a regular (a) cube (b) octahedron (c) tetrahedron.

2.2 Echelon Form


Denition 2.4. A matrix is in echelon form if (as in gure 2.1) each row is either all zeros or starts with more zeros than any earlier row. The rst nonzero entry of each row is called a pivot.

2.3. Matrices in Blocks

19

Problem 2.3. Draw dots where the pivots are in gure 2.1 on the facing page.

Problem 2.4. Give the simplest examples you can of two matrices which are not in echelon form, each for a dierent reason.

Problem 2.5. The entries A11 , A22 , . . . of a square matrix A are called the diagonal. Prove that every square matrix in echelon form has all pivots lying on or above the diagonal. Problem 2.6. Prove that a square matrix in echelon form has a zero row just when it is either all zeroes or it has a pivot above the diagonal. Problem 2.7. Prove that a square matrix in echelon form has a column with no pivot just when it has a zero row. Thus all diagonal entries are pivots or else there is a zero row. Theorem 2.5. Forward elimination brings any matrix to echelon form, without altering the solutions of the associated linear equations. Obviously proof is by induction, but the result is clear enough, so we wont give a proof.

2.2 Review Problems


Problem 2.8. In one colour, draw the locations of the pivots, and in another draw the staircase (as in gure 2.1 on the preceding page) for the matrices 1 2 0 1 0 1 0 A= , B = 0 3 , C = 0 0 ,D= 1 0 0 0 0 1 0 0

0 .

2.3 Matrices in Blocks


If A= then write A B = 1 3 2 4 5 7 6 8 and A B = 1 3 5 7 2 4 6 8 . 1 3 2 4 and B = 5 7 6 8 ,

(We will often colour various rows and columns of matrices, just to make the discussion easier to follow. The colours have no mathematical meaning.)

20

Matrices

Denition 2.6. Any matrix which has only zero entries will be written 0. Problem 2.9. What could (0 0) mean?

Problem 2.10. What could (A

0) mean if 1 3 2 ? 4

A=

2.4 Matrix Arithmetic


Example 2.7. Add matrices like: A= 1 3 2 ,B = 4 5 7 6 ,A + B = 8 1+5 3+7 2+6 . 4 + 8.

Denition 2.8. If two matrices have matching numbers of rows and columns, we add them by adding their components: (A + B )ij = Aij + Bij . Similarly for subtracting. Problem 2.11. Let A= Find A + B . When we add matrices in blocks, A B + C D = A+C B+D 1 3 2 , B= 4 1 1 2 . 2

(as long as A and C have the same numbers of rows and columns and B and D do as well). Problem 2.12. Draw the vectors u= 2 , v= 1 3 , 1

and the vectors 0 and u + v . In your picture, you should see that they form the vertices of a parallelogram (a quadrilateral whose opposite sides are parallel). Multiply by numbers like: 7 1 3 2 4 = 71 73 72 . 74

2.5. Matrix Multiplication

21

Denition 2.9. If A is a matrix and c is a number, cA is the matrix with (cA)ij = cAij . Example 2.10. Let x= 1 . 2

The multiples x, 2x, 3x, . . . and x, 2x, 3x, . . . live on a straight line through 0: 3x 2x x 0 x 2x 3x

2.5 Matrix Multiplication


Surprisingly, matrix multiplication is more dicult. Example 2.11. To multiply a single row by a single column, just multiply entries in order, and add up: 1 2 3 4 = 1 3 + 2 4 = 3 + 8 = 11.

Put your left hand index nger on the row, and your right hand index nger on the column, and as you run your left hand along, run your right hand down: .

As your ngers travel, you multiply the entries you hit, and add up all of the products. Problem 2.13. Multiply 8 2 1 3

22 Example 2.12. To multiply the matrices A= 1 3 2 ,B = 4 5 7 6 , 8

Matrices

multiply any row of A by any column of B : 1 2 5 7 = 15+27 .

As your left hand nger travels along a row, and your right hand down a column, you produce the entry in that row and column; the second row of A times the rst column of B gives the entry of AB in second row, rst column. Problem 2.14. Multiply 1 2 2 3 1 3 2 4 3 2 3 4

Denition 2.13. We write k in front of an expression to mean the sum for k taking on all possible values for which the expression makes sense. For example, if x is a vector with 3 entries, x1 x = x2 , x3 then
k

xk = x1 + x2 + x3 .

Denition 2.14. If A is p q and B is q r, then AB is the p r matrix whose entries are (AB )ij = k Aik Bkj .

2.5 Review Problems


Problem 2.15. If A is a matrix and x a vector, what constraints on dimensions need to be satised to multiply Ax? What about xA? Problem 2.16. 2 0 A = 2 0 , B = 2 0

0 0

1 , C= 1

2 0

1 1

2 , D= 1

0 2

0 1

Compute all of the following which are dened: AB, AC, AD, BC, CA, CD.

2.5. Matrix Multiplication

23

Problem 2.17. Find some 2 2 matrices A and B with no zero entries for which AB = 0. Problem 2.18. Find a 2 2 matrix A with no zero entries for which A2 = 0.

Problem 2.19. Suppose that we have a matrix A, so that whenever x is a vector with integer entries, then Ax is also a vector with integer entries. Prove that A has integer entries. Problem 2.20. A matrix is called upper triangular if all entries below the diagonal are zero. Prove that the product of upper triangular square matrices is upper triangular, and if, for example A11 A12 A13 A14 ... A1n A22 A23 A24 ... A2n A33 A34 ... A3n . A= , .. .. . . . . . .. . . . Ann (with zeroes under the diagonal) and B11 B= then AB = A11 B11 A22 B22 A33 B33 .. . ... ... ... .. . .. . . . . Ann Bnn . B12 B22 B13 B23 B33 B14 B24 B34 .. . ... ... ... .. . .. . B1n B2n B3n . , . . . . . Bnn

Problem 2.21. Prove the analogous result for lower triangular matrices.

24

Matrices

Algebraic Properties of Matrix Multiplication


Problem 2.22. If A and B matrices, and AB is dened, and c is any number, prove that c(AB ) = (cA)B = A(cB ).

Problem 2.23. Prove that matrix multiplication is associative: (AB )C = A(BC ) (and that if either side is dened, then the other is, and they are equal).

Problem 2.24. Prove that matrix multiplication is distributive: A(B + C ) = AB + AC and (P + Q)R = P R + QR for any matrices A, B, C, P, Q, R (again if one side is dened, then both are and they are equal). Running your nger along rows and columns, you see that blocks multiply like: A etc. Problem 2.25. To make sense of this last statement, what do we need to know about the numbers of rows and columns of A, B, C and D? B C D = AC + BD

3 Important Types of Matrices


Some matrices, or types of matrices, are especially important.

3.1 The Identity Matrix


Dene matrices 1 0 1 0 , I3 = 0 1 0 0 1 0 0 0 , . . . 1

I1 = (1) , I2 =

Denition 3.1. The n n matrix with 1s on the diagonal and zeros everywhere else is called the identity matrix, and written In . We often write it as I to be deliberately ambiguous about what size it is. An equivalent denition: Iij = 1 0 if i = j if i = j.

Problem 3.1. What could I13 mean? (Careful: it has two meanings.) What does I2 mean? Problem 3.2. Prove that IA = AI = A for any matrix A.

Problem 3.3. Suppose that B is an n n matrix, and that AB = A for any n n matrix A. Prove that B = In .

Problem 3.4. If A and B are two matrices and Ax = Bx for any vector x, prove that A = B . Denition 3.2. The columns of In are vectors called e1 , e2 , . . . , en . Problem 3.5. Consider the identity matrix I3 . What are the vectors e1 , e2 , e3 ? 25

26

Important Types of Matrices

Problem 3.6. The vector ej has a one in which row? And zeroes in which rows?

Problem 3.7. If A is a matrix, prove that Ae1 is the rst column of A.

Problem 3.8. If A is any matrix, prove that Aej is the j -th column of A. If A is a p q matrix, by the previous exercise, A = (Ae1 Ae2 . . . Aeq ) . In particular, when we multiply matrices AB = (ABe1 ABe2 . . . ABeq ) (and if either side of this equation is dened, then both sides are and they are equal). In other words, the columns of AB are A times the columns of B . This next exercise is particularly vital: Problem 3.9. If A is a matrix, and x a vector, prove that Ax is a sum of the columns of A, each weighted by entries of x: Ax = x1 (Ae1 ) + x2 (Ae2 ) + + xn (Aen ) .

3.1 Review Problems


Problem 3.10. True or false (if false, give a counterexample): a. If the second column of B is 3 times the rst column, then the same is true of AB . b. Same question for rows instead of columns.

Problem 3.11. Can you nd matrices A and B so that A is 3 5 and B is 5 3, and AB = I ?

Problem 3.12. Prove that the rows of AB are the rows of A multiplied by B .

3.1. The Identity Matrix

27

(a) The original

(b)

(c)

(d)

(e)

Figure 3.1: Faces formed by y = Ax. Each face is centered at the origin.

Problem 3.13. The Fibonacci numbers are the numbers x0 = 1, x1 = 1, xn+1 = xn + xn1 . Write down x0 , x1 , x2 , x3 and x4 . Let A= Prove that xn+1 xn Problem 3.14. Let x= 1 , y= 0 0 . 2 = An 1 . 1 1 1 1 . 0

Draw these vectors in the plane. Let A= 1 0 1 1 0 , B= 1 0 , E= 0 2 0 0 0 0 , C= 3 0 , F = 0 1 0 0 1 1 , 0 1 . 0

D=

For each matrix M = A, B, C, D, E, F draw M x and M y (in a dierent colour for each matrix), and explain in words what each matrix is doing (for example, rotating, attening onto a line, expanding, contracting, etc.). Problem 3.15. The rst picture in gure 3.1 is the original in the x1 , x2 plane, and the center of the circular face is at the origin. If we pick a matrix A and set y = Ax, and draw the image in the y1 , y2 plane, which matrix below draws which picture?

28 2 1 0 , 0 0 1 1 , 0 2 1

Important Types of Matrices

0 , 1

1 1

0 1

Problem 3.16. Can you gure out which matrices give rise to the pictures in the last problem, just by looking at the pictures? Assume that you known that all the entries of each matrix are integers between -2 and 2.

Problem 3.17. What are the simplest examples you can nd of 2 2 matrices A for which taking vectors x to Ax (a) contracts the plane, (b) dilates the plane, (c) dilates one direction, while contracting another, (d) rotates the plane by a right angle, (e) reects the plane in a line, (f) moves the vertical axis, but leaves every point of the horizontal axis where it is (a shear)?

3.2 Inverses
Denition 3.3. A matrix is called square if it has the same number of rows as columns. Denition 3.4. If A is a square matrix, a square matrix B of the same size as A is called the inverse of A and written A1 if AB = BA = I . Problem 3.18. If A= check that A 1 = 1 1 1 . 2 2 1 1 , 1

Problem 3.19. If A, B and C are square matrices, and AB = I and CA = I , prove that B = C . In particular, there is only one inverse (if there is one at all).

Problem 3.20. Which 1 1 matrices have inverses, and what are their inverses? Problem 3.21. By multiplying out the matrices, prove that any 2 2 matrix A= a c b d

3.2. Inverses

29

has inverse A 1 = as long as ad bc = 0. Problem 3.22. If A and B are invertible matrices, prove that AB is invertible, 1 and (AB ) = B 1 A1 . 1 ad bc d c b a

Problem 3.23. Prove that A1

= A, for any invertible square matrix A.

Problem 3.24. If A is invertible, prove that Ax = 0 only for x = 0.

Problem 3.25. If A is invertible, and AB = I , prove that A = B 1 and B = A 1 .

3.2 Review Problems


Problem 3.26. Write down a pair of nonzero 2 2 matrices A and B for which AB = 0. Problem 3.27. If A is an invertible matrix, prove that Ax = Ay just when x = y.

Problem 3.28. If a matrix M splits up into square blocks like M= A 0 B D

explain how to nd M 1 in terms of A1 and D1 . (Warning: for a matrix which splits into blocks like M= A C B D

the inverse of M cannot be expressed in any elementary way in terms of the blocks and their inverses.)

30 Original

Important Types of Matrices

Matrices

Inverse matrices

Figure 3.2: Images coming from some matrices, and from their inverses.

Problem 3.29. Figure 3.2 shows how various matrices (on the left hand side) and their inverses (on the right hand side) aect vectors. But the two columns are scrambled up. Which right hand side picture is produced by the inverse matrix of each left hand side picture?

3.3 Permutation Matrices


Permutations
Denition 3.5. Write down the numbers 1, 2, . . . , n in any order. Call this a permutation of the numbers 1, 2, . . . , n.

3.3. Permutation Matrices

31

Problem 3.30. Suppose that I write down a list of numbers, and (a) they are all dierent and (b) there are 25 of them and (c) all of them are integers between 1 and 25. Prove that this list is a permutation of the numbers 1, 2, . . . , 25. We can draw a picture of a permutation: the permutation 2,4,1,3 is
1 2 3 4 2 4 1 3

In general, write the numbers 1, 2, . . . , n down the left side, and the permutation down the right side, and then connect left to right, 1 to 1, 2 to 2, etc. We dont need to write out the labels, so from now on lets draw 2,4,1,3 as Problem 3.31. Which permutations are , , ?

Inverting a Permutation
Denition 3.6. Flip a picture of a permutation left to right to give the inverse permutation : to . So the inverse of 2, 4, 1, 3 is 3, 1, 4, 2. Problem 3.32. Find the inverses of a. 1,2,4,3 b. 4,3,2,1 c. 3,1,2

Multiplying
We need to write down names for permutations. Write down the numbers in a permutation as p(1), p(2), . . . , p(n), and call the permutation p. If p and q are permutations, write pq for the permutation p(q (1)), p(q (2)), . . . , p(q (n)) (the product of p and q ). So pq scrambles up numbers by rst getting q to scramble them, and then getting p to scramble them. Multiply by drawing the pictures beside each other: . p = , q = , pq = Write the inverse permutation of p as p1 . So pp1 = p1 p = 1, where 1 (the identity permutation ) means the permutation that just leaves all of the numbers where they were: 1, 2, . . . , n. Problem 3.33. Let p be 2, 3, 1, 4 and q be 1, 4, 2, 3. What is pq ?

Problem 3.34. Write down two permutations p and q for which pq = qp.

32

Important Types of Matrices

Transpositions
Denition 3.7. A transposition is a permutation which swaps precisely two numbers, leaving all of the others alone. For example, is a transposition.

Problem 3.35. Prove that every permutation is a product of transpositions.

Problem 3.36. Draw each of the following permutations as a product of transpositions: (a) 3,1,2 (b) 4,3,2,1 (c) 4,1,2,3 Flipping the picture of a transposition left to right gives the same picture: a transposition is its own inverse. When a permutation is written as a product of transpositions, its inverse is written as the same transpositions, taken in reverse order. Problem 3.37. Prove that every permutation is a product of transpositions swapping successive numbers, i.e. swapping 1 with 2, or 2 with 3, etc.

Permutation Matrices
Problem 3.38. Let P = 0 1 1 . 0

Prove that, for any vector x in R2 , P x is x with rows 1 and 2 swapped. Denition 3.8. The permutation matrix associated to a permutation p of the numbers 1, 2, . . . , n is P = ep(1) ep(2) ... ep(n) .

Example 3.9. Let p be the permutation 3, 1, 2. The permutation matrix of p is 0 1 0 P = e3 e1 e2 = 0 0 1 . 1 0 0 Problem 3.39. What is the permutation matrix P associated to the permutation 4, 2, 5, 3, 1?

3.3. Permutation Matrices

33

Problem 3.40. What permutation p has permutation matrix 0 0 1 0 0 0 0 1 ? 0 1 0 0 0 0 0 1 Problem 3.41. What is the permutation matrix of the identity permutation? Problem 3.42. Prove that a matrix is the permutation matrix of some permutation just when a. its entries are all 0s or 1s and b. it has exactly one 1 in each column and c. it has exactly one 1 in each row.

Lemma 3.10. Let P be the permutation matrix associated to a permutation p. Then for any vector x, P x is just x with the x1 entry moved to row p(1), the x2 entry moved to row p(2), etc. Remark 3.11. All we need to remember is that P x is x with rows permuted somehow, and that we could permute the rows of x any way we want by choosing a suitable permutation matrix P . We will never have to actually nd the permutation. Proof. Px = P
j

xj ej xj P ej

=
j

=
j

xj ep(j ) .

Proposition 3.12. Let P be the permutation matrix associated to a permutation p. Then for any matrix A, P A is just A with the row 1 moved to row p(1), row 2 moved to row p(2), etc. Proof. The columns of P A are just P multiplied by the columns of A. Remark 3.13. It is faster and easier to work directly with permutations than with permutation matrices. Avoid writing down permutation matrices if you can; otherwise you end up wasting time juggling huge numbers of 0s. Replace permutation matrices by permutations.

34

Important Types of Matrices

Problem 3.43. If p and q are two permutations with permutation matrices P and Q, prove that pq has permutation matrix P Q.

3.3 Review Problems


Problem 3.44. Let 0 A = 0 3 1 0 0 2 0 . 0

Which permutation p do we need to make sure that P A is in echelon form, with P the permutation matrix of p? Problem 3.45. Confusing: Prove that the permutation matrix P of a permutation p is the result of permuting the columns of the identity matrix by p, or the rows of the identity matrix by p1 .

Problem 3.46. What happens to a permutation matrix when you carry out forward elimination? back substitution? Problem 3.47. Let p be a permutation with permutation matrix P . Prove that P 1 is the permutation matrix of p1 .

4 Elimination Via Matrix Arithmetic


In this chapter, we will describe linear equations and elimination using only matrix multiplication.

4.1 Strictly Lower Triangular Matrices


Denition 4.1. A square matrix is strictly lower triangular if it has the form 1 1 S= .. . 1 with 1s on the diagonal, 0s above the diagonal, and anything below. Problem 4.1. Let S be strictly lower triangular. Must it be true that Sij = 0 for i > j ? What about for j > i? Lemma 4.2. If S is a strictly lower triangular matrix, and A any matrix, then SA is A with Sij (row j ) added to row i. In particular, S adds multiples of rows to lower rows. Proof. For x a vector, 1 S21 Sx = S31 . . . x1 x2 x3 . .. . . .

1 S32 . . .

1 . . .

x1 x2 + S21 x1 = x3 + S31 x1 + S32 x2 . . . 35

36

Elimination Via Matrix Arithmetic

adds S21 x1 to x2 , etc. If A is any matrix then the columns of SA are S times columns of A. Problem 4.2. Let S be a matrix so that for any matrix A (of appropriate size), SA is A with multiples of some rows added to later rows. Prove that S is strictly lower triangular.

Problem 4.3. Which 3 3 matrix S adds 5 row 2 to row 3, and 7 row 1 to row 2?

Problem 4.4. Prove that if R and S are strictly lower triangular, then RS is too.

Problem 4.5. Say that a strictly lower triangular matrix is elementary if it has only one nonzero entry below the diagonal. Prove that every strictly lower triangular matrix is a product of elementary strictly lower triangular matrices.

Lemma 4.3. Every strictly lower triangular matrix is invertible, and its inverse is also strictly lower triangular. Proof. Clearly true for 1 1 matrices. Lets consider an n n strictly lower triangular matrix S , and assume that we have already proven the result for all matrices of smaller size. Write S= 1 c 0 A

where c is a column and A is a smaller strictly lower triangular matrix. Then S 1 = which is strictly lower triangular. Denition 4.4. A matrix M is strictly upper triangular if it has ones down the diagonal zeroes everywhere below the diagonal. Problem 4.6. For each fact proven above about strictly lower triangular matrices, prove an analogue for strictly upper triangular matrices. 1 A1 c 0 A
1

4.2. Diagonal Matrices

37

Problem 4.7. Draw a picture indicating where some vectors lie in the x1 x2 plane, and where they get mapped to in the y1 y2 plane by y = Ax with 1 2 0 . 1

A=

4.2 Diagonal Matrices


A diagonal matrix is one like t1 D= t2 .. . tn (with blanks representing 0 entries). Problem 4.8. Show by calculation that 1 1 4 1 7 7 2 5 8 3 1 6 = 4 9 1 2 5
8 7

3 6 .
9 7

Problem 4.9. Prove that a diagonal matrix D is invertible just when none of its diagonal entries are zero. Find its inverse.

Lemma 4.5. If t1 D= t2 .. . tn then DA is A with row 1 scaled by t1 , etc. ,

38 Proof. For a vector x, t1 Dx =

Elimination Via Matrix Arithmetic

t2 .. . tn

x1 x2 . . . xn

t1 x1 t2 x2 = . . . t n xn (just running your ngers along rows and down columns). So D scales row i by ti . For any matrix A, the columns of DA are D times columns of A.

4.2 Review Problems


Problem 4.10. Which diagonal matrix D takes the matrix A= to the matrix DA = 1 5
4 3

3 5

4 6

Problem 4.11. Multiply a 0 0 0 b 0 0 d 0 0 c 0 0 e 0 0 0 f

Problem 4.12. Draw a picture indicating where some vectors lie in the x1 x2 plane, and where they get mapped to in the y1 y2 plane by y = Ax with each of the following matrices playing the part of A: 2 0 0 , 3 1 0 0 , 1 2 0 0 . 3

4.3. Encoding Linear Equations in Matrices

39

4.3 Encoding Linear Equations in Matrices


Linear equations x2 x3 = 6 3x1 + 7x2 + 4x3 = 9 3x1 + 5x2 + 8x3 = 3 can be written in matrix form as 0 1 3 7 3 5 Any linear equations A11 x1 + A12 x2 + + A1q xq = b1 A21 x1 + a22 x2 + + A2q xq = b2 . . . . =. . Ap1 x1 + Ap2 x2 + + Apq xq = bp 1 x1 6 4 x2 = 9 . 8 x3 3

become

A11 A21 . . . Ap 1

A12 A22 . . . Ap 2

... ... .. . ...

A1q x1 b1 A2q x2 b2 . = . . . . . . . . Apq xq bp

which we write as Ax = b. Problem 4.13. Write the linear equations x1 + 2 x2 = 7 3 x1 + 4 x2 = 8 in matrices.

4.4 Forward Elimination Encoded in Matrix Multiplication


Forward elimination on a matrix A is carried out by multiplying on the left of A by a sequence of permutation matrices and strictly lower triangular matrices. Example 4.6. 0 A = 3 3 1 7 5 1 4 8

40 Swap rows 1 and 2 (and 0 1 0 Add (row 1) to (row 1 0 0 1 1 0 The string of to (row 3). 1 0 0 1 0 2

Elimination Via Matrix Arithmetic

lets write out the permutation matrix): 1 0 3 7 4 0 0 A = 0 1 1 0 1 3 5 8

3): 0 0 0 1 0 1

1 0 0

3 0 A = 0 0 1 0

7 1 2

4 1 4

matrices in front of A just gets longer at each step. Add 2 (row 2) 0 1 0 0 1 1 0 0 0 1 1 0 0 3 0 A = 0 1 0 4 1 2

0 1 0

1 0 0

7 1 0

Call this U . This is the echelon form: 1 0 0 1 U = 0 1 0 0 0 2 1 1

0 1 0

0 0 0 1 1 0

1 0 0

0 0 A. 1

Remark 4.7. We wont write out these tedious matrices on the left side of A ever again, but it is important to see it done once. Remark 4.8. Back substitution is similarly carried out by multiplying by strictly upper triangular and invertible diagonal matrices.

4.4 Review Problems


Problem 4.14. Let P be the 3 3 permutation matrix which swaps rows 1 and 2. What does the matrix P 99 do? Write it down.

Problem 4.15. Let S be the 3 3 strictly lower triangular matrix which adds 2 (row 1) to row 3. What does the 3 3 matrix S 101 do? Write it down.

Problem 4.16. Which 3 3 matrix adds twice the rst row to the second row when you multiply by it? Problem 4.17. Which 4 4 matrix swaps the second and fourth rows when you multiply by it?

4.4. Forward Elimination Encoded in Matrix Multiplication

41

Problem 4.18. Which 4 4 matrix doubles the second and quadruples the third rows when you multiply by it? Problem 4.19. If P is the permutation matrix of a permutation p, what is AP ? Problem 4.20. If we start with 0 A = 2 0 and end up with 2 P A = 0 0 what permutation matrix is P ? Problem 4.21. If A is a 2 2 matrix, and AP = P A for every 2 2 permutation matrix P or strictly lower triangular matrix, then prove that A = c I for some number c. Problem 4.22. If the third and fourth columns of a matrix A are equal, are they still equal after we carry out forward elimination? After back substitution? Problem 4.23. How many pivots can there be in a 3 5 matrix in echelon form? 3 5 0 4 6 1 0 3 5 1 4 6

Problem 4.24. Write down the simplest 3 5 matrices you can come up with in echelon form and for which a. The second and third variables are the only free variables. b. There are no free variables. c. There are pivots in precisely the columns 3 and 4.

Problem 4.25. Write down the simplest matrices A you can for which the number of solutions to Ax = b is a. 1 for any b; b. 0 for some b, and for other b; c. 0 for some b, and 1 for other b; d. for any b. Problem 4.26. Suppose that A is a square matrix. Prove that all entries of A are positive just when, for any nonzero vector x which has no negative entries, the vector Ax has only positive entries.

42

Elimination Via Matrix Arithmetic

Problem 4.27. Prove that short matrices kill. A matrix is called short if it is wider than it is tall. We say that a matrix A kills a vector x if x = 0 but Ax = 0.

4.5 Summary
The many steps of elimination can each be encoded into a matrix multiplication. The resulting matrices can all be multiplied together to give the single equation U = V A, where A is the matrix we started with, U is the echelon matrix we end up with and V is the product of the various matrices that carry out all of our elimination steps. There is a big idea at work here: encode a possibly huge number of steps into a single algebraic equation (in this case the equation U = V A), turning a large computation into a simple piece of algebra. We will return to this idea periodically.

5 Finding the Inverse of a Matrix


Lets use elimination to calculate the inverse of a matrix.

5.1 Finding the Inverse of a Matrix By Elimination


Example 5.1. If Ax = y then multiplying both sides by A1 gives x = A1 y , solving for x. We can write out Ax = y as linear equations, and solve these equations for x. For example, if A= then writing out Ax = y : x1 2 x2 = y1 2 x1 3 x2 = y2 . Let apply GaussJordan elimination, but watch the equations instead of the matrices. Add -2(equation 1) to equation 2. x 1 2 x 2 = y1 x2 = 2 y1 + y2 . Add 2(equation 2) to equation 1. x1 = 3 y1 + 2 y2 x2 = 2 y1 + y2 . 1 2 2 , 3

So A 1 = 3 2 2 . 1

Theorem 5.2. Let A be a square matrix. Suppose that GaussJordan elimination applied to the matrix (A I ) ends up with (U V ) with U and V square matrices. A is invertible just when U = I , in which case V = A1 . 43

44

Finding the Inverse of a Matrix

Example 5.3. Before the proof, lets have an example. Lets invert A= 1 2 2 . 3

A Add -2(row 1) to row 2.

1 2

2 3

1 0

0 1

1 0 Make a new pivot . 1 0 Add 2(row 2) to row 1. 1 0 = U

2 1

1 2

0 1

2 1

1 2

0 1

0 1 V

3 2 .

2 1

Obviously these are the same steps we used in the example above; the shaded part represents coecients in front of the y vector above. Since U = I , A is invertible and A 1 = V = 3 2 2 . 1

Proof. GaussJordan elimination on (A I ) is carried out by multiplying by various invertible matrices (strictly lower triangular, permutation, invertible diagonal and strictly upper triangular), say like U So U = MN MN 1 . . . M2 M1 A V = M N M N 1 . . . M 2 M 1 , V = MN MN 1 . . . M2 M1 A I .

5.1. Finding the Inverse of a Matrix By Elimination

45

which we summarize as U = V A. Clearly V is a product of invertible matrices, so invertible. Thus U is invertible just when A is. First suppose that U has pivots all down the diagonal. Every pivot is a 1. Entries above and below each pivot are 0, so U = I . Since U = V A, we nd I = V A. Multiply both sides on the left by V 1 , to see that V 1 = A. But then multiply on the right by V to see that I = AV . So A and V are inverses of one another. Next suppose that U doesnt have pivots all down the diagonal. We always start GaussJordan elimination on the diagonal, so we fail to place a pivot somewhere along the diagonal just because we move during forward elimination. That move makes a pivotless column, hence a free variable for the equation Ax = 0. Setting the free variable to a nonzero value produces a nonzero x with Ax = 0. By problem 3.24 on page 29, A is not invertible. Problem 5.1. Find the inverse of 0 A = 1 1

0 1 2

1 0 . 1

5.1 Review Problems


Problem 5.2. Find the inverse of 1 A = 1 1 Problem 5.3. Find the inverse of 1 A = 1 1 Problem 5.4. Find the inverse of 1 A = 0 1 3 0 . 0

2 2 0

1 1 1

1 1 . 1

1 1 1

1 3 . 3

Problem 5.5. Is there a faster method than GaussJordan elimination to nd the inverse of a permutation matrix?

46

Finding the Inverse of a Matrix

5.2 Invertibility and Forward Elimination


Proposition 5.4. A square matrix U in echelon form is invertible just when U has pivots all the way down the diagonal, which occurs just when U has no zero rows. Proof. Applying back substitution to a matrix U which is already in echelon form preserves the locations of the pivots, and just rescales them to be 1, killing everything above them. So back substitution takes U to I just when U has pivots all the way down the diagonal. Example 5.5. 1 0 is invertible, while 0 0 0 is not invertible. Theorem 5.6. A square matrix A is invertible just when its echelon form U is invertible. Remark 5.7. So we can quickly decide if a matrix is invertible by forward elimination. We only need back substitution if we actually need to compute out the inverse. Proof. U = V A, and V is invertible, so U is invertible just when A is. Example 5.8. A= has echelon form U= so A is invertible. Problem 5.6. Is 0 1 invertible? 1 0 1 0 1 1 0 1 1 1 1 0 0 2 3 0 2 7

5.2. Invertibility and Forward Elimination

47

Problem 5.7. Is

0 1 1

1 0 1

0 1 1

invertible?

Problem 5.8. Prove that a square matrix A is invertible just when the only solution x to the equation Ax = 0 is x = 0.

Inversion and Solvability of Linear Equations


Theorem 5.9. Take a square matrix A. The equation Ax = b has a solution x for every vector b just when A is invertible. For each b, the solution x is then unique. On the other hand, if A is not invertible, then Ax = b has either no solution or innitely many, and both of these possibilities occur for dierent choices of b. Proof. If A is invertible, then multiplying both sides of Ax = b by A1 , we see that we have to have x = A1 b. On the other hand, suppose that A is not invertible. There is a free variable for Ax = b, so no solutions or innitely many. Lets see that for dierent choices of b both possibilities occur. Carry out forward elimination, say U = V A. Then U has a zero row, say row n. We cant solve U x = en (look at row n). So set b = V 1 en and we cant solve Ax = b. But now instead set b = 0 and we can solve Ax = 0 (for example with x = 0) and therefore solve Ax = 0 with innitely many solutions x, since there is a free variable. Example 5.10. The equations x1 + 2x2 = 9845039843453455938453 x1 2x2 = 90853809458394034464578

have a unique solution, because they are Ax = b with A= which has echelon form U= 1 0 2 4 . 1 1 2 2

48

Finding the Inverse of a Matrix

Problem 5.9. Suppose that A and B are n n matrices and AB = I . Prove that A and B are both invertible, and that B = A1 and that A = B 1 .

Problem 5.10. Prove that for square matrices A and B of the same size (AB )
1

= B 1 A 1

(and if either side is dened, then the other is and they are equal).

5.2 Review Problems


Problem 5.11. Is A= invertible? 0 1 1 0

Problem 5.12. How many solutions are there to the following equations? x1 + 2x2 + 3x3 = 284905309485083 x1 + 2x2 + x3 = 92850234853408 x2 + 15x3 = 4250348503489085.

Problem 5.13. Let A be the n n matrix which has 1 in every entry on or under the diagonal, and 0 in every entry above the diagonal. Find A1 . Problem 5.14. Let A be the n n matrix which has 1 in every entry on or above the diagonal, and 0 in every entry below the diagonal. Find A1 . Problem 5.15. Give an example of a 3 3 invertible matrix A for which A and At have dierent values for their pivots.

Problem 5.16. Imagine that you start with a matrix invertible, and carry out forward elimination on (A I ). (U V ) = 0 0 0 0 2 8 0 0 0 0 0 1

A which might not be Suppose you arrive at , 3 9 5 2

5.2. Invertibility and Forward Elimination

49

with some pivots somewhere on the rst two rows of U . Fact: you can solve Ax = b just for those vectors b which solve the equations 2b1 +8b2 +3b3 +9b4 = 0 . b2 +5b3 +2b4 = 0 Explain why.

6 The Determinant
We can see whether a matrix is invertible by computing a single number, the determinant. Problem 6.1. Use forward elimination to prove that a 2 2 matrix A= is invertible just when ad bc = 0. a b , the determinant is ad bc. For larger c d matrices, the determinant is complicated. For any 2 2 matrix a c b d

6.1 Denition
Determinants are computed as in gure 6.1 on the next page. To compute a determinant, run your nger down the rst column, writing down plus and minus signs in the pattern +, , +, , . . . in front the entry your nger points at, and then writing down the determinant of the matrix you get by deleting the row and column where your nger lies (always the rst column), and add up. Problem 6.2. Prove that det a c b d = ad bc.

6.1 Review Problems


Problem 6.3. Find the determinant of 3 1 51 1 3

52

The Determinant

3 det 1 6

2 4 7

1 3 5 = + (3) det 1 2 6 3 (1) det 1 6 3 + (6) det 1 6 =3 det 4 7 5 2

2 4 7 2 4 7 2 4 7 det 2 7

1 5 2 1 5 2 1 5 2 1 2 + 6 det 2 4 1 5

Figure 6.1: Computing a 3 3 determinant.

Problem 6.4. Find the determinant of 1 1 0 1 1 0

0 1 1

Problem 6.5. Does A2 11 appear in the expression for det A, when you expand out all of the determinants in the expression completely? Problem 6.6. Prove that the 0 0 0 0 0 0 A= 0 0 0 0 0 0 determinant of 0 0 0 0 0 0

is zero, no matter what number we put in place of the s, even if the numbers are all dierent.

6.2. Easy Determinants

53

Problem 6.7. Give an example of a matrix all of whose entries are positive, even though its determinant is zero. Problem 6.8. What is det I ? Justify your answer. Problem 6.9. Prove that det A 0 B C = det A det C,

for A and C any square matrices, and B any matrix of appropriate size to t in here.

6.2 Easy Determinants


Example 6.1. Lets nd the determinant of 7 A= 4 2 (There are zeros wherever entries are not written.) Running down the rst column, we only hit the 7. So det A = 7 det By the same trick: det A = (7)(4) det(2) = (7)(4)(2). Summing up: Lemma 6.2. The determinant of a diagonal matrix p1 p2 A= .. . pn is det A = p1 p2 . . . pn . We can easily do better: recall that a matrix A is upper triangular if all entries below the diagonal are 0. By the same trick again: 4 2 .

54

The Determinant

Lemma 6.3. The determinant of an upper triangular square matrix U11 U12 U13 U14 ... U 1n U22 U23 U24 ... U 2n U33 U34 ... U 3n . U = .. .. . . . . . .. . . . Unn is the product of the diagonal terms: det U = U11 U22 . . . Unn . Corollary 6.4. A square matrix A is invertible just when det U = 0, with U obtained from A by forward elimination. Proof. The matrix U is upper triangular. The fact that det U = 0 says just precisely that all diagonal entries of U are not zero, so are pivotsa pivot in every column. Apply theorem 5.6 on page 46.

6.2 Review Problems


Problem 6.10. Find 1 det 0 0 2 4 0 3 5 . 6

Problem 6.11. Suppose that U is an invertible upper triangular matrix. a. Prove that U 1 is upper triangular. b. Prove that the diagonal entries of U 1 are the reciprocals of the diagonal entries of U . c. How can you calculate by induction the entries of U 1 in terms of the entries of U ?

Problem 6.12. Let U be any upper triangular matrix with integer entries. Prove that U 1 has integer entries just when det U = 1.

6.3 Tricks to Find Determinants


Lemma 6.5. Swapping any two neighbouring rows of a square matrix changes the sign of the determinant. For example, det 1 3 2 4 = det 3 1 4 2 .

6.3. Tricks to Find Determinants

55

Proof. It is obvious for 1 1 (you cant swap anything). It is easy to check for a 2 2. Picture a 3 3 matrix A (like example 6.1 on page 52). For simplicity, lets swap rows 1 and 2. Then the plus sign of row 1 and the minus sign of row 2 are clearly switched in the 1st and 2nd terms in the determinant. In the 3rd term, the leading plus sign is not switched. Look at the determinant in the 3rd term: rows 1 and 2 dont get crossed out, and have been switched, so the determinant factor changes sign. So all terms in the determinant formula have changed sign, and therefore the determinant has changed sign. The argument goes through identically with any size of matrix (by induction) and any two neighboring rows, instead of just rows 1 and 2. Lemma 6.6. Swapping any two rows of a square matrix changes the sign of the determinant, so det P A = det A for P the permutation matrix of a transposition. Proof. Suppose that we want to swap two rows, not neighboring. For concreteness, imagine rows 1 and 4. Swapping the rst with the second, then second with third, etc., a total of 3 swaps will drive row 1 into place in row 4, and drives the old row 4 into row 3. Two more swaps (of row 3 with row 2, row 2 with row 1) puts everything where we want it. More generally, to swap two rows, start by swapping the higher of the two with the row immediately under it, repeatedly until it ts into place. Some number s of swaps will do the trick. Now the row which was the lower of the two has become the higher of the two, and we have to swap it s 1 swaps into place. So 2s 1 swaps in all, an odd number. Problem 6.13. If a square matrix has two rows the same, prove that it has determinant 0.

Problem 6.14. Find 2 2 matrices A and B for which det(A + B ) = det A + det B. So det doesnt behave well under adding matrices. But it does behave well under adding rows of matrices. Example 6.7. Watch each row: det 1+5 3 2+6 4 = det 1 3 2 4 + det 5 3 6 4 .

Theorem 6.8. The determinant of any square matrix scales when you scale across any row like det 71 3 72 4 = 7 det 1 3 2 4

56 or when you scale down any column like det 71 73 2 4 = 7 det 1 3 2 . 4

The Determinant

It adds when you add across any row like det 1+5 3 2+6 4 = det 1 3 2 4 + det 5 3 6 4

or when you add down any column like det 1 3 2+5 4+6 = det 1 3 2 4 + det 1 3 5 . 6

Proof. To compute a determinant, you pick an entry from the rst column, and then delete its row and column. You then multiply it by the determinant of what is left over, which is computed by picking out an entry from the second column, not from the same row, etc. If we ignore for a moment the plus and minus signs, we can see the pattern emerging: you just pick something from the rst column, and cross out its row and column,

and then something from the second column, and cross out its row and column, , , ...,

and so on. Finally, you have picked one entry from each column, all from dierent rows. In our example, we picked A31 , A52 , A23 , A14 , A45 . Multiply these together, and you get just one term from the determinant: A31 A52 A23 A14 A45 . Your term has exactly one entry from the rst column, and then you crossed out the rst column and moved on. Suppose that you double all of the entries in the rst column. Your term contains exactly one entry from that column, A31 in our example, so your term doubles. Adding up the terms, the determinant doubles. The determinant is the sum over all choices you could make of rows to pick at each step; and of course, there are some plus and minus signs which we are

6.3. Tricks to Find Determinants

57

still ignoring. For example, with this kind of picture, a 2 2 determinant looks like = A11 A22 A21 A12 .

In the same way, scaling any column, you scale your entry from that column, so you scale your term. You scale all of the terms, so you scale the determinant. When you cobbled together your term, you picked out an entry from some row, and then crossed out that row. So you didnt use the same row twice. There are as many rows as columns, and you picked an entry in each column, so you have picked as many entries as there are rows, never using the same row twice. So you must have picked out exactly one entry from each row. In our example term above, we see this clearly: the rows used were 3, 5, 2, 1, 4. By the same argument as for columns, if you scale row 2, you must scale the entry A23 , any only that entry, so you scale the term. Adding up all possible terms, you scale the determinant. Lets see why we can add across rows. If I try to add entries across the rst row, a single term looks like = (4 + 9) (. . . ) where the (. . . ) indicates all of the other factors from the lower rows, which we will leave unspecied, = 4 ( . . . ) + 9 (. . . ) 1 2 3 = 1+6 2+7 3+8 4+9 5 + 10

10

since we keep all of the entries in the lower rows exactly the same in each matrix. This shows that each term adds when you add across a single row, so the sum of the terms, the determinant, must add. This reasoning works for any size of matrix in the same way. Moreover, it works for columns just in the same way as for rows. Problem 6.15. What happens to the determinant if I double the rst row and then triple the second row?

58

The Determinant

Proposition 6.9. Suppose that S is the strictly upper or strictly lower triangular matrix which adds a multiple of one row to another row. Then det SA = det A. i.e. we can add a multiple of any row to any other row without aecting the determinant. Proof. We can always swap rows as needed, to get the rows involved to be the rst and second rows. Then swap back again. This just changes signs somehow, and then changes them back again. So we need only work with the rst and second rows. For simplicity, picture a 3 3 matrix as 3 rows: a1 A = a2 . a3 Adding s (row 1) to (row 2) gives

a1 a2 + s a1 a3

which has determinant a1 det a2 + s a1 a3

a1 = det a2 a3

a1 + s det a1 a3

by the last lemma. The second determinant vanishes because it has two identical rows. The general case is just the same with more notation: we stu more rows around the three rows we had above. Problem 6.16. Which property of the determinant is illustrated in each of these examples? (a) 10 5 5 2 1 1 1 0 2 = 5 1 0 2 1 0 1 1 0 1 (b) 1 1 3 2 0 0 3 1 2 = 3 2 1 2 0 0 3 2 2

6.3. Tricks to Find Determinants

59

(c) 1 4 2 1 0 2 1 2 2 = 4 0 1 1 0 0 2 2 3

7 The Determinant via Elimination


The fast way to compute the determinant of a large matrix is via elimination. The fast formula for the determinant: Theorem 7.1. Via forward elimination, (product of the pivots) if there is a pivot in each column, det A = 0 otherwise. where + =

if we make an even number of row swaps during forward elimination, otherwise.

Example 7.2. Forward elimination takes A= 0 2 7 3 to U = 2 0 3 7

with one row swap so det A = (2)(7) = 14. Remark 7.3. The fast formula isnt actually any faster for small matrices, so for a 2 2 or 3 3 you wouldnt use it. But we need the fast formula anyway; each of the two formulas gives dierent insight. Proof. We can see how the determinant changes during elimination: adding multiples of rows to other rows does nothing, swapping rows changes sign. Problem 7.1. Use the fast formula to nd the determinant of 2 5 5 A = 2 5 7 2 6 11

61

62 Problem 7.2. Just by looking, 1001 2002 det 2343 9873 nd 1002 2004 6787 7435

The Determinant via Elimination

1003 2006 1938 2938

1004 2008 . 4509 9038

Problem 7.3. Prove that a square matrix is invertible just when its determinant is not zero.

7.0 Review Problems


Problem 7.4. Find the determinant of 1 0 1 0 0 1 0 1 1 1 0 1

1 0 1 0

Problem 7.5. Find the determinant of 0 2 1 1 1 1 2 0

2 0 0 1

0 0 1 1

Problem 7.6. Find the determinant of 2 1 1 1 2 1

1 0 1

Problem 7.7. Find the determinant of 0 2 0 0 2 2

0 1 1

The Determinant via Elimination

63

Problem 7.8. Find the determinant of 2 1 2 0 0 2

1 2 1

Problem 7.9. Find the determinant of 0 1 0 1 0 1

1 2 0

Problem 7.10. Prove that a square matrix with a zero row has determinant 0.

Problem 7.11. Prove that det P A = (1)N det A if P is the permutation matrix of a product of N transpositions. Problem 7.12. Use the fast formula to nd 0 2 A = 3 1 3 5 the determinant of 1 2 2

Problem 7.13. Prove that the determinant of any lower triangular square matrix L11 L 21 L31 L= L41 . . . Ln1 L22 L32 L42 . . . Ln2 L33 L43 . . . Ln3 . . . . ... .. .. Lnn

Ln(n1)

(with zeroes above the diagonal) is the product of the diagonal terms: det L = L11 L22 . . . Lnn .

64

The Determinant via Elimination

7.1 Determinants Multiply


Theorem 7.4. det (AB ) = det A det B. (7.1)

Proof. Suppose that det A = 0. By the fast formula, A is not invertible. Problem 5.10 on page 48 tells us that therefore AB is not invertible, and both sides of equation 7.1 are 0. So we can safely suppose that det A = 0. Via Gauss-Jordan elimination, any invertible matrix is a product of matrices each of which adds a multiple of one row to another, or scales a row, or swaps two rows. Write A as a product of such matrices, and peel o one factor at a time, applying lemma 6.5 on page 54 and proposition 6.9 on page 58. Example 7.5. If 1 A = 0 0 4 2 0 1 6 , B = 2 5 7 3 0 2 5 0 0 , 4

then it is hard to compute out AB , and then compute out det AB . But det AB = det A det B = (1)(2)(3)(1)(2)(4) = 48.

7.2 Transpose
Denition 7.6. The transpose of a matrix A is the matrix At whose entries are At ij = Aji (switching rows with columns). Example 7.7. Flip over the diagonal: 10 2 t A=3 40 , A = 5 6 Problem 7.14. Find the transpose of 1 2 A = 4 5 0 0 Problem 7.15. Prove that (AB ) = B t At . (The transpose of the product is the product of the transposes, in the reverse order.)
t

10 2

3 40

5 . 6

3 6 . 0

7.3. Expanding Down Any Column or Across Any Row

65

Problem 7.16. Prove that the transpose of any permutation matrix is a permutation matrix. How is the permutation of the transpose related to the original permutation?

Corollary 7.8. det A = det At

Proof. Forward elimination gives U = V A, U upper triangular and V a product of permutation and strictly lower triangular matrices. Tranpose: U t = At V t . But V t is a product of permutation and strictly upper triangular matrices, with the same number of row swaps as V , so det V t = det V = 1. The matrix U t is lower triangular, so det U t is the product of the diagonal entries of U t (by problem 7.13 on page 63), which are the diagonal entries of U , so det U t = det U .

7.3 Expanding Down Any Column or Across Any Row


Consider the checkerboard pattern + . . . + . . . + + ... . . ..

Theorem 7.9. We can compute the determinant of any square matrix A by picking any column (or any row) of A, writing down plus and minus signs from the same column (or row) of the checkboard matrix, writing down the entries of A from that column (or row), multiplying each of these entries by the determinant obtained from deleting the row and column of that entry, and adding all of these up.

Example 7.10. For 3 A = 1 6 2 4 7 1 5 , 2

66 if we expand along the second row, we get 3 det A = (1) det 1 6 3 + (4) det 1 6 3 (5) det 1 6

The Determinant via Elimination

2 4 7 2 4 7 2 4 7

1 5 2 1 5 2 1 5 2

Proof. By swapping columns (or rows), we change signs of the determinant. Swap columns (or rows) to get the required column (or row) to slide over to become the rst column (or row). Take the sign changes into account with the checkboard pattern: changing all plus and minus signs for each swap. Problem 7.17. Use this to calculate the determinant of 1 2 0 1 4 0 0 3 A= . 0 0 0 2 839 1702 1 493

7.4 Summary
Determinants (a) scale when you scale across a row (or down a column), (b) add when you add across a row (or down a column), (c) switch sign when you swap two rows, (or when you swap two columns), (d) dont change when you add a multiple of one row to another row (or a multiple of one column to another column), (e) dont change when you transpose, (f) multiply when you multiply matrices. The determinant of (a) an upper (or lower) triangular matrix is the product of the diagonal entries. (b) a permutation matrix is (1)# of transpositions . (c) a matrix is not zero just when the matrix is invertible. (d) any matrix is det A = (1)N det U , if A is taken by forward elimination with N row swaps to a matrix U .

7.4. Summary

67

Problem 7.18. If A is a square matrix, prove that det Ak = (det A) for k = 1, 2, 3, . . . .


k

Problem 7.19. Use this last exercise to nd det A2222444466668888 where A= 0 1 1 . 1234567890

Problem 7.20. If A is invertible, prove that det A1 = 1 . det A

7.4 Review Problems


Problem 7.21. What are all of the dierent ways you know to calculate determinants?

Problem 7.22. How many solutions are there to the following equations? x1 + 1010x2 + 130923x3 = 2839040283 2x2 + 23932x3 = 2390843248 3x3 = 98234092384

Problem 7.23. Prove that no matter which entry of an n n matrix you pick (n > 1), you can nd some invertible n n matrix for which that entry is zero.

Bases and Subspaces

69

8 Span
We want to think not only about vectors, but also about lines and planes. We will nd a convenient language in which to describe lines and planes and similar objects.

8.1 The Problem


Look at a very simple linear equation: x1 + 2x2 + x3 = 0. (8.1) The solutions of an equation forming a plane.

There are many solutions. Each is a point in R3 , and together they draw out a plane. But how do we write down this plane? The picture is uselesswe cant see for sure which vectors live on it. We need a clear method to write down planes, lines, and similar things, so that we can communicate about them (e.g. over the telephone or to a computer). One method to describe a plane is to write down an equation, like x1 + 2x2 + x3 = 0, cutting out the plane. But there is another method, which we will often prefer, building up the plane out of vectors.

8.2 Span
Example 8.1. Consider the equations x1 + 2x2 7 x4 = 0 x3 + x4 = 0

Solutions have x 1 = 2 x 2 + 7 x 4 x 3 = x 4 , 71

72 giving x1 x x = 2 x3 x4 2 x2 + 7 x4 x2 = x4 x4 2 7 1 0 = x2 + x4 0 1 0 1

Span

But x2 and x4 are freethey can be anything. The solutions are just arbitrary combinations of 2 7 1 0 and 0 1 0 1 We can just remember these two vectors, to describe all of the solutions. Denition 8.2. A multiple of a vector v is a vector cv where c is a number. A linear combination of some vectors v1 , v2 , . . . , vp in Rn is a vector v = c1 v1 + c2 v2 + + cp vp , for some numbers c1 , c2 , . . . , cp (a sum of multiples). The span of some vectors is the collection of all of their linear combinations.

8.3 The Solution


We can describe the plane of solutions of equation 8.1 on the previous page: it is the span of the vectors 2 0 1 , 1 . 0 2 That isnt obvious. (You can apply forward elimination to check that it is correct.) But immediately we see our next problem: you might describe it as this span, and I might describe it as the span of 2 1 1 , 0 . 0 1

8.4. How to Tell if a Vector Lies in a Span

73

How do we see that these are the same thing?

8.4 How to Tell if a Vector Lies in a Span


If we have some vectors, lets say 1 1 x1 = 2 , x2 = 0 3 1 how do we tell if another vector lies in their span? Lets ask if 1 y = 4 7 lies in the span of x1 and x2 . So we are asking if y is a linear combination c1 x1 + c2 x2 . Example 8.3. Solving the linear equations c1 + c2 = 1 2c1 =4 3c1 c2 = 7 just means nding numbers c1 and c2 for which 1 1 1 c1 2 + c2 0 = 4 , 3 1 7 writing y as a linear combination of x1 and x2 . Solving linear equations is exactly the same problem as asking whether one vector is a linear combination of some other vectors. Problem 8.1. Write down some linear equations, so that solving them is the same problem as asking whether 1 0 2 is a linear combination of 1 2 1 0 , 1 , 1 . 1 1 0

74

Span

Denition 8.4. A pivot column of a matrix A is a column in which a pivot appears when we forward eliminate A. Example 8.5. The matrix A= has echelon form U= so columns 1 and 3 of A: A= are pivot columns. Lemma 8.6. Write some vectors into the columns of a matrix, say A = x1 x2 ... xp y 0 1 0 1 1 1 1 0 1 0 1 1 0 1 0 1 1 1

and apply forward elimination. Then y lies in the span of x1 , x2 , . . . , xp just when y is not a pivot column. Proof. As in example 8.3 on the preceding page, the problem is precisely whether we can solve the linear equations whose matrix is A, with y the column of constants. We already know that linear equations have solutions just when the column of constants is not a pivot column. Applied to our example, this gives A = x1 1 = 2 3 x2 1 0 1 y 1 4 7

to which we apply forward elimination: 1 1 2 0 3 1

1 4 7

8.4. How to Tell if a Vector Lies in a Span

75

Add 2(row 1) to row 2, and 3(row 1) to row 3. 1 0 0 Add 2(row 2) to row 3. 1 0 0 1 2 0 1 2 0 1 2 4 1 2 4

There is no pivot in the last column, so y is a linear combination of x1 and x2 , i.e. lies in their span. (In fact, in the echelon form, we see that the last column is twice the rst column minus the second column. So this must hold in the original matrix too: y = 2 x1 x2 .) Problem 8.2. What if we have a lot of vectors y to test? Prove that vectors y1 , y2 , . . . , yq all lie in the span of vectors x1 , x2 , . . . , xp just when the matrix x1 x2 ... xp y1 y2 ... yq

has no pivots in the last q columns.

8.4 Review Problems


Problem 8.3. Is the span of the vectors 1 1 1 , 1 0 1 the same as the span of the vectors 0 4 , 4 2? 2 3 Problem 8.4. Describe the span of the vectors 1 1 1 0 , 0 , 1 . 1 0 0

76

Span

Problem 8.5. Does the vector 1 0 1 lie in the span of the vectors

1 2 0 1 , 1 , 1? 1 2 0

Problem 8.6. Does the vector

0 2 0

lie in the span of the vectors 4 2 2 , , 0 1 0? 0 1 0 Problem 8.7. Does the vector 1 0 1 lie in the span of the vectors

0 2 1 1 , 0 , 1 ? 1 1 0 0 3 6

Problem 8.8. Does the vector

lie in the span of the vectors

1 1 3 1 , 2 , 0 ? 0 6 6

8.5. Subspaces

77

Problem 8.9. Find a linear equation satised on the span of the vectors 1 2 1 , 0 1 1

8.5 Subspaces
Picture a straight line through the origin, or a plane through the origin. We generalize this picture: Denition 8.7. A subspace P of Rn is a collection of vectors in Rn so that a. P is not empty (i.e. some vector belongs to the collection P ) b. If x belongs to P , then ax does too, for any number a. c. If x and y belong to P , then x + y does too. We can see in pictures that a plane through the origin is a subspace:

My plane is not empty: the origin lies in my plane

Scale a vector from my plane: it stays in that plane

Add vectors from my plane: the sum also lies in my plane

Problem 8.10. Prove that 0 belongs to every subspace. Problem 8.11. Prove that if a subspace contains some vectors, then it contains their span. Intuitively, a subspace is a at object, like a line or a plane, passing through the origin 0 of Rn . Example 8.8. The set P of vectors x= x1 x2

for which x1 + 2x2 = 0 is a subspace of R2 , because a. x = 0 satises x1 + 2x2 = 0 (so P is not empty). b. If x satises x1 + 2x2 = 0, then ax satises (ax)1 + 2(ax)2 = a x1 + 2a x2 = a (x1 + 2 x2 ) = 0.

78 c. If x and y are points of P , satisfying x1 + 2 x2 = 0 y1 + 2y2 = 0 then x + y satises (x1 + y1 ) + 2 (x2 + y2 ) = (x1 + 2x2 ) + (y1 + 2y2 ) = 0. Problem 8.12. Is the set S of all points x= of the plane with x2 = 1 a subspace? x1 x2

Span

Problem 8.13. Is the set P of all points x1 x = x2 x3 with x1 + x2 + x3 = 0 a subspace? The word subspace really means just the same as the span of some vectors, as we will eventually see. Proposition 8.9. The span of a set of vectors is a subspace; in fact, it is the smallest subspace containing those vectors. Conversely, every subspace is a span: the span of all of the vectors inside it. Remark 8.10. In order to make this proposition true, we have to change our denitions just a little: if we have an empty collection of vectors (i.e. we dont have any vectors at all), then we will declare that the span of that empty collection is the origin. Remark 8.11. If we have an innite collection of vectors, then their span just means the collection of all linear combinations we can build up from all possible choices we can make of any nite number of vectors from our collection. We dont allow innite sums. We would really like to avoid using spans of innite sets of vectors; we will address this problem in chapter 9. Proof. Given any set of vectors X in Rn , let U be their span. So any vector in U is a linear combination of vectors from X . Scaling any linear combination yields another linear combination, and adding two linear combinations yields a further linear combination, so U is a subspace. If W is any other subspace

8.5. Subspaces

79

containing X , then we can add and scale vectors from W , yielding more vectors from W , so we can make linear combinations of any vectors from W making more vectors from W . Therefore W contains the span of X , i.e. contains U . Finally, if V is any subspace, then we can add and scale vectors from V to make more vectors from V , so V is the span of all vectors in V . Problem 8.14. Prove that every subspace is the span of the vectors that it contains. (Warning: this fact isnt very helpful, because any subspace will either contain only the origin, or contain innitely many vectors. We would really rather only think about spans of nitely many vectors. So we will have to reconsider this problem later.)

Problem 8.15. What are the subspaces of R ?

Problem 8.16. If U and V are subspaces of Rn : a. Let W be the set of vectors which either belong to U or belong to V . Is W a subspace? b. Let Z be the set of vectors which belong to U and to V . Is Z a subspace?

8.5 Review Problems


Problem 8.17. Is the set X of all points x= of the plane with x2 = x2 1 a subspace? x1 x2

Problem 8.18. a. The set of b. The set of c. The set of d. The set of

Which of the following are subspaces of R4 ? points x for which x1 x4 = x2 x3 . points x for which 2x1 = 3x2 . points x for which x1 + x2 + x3 + x4 = 0. points x for which x1 , x2 , x3 and x4 are all 0.

Problem 8.19. Is a circle in the plane a subspace? Prove your answer. Draw pictures to explain your answer. Problem 8.20. Which lines in the plane are subspaces? Draw pictures to explain your answer.

80

Span

8.6 Summary
We have solved the problem of this chapter: to describe a subspace. You write down a set of vectors spanning it. If I write down a dierent set of vectors, you can check to see if mine are linear combinations of yours, and if yours are linear combinations of mine, so you know when yours and mine span the same subspace.

9 Bases
Our goal in this book is to greatly simplify equations in many variables by changing to new variables. In linear algebra, the concept of changing variables is replaced with the more concrete concept of a basis.

9.1 Denition
A basis is a list of just enough vectors to span a subspace. For example, we should be able to span a line by writing down just one vector lying in it, a plane with just two vectors, etc. Denition 9.1. A linear relation among some vectors x1 , x2 , . . . , xp in Rn is an equation c1 x1 + c2 x2 + + cp xp = 0, where c1 , c2 , . . . , cp are not all zero. A set of vectors is linearly independent if the vectors admit no linear relation. A set of vectors is a basis of Rn if (1) the vectors are linearly independent and (2) adding any other vector into the set would render them no longer linearly independent. Example 9.2. The vectors x1 = 1 , x2 = 2 2 4 The vector sticking up is linearly independent of the other two vectors.

Only this plane contains 0 and these two vectors. Threelegged tables dont wobble, unless all of the feet of the table legs lie on the same straight line.

satisfy the linear relation 2x1 x2 = 0.

9.2 Properties
Lemma 9.3. The columns of a matrix are linearly independent just when each one is a pivot column. Proof. Obvious from lemma 8.6 on page 74. Problem 9.1. Is 0 , 1 a basis of R2 ? 81 1 1

82

Bases

Problem 9.2. The standard basis of Rn is the basis e1 , e2 , . . . , en (where e1 is the rst column of In , etc.). Prove that the standard basis of Rn is a basis.

Problem 9.3. Prove that there is a linear relation between some vectors w1 , w2 , . . . , wq just when one of those vectors, say wk , is a linear combination of earlier vectors w1 , w2 , . . . , wk1 . Theorem 9.4. Every linearly independent set of vectors in Rn consists in at most n vectors, and consists in exactly n vectors just when it is a basis. Proof. Suppose that x1 , x2 , . . . , xp are a linearly independent. Let A = x1 x2 ... xp .

There is either one pivot or no pivot in each row. So the number of rows is at least as large as the number of pivots. There are n rows. There is one pivot in each column, so p pivots. So p n. If p = n, we have one pivot in each row, so adding another vector (another column) cant add another pivot. Therefore adding any other vector to the vectors x1 , x2 , . . . , xp would break linear independence. If p < n, then we have zero rows after forward elimination. Suppose that forward elimination yields U = V A. Then (U ep+1 ) has more pivot columns than U has, so A V 1 ep+1 has more pivot columns than A has. Thus adding a new vector xp+1 = V 1 ep+1 to the collection of vectors x1 , x2 , . . . , xp , we have a larger linearly independent collection. Problem 9.4. Prove that every linearly independent set of vectors in Rn belongs to a basis.

Lemma 9.5. A set of vectors u1 , u2 , . . . , un is a basis of Rn just when every vector b in Rn can be written as a linear combination b = a1 u1 + a2 u2 + + an un , for a unique choice of numbers a1 , a2 , . . . , an . Proof. Let A = u1 u2 a1 a2 , a= . . . an ... un ,

and apply theorem 5.9 on page 47 to the equation Aa = b.

9.3. The Change of Basis Matrix

83

9.2 Review Problems


Problem 9.5. Are the vectors 1 0 1 1 , 1 , 0 0 1 1 linearly independent?

Problem 9.6. Are the vectors 2 , 1 a basis? 1 2

Problem 9.7. Can you nd matrices A and B so that A is 3 5 and B is 5 3, and AB = 1? Problem 9.8. Suppose that A is 3 5 and B is 5 3, and that AB is invertible. Must the columns of B be linearly independent? the rows of B ? the columns of A? the rows of A? Problem 9.9. Give an example of a 3 3 matrix for which any two columns are linearly independent, but the three columns together are not linearly independent. Can such a matrix be invertible?

9.3 The Change of Basis Matrix


Denition 9.6. The change of basis matrix F associated to a basis u1 , u2 , . . . , un of Rn is the matrix F = u1 u2 ... un .

Note that F e1 = u1 , F e2 = u2 , . . . , F en = un . So taking x to F x is a change of basis, taking the standard basis to the new basis. Problem 9.10. Prove that an n n matrix A is the change of basis matrix of a basis just when the equation Ax = 0 has x = 0 as its only solution, which occurs just when A is invertible. Suppose that you and I look at the sky and watch a falling star. You measure its position against the xed choice of basis e1 , e2 , e3 , while I measure against

84

Bases

some funny choice of basis u1 , u2 , u3 . The actual position is some vector p in R3 . Lets say p = x1 e1 + x2 e2 + x3 e3 as you measure it, = y1 u1 + y2 u2 + y3 u3 as I measure it. Let F = u1 u2 u3 be the change of basis matrix, so that F e1 = u1 , etc. So F takes your basis to mine. If we let x1 y1 x = x2 and y = y2 x3 y3 then in your basis: p = x1 e1 + x2 e2 + x3 e3 =x but in mine: p = y1 u1 + y2 u2 + y3 u3 = y1 F e1 + y2 F e2 + y3 F e3 = F (y1 e1 + y2 e2 + y3 e3 ) = F y. So x = F y converts my measurements to yours. Remark 9.7. Suppose that we change variables by x = F y and so y = F 1 x, with F some invertible matrix. Then any matrix A acting on the x variables by taking x to Ax is represented in y variables as F 1 turn xs to y s the matrix F 1 AF . Problem 9.11. Take A act on xs F turn y s to xs

1 F = 0 0

0 1 0

1 1 0 , A = 0 1 0

0 2 0

0 0 . 2

Compute F 1 AF .

9.3. The Change of Basis Matrix

85

Problem 9.12. A shower of falling stars fall to Earth. Each star falls from a position x1 x = x2 x3 to a position on the ground x1 Ax = x2 . 0 What is the matrix A? Suppose that I measure the positions of the stars against the basis 1 2 0 u1 = 0 , u2 = 1 , u3 = 2 . 0 0 1 Find the change of basis matrix F , and nd F 1 AF , the matrix that describes how each star falls from the sky as measured against my basis.

9.3 Review Problems


Problem 9.13. Is 1 0 1 , , 2 2 1 0 0 1

a basis of R3 ? Problem 9.14. If A is a matrix, show how each vector which A kills determines a linear relation between the columns of A, and vice versa.

Problem 9.15. Are 1 , 0 linearly independent? 0 0

Problem 9.16. Write down a basis of R2 other than the standard basis, and prove that your basis really is a basis.

86 Problem 9.17. Is 1 , 1 a basis of R2 ? 2 1

Bases

Problem 9.18. Is 0 1 1 1

a change of basis matrix? If so, for what basis? Problem 9.19. If x1 , x2 , . . . , xn and y1 , y2 , . . . , yn are two bases of Rn , prove that there is a unique invertible matrix A so that A x1 = y1 , A x2 = y2 , etc.

9.4 Bases of Subspaces


We can write down a subspace, by writing down a spanning set of vectors. But you might write down more vectors than you need to. We want to squeeze the description down to the bare simplest minimum, throwing out redundant information. Denition 9.8. If V is a subspace of Rn , a basis of V is a set of linearly independent vectors from V , so that adding any other vector from V into the set would render them no longer linearly independent. Example 9.9. The vectors 1 0 , 0 1 0 0 are a basis for the subspace V in R3 of vectors of the form x1 x2 . 0 Obviously: Lemma 9.10. If some vectors span a subspace, then putting them into the columns of a matrix, the pivot columns form a basis of the subspace. Example 9.11. Lets nd a basis for 1 1 , 1 the span of the vectors 1 0 1 , 0 . 0 1

9.5. Dimension

87

Put them into a matrix 1 1 1 Forward eliminate: 1 0 0 1 1 0 0 1 , 0 1 1 0 0 0 . 1

so the rst and second columns are pivot columns. Therefore 1 1 , 1 1 1 0 are a basis for the span. Are there any bases? Proposition 9.12. Every subspace of Rn has a basis. Moreover, any basis v1 , v2 , . . . , vp of a subspace V of Rn lives in a basis v1 , v2 , . . . , vp , w1 , w2 , . . . , wq of Rn . Proof. If V only contains the 0 vector, then we can take no vectors as a basis for V , and let w1 , w2 , . . . , wn be any basis for Rn . On the other hand, if V contains a nonzero vector, then pick as many linearly independent vectors from V as possible. By theorem 9.4 on page 82, we could only pick at most n vectors. They must span V , because otherwise we could pick another one. If V = Rn , then we are nished. Otherwise, pick as many vectors from Rn as possible which are linearly independent of v1 , v2 , . . . , vp . Clearly we stop just when we hit a total of n vectors.

9.5 Dimension
Do all bases look pretty much the same? Theorem 9.13. Any two bases of a subspace have the same number of vectors. Proof. Imagine two bases, say x1 , x2 , . . . , xp and y1 , y2 , . . . , yq , for the same subspace. Forward eliminate x1 x2 ... xp y1 y2 ... yq

88 yielding .

Bases

Each x vector generates a pivot, p pivots in all, straight down the diagonal. Forward eliminate the right hand portion of the matrix, yielding ,

giving at most p pivots because of the zero rows. Each y vector generates a pivot. So there arent more than p of these y vectors. Thus no more y vectors than x vectors. Reversing the roles of x and y vectors, we nd that there cant be more x vectors than y vectors. Problem 9.20. Prove that every subspace of Rn has a basis with at most n vectors. Denition 9.14. The dimension of a subspace is the number of vectors in any basis. Write the dimension of a subspace U as dim U .

9.5 Review Problems


Problem 9.21. Consider the vectors in Rn of the form ei ej (for all possible values of i and j from 1 to n). Find a basis for the subspace they span.

9.6 Summary
A subspace is a at thing passing through 0. A basis for a subspace is a collection of just enough vectors to span the subspace. A change of basis matrix is a basis organized into the columns of a matrix.

9.7. Uniqueness of Reduced Echelon Form

89

9.7 Uniqueness of Reduced Echelon Form


When we carry out elimination, we choose rows to swap. Problem 9.22. Find the simplest matrix A you can with two dierent ways of carrying out forward elimination, with dierent results. Recall that GaussJordan elimination means forward elimination followed by back substitution. The matrix resulting from GaussJordan elimination is said to be in reduced echelon form. Theorem 9.15. The result of GaussJordan elimination does not depend on the choices made of which rows to swap. Proof. Suppose that U and W are two dierent eliminations of the same matrix A, obtained using dierent choices of rows to swap. The rst pivot column is just the rst nonzero column of A. The second pivot column is the earliest column which is linearly independent of the rst pivot column, etc. This is true for A, and doesnt change under forward elimination or back substitution. Therefore A and U and W have the same pivot columns. After elimination, the rst pivot column becomes e1 , the second becomes e2 , etc. So all of the pivot columns of U and W must be identical. Every pivotless column is a linear combination of earlier pivot columns, and the coecients in this linear combination are not aected by GaussJordan elimination. Therefore the pivotless columns of U and W are the same linear combinations of the pivot columns. The pivot columns are the same, so all columns are the same. Problem 9.23. The rank is the number of pivots in the forward elimination. Prove that the rank of a matrix does not depend on which rows you choose when forward eliminating.

10 Kernel and Image


Each matrix A has two important subspaces associated to it: its kernel (the vectors it kills), and its image (the vectors b for which you can solve Ax = b).

10.1 Kernel
Denition 10.1. If A is any matrix, say p q , then the vectors x in Rq for which Ax = 0 (vectors killed by A) form a subspace of Rq called the kernel of A, and written ker A. The kernel is a subspace, because a. 0 belongs to the kernel of any matrix A, since A0 = 0 (everything kills 0). b. If Ax = 0 and Ay = 0, then A(x + y ) = Ax + Ay = 0 (when you kill two vectors, you kill their sum). c. If Ax = 0, then A (ax) = aAx = 0 (when you kill a vector, you kill its multiples). Problem 10.1. If a matrix is wider than it is tall (a short matrix), then its kernel contains nonzero vectors. Problem 10.2. Prove that the kernel of AB contains the kernel of B . Does it have to contain the kernel of A? We will often need to nd kernels of matrices. To rapidly calculate the kernel of a matrix, for example 2 0 1 1 A = 1 1 2 1 3 1 0 1 a. Carry out forward elimination and back substitution. 1 1 1 0 2 2 3 1 0 1 2 2 0 0 0 0 b. Cut out all zero rows. 1 0 0 1 91
1 2 3 2 1 2 1 2

92 c. Change the signs of all entries after each pivot. 1 0 0 1 1 2 3 2 1 2 1 2

Kernel and Image

1 (This corresponds to changing equations like x1 + 1 2 x3 + 2 x4 = 0 to 1 1 x1 = 2 x3 2 x4 . Think of it as moving everything after the pivot over to the right hand side, although we wont actually move anything.) d. Stu in whatever rows from the identity matrix you need into your matrix, so that it ends up with nonzero entries all down the diagonal. 1 1 0 1 2 2 1 3 1 0 2 2 . 0 0 1 0 0 0 0 1

We wont mark the new rows with pivots. Each new row corresponds to setting one of the free variables to 1 and the others to 0. e. Cut out all of the pivot columns. The remaining columns are a basis for the kernel. 1 1 2 2 3 1 2 2 , . 1 0 0 1 Problem 10.3. Apply this algorithm to the matrix 0 1 2 A= 2 2 2 2 0 2 and check that Ax = 0 for each vector x in your resulting basis for the kernel.

Lemma 10.2. The algorithm works, giving a basis for the kernel of any matrix. Proof. Each vector in the kernel is obtained by setting arbitrary values for the free variables, and letting the pivots solve for the other variables. Let v1 be the vector in the kernel which has value 1 for the 1st free variable, and 0 for all other free variables. Similarly, make a vector v2 , v3 , . . . , vs for each free variablesuppose that there are s free variables. The kernel is a subspace, so each linear combination c1 v1 + c2 v2 + + cs vs lies in the kernel. This linear combination has value c1 for the rst free variable, c2 for the second, etc. (just looking at the rows of the free variables). Each

10.1. Kernel

93

vector in the kernel has some values c1 , c2 , . . . , cs for the free variables. So each vector in the kernel is a unique linear combination of v1 , v2 , . . . , vs . Supose we nd a linear relation among v1 , v2 , . . . , vs , say c1 v1 + c2 v2 + + cs vs = 0. Look at the row in which v1 has a 1 and all of the other vectors have 0s: the linear relation gives c1 = 0 in that row. Similarly all of c1 , c2 , . . . , cs must vanish, so there is no linear relation among these vectors. Therefore the vectors v1 , v2 , . . . , vs form a basis for the kernel. Finally, we need to see why these vectors v1 , v2 , . . . , vs are precisely the vectors which come out of our process above. First, look at our example. The reduced echelon form turns back into equations as x1 x2 +1 2 x3 +3 2 x3
1 +2 x4 1 +2

= =

0 0

x4

Solving for pivots means subtracting o: x1 x2 = = 1 2 x3 3 2 x3 1 2 x4


1 2 x4 .

All free variables line up on the right hand side, and we have changed the signs of their coecients. Setting x3 = 1 and x4 = 0, go down the right hand side, killing the x4 entries, and putting x3 = 1 in each x3 entry, i.e. writing down just the entries from the x3 column: 1 2 3 v1 = 2 . 1 0 The general algorithm works in the same way: if we put all free variables on to the right hand side, and then set one free variable to 1 (turn it on) and the others to 0s (turn them o), we can picture this as turning on the column associated to that free variable. Each pivot solves for a pivot variablethe value of that pivot variable is the entry in the corresponding row of the turned on column. Problem 10.4. Give an example of a square matrix whose kernel is not the kernel of its transpose. Problem 10.5. Draw a picture of the kernel for each of A= 1 0 0 1 0 ,B = 0 1 2 0 ,C = 1 ,D = 0 0 0 .

Corollary 10.3. The dimension of the kernel of a matrix is the number of pivotless columns after forward elimination. Remark 10.4. Another way to say it: the dimension of the kernel of a matrix A is the number of free variables in the equation Ax = 0.

94

Kernel and Image

10.1 Review Problems


Problem 10.6. Find a basis for the 1 0 1 0 0 0 0 0 kernel of 0 0 1 0 0 0 2 0 0 1 2 0

Problem 10.7. Find a basis for the 0 1 0 0 0 0 0 0

kernel of 0 1 0 0 0 0 1 0 0 0 0 1

Problem 10.8. Find a basis 1 2 2 1

for the kernel of 1 2 2 0 1 0 0 2 1 0 1 1 1 1 0 2

Problem 10.9. Find a basis for the kernel of 1 1 2 1 1 0 1 1 1

Problem 10.10. Find a basis for the 1 1 1

kernel of 1 1 1 2 0 1

Problem 10.11. Find a basis for the 1 2 1

kernel of 0 1 1 1 1 0

10.2. Image

95

10.2 Image
Denition 10.5. The image of a matrix is the set of vectors y of the form y = Ax for some vector x, written im A. Problem 10.12. Prove that the image of a matrix is the span of its columns.

Example 10.6. The image of

1 A= 0 1

1 1 0

2 2 0

3 3 0

is the span of 1 1 2 3 0 , 1 , 2 , 3 . 1 0 0 0 Problem 10.13. Prove that the equation Ax = b has a solution x just when b lies in the image of A.

10.2 Review Problems


Problem 10.14. Describe the image of the matrix 2 0 A = 0 3 . 0 0

Problem 10.15. (a) Suppose that A is a 2 2 matrix which, when taking a vector x to the vector Ax, takes multiples of e1 to multiples of e1 . Show that A is upper triangular. (b) Similarly, if A is a 3 3 matrix which takes multiples of e1 to multiples of e1 , and takes linear combinations of e1 and e2 to other such linear combinations, then A is upper triangular. (c) Generalize this to Rn , and use it to show that the inverse of an invertible upper triangular matrix is upper triangular. Problem 10.16. Prove that if a matrix is taller than it is wide (a tall matrix), then some vector does not belong to its image.

96 Problem 10.17. Find the dimension of the kernel of A= 0 0 1 1 0 , B= 1 1 , E= 1 1 0 1 0 0 , C= 0 1 0 0 0 0 , 0 1 0

Kernel and Image

D=

1 , F = 1

0 . 1

Problem 10.18. Prove that the kernel of A is the kernel of A . A Problem 10.19. If you know the kernel of a p q matrix A, how do you nd the dimension of the kernel of A A A B = A A A? A A A

10.3 Kernel and Image


Problem 10.20. If A and B are matrices, and there is an invertible matrix C for which B = CA, prove that A and B have the same kernel.

Problem 10.21. If A and B are two matrices, and B = CA for an invertible matrix C , prove that A and B have images of the same dimension.

Theorem 10.7. For any matrix A, dim ker A + dim im A = number of columns. Proof. The image of A is the span of the columns. By lemma 8.6 on page 74, each pivotless column is a linear combination of earlier pivot columns. So the pivot columns span the image. Pivot columns are linearly independent: a basis. Each pivotless column contributes (in our algorithm) to our basis for the kernel. Example 10.8. The matrix 1 A = 1 2 1 1 2 1 1 1 1 1 1

10.4. Summary

97

has echelon form 1 U = 0 0 1 0 0 1 2 0 1 0 1

So columns 1, 3 and 4 of A (not of U ) are a basis for the image of A: 1 1 1 , , 1 1 1 . 1 1 2 The image has dimension 3 because there are 3 pivot columns. The kernel has dimension 1, because there is one pivotless column. The pivotless column is not a basis for the kernel. It just shows you the dimension of the kernel. (In this example, the pivotless column isnt even in the kernel.) Problem 10.22. Find the rank of 0 1 A= 0 0

2 2 1 2

2 0 2 2

2 0 2 2

and explain what this tells you about image and kernel.

10.3 Review Problems


Problem 10.23. Find two matrices A and B , which have dierent images, but for which B = CA for an invertible matrix C . Prove that the images really are dierent, but of the same dimension.

Problem 10.24. Prove that rank A = rank At for any matrix A.

10.4 Summary
a. The kernel of a matrix is the set of vectors it kills. It is large just when linear equations Ax = b with one solution have lots of solutions (measures plurality of solutions when they exist). b. Our algorithm makes a basis for the kernel out of the pivotless columns. c. The image of a matrix is the stu that comes out of itthe vectors b for which you can solve Ax = b (measures existence of solutions). d. The pivot columns are a basis of the image.

98

Kernel and Image

10.4 Review Problems


Problem 10.25. Suppose that Ax = b. Prove that A(x + y ) = b too, just when y lies in the kernel. So the kernel measures the plurality of solutions of equations, while the image measures existence of solutions. Problem 10.26. What is the maximum possible rank of a 4 3 matrix? A 3 5 matrix? Problem 10.27. If a 3 5 matrix A has rank 3 must the equation Ax = b have a solution x? Can it have more than one solution? If it has one solution, must it have innitely many? Problem 10.28. As for the previous question, but with a 5 3 matrix A. Problem 10.29. If A = BC and B is 5 4 and C is 4 5, prove that det A = 0. Problem 10.30. Write down a 2 2 matrix A so that if I choose any vector x with positive entries, then the vector Ax also has positive entries, and lies between (but not on) the horizontal axis and a diagonal line. Problem 10.31. The Fredholm Alternative: for any matrix A and vector b, prove that just one of the following two problems has a solution: (1) Ax = b or (2) At y = 0 with bt y = 0.

Problem 10.32. Prove that the image of AB is contained in the image of A. Problem 10.33. Prove that the rank of AB is never more than the rank of A or of B . Problem 10.34. Prove that the rank of a sum of matrices is never more than the sum of the ranks.

Problem 10.35. Which of the following can change when you carry out forward elimination? a. image, b. kernel, c. dimension of image, d. dimension of kernel?

Problem 10.36. Prove that the rank of AB is no larger than the ranks of A and B .

Eigenvectors

99

11 Eigenvalues and Eigenvectors


In this chapter, we study certain special vectors, called eigenvectors, associated to a square matrix. When a vector x is struck by a matrix A, it becomes a new vector Ax. Usually the new vector is unrelated to the old one. Rarely, the new vector might just be the old one stretched or squished; we will then call x an eigenvector of A. If we have a basis worth of eigenvectors, then the matrix A just squishes or stretches each one, and we can completely recover the matrix if we know the basis of eigenvectors and their eigenvalues.

An eigenvector just gets stretched

11.1 Eigenvalues and Characteristic Polynomial


Denition 11.1. An eigenvector x of a square matrix A is a nonzero vector for which Ax = x for some number , called the eigenvalue of x. The eigenvalue is the factor that the eigenvector gets stretched by. Problem 11.1. For each of the pictures in problem 3.15 on page 27, calculate the eigenvalues of the associated matrix and draw on each face the directions that the eigenvectors point in. Lemma 11.2. A number is an eigenvalue of a square matrix A (which is to say that there is an eigenvector x with that eigenvalue) just when det (A I ) = 0. Proof. Rewrite the equation Ax = x as (A I ) x = 0. Recall that the equation Bx = 0 (with B a square matrix) has a nonzero solution x just when B is not invertible, so just when det B = 0. Lets pick B to be A I ; there is an eigenvector x with eigenvalue just when det (A I ) = 0. Denition 11.3. The expression det (A I ) is called the characteristic polynomial of the matrix A. We can restate the lemma: Lemma 11.4. The eigenvalues of a square matrix A are precisely the roots of its characteristic polynomial. 101

102 Example 11.5. The matrix A= has characteristic polynomial det 2 0 0 3 1 0 0 1 = det 2 0 0 3

Eigenvalues and Eigenvectors

2 0

0 3

= (2 ) (3 ) . So the eigenvalues are = 2 and = 3. Problem 11.2. Prove that the eigenvalues of an upper (or lower) triangular matrix are the diagonal entries. For example: 1 2 3 A = 0 4 5 0 0 6 has eigenvalues = 1, = 4, = 6.

Problem 11.3. Find a 2 2 matrix A whose eigenvalues are not the same as its diagonal entries.

Problem 11.4. Find 2 2 matrices A and B for which A + B has an eigenvalue which is not a sum of some eigenvalue of A with some eigenvalue of B .

Denition 11.6. The set of eigenvalues of a matrix is called its spectrum. Example 11.7. A= 0 1 1 0

has det (A I ) = 2 + 1, which has no roots, so there are no eigenvalues (among real numbers ). Problem 11.5. Why is the characteristic polynomial a polynomial in ?

Problem 11.6. What is the highest order term of the variable in the characteristic polynomial of a matrix A?

11.1. Eigenvalues and Characteristic Polynomial

103

Appendix: Why the Fast Formula is So Slow


We use the slow formula to calculate determinants when we compute out characteristic polynomials. Why not the fast formula? Lets try it on an example. Take 1 1 2 A = 2 3 0 . 1 1 4 Lets hunt down the eigenvalues of A, by computing the characteristic polynomial as before. But this time, lets try the fast formula for the determinant, to nd det (A I ). We apply forward elimination to 1 1 2 A I = 2 3 0 . 1 1 4 to nd 1 2 1 1 3 1 2 0 4

2 1 Add 1 (row 1) to row 2, 1 (row 1) to row 3.

1 0 0 Move the pivot 1 0 0 Add


14 +2 (row

1
4 + 1 1+ 1+
2

2 1 4 ( 1 + ) 2 5 + 2 1+

. 1 1 4 + 2 1 +
1+

2 4 (1 + )
5 + 2 1+

2) to row 3. 1 1 4 + 2 1 + 0 2 4 (1 + )
3

1 0 0

8 2 +15 2 14 +2

104 Move the pivot . 1 0 0

Eigenvalues and Eigenvectors

1 1 4 + 2 1 + 0

2
1

4 (1 + ) 3 8 2 + 15 2 1 4 + 2

The point: at each step, the expressions are rational functions of , accumulating to become more complicated at each step. This is not any faster than the slow process, which gives: det (A I ) = + (1 ) det 2 det 1 1 1 3 3 1 2 4 2 0 0 4

+ 1 det

=2 15 + 8 2 3 . Lets always use the slow process. There is actually a faster method to nd eigenvalues (see section 24.2 on page 230) of large matrices, but it is slower on small matrices, and we wont ever want to work with large matrices.

11.1 Review Problems


Problem 11.7. Find the eigenvalues of the matrices A= 4 2 2 , B= 1 2 1 3 , C= 0 0 0 1 1 , D = 1 1 0 1 1 1 0 0 0

Problem 11.8. Prove that a square matrix A and its transpose At have the same eigenvalues.

Problem 11.9. Prove that det F 1 AF I = det (A I ) for any square matrix A, and any invertible matrix F . So the characteristic polynomial is unchanged by change of basis.

11.1. Eigenvalues and Characteristic Polynomial

105

Problem 11.10. If all of the entries of a square matrix are positive, are its eigenvalues positive? Problem 11.11. Are the eigenvalues of AB equal to those of BA?

Problem 11.12. Give an example of 2 2 matrices A and B for which the eigenvalues of AB are not products of eigenvalues of A with those of B . Problem 11.13. What are the eigenvalues of 1 1 1 A = 1 1 1? 1 1 1 Problem 11.14. The multiplicity of an eigenvalue j is the number of factors of j appearing in the characteristic polynomial. Suppose that the characteristic polynomial of some n n matrix A splits into a product of linear factors. Prove that the determinant of A is the product of its eigenvalues (each taken with multiplicity), by setting = 0 in the characteristic polynomial. Problem 11.15. From the previous exercise, if a 2 2 matrix A has eigenvalues 0 and 1, what is its rank? Problem 11.16. Write out the characteristic polynomial of an n n matrix A as det (A I ) = s0 (A) s1 (A) + s2 (A)2 + + (1)n sn (A)n . a. Find sn (A). b. Prove that s0 (A) = det A. c. Prove that sj (A) is a sum of products of precisely n j entries of A. In particular, sn1 (A) is a polynomial of degree 1 as a function of each entry of A. d. Use this to prove that sn1 (A) = A11 + A22 + + Ann . (This quantity A11 + A22 + + Ann is called the trace of A). e. Prove that sj F 1 AF = sj (A) for any invertible matrix F , so the coecients of the characteristic polynomial are unchanged by change of basis. f. Take a basis u1 , u2 , . . . , un for which the vectors ur+1 , ur+2 , . . . , un form a basis of the kernel, let F be the associated change of basis matrix, and look at F 1 AF . Prove that F 1 AF = P Q 0 0

for some invertible r r matrix P , and some matrix Q.

106

Eigenvalues and Eigenvectors

g. If A has rank r, prove that sk (A) = 0 for k n r. h. Write down two 2 2 matrices of dierent ranks with the same characteristic polynomial.

11.2 Eigenvectors
To nd the eigenvectors of a matrix A: once you have the eigenvalues, pick each eigenvalue , and nd the kernel of A I . Example 11.8. The matrix A= has eigenvalues = 2 and = 3. Lets start with = 2: A I = 2 1 0 1 0 3 0 1 2 1 0 0 1 2 1 0 3

Our algorithm (from section 10.1) for nding the kernel yields a basis 1 1 for the = 2-eigenvectors. Problem 11.17. Do the same for = 3.

Problem 11.18. Find the eigenvectors and eigenvalues of 1 3 0 A = 2 6 0 0 0 4 Example 11.9. Lets put it all together. How do we calculate the eigenvectors of A= 3 0 2 ? 1

11.2. Eigenvectors

107

a. Find the eigenvalues: 0 = det (A I ) = det 3 0 2 1

= (3 ) (1 )

So the eigenvalues are = 3 and = 1. b. Find the eigenvectors: for each eigenvalue , compute a basis for the kernel of A I . A 3I = 33 0 0 0 2 2 2 13

= The kernel of A 3 I has basis

1 . 0 The nonzero linear combinations of this basis are the eigenvectors with eigenvalue = 3. For = 1, AI = 31 0 2 0 2 0 2 11

= The kernel of A I has basis

1 . 1 The nonzero linear combinations of this basis are the = 1-eigenvectors.

11.2 Review Problems


Problem 11.19. Without any calculation, what are the eigenvalues and eigenvectors of 5 0 0 A = 0 6 0 ? 0 0 7

108

Eigenvalues and Eigenvectors

Problem 11.20. Find the eigenvalues and eigenvectors of A= 1 0 1 1

Problem 11.21. Find the eigenvalues and eigenvectors of A= 0 2 3 1

Problem 11.22. Find the eigenvalues and eigenvectors of 2 0 0 A = 2 1 3 2 0 3

Problem 11.23. Prove that every eigenvector of any square matrix A is an eigenvector of A1 , of A2 , of 3A and of A 7 I . How are the eigenvalues related? Problem 11.24. Forward elimination messes up eigenvalues and eigenvectors. Back substitution messes them up further. Give the simplest examples you can. Problem 11.25. What are the eigenvalues and eigenvectors of the permutation matrix of a transposition? Problem 11.26. What are the eigenvalues and eigenvectors of a 2 2 strictly lower triangular matrix?

12 Bases of Eigenvectors
In this chapter, we try (and dont always succeed) to organize eigenvectors into bases.

12.1 Eigenspaces
Denition 12.1. The -eigenspace of a square matrix A is the set of vectors x for which (A I )x = 0 (i.e. the kernel of A I ). The eigenvectors are precisely the nonzero vectors in the eigenspace. In particular, if is not an eigenvalue, then the -eigenspace is just the 0 vector. Problem 12.1. Prove that for any value , the -eigenspace of any square matrix is a subspace.

12.1 Review Problems


Problem 12.2. Suppose that A and B are n n matrices, and AB = BA. Prove that if x is in the -eigenspace of A, then so is Bx.

Figure 12.1: An eigenspace with eigenvalue = 2: anything you draw in that subspace get doubled. 109

110

Bases of Eigenvectors

Figure 12.2: A basis of eigenvectors of a matrix. Each vector starts o as a thickly drawn vector, and gets stretched into the thinly drawn vector. A negative stretching factor reverses the direction of the vector. We can recover the entire matrix A if we know the directions of the basis vectors that are stretched and how much the matrix stretches vectors in each of those directions.

12.2 Bases of Eigenvectors


Diagonal matrices are very easy to work with: 2 3 x1 x2 = 2x1 3x2 ,

each variable simply getting scaled by a factor. The next easiest are matrices that become diagonal when we change variables. Theorem 12.2 (Decoupling Theorem). If u1 , u2 , . . . , un is a basis of Rn , and each of u1 , u2 , . . . , un is an eigenvector of a square matrix A, say Au1 = 1 u1 , Au2 = 2 u2 , . . . , Aun = n un , then 1 2 1 , F AF = .. . n where F = u1 u2 ... un is the change of basis matrix of the basis u1 , u2 , . . . , un . Denition 12.3. We say that the matrix F (or the basis u1 , u2 , . . . , un ) diagonalizes the matrix A. Remark 12.4. We call this the decoupling theorem, because the transformation taking x to Ax is usually very complicated, mixing up the variables x1 , x2 , . . . , xn in a tangled mess. But if we can somehow change the variables and make A into a diagonal matrix, then each of the new variables is just being stretched or squished by a factor i , independently of any of the other variables, so the variables appear decoupled from one another.

12.2. Bases of Eigenvectors

111

Proof. F takes es to us, A scales the us, and then F 1 turns the scaled us back into es. So F 1 AF ej = j ej , giving the j -th column of F 1 AF . So 1 2 . F 1 AF = .. . n

Problem 12.3. Diagonalize 1 A = 0 2 0 2 0 0 0 . 3

We save a lot of time if we notice that: Theorem 12.5. Eigenvectors with dierent eigenvalues are linearly independent. Remark 12.6. This saves us time because we dont have to check to see if the eigenvectors we come up with are linearly independent, since we generate a basis for each eigenspace, and there are no relations between eigenspaces. Proof. Take a square matrix A. Pick some eigenvectors, say x1 with eigenvalue 1 , x2 with eigenvalue 2 , etc., up to some xp . Suppose that all of these eigenvalues 1 , 2 , . . . , p are dierent from one another. If we found a linear relation c1 x1 = 0 involving just one vector x1 , we would divide by c1 to see that x1 = 0. But x1 = 0 (being an eigenvector), so there is no linear relation involving just one eigenvector. Lets suppose we found a linear relation involving just two eigenvectors, x1 and x2 , like c1 x1 + c2 x2 = 0. We could just replace x1 by c1 x1 and x2 by c2 x2 to arrange a linear relation x1 + x2 = 0. Since x1 is an eigenvector with eigenvalue 1 , we know that (A 1 I ) x1 = 0. Apply A 1 I to both sides of our relation to get (2 1 ) x2 = 0. Since the eigenvalues are distinct, we can divide by 2 1 to get x2 = 0, again a contradiction. So there are no linear relations involving just two eigenvectors. Lets imagine a linear relation c1 x1 + c2 x2 + + cp xp = 0, involving any number of eigenvectors, and see why that leads us into a contradiction. If any of the terms are 0, just drop them, so we can assume that there are no 0 terms, i.e. that all coecients c1 , c2 , . . . , cp are nonzero. So we can rescale, replacing x1 by c1 x1 , etc., to arrange that our relation is now x1 + x2 + + xp = 0.

112 Applying A 1 I to our linear relation: 0 = (A 1 I ) (x1 + x2 + + xp ) = (2 1 ) x2 + + (p 1 ) xp

Bases of Eigenvectors

a linear relation with fewer terms. Since 1 = 2 , the coecient of x2 wont become 0 in the new linear relation unless it was already 0, so this new linear relation still has nonzero terms. In this way, each linear relation leads to a linear relation with fewer terms, until we get down to one or two terms, which we already saw cant happen. Therefore there are no linear relations among x1 , x2 , . . . , xn . Problem 12.4. Diagonalize 1 A= 2 3 2 2 6 2 2 . 6

Problem 12.5. Diagonalize 1 A = 3 6 3 5 6 3 3 . 4

Problem 12.6. Prove that a square matrix is diagonalizable (i.e. diagonalized by some matrix) just when it has a basis of eigenvectors.

Remark 12.7. Following remark 9.7 on page 84, F diagonalizes A just when the change of coordinates y = F 1 x changes the matrix A into a diagonal matrix. Problem 12.7. Give an example of a matrix which is not diagonalizable.

Problem 12.8. If A is diagonalized by F , say F 1 AF = diagonal, then prove that A2 is also diagonalized by F . Apply induction to prove that all powers of A are diagonalized by F . Problem 12.9. Use the result of the previous exercise to compute A100000 where 3 2 A= . 4 3

12.3. Summary

113

12.2 Review Problems


Problem 12.10. Find the eigenvalues and eigenvectors of 1 A= 2 . 3 Problem 12.11. Find the matrix A which has eigenvalues 1 and 3 and corresponding eigenvectors 2 4 , . 4 2 Problem 12.12. Imagine that a quarter of all people who are healthy become sick each month, and a quarter die. Imagine that a quarter of all people who are sick die each month, and a quarter become healthy. What happens to the dead people? Write a matrix A to show how the numbers hn , sn , dn of healthy, sick and dead people change from month n to month n + 1. Diagonalize A. What happens to the population in the long run? Prove your answer. (Keep in mind that no one is being born in this story.)

Problem 12.13. Let A be the 5 5 matrix all of whose entries are 1. a. Without any calculation, what is the kernel of A? b. Use this to diagonalize A. Problem 12.14. Lets investigate which 2 2 matrices are diagonalizable. a. Prove that every 2 2 matrix A can be written uniquely as A= p+q rs r+s pq

b. c. d. e.

for some numbers p, q, r, s. 2 Prove that the characteristic polynomial of A is (p ) + s2 q 2 r2 . Prove that A has two dierent eigenvalues just when q 2 + r2 > s2 . Prove that any 2 2 matrix with two dierent eigenvalues is diagonalizable. Prove that any 2 2 matrix with only one eigenvalue is diagonalizable just when it is diagonal. f. Prove that any 2 2 matrix with no eigenvalues is not diagonalizable.

12.3 Summary
Linear algebra has two problems: a. Solving linear equations Ax = b for the unknown x. This problem is truly linear. It has a solution x whenever b lies in the image, and the solution x is unique up to adding on vectors from the kernel.

114

Bases of Eigenvectors

b. Find eigenvectors and eigenvalues Ax = x. This problem is nonlinear, in fact quadratic, since and x are multiplied by one another. The nonlinear part is nding the eigenvalues , which are the roots of the characteristic polynomial det (A I ). There is an eigenspace of solutions x for each , and nding a basis of each eigenspace is a linear problem. If we get lucky (which doesnt always happen ), then the eigenvectors might form a basis of Rn , diagonalizing A. Table 12.1: Invertibility criteria (Strangs nutshell [15]). A is n n. U is any matrix obtained from A by forward elimination. 5.1 5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 7.4 7.4 9.2 9.2 9.2 10.2 10.3 11.1 Invertible Just When . . . GaussJordan on A yields 1. U is invertible. Pivots lie all the way down the diagonal. U has no zero rows U has n pivots. Ax = b has a solution x for each b. Ax = b has exactly one solution x for each b. Ax = b has exactly one solution x for some b. Ax = 0 only for x = 0. A has rank n. At is invertible. det A = 0. The columns are linearly independent. The columns form a basis. The rows form a basis. The kernel of A is just the 0 vector. The image of A is all of Rn . 0 is not an eigenvalue of A.

Problem 12.15. Take each of the criteria in table 12.1, and describe an analogous criterion for showing that A is not invertible. For example, instead of det A = 0, you would write det A = 0. Make sure that as many as possible of your criteria express the failure of invertibility in terms of the rank r of the matrix A. For example, instead of turning U has no zero rows into U has a zero row,

12.3. Summary

115

you should turn it into U has n r zero rows.

Orthogonal Linear Algebra

117

13 Inner Product
So far, we havent thought about distances or angles. The elegant algebraic way to describe these geometric notions is in terms of the inner product, which measures something like how strongly in agreement two vectors are.

13.1 Denition and Simplest Properties


Denition 13.1. The inner product (also called the dot product or scalar product ) of two vectors x and y in Rn is the number x, y = x1 y1 + x2 y2 + + xn yn . Example 13.2. 1 4 x = 0 , y = 5 , 2 6 have inner product x, y = (1)(4) + (0)(5) + (2)(6) = 16. Example 13.3. ei , ej 1 = 0 if i = j, if i = j.

Problem 13.1. Prove that Aej , ei = Aij . At ij Recall that the transpose At of a matrix A is the matrix with entries = Aji , i.e. with rows and columns switched.

Problem 13.2. Prove that x, y = xt y.

Problem 13.3. Let P be a permutation matrix. Use the result of problem 13.1 to prove that P 1 = P t . 119

120

Inner Product

a2 + b2 b

Figure 13.1: The Pythagorean theorem. Rearrange the 4 triangles into 2 rectangles to nd the area of all 4 triangles. Add the area of the small white square.

Denition 13.4. Vectors u and v are perpendicular if u, v = 0. Denition 13.5. The length of a vector x in Rn is x = x, x .

a This agrees in the plane with the Pythagorean theorem: if x = then b we can draw x as a point of the plane, and the length along x is a2 + b2 . Problem 13.4. Prove that for any vectors u and v , with u = 0, the vector v is perpendicular to u. v, u u u, u

13.1 Review Problems


Problem 13.5. How many vectors x in Rn have integer coordinates x1 , x2 , . . . , xn and have (a) x = 0? (b) x = 1? (c) x = 2? (d) x = 3?

Problem 13.6. (Due to Ian Christie) a. What is wrong with the clock?

13.2. Symmetric Matrices

121

b. At what times of day are the minute and hour hands of a properly functioning clock a) perpendicular? b) parallel? (The answer isnt very pretty.)

Problem 13.7. Prove that Ax, y = x, At y for vectors x in Rq , y in Rp and A any p q matrix.

13.2 Symmetric Matrices


Denition 13.6. A matrix A is symmetric if At = A. Problem 13.8. Which of the following are symmetric? 1 2 2 , 1 1 0 0 1 , 1 0 2 1 3 4 2 4 5

Problem 13.9. Prove that an n n matrix is symmetric just when Ax, y = x, Ay for x and y any vectors in Rn . Clearly a symmetric matrix is square. Problem 13.10. Prove that a. The sum and dierence of symmetric matrices is symmetric. b. If A is a symmetric matrix, then 3A is also a symmetric matrix. Problem 13.11. Give an example of a pair of symmetric 2 2 matrices A and B for which AB is not symmetric.

122

Inner Product

13.2 Review Problems


Problem 13.12. For which matrices A is the matrix B= symmetric? 1 A A 1

Problem 13.13. If A is symmetric, and F an invertible matrix, is F AF 1 symmetric? If not, can you give a 2 2 example? Problem 13.14. If A and B are symmetric, is AB + BA symmetric? Is AB BA symmetric?

13.3 Orthogonal Matrices


Denition 13.7. A matrix F is orthogonal if F t F = 1. Example 13.8. In problem 13.3, you proved that permutation matrices are orthogonal. Problem 13.15. Which of the following are orthogonal? 1 0 1 0 1 , 1 0 , 1 1 0 0 , 1
1 2 1 2

2 0 1 1

0 , 1 1 , 1

2 0 0 1

0 , 2 1 0

1 2 1 2

Orthogonal matrices are important because they preserve inner products: Problem 13.16. Prove that a matrix is orthogonal just when F x, F y = x, y for x and y any vectors. Clearly any orthogonal matrix is square. Problem 13.17. Prove that a. The product of orthogonal matrices is orthogonal. b. The inverse of an orthogonal matrix is orthogonal. Problem 13.18. If F is orthogonal, and c is a real number, prove that cF is also orthogonal only when c = 1. Problem 13.19. Which diagonal matrices are orthogonal?

13.4. Orthonormal Bases

123

Problem 13.20. Prove that the matrices P = cos sin sin cos

are orthogonal. Give an example of an orthogonal 2 2 matrix not of this form.

Problem 13.21. Give an example of a pair of orthogonal 2 2 matrices A and B for which A + B is not orthogonal.

Problem 13.22. By expanding out the expressions x + y, x + y using the properties of inner products, express the inner product x, y of two vectors in terms of their lengths. Use this to prove that a matrix A is orthogonal just when Ax = x , for any vector x.

13.4 Orthonormal Bases


Some bases are much easier to use than others. Denition 13.9. A basis u1 , u2 , . . . , un is orthonormal if 1 0 if i = j, if i = j.

ui , uj =

Example 13.10. The standard basis is orthonormal. Example 13.11. The basis

u1 = is orthonormal.

3 2 1 2

, u2 =

1 2 3 2

124

Inner Product

Why Are Orthonormal Bases Better Than Other Bases?


Take any basis u1 , u2 , . . . , un for Rn . Every vector x in Rn can be written as a linear combination x = c1 u1 + c2 u2 + + cn un . How do you nd the coecients c1 , c2 , . . . , cn ? You apply elimination to the matrix u1 u2 ... un x . This is a big job. But if the basis is orthonormal then you can just read o the coecients as c1 = x, u1 , c2 = x, u2 , . . . , cn = x, un . Problem 13.23. Prove that if u1 , u2 , . . . , un is an orthonormal basis for Rn and x is any vector in Rn then x = x, u1 u1 + x, u2 u2 + + x, un un .

How Do We Tell If a Basis is Orthonormal?


Proposition 13.12. A square matrix is orthogonal just when its columns are orthonormal. Proof. Write the matrix as F = u1 Calculate u1 t t u2 t u1 F F = . . . un t u1 t u1 t u2 u1 = . . . u2 ... un .

u2

...

un

u1 t u2 u2 t u2 . . . un t u2 u1 , u2 u2 , u2 . . . un , u2

... ... . . . ... ... ... . . . ...

un t u1 u1 , u1 u2 , u1 = . . . un , u1

un t un

u1 t un u2 t un . . . u1 , un u2 , un . . . un , un .

13.5. GramSchmidt Orthogonalization

125

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure 13.2: The GramSchmidt process

Problem 13.24. Is the basis 1 , 2 2 1

orthonormal? Draw a picture of these two vectors. Problem 13.25. Prove that a square matrix is orthogonal just when its rows are orthonormal.

13.5 GramSchmidt Orthogonalization


The idea: if I start with a basis v1 , v2 of R2 which is not orthonormal, I can x it up (as in gure 13.2).

126

Inner Product

The formal denition: given any linearly independent vectors v1 , v2 , . . . , vp (as input), the output are orthonormal vectors u1 , u2 , . . . , up : w 1 = v1 , v2 , w1 w1 , w1 , w1 v3 , w2 v3 , w1 w1 w2 , w 3 = v3 w1 , w1 w2 , w2 vj , wi wj = vj wi , wi , wi i<j w 2 = v2 uj = 1 wj . wj , wj

Each wj is just vj with all parts pulled o that head in the directions of previous wi s (the directions we are already nished with). At the nal step, each uj is just wj rescaled to unit length. We say that we are orthogonalizing the vectors v1 , v2 , . . . , vp . Problem 13.26. Orthogonalize

1 2 v1 = 1 , v2 = 0 . 0 2

Problem 13.27. Orthogonalize 1 1 0 v1 = 1 , v2 = 0 , v3 = 0 . 0 1 1 Problem 13.28. Orthogonalize v1 = 1 , v2 = 1 1 , 0

and then draw pictures explaining the process.

Problem 13.29. Orthogonalize 1 2 0 v1 = 1 , v2 = 2 , v3 = 0 . 2 0 1

13.5. GramSchmidt Orthogonalization

127

Problem 13.30. Prove that if v1 , v2 , . . . , vp are linearly independent vectors, then each step of GramSchmidt makes sense (no dividing by zero), and the resulting u1 , u2 , . . . , up are an orthonormal basis for the span of v1 , v2 , . . . , vp .

Problem 13.31. Prove that any set of vectors, all of unit length, and perpendicular to one another, is contained in an orthonormal basis. Problem 13.32. If u and v are two vectors in Rn , and every vector w which is perpendicular to u is perpendicular to v , then v = au for some number a.

13.5 Review Problems


Problem 13.33. Orthogonalize 1 5 , . 1 3

Problem 13.34. Orthogonalize 1 0 , . 1 2

Problem 13.35. Orthogonalize 1 1 , . 1 2

Problem 13.36. Orthogonalize 2 1 , . 1 1

Problem 13.37. Orthogonalize 1 0 , . 2 2

128 Problem 13.38. Orthogonalize 2 1 . , 0 1

Inner Product

Problem 13.39. Orthogonalize 1 0 , . 1 2

Problem 13.40. Orthogonalize 1 1 , . 1 2

Problem 13.41. Orthogonalize 0 1 , . 1 1

Problem 13.42. Orthogonalize 2 1 , . 1 2

Problem 13.43. Orthogonalize 1 1 , . 0 1

Problem 13.44. Orthogonalize 2 0 , . 1 1

Problem 13.45. Orthogonalize 1 1 , . 1 0

13.5. GramSchmidt Orthogonalization

129

Problem 13.46. Orthogonalize 1 1 , . 2 0 Problem 13.47. Orthogonalize 0 1 . , 1 1 Problem 13.48. Orthogonalize 1 1 , . 2 1 Problem 13.49. Orthogonalize 0 1 , . 1 2 Problem 13.50. Orthogonalize 1 0 , . 1 2 Problem 13.51. Orthogonalize 1 2 , . 1 1 Problem 13.52. Orthogonalize 1 1 , . 1 0 Problem 13.53. Orthogonalize 1 2 , . 2 2

130

Inner Product

Problem 13.54. Orthogonalize 2 1 , . 2 2 Problem 13.55. Orthogonalize 0 1 . , 1 0 Problem 13.56. Orthogonalize 1 1 , . 1 0 Problem 13.57. Orthogonalize 1 2 , . 1 0 Problem 13.58. Orthogonalize 1 2 , . 1 1 Problem 13.59. Orthogonalize 0 1 , . 2 1 Problem 13.60. Orthogonalize 1 2 , . 1 1 Problem 13.61. Orthogonalize 2 1 , . 1 2

13.5. GramSchmidt Orthogonalization

131

Problem 13.62. Orthogonalize 1 1 , . 2 1 Problem 13.63. Orthogonalize 2 2 . , 0 1 Problem 13.64. Orthogonalize 1 2 , . 1 1 Problem 13.65. Orthogonalize 1 0 , . 1 1 Problem 13.66. Orthogonalize 1 2 , . 1 0 Problem 13.67. Orthogonalize 2 0 , . 0 2 Problem 13.68. Orthogonalize 0 1 , . 1 1 Problem 13.69. Orthogonalize 1 2 , . 0 2

132

Inner Product

Problem 13.70. Orthogonalize 0 2 , . 2 1 Problem 13.71. Orthogonalize 2 0 . , 0 2 Problem 13.72. Orthogonalize 2 1 , . 2 1 Problem 13.73. Orthogonalize 0 1 , . 2 1 Problem 13.74. Orthogonalize 2 1 , . 0 1 Problem 13.75. Orthogonalize 1 0 , . 0 1 Problem 13.76. Orthogonalize 2 0 , . 1 1 Problem 13.77. Orthogonalize 2 2 , . 0 1

13.5. GramSchmidt Orthogonalization

133

Problem 13.78. Orthogonalize 1 2 . , 1 2

Problem 13.79. Orthogonalize 2 2 . , 1 0

Problem 13.80. Orthogonalize 1 0 , . 2 1

Problem 13.81. Orthogonalize 1 1 , . 2 0

Problem 13.82. Orthogonalize 2 0 , . 2 1

Problem 13.83. Orthogonalize 1 1 , . 1 1

Problem 13.84. What happens to a basis when you carry out GramSchmidt, if it was already orthonormal to begin with?

14 The Spectral Theorem


Finally, our goal: we want to prove that symmetric matrices can be made into diagonal matrices by orthogonal changes of variable.

14.1 Statement and Proof


Proposition 14.1 (The Minimum Principle). Let A be a symmetric n n matrix. The function Q(x) = Ax, x is a quadratic polynomial function. Restrict x to lie on the sphere of unit length vectors. Then Q(x) reaches a minimum among all vectors on that sphere at some vector x = u. This vector u is an eigenvector of A. Proof. In the appendix to this chapter, we prove that the minimum occurs. So there is some vector x = u so that Ax, x Au, u for any x of unit length. Fixing u, consider the quadratic function H (x) = Ax, x Au, u For x of unit length, x, x = 1, so H (x) = Ax, x Au, u 0. But if we scale x, say to ax, clearly H (x) is quadratic in x, so H (ax) = a2 H (x). So rescaling, we nd that H (x) 0 for any vector x, of any length. Pick w any vector perpendicular to u. For any number t: 0 H (u + tw) = A(u + tw), u + tw Au, u u + tw, u + tw = Au, u + 2t Au, w + t2 Aw, w Au, u = 2t Au, w + t2 H (w) = t (2 Au, w + tH (w)) . 135 1 + t2 w, w x, x .

136

The Spectral Theorem

First lets try t positive, so we can divide by t, and nd 0 2 Au, w + tH (w). Let t to go to zero, to see that 0 Au, w . Next try t negative, and divide by t and then let t go to zero, and see that 0 Au, w . Therefore Au, w = 0. So every vector w perpendicular to u is also perpendicular to Au. By problem 13.32 on page 127, Au is a multiple of u, so u is an eigenvector. Problem 14.1. If two eigenvectors of a symmetric matrix have dierent eigenvalues, prove that they are perpendicular.

Theorem 14.2 (Spectral Theorem). Each symmetric matrix A is diagonalized by an orthogonal matrix F . The columns of F form an orthonormal basis of eigenvectors. We say that F orthogonally diagonalizes A. Proof. Start with a unit eigenvector u1 , given by the minimum principle. Take any orthonormal basis u1 , u2 , . . . , un that starts with this vector, and let F be the matrix with these vectors as columns. Replace A with F t AF . After replacement, A has e1 as eigenvector: Ae1 = e1 , so the rst column of A is e1 . Because A is symmetric, we see that A= I 0 0 B

with B a smaller symmetric matrix. By induction on the size of matrix, we can orthogonally diagonalize B . The previous exercise is vital for calculations: to orthogonally diagonalize, nd all eigenvalues, and for each eigenvalue nd an orthonormal basis u1 , u2 , . . . of eigenvectors of that eigenvalue . All of the eigenvectors of all of the other eigenvalues will automatically be perpendicular to u1 , u2 , . . . , so put together they make an orthonormal basis of Rn . Problem 14.2. Find a matrix F which orthogonally diagonalizes the matrix A= 7 6 6 , 12

by nding the eigenvectors u1 , u2 and eigenvalues 1 , 2 .

Problem 14.3. Prove that a square matrix is orthogonally diagonalizable just when it is symmetric. Remark 14.3. An elegant (but longer) proof of the spectral theorem can be made along the following lines. Once we have used the minimum principle to nd one eigenvector u1 , we can then look among all unit length vectors x perpendicular

14.1. Statement and Proof

137

to u1 , and see which of these vectors has smallest value for Ax, x . Call that vector u2 . Look among all unit vectors x which are perpendicular to both u1 and u2 for one which has the smallest value of Ax, x , and call it u3 , etc. This recipe will actually generate the eigenvectors for us, although it isnt easy to use either by hand or by computer.

14.1 Review Problems


Problem 14.4. Find a matrix F which orthogonally diagonalizes the matrix A= 4 2 2 1

Problem 14.5. Find a matrix F which orthogonally diagonalizes the matrix 4 2 0 A = 2 1 0 . 0 0 3 Problem 14.6. Find a matrix F which orthogonally diagonalizes the matrix 7 1 6 1 6 3 7 1 A = 1 6 6 3 1 1 5 3 3 3 Problem 14.7. Find a matrix F which orthogonally diagonalizes 7 24 25 0 25 A= 0 1 0 24 7 0 25 25 Problem 14.8. Let

1 A=

2 3

What are all of the orthogonal matrices F for which F t AF is diagonal with entries increasing as we move down the diagonal?

Problem 14.9. If A is symmetric, prove that A2 has the same rank as A. Problem 14.10. If A is n n, prove that At A has no negative eigenvalues, and its eigenvalues are all positive just when A is invertible.

138

The Spectral Theorem

14.2 Quadratic Forms


Denition 14.4. A quadratic form is a polynomial in several variables, with all terms being quadratic (like x2 or xy ). In particular, no linear or constant terms can appear in a quadratic form.

Symmetrizing
If we have a quadratic form in variables x1 and x2 , we can write it more symmetrically; for example: x1 x2 = 1 1 x1 x2 + x2 x1 . 2 2

We just leave alone a term like x2 1 , so for example


2 x2 1 + 8 x1 x2 = x1 + 4 x1 x2 + 4 x2 x1 .

Problem 14.11. Symmetrize: a. x2 2 2 b. x2 1 + x2 2 c. x1 + 3 x1 x2 d. x1 (x1 + x2 )

Making a matrix
Pluck out the quadratic terms in the polynomial to make a matrix. For example:
2 2 a x2 1 + b x1 x2 + c x2 = a x1 +

b b x1 x2 + x2 x1 + c x2 2 2 2
b 2

becomes A= a
b 2

More generally, ij Aij xi xj becomes A = (Aij ). Because we symmetrized, the matrix is symmetric. Problem 14.12. Make matrices for a. x2 2 2 b. x2 1 + x2 3 2 c. x1 + 2 x1 x2 + 3 2 x2 x1 1 1 d. x2 + x x + 1 2 1 2 2 x2 x1

14.2. Quadratic Forms

139

Diagonalizing
Diagonalize our matrix, by orthogonal change of variables. Then the same orthogonal change of variables will simplify our quadratic form, turning it into a sum of quadratic forms in one variable each. For example, take the quadratic form 2 23 x2 1 + 72 x1 x2 + 2 x2 . Symmetrize:
2 23 x2 1 + 36 x1 x2 + 36 x2 x1 + 2 x2 .

The associated matrix is A= 23 36 36 . 2

We let the reader check that A is orthogonally diagonalized by F = so that F t AF = 25 0 0 . 50


3 5 4 5 4 5 3 5

We also let the reader check that if we take new variables y , dened by y = F t x, i.e. by x = F y , then the same quadratic form is
2 2 25 y1 + 50 y2 .

Theorem 14.5 (Decoupling Theorem). Any quadratic form in any number of variables becomes a sum of quadratic forms in one variable each, after a change of variables x = F y given by an orthogonal matrix F . Remark 14.6. We will say that the quadratic form is diagonalized by the orthogonal matrix. Proof. The problem comes from the mixed terms, like x1 x2 . Symmetrize and write a symmetric matrix A out of the coecients. Then the quadratic form is ij Aij xi xj = Ax, x . Diagonalize A to = F t AF . Let y = F t x. Then x = F y , so Ax, x = AF y, F y = F t AF y, y = y, y
2 2 = 1 y1 + + n yn .

140

The Spectral Theorem

14.2 Review Problems


Problem 14.13. Diagonalize (a) 3 x1 2 2 x1 x2 + 3 x2 2 (b) 9 x1 2 + 18 x1 x2 + 9 x2 2 (c) 11 x1 2 + 6 x1 x2 + 3 x2 2 (d) 3 x1 2 + 4 x1 x2 + 6 x2 2 (e) 8 x1 2 12 x1 x2 + 3 x2 2 (f) 10 x1 2 + 6 x1 x2 + 2 x2 2 Problem 14.14. Diagonalize (a) 4 x1 x2 + 3 x2 2 (b) 6 x1 2 12 x1 x2 + 11 x2 2 (c) 10 x1 2 12 x1 x2 + 5 x2 2 (d) 2 x1 2 4 x1 x2 + x2 2 (e) 2 x1 2 6 x1 x2 + 10 x2 2 (f) 2 x1 2 + 6 x1 x2 + 6 x2 2 Problem 14.15. Diagonalize (a) 10 x1 2 + 12 x1 x2 + 5 x2 2 (b) 10 x1 2 18 x1 x2 + 10 x2 2 (c) 11 x1 2 + 12 x1 x2 + 6 x2 2 (d) 2 x1 2 + 4 x1 x2 + x2 2 (e) x1 2 4 x1 x2 + 4 x2 2 (f) 8 x1 2 + 18 x1 x2 + 8 x2 2 Problem 14.16. Diagonalize (a) 4 x1 2 4 x1 x2 + x2 2 (b) 11 x1 2 + 18 x1 x2 + 11 x2 2 (c) 9 x1 2 + 12 x1 x2 + 4 x2 2 (d) x1 2 2 x1 x2 + x2 2 (e) 3 x1 2 + 4 x1 x2 (f) 4 x1 2 + 8 x1 x2 + 4 x2 2 Problem 14.17. Diagonalize (a) x1 2 + 4 x1 x2 2 x2 2 (b) x1 2 12 x1 x2 + 6 x2 2 (c) x1 2 6 x1 x2 + 7 x2 2 (d) x1 2 + 2 x1 x2 x2 2 (e) 6 x1 2 18 x1 x2 + 6 x2 2 (f) 8 x1 2 + 12 x1 x2 + 3 x2 2 Problem 14.18. Diagonalize (a) x1 2 8 x1 x2 + x2 2 (b) 6 x1 2 8 x1 x2 + 6 x2 2 (c) 2 x1 2 + 4 x1 x2 + 5 x2 2 (d) 3 x1 2 12 x1 x2 + 8 x2 2 (e) x1 2 4 x1 x2 2 x2 2 (f) 2 x1 2 6 x1 x2 + 6 x2 2

14.2. Quadratic Forms

141

Problem 14.19. Diagonalize (a) 7 x1 2 + 12 x1 x2 + 2 x2 2 (b) 2 x1 2 4 x1 x2 x2 2 (c) 6 x1 2 + 18 x1 x2 + 6 x2 2 (d) 3 x1 2 + 6 x1 x2 + 11 x2 2 (e) 9 x1 2 + 6 x1 x2 + x2 2 (f) 8 x1 2 6 x1 x2 Problem 14.20. Diagonalize (a) 8 x1 2 + 18 x1 x2 + 8 x2 2 (b) 7 x1 2 6 x1 x2 x2 2 (c) 11 x1 2 18 x1 x2 + 11 x2 2 (d) 9 x1 2 6 x1 x2 + x2 2 (e) 11 x1 2 12 x1 x2 + 6 x2 2 (f) 5 x1 2 + 12 x1 x2 + 10 x2 2 Problem 14.21. Diagonalize (a) 4 x1 2 + 12 x1 x2 + 9 x2 2 (b) 6 x1 2 6 x1 x2 2 x2 2 (c) 2 x1 x2 (d) 2 x1 2 + 8 x1 x2 + 2 x2 2 (e) 3 x1 2 6 x1 x2 + 11 x2 2 (f) 7 x1 2 + 18 x1 x2 + 7 x2 2 Problem 14.22. Diagonalize (a) 3 x1 2 4 x1 x2 (b) x1 2 + 6 x1 x2 + 7 x2 2 (c) 3 x1 2 + 12 x1 x2 + 8 x2 2 (d) 2 x1 2 2 x1 x2 2 x2 2 (e) 6 x1 x2 + 8 x2 2 (f) 3 x1 2 + 8 x1 x2 + 3 x2 2 Problem 14.23. Diagonalize (a) x1 2 + 2 x1 x2 + x2 2 (b) 7 x1 2 18 x1 x2 + 7 x2 2 (c) 6 x1 2 + 12 x1 x2 + 11 x2 2 (d) 2 x1 2 8 x1 x2 + 2 x2 2 (e) 5 x1 2 + 4 x1 x2 + 2 x2 2 (f) 2 x1 2 + 4 x1 x2 x2 2 Problem 14.24. Diagonalize (a) 4 x1 2 8 x1 x2 + 4 x2 2 (b) 11 x1 2 6 x1 x2 + 3 x2 2 (c) x1 2 6 x1 x2 + 9 x2 2 (d) 2 x1 2 + 6 x1 x2 + 10 x2 2 (e) 4 x1 2 + 4 x1 x2 + x2 2 (f) x1 2 + 12 x1 x2 + 6 x2 2 Problem 14.25. Diagonalize

142 (a) (b) (c) (d) (e) (f) 4 x1 x2 + 3 x2 2 6 x1 2 4 x1 x2 + 3 x2 2 6 x1 2 + 6 x1 x2 2 x2 2 10 x1 2 6 x1 x2 + 2 x2 2 x1 2 + 6 x1 x2 + 9 x2 2 x1 2 4 x1 x2 + 2 x2 2

The Spectral Theorem

Problem 14.26. Diagonalize (a) x1 2 + 4 x1 x2 + 2 x2 2 (b) 2 x1 2 4 x1 x2 + 5 x2 2 (c) 7 x1 2 + 6 x1 x2 x2 2 (d) 8 x1 2 18 x1 x2 + 8 x2 2 (e) 3 x1 2 + 2 x1 x2 + 3 x2 2 (f) 9 x1 2 12 x1 x2 + 4 x2 2 Problem 14.27. Diagonalize (a) 5 x1 2 + 8 x1 x2 + 5 x2 2 (b) 6 x1 2 + 12 x1 x2 + x2 2 (c) 3 x1 2 8 x1 x2 + 3 x2 2 (d) 2 x1 x2 (e) x1 2 + 4 x1 x2 + 4 x2 2 (f) 9 x1 2 18 x1 x2 + 9 x2 2 Problem 14.28. Diagonalize (a) 7 x1 2 + 6 x1 x2 x2 2 (b) 2 x1 2 2 x1 x2 + 2 x2 2 (c) 2 x1 2 + 2 x1 x2 2 x2 2 (d) 8 x1 2 + 6 x1 x2 (e) 5 x1 2 12 x1 x2 + 10 x2 2 (f) 6 x1 2 12 x1 x2 + x2 2 Problem 14.29. Diagonalize (a) 2 x1 2 + 2 x1 x2 + 2 x2 2 (b) 6 x1 x2 + 8 x2 2 (c) x1 2 + 8 x1 x2 + x2 2 (d) 5 x1 2 8 x1 x2 + 5 x2 2 (e) 6 x1 2 + 4 x1 x2 + 3 x2 2 (f) 6 x1 2 + 8 x1 x2 + 6 x2 2 Problem 14.30. Diagonalize (a) 4 x1 2 12 x1 x2 + 9 x2 2 (b) 3 x1 2 4 x1 x2 + 6 x2 2 (c) 2 x1 2 12 x1 x2 + 7 x2 2 (d) 5 x1 2 4 x1 x2 + 2 x2 2 (e) 10 x1 2 + 18 x1 x2 + 10 x2 2 (f) 2 x1 2 + 12 x1 x2 + 7 x2 2

14.3. Application to Quadratic Equations

143

(a) (b) Circle Ellipse

(c) Pair of lines (d) Hyperbola

Figure 14.1: Some examples of solutions of quadratic equations

14.3 Application to Quadratic Equations


In the plane, with two variables x1 and x2 , a quadratic equation Q(x) = c (with Q(x) a quadratic form and c a constant number) cuts out a circle, ellipse, hyperbola, pair of lines, single line, point, or empty set. Example 14.7. The quadratic equation 4 x1 x2 + 3 x2 2 =0 involves the quadratic form with matrix 0 2 2 . 3

Its eigenvalues are = 1 and = 4. So we can change variables (somehow) to get to 2 x 2 1 + 4 x2 = 0. This is just x1 = 2 x2 , a pair of lines intersecting at a point. Since the change of variables is linear, the original quadratic equation also cuts out a pair of lines intersecting at a point. Example 14.8. The equation
2 x2 1 + 4 x1 x2 + x2 = 1

contains the quadratic form with matrix 1 2 2 . 1

The eigenvalues are = 1 and = 3. So after a linear change of variables, we get 2 x 2 1 + 3 x2 = 1. (The right hand side is a constant, so doesnt change.)

144 It is well known that an equation of the form


2 a x2 1 + b x2 = 1

The Spectral Theorem

with a and b of dierent signs is a hyperbola, while if a and b have the same signs then it is an ellipse. So our last example must be a hyperbola. Warning: until you diagonalize the associated matrix, and look at the eigenvalues, you cant easily see what shape a quadratic equation cuts out. You cant just look at whether the coecients are positive, or anything obvious like that.

14.3 Review Problems


Problem 14.31. Just by nding eigenvalues (without nding eigenvectors), determine what geometric shape (ellipse, hyperbola, pair of lines, line, empty set) the following are: 2 a. 5 x2 1 + 2 x2 4 x1 x2 = 1 2 b. 3 x1 + 8 x1 x2 + 3 x2 2 =1 2 c. x2 1 4 x 1 x 2 + 4 x 2 = 1 d. 6 x1 x2 + 8 x2 2 =1 e. x2 6 x x + 9 x2 1 2 1 2 =0 2 f. 4 x1 x2 3 x1 6 x2 2 =1

Problem 14.32. What more can you do to normalize a quadratic form if you allow arbitrary invertible matrices instead of orthogonal ones?

14.4 Positivity
Denition 14.9. A quadratic form Q(x) is positive denite if Q(x) > 0 except if x = 0. (Clearly if x = 0 then Q(x) = 0.) For example, Q(x) = x2 is a positive 2 n denite quadratic form on R , while Q(x) = x2 1 + x2 is positive denite on R . 2 2 But it is not at all clear whether Q(x) = 6 x1 12 x1 x2 + 11 x2 is positive denite, because it has positive terms and negative ones. As in gure 14.2 on the facing page, we can also dene positive semidenite forms (Q(x) 0), negative denite forms (Q(x) < 0 for x = 0), and indenite forms (not positive semidenite or negative semidenite), but they are less important. Lemma 14.10. A quadratic form Q(x) = x, Ax (with a symmetric matrix A) is positive denite just if all of the eigenvalues of A are positive. Proof. Let 1 , 2 , . . . , n be the eigenvalues, and change variables to y = F 1 x, so x = F y , to diagonalize the quadratic form:
2 2 2 Q = 1 y1 + 2 y2 + + n yn .

14.5. Appendix: Continuous Functions and Maxima

145

-1

-1.0 -0.5 0.0 3 0.5 1.0 2 -0.5 1 0.5 0 1.0 0.0 x[1] -1.0 1.0

-1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 0.0 1.0 x[1] -0.5 0.0 0.5 1.0

-1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 x[1]

-2 x[2]

x[2]

x[2]

-3 1.0

0.75

0.5

0.0 x[1]

-0.5

0.5

1.0 -1.0 0.5

0.25

0.0 -0.5 -1.0 x[2]

(a) Positive denite (b) Positive semidenite

(c) Indenite

(d) Negative denite

Figure 14.2: Various behaviours of quadratic forms in two variables

Suppose that all of the eigenvalues are positive. Clearly this quantity is then positive for nonzero vectors y , because each term is positive or zero, and at least one of y1 , y2 , . . . , yn is not zero, so gives a positive term. On the other hand, if one of these j is negative, then take y = ej , and you get Q 0 but x = F y = F ej = 0.

14.4 Review Problems


Problem 14.33. Which of the quadratic forms in problem 14.13 on page 140 are positive denite?

14.5 Appendix: Continuous Functions and Maxima


There is one gap in our proof of the spectral theorem for symmetric matrices: we need to know that a quadratic function on the sphere in Rn has a maximum. This appendix gives the proof. Warning: students are not required or expected to work through this appendix, which is advanced and is included only for completeness. Denition 14.11. A sequence of numbers x1 , x2 , . . . converges to a number x if, in order to make xj stay as close as we like to x, we have only to ensure that j is kept large enough. A sequence of points x1 y1 in R2 converges to a point x y if x1 , x2 , . . . converges to x and y1 , y2 , . . . converges to y . Similarly, a sequence of points in Rn converges if all of the coordinates of those points converge. , x2 y2 ,...

146

The Spectral Theorem

Any sequence of increasing real numbers, all of which are bounded from above by some large enough number, must converge to something. This fact is a property of real numbers which we cannot prove without giving an explicit and precise denition of the real numbers; see Spivak [14] for the complete story. We will just assume that this fact is true. Denition 14.12. A function f (x) of any number of variables x1 , x2 , . . . , xn (writing x for x1 . x= . . xn as a point of Rn ) is continuous if, in order to get f (y ) to stay as close to f (x) as you like, you have only to ensure that y is kept close enough to x. If two numbers are close, their sums, products and dierences are clearly close. The reader should try to prove: Lemma 14.13. The function f (x) = x1 is continuous. Constant functions are continuous. The sum, dierence and product of continuous functions is continuous. Corollary 14.14. Any polynomial function in any nite number of variables is continuous. Proof. Induction on the degree and number of terms of the polynomial. Denition 14.15. A ball in Rn is a set B consisting of all points closer than some distance to some chosen point (called the center of the ball). The distance is called the radius. A closed ball includes also the points of distance equal to the radius (an apple with the skin), while an open ball does not include any such points, only including points of distance less than the radius (an apple without the skin). Denition 14.16. A set S Rn is bounded if it lies in a ball (open or closed). Denition 14.17. A set S Rn is called closed if every point x of Rn not belonging to S can be surrounded by an open ball not belonging to S . Denition 14.18. A closed box is the set of points x = (x1 , . . . , xn ) for which each xj lies in some chosen interval, aj xj bj . An open box is the same but with aj < xj < bj . Lemma 14.19. A closed ball is a closed set, as is a closed box. Proof. Given a closed ball, say of radius r, take any point p not belonging to it, say of distance R from the center, and draw an open ball of radius R r about p. By the triangle inequality, no point in the open ball lies in the closed ball. Given a closed box, and a point p not belonging to it, there must be some coordinate of p which does not satisfy the inequalities dening the closed

14.5. Appendix: Continuous Functions and Maxima

147

box. For example, suppose that the box is cut out by inequalities including a1 x1 b1 , and p fails to satisfy these bounds because p1 > b1 . Then every point q closer to p than p1 b1 will still fail: q1 > b1 . So then a ball of radius p1 b1 around p will not overlap the closed box. Theorem 14.20. Every innite sequence of points in a closed, bounded set has a convergent subsequence. Proof. Suppose that the set is a box. Cover the box with a nite number of small closed boxes (perhaps overlapping). There are innitely many points x1 , x2 , . . . , and only nitely many of the small boxes, so there must be innitely many xj lying in the same small box. Similarly, subdivide that small box into much smaller closed boxes. Repeating, we nd a sequence of closed boxes, like Russian dolls, each contained entirely in the previous one, with innitely many xj in each. We get to choose how small the boxes are going to be at each step, so lets make them get much smaller at each step, with side lengths decreasing as rapidly as we like. Pick out one of these xi points, call it yj , from the j -th nested box, as the point with the smallest possible coordinates among all points of that box. The sequence of points y1 , y2 , . . . must converge, since all of the coordinates of the point yj are constrained by the box yj lies in, and each coordinate only increases with j . If we face a closed, bounded set S , which is not a closed box, then nd a closed box B containing it, and repeat the argument above. The problem is to ensure that the limit x of the sequence constructed belongs to the set S . Even if not, it certainly belongs to B . Since S is closed, if x does not belong to S ,then there must be an open ball around x not containing any points of S . But that open ball can not contain any of the points in the nested boxes, and therefore x cannot be their limit. Theorem 14.21. Every continuous function f on a closed, bounded set attains a maximum and a minimum. Proof. For the moment, lets suppose that our closed, bounded set is just a closed box. Suppose that f has no maximum. So the values of f can get larger and larger, but never peak. Let M be the smallest positive number so that f never exceeds M ; if there is no such number let M = . By denition, f gets as close to M as we like (which, if M = , means simply that f gets as large as we like), but never reaches M . Let x1 , x2 , x3 , . . . be any points of the closed bounded set on which f (xj ) approaches M . Taking a subsequence, we nd xj approaching a limit point x, and by continuity f (xj ) must approach f (x), so f (x) = M . So every continuous function on a closed bounded set has a maximum. If f is a continuous function on a closed bounded set, then f is too, and has a maximum, so f has a minimum.

15 Complex Vectors
The entire story so far can be retold with a cast of complex numbers instead of real numbers. Most of this is straightforward. But there turns out to be an important twist in the complex theory of the inner product. The minimum principle doesnt make any sense in the setting of complex numbers, and the spectral theorem as it was stated just isnt true any more for complex matrices. Moreover, the natural notion of inner product itself is quite dierent for complex vectorsthis new notion leads directly to the complex spectral theorem.

15.1 Complex Numbers


Denition 15.1. A complex number is a pair (x, y ) of real numbers. Write 1 to mean the pair (1, 0), and i to mean (0, 1). Addition is dened by the rule (x, y ) + (X, Y ) = (x + X, y + Y ) , subtraction by (x, y ) (X, Y ) = (x X, y Y ) , and multiplication by (x, y ) (X, Y ) = (xX yY, xY + yX ) . We henceforth write any pair as x + iy . We call x the real part and y the imaginary part. When working with complex numbers, we draw them as points of the xy -plane, which we call the complex plane. Complex numbers are associative, commutative and distributive, and every nonzero complex number z = x + iy has a reciprocal: 1 x iy = 2 . z x + y2 You can easily check all of this, but you may assume it if you prefer.

15.2 Polar Coordinates


Trigonometry tells us that any point (x, y ) of the plane can be written as x = r cos , y = r sin in polar coordinates. Therefore x + iy = r cos + ir sin , 149

150

Complex Vectors

The number r is called the modulus of the complex number (written |z | if z is the complex number). The angle is called the argument of the complex number (written arg z if z is the complex number). The modulus is the distance from 0, or the length if think of (x, y ) as a vector. Theorem 15.2 (de Moivre). Under multiplication of complex numbers, moduli multiply, while arguments add. Under division of complex numbers, moduli divide, while arguments subtract. Proof. Let z and w be two complex numbers. Write them as z = r cos + ir sin and w = cos + i cos . Then calculate zw = r [(cos cos sin sin ) + i (sin cos + cos sin )] . A trigonometric identity: cos cos sin sin = cos ( + ) sin cos + cos sin = sin ( + ) . Division is similar. Problem 15.1. Explain why every complex number has a square root.

Denition 15.3. The conjugate z of a complex number z = x + iy is the number z = x iy . Problem 15.2. If z is a complex number, prove that |z |2 = z z We write C for the set of complex numbers, and Cn for the set of vectors z1 z2 z= . . . zn with each of z1 , z2 , . . . , zn a complex number. We wont go through the eort of translating the theorems above into complex linear algebra, except to say that all of the results before chapter 13 (on inner products) are still true for complex matrices, with identical proofs.

15.2 Review Problems


1 1 Problem 15.3. Draw the point z = 2 +1 , 2z , z 2 , z . 3 i on the plane. Draw z

Problem 15.4. Draw w = 1 + 2i and z = 2 + i, and draw z + w, zw,

z w.

15.3. Complex Linear Algebra

151

Problem 15.5. The unit disk in the complex plane is the set of complex numbers of modulus less than 1. The unit circle is the set of complex numbers of modulus 1. Draw the unit circle. Pick z a nonzero complex number, and consider its integer powers z n (n ranging over the integers). Prove that either innitely many of these powers lie inside the unit disk, or all of them lie on the unit circle. Problem 15.6. Let zk = cos 2k n + i sin 2k n .

n Use de Moivres theorem to show that zk = 1. These are the so-called nth roots of 1. Why are they all dierent for k = 0, 1, . . . , n 1? Draw the 3rd roots of 1 and (in another colour) the 4th roots of 1.

Problem 15.7. With pictures and words, explain what you know about a. z + z , b. z 2 if |z | > 1, c. z if |z | = 1, d. zw if |z | = |w| = 1.

15.3 Complex Linear Algebra


The main dierences between real and complex linear algebra are (1) eigenvalues and (2) inner products. We will consider inner products soon, but rst lets consider eigenvalues. Theorem 15.4. Every square complex matrix has a complex eigenvalue. Proof. The eigenvalues are the roots of the characteristic polynomial det (A I ) ; a polynomial with complex number coecients in a complex variable . The existence of a root of any complex polynomial is proven in the appendix. Example 15.5. It may be that there are not very many eigenvectors. A= 0 0 1 0

as a complex matrix still only has one eigenvalue, and (up to rescaling) one eigenvector. Two linearly independent eigenvectors are just what we need to diagonalize a 2 2 matrix. Clearly we cannot diagonalize A. So complex numbers dont resolve all of the subtleties. Problem 15.8. Find the (complex) eigenvalues and eigenvectors of A= 0 1 1 0

152

Complex Vectors

and write down a matrix F which diagonalizes A. Moral of the story: even a matrix like A, which has only real number entries, can have complex number eigenvalues, and complex eigenvectors.

15.4 Hermitian Inner Product


The spectral theorem for symmetric matrices breaks down: Problem 15.9. Prove that the symmetric complex matrix 1 i is not diagonalizable. We will nd a complex spectral theorem, but with a dierent concept replacing symmetric matrices. The equation |z |2 = z z is very important. Think of a complex number as if it were a vector in the plane: z = x + iy = x . y i 1

Then |z |2 = z z is the squared length x2 + y 2 . This motivates: Denition 15.6. The Hermitian inner product of two vectors z and w in Cn is the complex number z, w = z1 w 1 + z2 w 2 + + zn w n . The curious bars on top of the w terms allow us to write z = z, z , just as we would for real vectors. Warning: the Hermitian inner product z, w is a complex number, not the real number we had in inner products before. Problem 15.10. Compute z, w , z and w for z= 1 , w= i i , 2 + 2i
2

Problem 15.11. Prove that a. w, z = z, w b. cz, w = c z, w c. z + w, u = z, u + w, u d. z, z 0 e. z, z = 0 just when z = 0 for z, w and u any complex vectors in Cn and c and complex number.

15.5. Adjoint of a Matrix

153

15.5 Adjoint of a Matrix


Denition 15.7. The adjoint A of a matrix A is the matrix whose entries are A ij = Aji (the conjugate of the transpose). Note that (A ) = A. Problem 15.12. Prove that Az, w = z, A w for any vectors z and w (if one side is dened, then they both are and they are equal). Problem 15.13. Prove that if some matrices A and B satisfy Az, w = z, Bw for any vectors z and w for which this is dened, then B = A .

15.6 Self-adjoint Matrices


Denition 15.8. A complex matrix A is self-adjoint if A = A . This is the complex analogue of a symmetric matrix. Clearly self-adjoint matrices are square. Problem 15.14. Prove that sums and dierences of self-adjoint matrices are self-adjoint, and that any real multiple of a self-adjoint matrix is self-adjoint. Example 15.9. The matrices 1 0 are self-adjoint. Problem 15.15. Which diagonal matrices are self-adjoint? Problem 15.16. Prove that the eigenvalues of a self-adjoint matrix are real numbers. 0 , 1 0 1 1 , 0 0 i i , 0 1 0 0 1

15.6 Review Problems


Problem 15.17. A matrix A is called skew-adjoint if A = A. Prove that a matrix A is skew-adjoint just when iA is self-adjoint, and vice versa.

154

Complex Vectors

15.7 Unitary Matrices


Denition 15.10. A complex matrix A is unitary if A = A1 . This is the complex analogue of an orthogonal matrix. Clearly every unitary matrix is square. Problem 15.18. Prove that a matrix is unitary just when Az, Aw = z, w for any vectors z and w. Problem 15.19. Which diagonal matrices are unitary? Problem 15.20. If A is a real orthogonal matrix, prove that A is also unitary.

15.7 Review Problems


Problem 15.21. Prove that the eigenvalues of a unitary matrix are complex numbers of modulus 1.

15.8 Unitary Bases


Denition 15.11. A unitary basis of Cn is a complex basis u1 , u2 , . . . , un for which 1 if p = q up , uq = 0 if p = q. It may be helpful to use letters like p and q as subscripts, rather than i, j , to avoid confusion with the complex number i. Problem 15.22. Any complex basis v1 , v2 , . . . , vn determines a unitary basis u1 , u2 , . . . , un by the complex Gram-Schmidt process: wp = vp
q<p

vp , uq uq

up =

wp . wp

Problem 15.23. Apply the complex GramSchmidt process to nd a unitary basis for the basis 1 1 v1 = , v2 = . i 2

15.9. Normal Matrices

155

15.9 Normal Matrices


Denition 15.12. A complex matrix A is normal if AA = A A. Problem 15.24. Prove that self-adjoint, skew-adjoint and unitary matrices are normal. Problem 15.25. Which diagonal matrices are normal? Problem 15.26. If A is normal, and c is a constant, prove that A + c1 is also normal. Lemma 15.13. If A is normal, and Az = 0 for some vector z , then A z = 0. Proof. Suppose that Az = 0. Then 0 = Az
2

= Az, Az = z, A Az = z, AA z = A z, A z . So A z = 0. Lemma 15.14. If A is normal, then every eigenvector z of A with eigenvalue . is also an eigenvector of A , but with eigenvalue Proof. Let B = A . Then Bz = 0. Moreover, B is normal since A is. By z = 0. the previous lemma, B z = 0, so A

15.10 The Spectral Theorem for Normal Matrices


At last, the complex version of our main theorem: normal matrices are diagonal after a unitary change of variables. Theorem 15.15. A square matrix is normal just when it is unitarily diagonalizable. Proof. Let A be unitarily diagonalizable. So F AF = is diagonal. Then A = F F , and one easily checks that AA = A A. Let A be a normal matrix. Pick any eigenvector u1 of A, say with eigenvalue . Scale u1 to be a unit vector. Pick unit vectors u2 , u3 , . . . , un so that u1 , u2 , u3 , . . . , un is a unitary basis. Let F be the associated unitary change of basis matrix, F = u1 u2 ... un .

156

Complex Vectors

Replace A by F AF . After replacement A is still normal, and Ae1 = e1 ; the rst column of A is e1 . So A= 0 B C

1 , so the rst for some smaller matrices B and C . By lemma 15.14, A e1 = e column of A is e1 , and so B = 0. Moreover, C is also normal, so by induction we can unitarily diagonalize C , and therefore A. Corollary 15.16. Self-adjoint, skew-adjoint and unitary matrices are unitarily diagonalizable. Problem 15.27. Let A=
7 2 i 2 i 2 7 2

a. Is A self-adjoint or skew-adjoint? b. Find the eigenvalues and eigenvectors of A. c. Find a unitary matrix F which diagonalizes A, and unitarily diagonalize A.

15.11 Appendix: The Fundamental Theorem of Algebra


There was one missing ingredient in the proof of the complex spectral theorem: we need to know that complex polynomials have zeros. This gap is lled in here. Warning: students are not required or expected to work through this appendix, which is advanced and is included only for completeness. Lemma 15.17. Take p(z ) = a0 + a1 z + + an z n any nonconstant polynomial. In order to keep p(z ) at as large a modulus as we like, we only have to keep z at a large enough modulus. Proof. We can assume that an = 0. Write p(z ) a0 a1 = n + n1 + + an . n z z z All of the terms get as small as we like, for z of large modulus, except the last one. So for large enough z , p(z )/z n is close to an . Since z n has large modulus, p(z ) must as well. Corollary 15.18. For any polynomial p(z ), there must be a point z = z0 at which p(z ) has smallest modulus.

15.11. Appendix: The Fundamental Theorem of Algebra

157

Proof. By lemma 15.17, if we choose a large enough disk containing z0 then p(z ) has large modulus at each point z around the edge of that disk. Making the disk even larger if need be, we can ensure that the modulus all around the edge is larger than at some chosen point inside the disk. By theorem 14.21 on page 147, there is a point of the disk where p(z ) has minimum modulus among all points of that disk. The minimum cant be on the edge. Moreover, the modulus stays large as we move past the edge. Thus any minimum modulus point in that large disk is a minimum modulus point among all points of the plane. Lemma 15.19. The modulus |p(z )| of any nonconstant polynomial function reaches a minimum just where p(z ) reaches zero. Proof. Take any point z0 . Suppose that p (z0 ) = 0, and lets nd a reason why z0 is not a minimum modulus point. Replace p(z ) by p (z z0 ) if needed, to arrange that z0 = 0. Write out p(z ) = a0 + a1 z + a2 z 2 + + an z n . It might happen that a1 = 0, and maybe a2 too. So write p(z ) = a0 + ak z k + + an z n , writing down only the nonzero terms, in increasing order of their power of z . Clearly a0 = 0 because p(0) = 0. We can divide by a0 if we wish, which alters modulus only by a positive factor, so lets assume that a0 = 1. We can rotate the z variable, and rescale it, which rotates and scales each coecient. Thereby arrange ak = 1, so p(z ) = 1 z k + + an z n . Calculate |p(z )|2 = p(z )p(z ) = 1 zk z k + . . . , where the dots indicate terms involving more z and z factors. Write z = r cos + ir sin . De Moivres theorem gives |p(z )| = 1 2 rk cos k + rk+1 (. . . ) . The error term (. . . ) is some (probably very complicated) polynomial in r with (complicated) coecients involving cos and sin . We dont need to work it out. We only need to know that it is bounded for z near enough to 0, which is clear whatever the terms involved are. For r > 0 suciently small, 2 r(. . . ) > 0. Multiplying by rk , 2 rk + rk+1 (. . . ) < 0. Therefore |p(z )|2 gets even smaller at the point z = r than at z = 0.
2

158

Complex Vectors

Corollary 15.20. Every nonconstant complex polynomial has a root. Theorem 15.21 (Fundamental Theorem of Algebra). Every nonconstant complex polynomial p(z ) can be factored into linear factors. More specically, p(z ) = c (z z1 )
d1

( z z2 )

d2

. . . (z zk )

dk

where c is a constant, z1 , z2 , . . . , zk are the roots of p(z ), and d1 , d2 , . . . , dk are positive integers, with sum d1 + d2 + + dk equal to the degree of p(z ). Proof. We have a root, say z1 . Therefore p(z )/ (z z1 ) is a polynomial, and we apply induction.

Abstraction

159

16 Vector Spaces
The ideas of linear algebra apply more widely, in more abstract spaces than Rn .

16.1 Denition
Denition 16.1. A vector space V is a set (whose elements are called vectors ) equipped with two operations, addition (written +) and scaling (written ), so that a. Addition laws: a) u + v is in V b) (u + v ) + w = u + (v + w) c) u + v = v + u for any vectors u, v, w in V , b. Zero laws: a) There is a vector 0 in V so that 0 + v = v for any vector v in V . b) For each vector v in V , there is a vector w in V , for which v + w = 0. c. Scaling laws: a) av is in V b) 1 v = v c) a(bv ) = (ab)v d) (a + b)v = av + bv e) a(u + v ) = au + av for any real numbers a and b, and any vectors u and v in V . Because (u + v ) + w = u + (v + w), we never need parentheses in adding up vectors. Example 16.2. Rn is a vector space, with the usual addition and scaling. Example 16.3. The set V of all real-valued functions of a real variable is a vector space: we can add functions (f + g )(x) = f (x) + g (x), and scale functions: (c f )(x) = c f (x). This example is the main motivation for developing an abstract theory of vector spaces. Example 16.4. Take some region inside Rn , like a box, or a ball, or several boxes and balls glued together. Let V be the set of all real-valued functions of that region. Unlike Rn , which comes equipped with the standard basis, there is no standard basis of V . By this, we mean that there is no collection of 161

162

Vector Spaces

functions fi we know how to write down so that every function f is a unique linear combination of the fi . Even still, we can generalize a lot of ideas about linear algebra to various spaces like V instead of just Rn . Practically speaking, there are only two types of vector spaces that we ever encounter: Rn (and its subspaces) and the space V of real-valued functions dened on some region in Rn (and its subspaces). Example 16.5. The set Rpq of p q matrices is a vector space, with usual matrix addition and scaling. Problem 16.1. If V is a vector space, prove that a. 0 v = 0 for any vector v , and b. a 0 = 0 for any scalar a.

Problem 16.2. Let V be the set of real-valued polynomial functions of a real variable. Prove that V is a vector space, with the usual addition and scaling. Problem 16.3. Prove that there is a unique vector w for which v + w = 0. (Lets always call that vector v .) Prove also that v = (1)v . We will write u v for u + (v ) from now on. We dene linear relations, linear independence, bases, subspaces, bases of subspaces, and dimension using exactly the same denitions as for Rn . Remark 16.6. Thinking as much as possible in terms of abstract vector spaces saves a lot of hard work. We will see many reasons why, but the rst is that every subspace of any vector space is itself a vector space.

16.1 Review Problems


Problem 16.4. Prove that if u + v = u + w then v = w. Problem 16.5. Imagine that the population pj at year j is governed (at least roughly) by some equation pj +1 = apj + bpj 1 + cpj 2 . Prove that for xed a, b, c, the set of all sequences . . . , p1 , p2 , . . . which satisfy this law is a vector space. Problem 16.6. Give examples of subsets of the plane a. invariant under scaling of vectors (sending u to au for any number a), but not under addition of vectors. (In other words, if you scale vectors from your subset, they have to stay inside the subset, but if you add some vectors from your subset, you dont always get a vector from your subset.) b. invariant under addition but not under scaling or subtraction. c. invariant under addition and subtraction but not scaling. Problem 16.7. Take positive real numbers and add by the law u v = uv and scale by a u = ua . Prove that the positive numbers form a vector space with these funny laws for addition and multiplication.

16.2. Linear Maps

163

16.2 Linear Maps


Denition 16.7. A linear map between vector spaces U and V is a rule which associates to each vector x from U a vector y from V , (which we write as y = T x) for which a. T (x0 + x1 ) = T x0 + T x1 b. T (ax) = aT x for any vectors x0 , x1 and x in U and real number a. We will write T : U V to mean that T is a linear map from U to V . Example 16.8. Let U be the vector space of all real-valued functions of real variable. Imagine 16 scientists standing one at each kilometer along a riverbank, each measuring the height of the river at the same time. The height at that time is a function h of how far you are along the bank. The 16 measurements of the function, say h(1), h(2), . . . , h(16), sit as the entries of a vector in R16 . So we have a map T : U R16 , given by sampling values of functions h(x) at various points x = 1, x = 2, . . . , x = 16. h(1) h(2) . Th = . . . h(16) This T is a linear map, possibly the most important type of map in all of science. Example 16.9. Any p q matrix A determines a linear map T : Rq Rp , by the equation T x = Ax. Conversely, given a linear map T : Rq Rp , dene a p q matrix A by letting the j -th column of A be T ej . Then T x = Ax. We say that A is the matrix associated to T . In this way we can identify the space of linear maps T : Rq Rp with the space of p q matrices. Example 16.10. There is an obvious linear map 1 : V V given by 1v = v for any vector v in V , and called the identity map We could easily confuse the number 1 with the identity map 1, once again a deliberate ambiguity. Similarly, we will write 2 : V V for the map which associates to each vector v the vector 2v , etc. Denition 16.11. If S : U V and T : V W are linear maps, then T S : U W is their composition. Problem 16.8. Prove that if A is the matrix associated to a linear map S : Rp Rq and B the matrix associated to T : Rq Rr , then BA is the matrix associated to their composition. Remark 16.12. From now on, we wont distinguish a linear map T : Rq Rp from its associated matrix, which we will also write as T . Once again, deliberate ambiguity has many advantages.

164

Vector Spaces

Remark 16.13. A linear map between abstract vector spaces doesnt have an associated matrix; this idea only makes sense for maps T : Rq Rp . Example 16.14. Let U and V be two vector spaces. The set W of all linear maps T : U V is a vector space: we add linear maps by (T1 + T2 ) (u) = T1 (u) + T2 (u), and scale by (cT )u = cT u. Denition 16.15. The kernel of a linear map T : U V is the set of vectors u in U for which T u = 0. The image is the set of vectors v in V of the form v = T u for some u in U . Denition 16.16. A linear map T : U V is an isomorphism if a. T x = T y just when x = y (one-to-one) for any x and y in U , and b. For any z in W , there is some x in U for which T x = z (onto). Two vector spaces U and V are called isomorphic if there is an isomorphism between them. Being isomorphic means eectively being the same for purposes of linear algebra. Problem 16.9. Prove that a linear map T : U V is an isomorphism just when its kernel is 0, and its image is V . Problem 16.10. Let V be a vector space. Prove that 1 : V V is an isomorphism.

Problem 16.11. Prove that an isomorphism T : U V has a unique inverse map T 1 : V U so that T 1 T = 1 and T T 1 = 1, and that T 1 is linear.

Problem 16.12. Let V be the set of polynomials of degree at most 2, and map T : V R3 by, for any polynomial p, p(0) T p = p(1) . p(2) Prove that T is an isomorphism.

16.2 Review Problems


Problem 16.13. Prove that the space of all p q matrices is isomorphic to Rpq .

16.3. Subspaces

165

16.3 Subspaces
The denition of a subspace is identical to that for Rn . Example 16.17. Let V be the set of real-valued functions of a real variable. The set P of continuous real-valued functions of a real variable is a subspace of V . Example 16.18. Let V be the set of all innite sequences of real numbers. We add a sequence x1 , x2 , x3 , . . . to a sequence y1 , y2 , y3 , . . . to make the sequence x1 + y1 , x2 + y2 , x3 + y3 , . . . . We scale a sequence by scaling each entry. The set of convergent innite sequences of real numbers is a subspace of V . In these last two examples, we see that a large part of analysis is encoded into subspaces of innite dimensional vector spaces. (We will dene dimension shortly.) Problem 16.14. Describe some subspaces of the space of all real-valued functions of a real variable.

16.3 Review Problems


Problem 16.15. Which of the following are subspaces of the space of real-valued functions of a real variable? a. The set of everywhere positive functions. b. The set of nowhere positive functions. c. The set of functions which are positive somewhere. d. The set of polynomials which vanish at the origin. e. The set of increasing functions. f. The set of functions f (x) for which f (x) = f (x). g. The set of functions f (x) each of which is bounded from above and below by some constant functions.

Problem 16.16. Which of the following are subspaces of vector space of all 3 3 matrices? a. The invertible matrices. b. The noninvertible matrices. c. The matrices with positive entries. d. The upper triangular matrices. e. The symmetric matrices. f. The orthogonal matrices.

Problem 16.17. a. Let H be an n n matrix. Let P be the set of all matrices A for which AH = HA. Prove that P is a subspace of the space V of all n n matrices.

166 b. Describe this subspace P for H= 1 0 0 . 1

Vector Spaces

16.4 Bases
We dene linear combinations, linear relations, linear independence, bases, the span of a set of vectors, eigenvalues, and eigenvectors identically. Problem 16.18. Find bases for the following vector spaces: a. The set of polynomial functions of degree 3 or less. b. The set of 3 2 matrices. c. The set of n n upper triangular matrices. d. The set of polynomial functions p(x) of degree 3 or less which vanish at the origin x = 0.

Remark 16.19. When working with an abstract vector space V , the role that has up to now been played by a change of basis matrix will henceforth be played by an isomorphism F : Rn V . Equivalently, F e1 , F e2 , . . . , F en is a basis of V. Example 16.20. Let V be the vector space of polynomials p(x) = a + bx + cx2 of degree at most 2. Let F : R3 V be the map a F b = a + bx + cx2 . c Clearly F is an isomorphism. Denition 16.21. The dimension of a vector space V is n if there is an isomorphism F : Rn V . If there is no such value of n, then we say the V has innite dimension. Remark 16.22. We can include the possibility that n = 0 by dening R0 to consist in just a single vector 0, a zero dimensional vector space. Problem 16.19. Prove that the denition of dimension is well-dened, i.e. that there is either only one such value of n, or no such value of n.

Problem 16.20. Let V be the set of polynomials of degree at most p in n variables. Find the dimension of V . Problem 16.21. Prove that if linear maps satisfy P S = T and P is an isomorphism, then S and T have the same kernel, and isomorphic images.

16.4. Bases

167

Problem 16.22. Prove that if linear maps satisfy SP = T , and P is an isomorphism, then S and T have the same image and isomorphic kernels. Problem 16.23. Prove that dimension is invariant under isomorphism.

Problem 16.24. Prove that for any subspace Z of a nite dimensional vector space U , there is basis for U z1 , z2 , . . . , zp , u1 , u2 , . . . , uq so that z1 , z2 , . . . , zp , form a basis for Z .

Theorem 16.23. If v1 , v2 , . . . , vn is a basis for a vector space V , and w1 , w2 , . . . , wn are any vectors in a vector space W , then there is a unique linear map T : V W so that T vi = wi . Proof. If there were two such maps, say S and T , then S T would vanish on v1 , v2 , . . . , vn , and therefore by linearity would vanish on any linear combination of v1 , v2 , . . . , vn , therefore on any vector, so S = T . To see that there is such a map, we know that each vector x in V can be written uniquely as x = x 1 v 1 + x 2 v2 + + x n vn . So lets dene T x = x1 w1 + x2 w2 + + xn wn . If we take two vectors, say x and y , and write them as linear combinations of basis vectors, say with x = x 1 v 1 + x 2 v2 + + x n vn , y = y 1 v1 + y 2 v2 + + y n vn ,

then T (x + y ) = (x1 + y1 ) w1 + (x2 + y2 ) w2 + + (xn + yn ) wn = T x + T y. Similarly, if we scale a vector x by a number a, then ax = a x1 v1 + a x2 v2 + + a xn vn ,

168 so that T (ax) = a x1 w1 + a x2 w2 + + a xn wn = a T x. Therefore T is linear.

Vector Spaces

Denition 16.24. If T : U V is a linear map, and W is a subspace of U , the restriction, written T |W : W V , is the linear map dened by T |W (w) = T w for w in W , only allowing vectors from W to map through T . Theorem 16.25. Let T : U V be a linear transformation of nite dimensional vector spaces. Then dim ker T + dim im T = dim U. Proof. Problem 16.24 on the previous page shows that we can pick a basis z1 , z2 , . . . , zp , u1 , u2 , . . . , uq for U so that z1 , z2 , . . . , zp is a basis for ker T . Let w1 = T u1 , w2 = T u2 , . . . , wq = T uq . Every vector in im T can be written as y = T x for some vector x in U . But then x can be written in terms of the basis, say as x = a1 z1 + a2 z2 + + ap zp + b1 u1 + b2 u2 + + bq uq . So y = Tx = a1 T z1 + a2 T z2 + + ap T zp + b1 T u1 + b2 T u2 + + bq T uq = b1 T u1 + b2 T u2 + + bq T uq = b1 w1 + b2 w2 + + bq wq . Therefore w1 , w2 , . . . , wq span im T . If there is a linear relation between w1 , w2 , . . . , wq , say 0 = c1 w1 + c2 w2 + + cq wq , then 0 = T (c1 u1 + c2 u2 + + cq uq ) . Therefore c1 u1 + c2 u2 + + cq uq lies in ker T , and so can be written in terms of the basis z1 , z2 , . . . , zp of ker T , say as c1 u1 + c2 u2 + + cq uq = d1 z1 + d2 z2 + + dp zp , a linear relation among the vectors of our basis of U , impossible. So w1 , w2 , . . . , wq are linearly independent, and so form a basis of im T . Finally, we have a basis of U with p + q vectors in it, a basis of ker T with p vectors in it, and a basis of im T with q vectors in it, so dim ker T + dim im T = dim U . Remark 16.26. In this theorem, we could also allow U , V , or both to have innite dimension, as well as allowing kernel and image to have innite dimension, with the understanding that + = . However, in this book we will content ourselves with nite dimensional vector spaces.

16.5. Determinants

169

16.5 Determinants
Denition 16.27. If T : V V is a linear map taking a nite dimensional vector space to itself, dene det T to be det T = det A, where F : Rn V is an isomorphism, and A is the matrix associated to F 1 T F : Rn R n . Remark 16.28. There is no denition of determinant for a linear map of an innite dimensional vector space, and there is no general theory to handle such things, although there are many important examples. Remark 16.29. A map T : U V between dierent vector spaces doesnt have a determinant. Problem 16.25. Prove that value of the determinant is independent of the choice of isomorphism F .

Problem 16.26. Let V be the vector space of polynomials of degree at most 2, and let T : V V be the linear map T p(x) = 2p(x 1) (shifting a polynomial p(x) to 2p(x 1).) For example, T 1 = 2, T x = 2(x 1), T x2 = 2(x 1)2 . a. Prove that T is a linear map. b. Prove that T is an isomorphism. c. Find det T .

16.5 Review Problems


Problem 16.27. Let T : V V be the linear map T x = 2x. Suppose that V has dimension n. What is det T ?

Problem 16.28. Let V be the vector space of all 2 2 matrices. Let A be a 2 2 matrix with two dierent eigenvalues, 1 and 2 , and eigenvectors x1 and x2 corresponding to these eigenvalues. Consider the linear map T : V V given by T B = AB (matrix multiplication on the right hand side of B by A). What are the eigenvalues of T and what are the eigenvectors? (Warning: the eigenvectors are vectors from V , so they are matrices.) What is det T ?

Problem 16.29. The same but let T B = BA.

170

Vector Spaces

Problem 16.30. Let V be the vector space of polynomials of degree at most 2, and let T : V V be dened by T q (x) = q (x). What is the characteristic polynomial of T ? What are the eigenspaces of T ? Is T diagonalizable?

Problem 16.31. (Due to Peter Lax [6].) Consider the problem of nding a polynomial p(x) with specied average values on each of a dozen intervals on the x-axis. (Suppose that the intervals dont overlap.) Does this problem have a solution? Does it have many solutions? (All you need is a naive notion of average value, but you can consult a calculus book, for example [14], for a precise denition.) (a) For each polynomial p of degree n, let T p be the vector whose entries are the averages. Suppose that the number of intervals is at least n. Show that T p = 0 only if p = 0. (b) Suppose that the number of intervals is no more than n. Show that we can solve T p = b for any given vector b.

Problem 16.32. How much of the nutshell (table 12.1 on page 114) can you translate into criteria for invertibility of a linear map T : U V ? How much more if we assume that U and V are nite dimensional? How much more if we assume that U = V ?

16.6 Complex Vector Spaces


If we change the denition of a vector space, a linear map, etc. to use complex numbers instead of real numbers, we have a complex vector space, complex linear map, etc. All of the examples so far in this chapter work just as well with complex numbers replacing real numbers. We will refer to a real vector space or a complex vector space to distinguish the sorts of numbers we are using to scale the vectors. Some examples of complex vector spaces: a. Cn b. The space of p q matrices with complex entries. c. The space of complex-valued functions of a real variable. d. The space of innite sequences of complex numbers.

16.7 Inner Product Spaces


Denition 16.30. An inner product on a real vector space V is a choice of a real number x, y for each pair of vectors x and y so that a. x, y is a real-valued linear map in x for each xed y b. x, y = y, x c. x, x 0 and equal to 0 just when x = 0.

16.8. Hermitian Inner Product Spaces

171

A real vector space equipped with an inner product is called a inner product space. A linear map between vector spaces is called orthogonal if it preserves inner products. Theorem 16.31. Every inner product space of dimension n is carried by some orthogonal isomorphism to Rn with its usual inner product. Proof. Use the GramSchmidt process to construct an orthonormal basis, using the same formulas we have used before, say u1 , u2 , . . . , un . Dene a linear map F x = x1 u1 + +xn un , for x in Rn . Clearly F is an orthogonal isomorphism. Example 16.32. Take A any symmetric n n matrix with positive eigenvalues, and let x, y A = Ax, y (with the usual inner product on Rn appearing on the right hand side). Then the expression x, y A is an inner product. Therefore by the theorem, we can nd a change of variables taking it to the usual inner product. Denition 16.33. A linear map T : V V from an inner product space to itself is symmetric if T v, w = v, T w for any vectors v and w. Theorem 16.34 (Spectral Theorem). Given a symmetric linear map T on a nite dimensional inner product space V , there is an orthogonal isomorphism F : Rn V for which F 1 T F is the linear map of a diagonal matrix.

16.8 Hermitian Inner Product Spaces


Denition 16.35. A Hermitian inner product on a complex vector space V is a choice of a complex number z, w for each pair of vectors z and w from V so that a. z, w is a complex-valued linear map in z for each xed w b. z, w = w, z c. z, z 0 and equal to 0 just when z = 0.

16.8 Review Problems


Problem 16.33. Let V be the vector space of complex-valued polynomials of a complex variable of degree at most 3. Prove that for any four distinct points z1 , z2 , z3 , z4 , the expression p(z ), q (z ) = p (z0 ) q (z0 ) + p (z1 ) q (z1 ) + p (z2 ) q (z2 ) + p (z3 ) q (z3 ) + is a Hermitian inner product.

Problem 16.34. Continuing the previous question, if the points z0 , z1 , z2 , z3 are z0 = 1, z1 = 1, z2 = i, z3 = i, prove that the map T : V V given by T p(z ) = p(z ) is unitary.

172

Vector Spaces

Problem 16.35. Continuing the previous two questions, unitarily diagonalize T. Problem 16.36. State and prove a spectral theorem for normal complex linear maps T : V V on a Hermitian inner product space, and dene the terms adjoint, normal and unitary for complex linear maps V V .

17 Fields
Instead of real or complex numbers, we can dream up wilder notions of numbers. Denition 17.1. A eld is a set F equipped with operations + and so that a. Addition laws a) x + y is in F b) (x + y ) + z = x + (y + z ) c) x + y = y + x for any x, y and z from F . b. Zero laws a) There is an element 0 of F for which x + 0 = x for any x from F b) For each x from F there is a y from F so that x + y = 0. c. Multiplication laws a) xy is in F b) x(yz ) = (xy )z c) xy = yx for any x, y and z in F . d. Identity laws a) There is an element 1 in F for which x1 = 1x = x for any x in F . b) For each x = 0 there is a y = 0 for which xy = 1. (This y is called the reciprocal or inverse of x.) c) 1 = 0 e. Distributive law a) x(y + z ) = xy + xz for any x, y and z in F . We will not ask the reader to check all of these laws in any of our examples, because there are just too many of them. We will only give some examples; for a proper introduction to elds, see Artin [1]. Example 17.2. Of course, the set of real numbers R is a eld (with the usual addition and multiplication), as is the set C of complex numbers and the set Q of rational numbers. The set Z of integers is not a eld, because the integer 2 has no integer reciprocal. 173

174

Fields

Example 17.3. Let F be the set of all rational functions p(x)/q (x), with p(x) and q (x) polynomials, and q (x) not the 0 polynomial. Clearly for any pair of rational functions, the sum p1 (x) p2 (x) p1 (x)q2 (x) + q1 (x)p2 (x) + = q1 (x) q2 (x) q1 (x)q2 (x) is also rational, as is the product, and the reciprocal. Problem 17.1. Suppose that F is a eld. Prove the uniqueness of 0, i.e. that there is only one element z = 0 in F which satises x + z = x for any element x.

Problem 17.2. Prove the uniqueness of 1.

Problem 17.3. Let x be an element of a eld F . Prove the uniqueness of the element y for which x + y = 0. Henceforth, we write this y as x. Problem 17.4. Let x be an element of eld F . If x = 0, prove the uniqueness of the reciprocal. Henceforth, we write the reciprocal of x as
1 x,

and write x + (y ) as x y .

17.1 Some Finite Fields


Example 17.4. Let F be the set of numbers F = {0, 1}. Carry out multiplication by the usual rule, but when you add, x + y wont mean the usual addition, but instead will mean the usual addition except when x = y = 1, and then we set 1 + 1 = 0. F is a eld called the eld of Boolean numbers. Problem 17.5. Prove that for Boolean numbers, x = x and
1 x

= x.

Example 17.5. Suppose that p is a positive integer. Let F be the set of numbers Fp = {0, 1, 2, . . . , p 1}. Dene addition and multiplication as usual for integers, but if the result is bigger than p 1, then subtract multiples of p from the result until it lands in Fp , and let that be the denition of addition and multiplication. F2 is the eld of Boolean numbers. We usually write x = y (mod p) to mean that x and y dier by a multiple of p. For example, if p = 7, we nd 5 6 = 30 =2 (mod 7) (mod 7) (mod 7).

= 30 28

This is arithmetic in F7 . It turns out that Fp is a eld for any prime number p.

17.1. Some Finite Fields

175

Problem 17.6. Prove that Fp is not a eld if p is not prime. The only trick in seeing that Fp is eld is to see why there is a reciprocal. It cant be the usual reciprocal as a number. For example, if p = 7 6 6 = 36 (mod 7) = 36 35 =1 (mod 7) (mod 7)

(because 35 is a multiple of 7). So 6 has reciprocal 6 in F7 .

The Euclidean Algorithm


To compute reciprocals, we rst need to nd greatest common divisors, using the Euclidean algorithm. The basic idea: given two numbers, for example 12132 and 2304, divide the smaller into the larger, writing a quotient and remainder: 12132 5 2304 = 612. Take the two last numbers in the equation (2304 and 612 in this example), and repeat the process on them, and so on: 2304 3 612 = 468 612 1 468 = 144 468 3 144 = 36 144 4 36 = 0. Stop when you hit a remainder of 0. The greatest common divisor of the numbers you started with is the last nonzero remainder (36 in our example). Now that we can nd the greatest common divisor, we will need to write the greatest common divisor as an integer linear combination of the original numbers. If we write the two numbers we started with as a and b, then our goal is to compute integers u and v for which ua + bv = gcd(a, b). To do this, lets go backwards. Start with the second to last equation, giving the greatest common divisor. 36 = 468 3 144 Plug the previous equation into it: = 468 3 (612 1 468) Simplify: = 3 612 + 4 468

176 Plug in the equation before that: = 3 612 + 4 (2304 3 612) = 4 2304 15 612 = 4 2304 15 (12132 5 2304) = 15 12132 + 79 2304.

Fields

We have it: gcd(a, b) = u a + b v , in our case 36 = 15 12132 + 79 2304. What does this algorithm do? At each step downward, we are facing an equation like a bq = r, so any number which divides into a and b must divide into r and b (the next a and b) and vice versa. The remainders r get smaller at each step, always smaller than either a or b. On the last line, b divides into a. Therefore b is the greatest common divisor of a and b on the last line, and so is the greatest common divisor of the original numbers. We express each remainder in terms of previous a and b numbers, so we can plug them in, cascading backwards until we express the greatest common divisor in terms of the original a and b. In the example, that gives (15)(12132) + (79)(2304) = 36. Let compute a reciprocal modulo an integer. Lets compute 171 modulo 1001. Take a = 1001, and b = 17. 1001 58 17 = 15 17 1 15 = 2 15 7 2 = 1 2 2 1 = 0. Going backwards 1 = 15 7 2 = 15 7 (17 1 15) = 7 17 + 8 15 = 7 17 + 8 (1001 58 17) = 471 17 + 8 1001. So nally, (471)(17) + (8)(1001) = 1. Modulo 1001, (471)(17) = 1. So 171 = 471 = 1001 471 = 530 (mod 1001). This is how we can compute reciprocals in Fp : we take a = p, and b the number to reciprocate, and apply the process. If p is prime, the resulting greatest common divisor is 1, and so we get up + vb = 1, and so vb = 1 (mod p), so v is the reciprocal of b. Problem 17.7. Compute 151 in F79 .

Problem 17.8. Solve the linear equation 3x + 1 = 0 in F5 .

17.2. Matrices

177

Problem 17.9. Prove that Fp is a eld whenever p is a prime number.

17.2 Matrices
Matrices with entries from any eld F are added, subtracted, and multiplied by the same rules. We can still carry out forward elimination, back substitution, calculate inverses, determinants, characteristic polynomials, eigenvectors and eigenvalues, using the same steps. Problem 17.10. Let F be the Boolean numbers, and A the matrix 0 1 0 A = 1 0 1 , 1 1 0 thought of as having entries from F . Is A invertible? If so, nd A1 . All of the ideas of linear algebra worked out for the real and complex numbers have obvious analogues over any eld, except for the concept of inner product, which is much more sophisticated. From now on, we will only state and prove results for real vector spaces, but those results which do not require inner products (or orthogonal or unitary matrices) continue to hold with identical proofs over any eld. Problem 17.11. If A is a matrix whose entries are rational functions of a variable t, prove that the rank of A is constant in t, except for nitely many values of t.

Geometry and Combinatorics

179

18 Permutations and Determinants


In this chapter, we present explicit, theoretically important, but computationally infeasible formulas for determinants, matrix inversion and solving linear equations.

18.1 Determinants Via Permutations


Recall the denition of determinant: Denition 18.1. Let A(ij ) be the matrix obtained by cutting out row i and column j from a matrix A. The determinant of an n n matrix A is a. If A is 1 1, say A = (a), then det A = a. b. Otherwise det A =A11 det A(11) A21 det A(21) + A31 det A(31) A41 det A(41) + . . . =
i

(1)i+1 Ai1 det A(i1) .

Remark 18.2. There is no standard notation in the mathematical literature for A(ij ) . We wont refer to A(ij ) again after this chapter. We can also expand down any column, or across any row: Theorem 18.3. det A =
i

(1)i+j Aij det A(ij ) , for any column j (1)i+j Aij det A(ij ) , for any row i,
j

where A is any square matrix. Finally, we can expand into a sum of permutations: Theorem 18.4. det A = (1)N Ai1 1 Ai2 2 . . . Ain n , 181 (18.1)

182

Permutations and Determinants

for any square matrix A, where the sum is over all permutations i1 , i2 , . . . , in of 1, 2, . . . , n, and N is the number of transpositions in some sequence of transpositions taking the permutation i1 , i2 , . . . , in back to 1, 2, . . . , n. Remark 18.5. This permutation formula for the determinant is important for the theory. However, it is too slow as a method for computing determinants. For a 20 20 matrix, GaussJordan elimination takes approximately 2489 multiplications and divisions, while the permutation formula takes approximately 46225138155356160000 multiplications, so about 1016 times as longtoo long for any supercomputer that will ever be built. Remark 18.6. The number (1)N is called the sign of the permutation i1 , i2 , . . . , in . For example, the sign of 1, 3, 2 is (1)1 = 1. Keep in mind that a permutation might factor into transpositions in dierent ways, using dierent numbers of transpositions; for example = .

It is not obvious whether there could be two dierent ways to carry out a permutation in terms of transpositions, with two dierent values for (1)N (i.e. one with an odd number of transpositions, and the other an even number of transpositions). However, this will follow from the proof below.

Proof. First, lets forget about the minus signs. Run your nger down the rst column, and you pick up Ai1 1 , and multiply it by det A(i1 1) . This det A(i1 1) is calculated similarly, except that row i1 and column 1 have been deleted, so none of the terms come from that row or column. Therefore (inductively) all of the terms look just right, except for the (very confusing) minus signs. In each term, the numbers i1 , i2 , . . . , in label deleted rows, in the order in which we delete them, and 1, 2, . . . , n label deleted columns. There is only one way to generate each term, given by the permutation i1 , i2 , . . . , in , since i1 labels which row we delete rst, etc. So each term we have written above shows up just once, with a plus or minus sign. Lets x the minus signs. The term A11 A22 . . . Ann occurs with a plus sign, because we start with a plus sign when we run our nger down the rst column, so it is clear by induction. When we swap rows, we switch sign of the whole determinant. Each term is dierent from any other (coming from a dierent permutation), and has just a plus or minus sign in front of it. Swapping rows can alter signs of terms, and changes the order in which the terms appear, but the same terms are still sitting there. So every term must switch sign when we swap any two rows. The sign in front of A12 A21 A33 A44 . . . Ann , for example, must be , since it comes about from one swapping (of rows 1 and 2) applied to A11 A22 . . . Ann . The number N of transpositions needed to reorder the permutation i1 , i2 , . . . , in of rows into 1, 2, . . . , n is the number of minus signs in front of that term. (The minus signs cancel in pairs.)

18.2. Determinants Multiply: Permutation Proof

183

Corollary 18.7. det A = (1)N A1j1 A2j2 . . . Anjn ,

for any square matrix A, where the sum is over all permutations j1 , j2 , . . . , jn of 1, 2, . . . , n, and N is the number of transpositions in some sequence of transpositions taking the permutation j1 , j2 , . . . , in back to 1, 2, . . . , n. Proof. Take a term in the row permutation formula 18.1, say (1)N Ai1 1 Ai2 2 . . . Ain n . Scramble the factors back into order by rst index, say (1)N A1j1 A2j2 . . . Anjn . So j1 , j2 , . . . , jn is the inverse permutation of i1 , i2 , . . . , in . The inverse permutation can be brought about by reversing the transpositions that bring about the original permutation i1 , i2 , . . . , in . So it has the same number of transpositions N , and therefore the same sign.

18.2 Determinants Multiply: Permutation Proof


Lets see a dierent proof that det AB = det A det B , using the permutation formulas above instead of forward elimination. Write columns of AB as ABej = A
i

Bij ej Bij Aei .

=
i

Calculate the determinant: det AB =


p

(1)N (AB )i1 1 (AB )i2 2 . . . (AB )in n

Use (AB )ij = =


p

Aik Bkj , to expand each AB : Ai1 k1 Bk1 1


k1 k2

(1)N

Ai2 k2 Bk2 1
kn

Ai n k n B k n 1

=
k1 ,k2 ,...,kn

Bk1 1 Bk2 2 . . . Bkn n


p

(1)N Ai1 k1 Ai2 k2 . . . Ain kn Aek2 ... Aekn

=
k1 ,k2 ,...,kn

Bk1 1 Bk2 2 . . . Bkn n det Aek1

184

Permutations and Determinants

If k1 = k2 , then two columns in here are equal, so the resulting determinant is zero. Therefore this is really a sum over permutations k1 , k2 , . . . , kn . Reorder the columns: = = Bk1 1 . . . Bkn n (1)N det Ae1 Bk1 1 . . . Bkn n (1)N det A (1)N Bk1 1 . . . Bkn n ... Aen

= det A

= det A det B.

18.2 Review Problems


Problem 18.1. Prove that the sign of a permutation p is the determinant of its permutation matrix. Problem 18.2. Prove that the sign of qp is the product of the signs of q and p, and that the sign of the inverse permutation p1 is the sign of p. Problem 18.3. Let Q (x1 , x2 , . . . , xn ) be the product of all possible terms of the form xi xj where i < j . For example, if n = 2, then Q (x1 , x2 ) = x1 x2 , while if n = 3, then Q (x1 , x2 , x3 ) = (x1 x2 ) (x1 x3 ) (x2 x3 ) . (a) If n = 4, what is Q (x1 , x2 , x3 , x4 )? (b) Prove that if we permute the variables by a permutation p, then Q xp(1) , xp(2) , . . . , xp(n) = Q (x1 , x2 , . . . , xn ) where is the sign of p. (We could use this equation as the denition of the sign of a permutation.)

Problem 18.4. Prove that a matrix is a permutation matrix just when it is a product of permutation matrices of transpositions. Problem 18.5. For the reader who has read chapter 13: why are permutation matrices orthogonal? For which permutations is the associated permutation matrix symmetric? Problem 18.6. Lets write n! (read as n factorial ) for how many permutations there are of the numbers 1, 2, . . . , n. Prove that n! = n(n 1)(n 2) . . . (2)(1).

Problem 18.7. If all entries of an n n matrix A satisfy |Aij | R:

18.3. Cramers Rule

185

a. Prove that |det A| n! Rn , b. Using caution with how you pick your pivots, by forward elimination and induction, prove that |det A| 2n(n1)/2 Rn . c. Give some evidence as to which is the better bound.

18.3 Cramers Rule


The permutation formula for the determinant is no help in calculation, but it has some theoretical power. There are similarly impractical formulas for the inverse of a matrix, and for the solutions of linear equations. The formulas give insight into how complicated inverses and solutions can get as functions of matrix entries. Recall the notation: A(ij ) means the matrix A with row i and column j deleted. Recall from theorem 7.9 on page 65: det A =
i

(1)i+j Aij det A(ij ) , for any column j (1)i+j Aij det A(ij ) , for any row i,
j

where A is any square matrix. Denition 18.8. The adjugate matrix adj A of a square matrix A is the matrix whose entries are (adj A)ij = (1)j +i det A(ji) . Note the placement of i and j here: reversed from the formula for determinants. So clearly det A =
j

Aij (adj A)ji

for any xed index i. Some more notation: if A is a matrix, and b a vector, write Ab,i for the matrix obtained by replacing column i of A by b. Theorem 18.9 (Cramers Rule). If A is invertible, then the solution to Ax = b has entries det Ab,j xj = . det A Proof. Write A as columns A = a1 a2 ... an .

186 Expand out: det Ab,j = det a1 = det a1 = det a1 a2 a2 a2 ... ... ... b Ax ... ...

Permutations and Determinants

an an ... an

(x1 a1 + x2 a2 + + xn an )

But a1 already appears in the rst column, so x1 a1 has no eect on the result. Similarly for x2 a2 , etc., so = det a1 = xj det A. a2 ... xj aj ... an

Problem 18.8. If A and b have integer entries, and det A = 1, prove that the solution x of Ax = b has integer entries. Corollary 18.10. If a matrix A is invertible, then A 1 = 1 adj A. det A

Proof. We need to invert A, so to solve Ax = ej for each vector ej , and then put each solution x in column j of a matrix A1 . By Cramers rule, the solution of Ax = ej has entries det Aej ,i xi = . det A Putting these together into the columns of a matrix, we nd components
1 A ij =

det Aej ,i det A A1n A2n . . . Ajn . . . Ann

Calculate the numerator:

det Aej ,i

A11 A 21 . . . = det Aj 1 . . . An 1

A12 A22 . . . Aj 2 . . . An 2

... ... ... ...

0 0 . . . 1 . . . 0

... ... ... ...

Expand along column j : = (1)i+j det A(ji) .

18.3. Cramers Rule

187

Remark 18.11. Cramers rule shows us that the solution x of Ax = b is a smooth function of A and b, as long as A is invertible. (By smooth, we mean dierentiable as many times as you like, with respect to all variables. In this case, the variables are the entries of A and b.) In particular, the entries of A1 are smooth functions of the entries of A. Denition 18.12. We will say that two square matrices A and B commute if AB = BA. Similarly, we will say that two linear maps S : V V and T : V V of a vector space V commute if ST = T S . Lemma 18.13 (Lax [6]). Suppose that P (x) and Q(x) are polynomials with n n matrix coecients: P (x) = P0 + P1 x + P2 x2 + + Pp xp Q(x) = Q0 + Q1 x + Q2 x2 + + Qq xq . Their product P Q(x) = P (x)Q(x) is also a polynomial with matrix coecients. If A is an n n matrix, then write P (A) to mean P ( A ) = P0 + P1 A + P2 A 2 + + Pp A p . If A is an nn matrix commuting with all of the coecient matrices Q0 , Q1 , . . . , Qq of Q(x), then P Q(A) = P (A)Q(A). Proof. Calculate P Q(x) = P0 Q0 + (P1 Q0 + P0 Q1 ) x + + Pp Qq xp+q =
j,k

Pj Qk xj +k .

So P Q(A) =
j,k

Pj Qk Aj +k Pj Aj Qk Ak
j,k

= P (A)Q(A).

Theorem 18.14 (CayleyHamilton). Every square matrix A satises p(A) = 0 where p() = det (A I ) is the characteristic polynomial of A. Remark 18.15. The proof works equally well over any eld, not just the real or complex numbers. Proof. Let P () = adj (A I ), Q() = A I . Then P ()Q() = p(). Clearly A commutes with the coecient matrices of Q() (i.e. A commutes with A), so P (A)Q(A) = P Q(A) = p(A). But Q(A) = A A = 0.

188

Permutations and Determinants

18.3 Review Problems


Problem 18.9. Use Cramers rule to solve 2x1 + x2 = 1 x1 x2 = 3. Problem 18.10. By nding the adjugate, compute the inverse of A= 1 3 2 . 4

Problem 18.11. If an invertible matrix A has integer entries, prove that A1 also has integer entries just when det A = 1. Problem 18.12. Without any calculation, what can you say about the inverse of the matrix 1 1 + x + 5 x3 1 + 2 x3 2 + 7 x + x3 6 + 3 x3 0 1 5 + 7 x + 7 x3 7 + 7 x + 8 x3 1 + 5 x3 A = 0 0 1 2 + 4 x + 3 x3 6 x + 4 x3 ? 0 0 0 1 3 + 3 x + 2 x3 0 0 0 0 1

19 Volume and Determinants


We will see that the determinant of a matrix measures volume growth.

19.1 Shears
Consider the matrix A= 1 0 c . 1

To each vector x, associate the vector y = Ax, y1 y2 = 1 0 c 1 x1 x2 . A shear

x1 + cx2 x2

We call the transformation taking x to y = Ax a shear. Consider the x1 , x2 plane. Draw the x1 axis horizontally, and x2 vertically. Take a rectangule in the x1 , x2 -plane, with sides along the two axes. The rectangle gets mapped into a parallelogram in the y1 , y2 -plane. Problem 19.1. Prove that the bottom of the rectangle stays put, but the top is shifted over by the amount c, while the height stays the same. The parallelogram has the same area as the rectangle. So the shear preserves the area of the rectangle. If we slide the rectangle away from the origin, the parallelogram slides too in the same way. Consequently all rectangles with sides parallel to the axes have area preserved by any shear.

19.2 Reections
The map y = Ax with A= 0 1 1 0

Cut the parallelogram. Slide the triangle over to the right side to recover the rectangle.

swaps the x1 , x2 coordinates, so clearly takes rectangles to congruent rectangles, and so preserves their area. 189

190

Volume and Determinants

19.3 Hypotheses About Volume


We will make the following assumptions about volumes of subsets of Rn : a. A single point in R has volume 0. b. If we scale the real line by a factor, say c = 0, then we scale lengths of line segments by a factor of |c|. c. If A is a subset of Rp and B a subset of Rq , with volumes Vol (A) = a and Vol (B ) = b, then A B in Rp+q has Vol (A B ) = ab. d. If a map preserves the volumes of all rectangular boxes with sides parallel to the coordinate axes, then it preserves volumes of any subset that has a nite volume. (Because it should be possible to approximate complicated objects with little boxes.) e. If a set X has a nite volume, then so does AX for any linear map A. These hypotheses are justied in any book of multivariable calculus or basic analysis (for example, Wheeden & Zygmund [16]). Complicated shapes can be approximated by small rectangular boxes. Area of any shape is preserved as long as the areas of rectangular boxes are preserved.

19.4 Determinant and Volume


Theorem 19.1. Consider a transformation y = Ax with A an n n matrix. Taking any set X in Rn , let AX be its image under the transformation. If X has some volume Vol (X ), then AX has volume Vol (AX ) = |det A| Vol (X ). Proof. The strictly lower and strictly upper triangular matrices with only a single nonzero entry o the diagonal are shears of two coordinates. The permutation matrices of transpositions are swaps of two coordinates. We have seen that these preserve areas in the plane of those two coordinates. They have no eect on any other coordinates. Therefore, by our rst hypothesis, they preserve volumes of boxes. By our second hypothesis, they preserve volumes. The scalings of rows we encountered in reduced echelon form are just rescalings of variables, so they rescale volume by their determinant. Any matrix can be brought to reduced echelon form by nitely many products with these matrices. If det A = 0, then the reduced echelon form is 1, so the result is true. If det A = 0, we carry out more simplications after GaussJordan elimination: multiplying A by strictly lower and strictly upper triangular matrices on the right side of A, we can add any column of A to any other column, thereby killing all entries after each pivot. Each step preserves volumes. We arrange A= 1p 0 0 , 0

for some size p p of identity matrix. So the image of A lies inside Rp . Take any set X . Suppose that X has volume Vol (X ). The set AX is AX = Y Z for Y some set in Rp , and Z the single point 0 in Rnp . Hence Vol (Z ) = 0 and Vol (AX ) = Vol (Y ) Vol (Z ) = 0.

19.4. Determinant and Volume

191

19.4 Review Problems


Problem 19.2. (a) What is the volume of a cube in n dimensions in terms of the length of each side? (b) How far apart are the opposite corners of the cube from one another? (c) How many faces are there of one lower dimension? (d) How many vertices are there? (e) How many faces are there of all possible lower dimensions? Problem 19.3. Use a determinant to nd the area of the ellipse x a
2

y b

= 1,

from the well-known fact that a circle of unit radius x2 + y 2 = 1 has area .

20 Geometry and Orthogonal Matrices


What does it mean geometrically for a matrix to be orthogonal?

20.1 Distances and Orthogonal Matrices


Inequalities
Lemma 20.1 (The Schwarz Inequality). For x and y in Rn , | x, y | x y .

Equality holds just when one of the vectors is a multiple of the other. Proof. If y = 0 then the result is obvious. If y = 0, then 0 = = y = y = y Therefore x
2

y y
4 4 2

2 2

x x, y y x x, y y, y x x
2 2 2

x x, y y x, y x, y
2 2

y y
2

2 2 2

x, y x, y

x, y + x, y

x, y

x, y

On the rst line, you see that equality holds just when x is a multiple of y . Lemma 20.2 (The Triangle Inequality). If x and y are vectors in Rn , then x+y x + y . Equality holds just when one of the vectors is a nonnegative multiple of the other. 193

194 Proof. x+y


2

Geometry and Orthogonal Matrices

= x + y, x + y = x x
2 2

+ 2 x, y + y +2 x
2

2 2

y + y

=( x + y ) . Equality requires x, y = x and then y , so one is a multiple of the other, say y = ax, x, y = x y

implies that a 0, and similarly if x is a multiple of y .

20.1 Review Problems


Problem 20.1. Suppose that A is a matrix. (a) Among all vectors x of length 1, how large can the 1st entry of Ax (the entry (Ax)1 ) be? (b) Show that if x is a vector of length 1, then y = Ax sits inside a rectangular box whose sides are parallel to the axes, with the side along the yj -axis being of length 2 A ej . (c) Apply this to the matrix A= and draw a picture. 4 0 3 , 2

Isometries and Orthogonal Matrices


Problem 20.2. The distance between two points x and y of Rn is x y . Prove that for any three points x, y, z in Rn , xz + zy xy .

Denition 20.3. Dene the line connecting two distinct points x and y in Rn to be the set of points of the form tx + (1 t)y , and the line segment between x and y to be the set of such points with 0 t 1. Problem 20.3. Recall that for any three points x, y, z in Rn , xz + zy xy . Prove that when x and y are distinct, equality holds just when z lies on the line segment between x and y .

20.1. Distances and Orthogonal Matrices

195

Problem 20.4. If x and y are distinct, and z = tx + (1 t)y on the line segment between x and y , prove that no other point of Rn has the same distances to x and y . Denition 20.4. A map T : Rn Rp is a rule associating to each point x of Rn a point T (x) of Rp . An isometry T of Rn is a map T : Rn Rn , so that the distance between any two points x and y of Rn is the same as the distance between T (x) and T (y ). Theorem 20.5. The isometries of Rn are precisely the maps T (x) = Ax + b where A is any orthogonal matrix, and b is any vector in Rn . Proof. From the exercises, T (tx + (1 t)y ) = tT (x) + (1 t)T (y ) , for x and y distinct, and t between 0 and 1, because T preserves distances, so preserves the triangle inequality, the lines, line segments, etc. Replace the map T (x) by T (x) T (0) if needed (just shifting T (x) over) to ensure that T (0) = 0. Taking y = 0, and x = 0, we get T (tx) = tT (x) , for 0 t 1. For t > 1, t T (x) = t T =t 1 tx t

1 T (tx) t = T (tx) . To handle minus signs, set y = x and t = 1 2: 0 = T (0) 1 1 x + (x) 2 2 1 1 = T (x) + T (x). 2 2 So T (x) = T (x), and this ensures that T (tx) = t T (x) for all real values of t. Take any nonzero vector y : =T T (x + y ) = T x y + (1 t) t 1t x y = tT + (1 t) T t 1t t

= T (x) + T (y ).

196

Geometry and Orthogonal Matrices

Set A to be the matrix whose columns are T (e1 ) , T (e2 ) , . . . , T (en ). Then T ( x) = T = = = Ax. So T (x) = Ax for a matrix A, and Ax = x so A is orthogonal. Problem 20.5. Which maps alter distance by a constant factor? xj ej xj T ( ej ) xj Aej

20.2 Rotations and Orthogonal Matrices


How can we picture what an orthogonal matrix does, in any number of dimensions? Problem 20.6. Prove that a 2 2 orthogonal matrix has the form A= just when it has det A = 1. Consider the matrix 1 cos sin sin cos

cos 1 sin cos 1 sin

(where each 1 could be an identity matrix of any size). Call this a rotation in the xk x -plane if the sines and cosines appear in rows k and . More generally, picking any two perpendicular vectors of unit length, say u and v , and an angle , we can dene an orthogonal matrix R by asking that Rx = x for x perpendicular to u and v , and that R rotate the plane spanned by u and v by the angle : Ru = cos u + sin v Rv = sin u + cos v. Problem 20.7. The minus signs look funny. Check that this gives the expected matrix R if u = e1 and v = e2 . Problem 20.8. How can we be sure that such a matrix R exists?

20.2. Rotations and Orthogonal Matrices

197

Careful: if we swap the choice of which is u and which is v , we will rotate in the wrong direction. Another kind of orthogonal map is the map taking x to x, which (for want of a better word) we can call a reversal . Theorem 20.6. Every orthogonal matrix is a product of rotations in mutually perpendicular planes together with a reversal in some subspace perpendicular to all of those planes. Remark 20.7. The reversal occurs precisely in the = 1 eigenspace of the orthogonal matrix. Remark 20.8. There might be some vectors which are perpendicular to all of the planes and to the reversing subspace. Such vectors must be xed: Ax = x, and so form the = 1 eigenspace of the orthogonal matrix. Proof. Call the matrix A. This matrix A is a real matrix, but we can think of it as a complex matrix, which just happens to have only real number entries. Since A is an orthogonal matrix, it is normal. By the complex spectral theorem we can nd a unitary basis u1 , u2 , . . . , un in Cn of complex eigenvectors of A, say Au1 = 1 u1 , Au2 = 2 u2 , . . . , Aun = n un . These eigenvalues 1 , 2 , . . . , n are complex numbers. What sort of complex numbers are they? As in problem 15.21 on page 154, since A is unitary, the eigenvalues of A are complex numbers of modulus 1: 1 = |1 | = |2 | = = |n | . Moreover, since A is a matrix of real numbers, taking complex conjugates of 1u the equation Au1 = 1 u1 gives Au 1 = 1 . Because A is orthogonal, and therefore unitary, Au1 , Au 1 = u1 , u 1 . Therefore u1 , u 1 = Au1 , Au 1 1u = 1 u1 , 1 Pulling the scalars through the Hermitian inner product, = 1 1 u1 , u 1 = 2 1 . 1 u1 , u Therefore either (1) 2 1 . If (1), then 1 = 1 or else (2) u1 is perpendicular to u 1 = 1, so we are looking at an eigenvector in the reversing subspace, or a xed vector. The case (2) is trickier: write u1 = x1 + iy1 , with x1 and y1 real vectors. Then calculate u1 , u 1 = x1
2

y1

+ 2i x1 , y1 .

198

Geometry and Orthogonal Matrices

Therefore in case (2), we nd that x1 and y1 have the same length, and are perpendicular. We can scale x1 and y1 to both have length 1. Since |1 | = 1, we can write 1 = cos 1 + i sin 1 . Expand out the equation Au1 = 1 u1 , into real and imaginary parts, and you nd Ax1 = cos 1 x1 sin 1 y1 Ay1 = sin 1 x1 + cos 1 y1 , a rotation by an angle of 1 . The same results hold for u2 , u3 , . . . , un in place of u1 . Problem 20.9. Prove that the reversal of an even dimensional space R2n is a product of rotations, each rotating some plane by an angle of . We can choose any planes we like, as long as they are mutually perpendicular and together span R2n .

Corollary 20.9. An orthogonal matrix is a product of rotations in mutually perpendicular planes just when it has determinant 1. Denition 20.10. A rotation is an orthogonal matrix of unit determinant.

20.3 Reections and Orthogonal Matrices


Denition 20.11. A reection by a nonzero vector u in Rn is an isometry R of Rn for which R(u) = u and R(x) = x for any vector x perpendicular to u. Lemma 20.12. There is a unique reection R by any nonzero vector u, and it is x, u Rx = x 2 u. (20.1) u, u Proof. If R is a reection, then R is an isometry and R(0) = 0, so R(x) = Rx for a unique matrix R (which we deliberately call by the same name). Clearly R2 = 1 so R1 = R. If R and S are two reections in the same vector u, then RS (u) = u and RS (x) = x for x perpendicular to u, so RS (au + bx) = au + bx xes every vector, and therefore RS = 1 so S = R1 = R. So there is a unique reection, if one exists. Let u1 = u u . Pick unit vectors u2 , u3 , . . . , un so that u1 , u2 , . . . , un is an orthonormal basis. Write a vector x as x = i ai ui . Then equation 20.1 gives Rx = a1 u1 + a2 u2 + a3 u3 + + an un . Therefore Rx = x so R is any isometry, and clearly a reection.

20.3. Reections and Orthogonal Matrices

199

Problem 20.10. If A is orthogonal, prove that det A = 1. Theorem 20.13. Every orthogonal matrix is either a rotation (if it has determinant 1) or the product of a rotation with a reection (if it has determinant -1). Proof. By the same procedure as in corollary 20.9, we can arrange by induction without loss of generality that the columns of our orthogonal matrix A are e1 , e2 , e3 , . . . , en1 . But the proof breaks down at the last column, because to get each column xed up, the proof needs to have at least two rows. So the last column is en . Hence det A = 1 and A = 1 just when det A = 1, and is reection in en just when det A = 1. Problem 20.11. Prove that every unitary matrix is a rotation. Problem 20.12. Use the spectral theorem to show that every n n unitary matrix is a product of rotations in n mutually perpendicular planes. Theorem 20.14 (CartanDieudonn). Every n n orthogonal matrix is a product of at most n reections. The number of reections is an even number if it is a rotation, and odd otherwise. Proof. For a 1 1 matrix, the result is obvious. Let A be our n n orthogonal matrix. Our matrix A is a product of reections, say in vectors u1 , u2 , . . . , un , just when F AF t is a product in reections F u1 , F u2 , . . . , F un . So we may change orthonormal basis as we please. If u is a xed vector, i.e. Au = u, then we can rescale u to have unit length, and change basis to get u = e1 . Then by induction, A= 1 0 0 B

is a product of at most n 1 reections. If A doesnt x any vector, then take any nonzero vector v . Consider the reection R in the vector u = Av v . A simple calculation: Rv = Av and RAv = v. Therefore ARAv = Av, so that AR has a xed vector Av . By induction AR is a product of n 1 reections, so A is a product of n reections.

21 Orthogonal Projections
In chapter 20, we studied geometry of the inner product in Rn . In this chapter, we continue the study in abstract inner product spaces.

21.1 Orthonormal Bases


An orthonormal basis in an inner product space has the same denition as in Rn (see denition 13.9 on page 123). Lemma 21.1. Suppose that W is a subspace of a nite dimensional inner product space V , say of dimension p. Then there is an orthonormal basis v1 , v2 , . . . , vn for V so that v1 , v2 , . . . , vp is an orthonormal basis for W . Proof. Follow the process given in the hint for problem 16.24 on page 167, to produce a basis for V whose rst p vectors are a basis for W . Apply the GramSchmidt process to these vectors to obtain an orthonormal basis.

21.2 Orthogonal Projections


Recall that a vector space which is equipped with an inner product is called an inner product space. Denition 21.2. If W is a subspace of an inner product space V , let W (called the orthogonal complement of W ) be the set of vectors of V perpendicular to all vectors of W . Problem 21.1. Prove that W is a subspace of V .

Theorem 21.3. Let W be a subspace of a nite dimensional inner product space V . Then every vector v in V can be written in precisely one way as v = x + y with x from W and y from W . The vector x is called the orthogonal projection of v to W . The map P : V W taking each vector to its orthogonal projection is linear. 201

202

Orthogonal Projections

Proof. First, lets show that there is at most one way to break up a vector v into a sum x + y . Suppose that v = x0 + y0 = x1 + y1 , with x0 and x1 from W and y0 and y1 from W . Then x0 x1 = y1 y0 . The left hand side lies in W while the right hand side lies in W . But the the left hand side must be perpendicular to the right hand side, so perpendicular to itself, so 0. Therefore the right hand side must be 0 too, so x0 = x1 and y0 = y1 : there is at most one way to break up each vector v . Next, lets show that there is a way. Take an orthonormal basis for V , say v1 , v2 , . . . , vn for which v1 , v2 , . . . , vp form an orthonormal basis for W . Therefore vp+1 , vp+2 , . . . , vn lie in W . Write each vector v in V as v = a1 v1 + a2 v2 + + ap vp + ap+1 vp+1 + ap+2 vp+2 + + an vn .
in W in W

Problem 21.2. Finish the proof by proving that orthogonal projection is a linear map.

Problem 21.3. Suppose that W is a subspace of a nite dimensional inner product space V . Prove that W = W .

Problem 21.4. Prove the Pythagorean theorem :if x and y are perpendicular 2 2 2 vectors in an inner product space, then x + y = x + y .

Problem 21.5. Prove that v = P v + Qv , where P is the orthgonal projection to a subspace W and Q is the orthogonal projection to W . Lemma 21.4. The orthogonal projection P v of a vector v to a subspace W is the closest point of W to the vector v . Proof. Write v = x + y with x in W and y in W . Lets see how close a vector w from W can get to v . vw
2

= xw+y = xw
2

2 2

+ y

by the Pythagorean theorem. As we vary w through W , this expression is clearly smallest when w = x.

21.2. Orthogonal Projections

203

Problem 21.6. Let P be the orthogonal projection to a subspace W of a nite dimensional inner product space V . Prove Bessels inequality Pv v , and equality holds just when P v = v , which holds just when v lies in W . Problem 21.7. Prove that a linear map P : V V of a nite dimensional inner product space V is the projection to some subspace if and only if P = P 2 = P t .

Problem 21.8. Find analogues of all of the results of this chapter for complex vectors in a nite dimensional Hermitian inner product space.

Jordan Normal Form

205

22 Direct Sums of Subspaces


Subspaces have a kind of arithmetic. Denition 22.1. The intersection of two subspaces is the collection of vectors which belong to both of the subspaces. We will write the intersection of subspaces U and V as U V . Example 22.2. The subspace U of R3 given by the vectors of the form x1 x2 x1 x2 intersects the subspace V consisting in the vectors of the form x1 0 x3 in the subspace written U V , which consists in the vectors of the form x1 0 . x1 Denition 22.3. If U and V are subspaces of a vector space W , write U + V for the set of vectors w of the form w = u + v for some u in U and v in V ; call U + V the sum. Problem 22.1. Prove that U + V is a subspace of W . Denition 22.4. If U and V are two subspaces of a vector space W , we will write U + V as U V (and say that U V is a direct sum ) to mean that every vector x of U + V can be written uniquely as a sum x = y + z with y in U and z in Z . We will also say that U and V are complementary, or complements of one another. Example 22.5. R3 = U V for U the subspace consisting of the vectors x1 x = x2 0 207

208 and V the subspace consisting of the vectors 0 x = 0 , x3 since we can write any vector x uniquely as 0 x1 x = x2 + 0 . x3 0

Direct Sums of Subspaces

Problem 22.2. Give an example of two subspaces of R3 which are not complementary.

Theorem 22.6. U + V is a direct sum U V just when U V consists of just the 0 vector. Proof. If U + V is a direct sum, then we need to see that U V only contains the zero vector. If it contains some vector x, then we can write x uniquely as a sum x = y + z , but we can also write x = (1/2)x + (1/2)x or as x = (1/3)x + (2/3)x, as a sum of vectors from U and V . Therefore x = 0. On the other hand, if there is more than one way to write x = y + z = Y + Z for some vectors y and Y from U and z and Z from V , then 0 = (y Y )+(z Z ), so Y y = z Z , a nonzero vector from U V . Lemma 22.7. If U V is a direct sum of subspaces of a vector space W , then the dimension of U V is the sum of the dimensions of U and V . Moreover, putting together any basis of U with any basis of V gives a basis of W . Proof. Pick a basis for U , say u1 , u2 , . . . , up , and a basis for V , say v1 , v2 , . . . , vq . Then consider the set of vectors given by throwing all of the us and v s together. The us and v are linearly independent of one another, because any linear relation 0 = a1 u1 + a2 u2 + + ap up + b1 v1 + b2 v2 + + bq vq would allow us to write a1 u1 + a2 u2 + + ap up = (b1 v1 + b2 v2 + + bq vq ) , so that a vector from U (the left hand side) belongs to V (the right hand side), which is impossible unless that vector is zero, because U and V intersect only at 0. But that forces 0 = a1 u1 + a2 u2 + + ap up . Since the us are a basis, this forces all as to be zero. The same for the bs, so it isnt a linear relation. Therefore the us and v s put together give a basis for U V .

22.1. Application: Simultaneously Diagonalizing Several Matrices

209

We can easily extend these ideas to direct sums with many summands U1 U2 Uk . Problem 22.3. Prove that if U V = W , then any linear maps S : U Rp and T : V Rp determine a unique linear map Q : W Rp written Q = S T , so that Q|U = S and Q|V = T .

22.1 Application: Simultaneously Diagonalizing Several Matrices


Theorem 22.8. Suppose that T1 , T2 , . . . , TN are linear maps taking a vector space V to itself, each diagonalizable, and each commuting with the other (which means T1 T2 = T2 T1 , etc.) Then there is a single invertible linear map F : Rn V diagonalizing all of them. Proof. Since T1 and T2 commute, if x is an eigenvector of T1 with eigenvalue , then T2 x is too: T1 (T2 x) = T2 T1 x = T2 x = (T2 x) . So each eigenspace of T1 is invariant under T2 . The same is true for any two of the linear maps T1 , T2 , . . . , TN . Because T1 is diagonalizable, V is a direct sum of the eigenspaces of T1 . So it suces to nd a basis for each eigenspace of T1 , which diagonalizes all of the linear maps. It suces to prove this on each eigenspace separately. So lets restrict to an eigenspace of T1 , where T1 = 1 . So T1 = 1 is diagonal in any basis. By the same reasoning applied to T2 , etc., we can work on a common eigenspace of all of the Tj , arranging that T2 = 2 , etc., diagonal in any basis.

22.2 Transversality
Lemma 22.9. If V is a nite dimensional vector space, containing two subspaces U and W , then dim U + dim W = dim(U + W ) + dim (U W ) . Proof. Take any basis for U W . Then while you pick some more vectors from U to extend it to a basis of U , I will simultaneously pick some more vectors from W to extend it to a basis of W . Clearly can throw our vectors together to get a basis of U + W . Count them up. This lemma makes certain inequalities on dimensions obvious.

210

Direct Sums of Subspaces

Lemma 22.10. If U and W are subspaces of an n dimensional vector space V , say of dimensions p and q , then max {0, p + q n} dim (U W ) min {p, q } , max {p, q } dim (U + W ) min {n, p + q } . Proof. All inequalities but the rst are obvious. The rst follows from the last by applying lemma 22.9 on the previous page. Problem 22.4. How few dimensions can the intersection of subspaces of dimensions 5 and 3 in R7 have? How many? Denition 22.11. Two subspaces U and W of a nite dimensional vector space V are transverse if U + W = V . Problem 22.5. How few dimensions can the intersection of transverse subspaces of dimensions 5 and 3 in R7 have? How many? Problem 22.6. Must subspaces in direct sums be transverse?

22.3 Computations
In Rn , all abstract concepts of linear algebra become calculations. Problem 22.7. Suppose that U and W are subspaces of Rn . Take a basis for U , and put it into the columns of a matrix, and call that matrix A. Take a basis for W , and put it into the columns of a matrix, and call that matrix B . How do you nd a basis for U + W ? How do you see if U + W is a direct sum?

Proposition 22.12. Suppose that U and W are subspaces of Rn and that A and B are matrices whose columns give bases for U and W respectively. Apply the algorithm of chapter 10 to nd a basis for the kernel of (A B ), say x1 y1 , x2 y2 ,..., xs ys .

Then the vectors Ax1 , Ax2 , . . . , Axs form a basis for the intersection of U and W. Proof. For example, Ax1 + By1 = 0, so Ax1 = By1 = B (y1 ) lies in the image of A and of B. Therefore the vectors Ax1 , Ax2 , . . . , Axn lie in U W . Suppose that some vector v also lies in U W . Then v = Ax = B (y ) for some vectors x and y . But then Ax + By = 0, so x y

22.3. Computations

211

is a multiple of the vectors x1 y1 , x2 y2 ,..., xs ys .

so x = aj Axj , for some numbers aj . Therefore these Axj span the intersection. Suppose they suer some linear relation: 0 = cj Axj . So 0 = A cj xj . But the columns of A are linearly independent, so A is 1-1. Therefore 0 = cj xj . At the same time, 0= = cj Axj cj B (yj ) cj yj ).

= B ( But B is also 1-1, so 0 = cj yj . So 0= cj

xj yj

But these vectors are linearly independent by construction.

23 Jordan Normal Form


We cant quite diagonalize every matrix, but we will see how close we can come: the Jordan normal form. We will generalize this form to linear maps of an abstract vector space.

23.1 Jordan Normal Form and Strings


Diagonalizing is powerful. By diagonalizing a matrix, we can see what it does completely, and we can easily carry out immense computations; for example, nding large powers of a matrix. The trouble is that some matrices cant be diagonalized. How close can we come? Example 23.1. A= 0 0 1 0

is the simplest possible example. Its only eigenvalue is = 0. As a real or complex matrix, it has only one eigenvector, 1 , 0 up to rescaling. Not enough eigenvectors to form a basis of R2 , so not enough to diagonalize. We will build this entire chapter from this simple example. Problem 23.1. What does the map taking x to Ax look like for this matrix A? Lets write 1 = (0) , 2 = 0 0 0 1 , 3 = 0 0 0 1 0 0 0 1 , 0

and in general write n or just for the square matrix with 1s just above the diagonal, and 0s everywhere else. Problem 23.2. Prove that ej = ej 1 , except for e1 = 0. So we can think of as shifting the standard basis, like the proverbial lemmings stepping forward until e1 falls o the cli. 213

214

Jordan Normal Form

A matrix of the form + is called a Jordan block. Our goal in this chapter is to prove: Theorem 23.2. Every square complex matrix A can be brought by change of F 1 AF to Jordan normal form 1 + 2 + 1 , F AF = .. . N + broken into Jordan blocks. We will not give the simplest possible proof (which is probably the one given by Hartl [4]), but instead give an explicit algorithm for computing F , and then prove that the algorithm works. Denition 23.3. If is an eigenvalue of a matrix A, a vector x is a generalized eigenvector of A with eigenvalue if (A )k x = 0, for some positive integer k . If k = 1 then x is an eigenvector in the usual sense. Example 23.4. A= 0 0 1 0

satises A2 = 0, so every vector x in R2 is a generalized eigenvector of A, with eigenvalue 0. In the generalized sense, we have lots of eigenvectors. Problem 23.3. Prove that every vector in Cn is a generalized eigenvector of with eigenvalue 0. Then prove that for any number , every vector in Cn is a generalized eigenvector of + with eigenvalue . Problem 23.4. Prove that no nonzero vector can be a generalized eigenvector with two dierent eigenvalues.

Problem 23.5. Prove that nonzero generalized eigenvectors of a square matrix, with dierent eigenvalues, are linearly independent.

Denition 23.5. A string is a collection of linearly independent vectors of the form 2 k x, (A ) x, (A ) x, . . . , (A ) x, each a generalized eigenvector with eigenvalue . We want to make our strings as long as possible, not contained in any longer string.

23.2. Algorithm

215

Example 23.6. For A = , en , en1 , . . . , e1 is a string with eigenvalue = 0. Problem 23.6. For A = + , show that en , en1 , . . . , e1 is a string with eigenvalue . Problem 23.7. Find strings of 2 0 0 2 0 0 A= 0 0 0 0 0 0

0 1 2 0 0 0

0 0 0 3 0 0

0 0 0 1 3 0

0 0 0 . 0 1 3

Problem 23.8. Prove that every nonzero generalized eigenvector x belongs to a string, and the string can be lengthened until the last entry of the string is an eigenvector (not generalized).

23.2 Algorithm
First we give the algorithm, and then an example, and then a proof that it works. To compute out the strings that put a matrix A into Jordan normal form: a. If A is already in Jordan normal form, then just look beside the diagonal of A to nd the blocks, and read o the strings. So we can assume that A is not in Jordan normal form. b. Find an eigenvalue of A. Replace A with A . (So from now on A is not invertible.) c. Apply forward elimination to (A 1). Call the result (U V ). d. Put the pivot columns of A into a matrix A , and the pivot columns of U into a matrix U . e. Solve the system of equations U X = UA for an unknown square matrix X , by back substitution. This matrix X has size r r, where r is the rank of A (i.e. the number of pivot columns of A). The matrix X is smaller than A, because not all columns of A are pivot columns. f. Apply the algorithm to the matrix X , to nd its strings. (It may save you time to notice that X has the same eigenvalues as A, except perhaps 0.) g. For each string of X , applying A to the string gives a string of A. For example, a string y1 , y2 , . . . becomes A y1 , A y2 , . . . .

216

Jordan Normal Form

h. From among the strings we have just constructed in our last step, take each vector z which starts a string with eigenvalue 0. Solve the equations U x = V z by back substitution, and add x to the start of this string. i. We are still missing the strings of length 1 and zero eigenvalue, i.e. vectors from the kernel. Solve U x = 0 by back substitution, and take a basis x1 , x2 , . . . , xk of solutions. j. Each of these vectors x1 , x2 , . . . , xk constitutes a string of length 1 and zero eigenvalue. Add into our list of strings enough of these vectors to produce a basis. k. Put all of the strings into the columns of a single matrix F , each string listed in reverse order. Then F 1 AF is in Jordan normal form. A computer can carry out the algorithm, using symbolic algebra software. Lets see an example, and then prove that this algorithm always works. Example 23.7. Let 4 0 A= 0 0 0 1 0 1 0 0 3 1 0 0 . 0 3

a. A is not already in Jordan normal form. However, we can see that A is built out of blocks: a 1 1 block, and a 3 3 block. Each block can be separately brought to Jordan normal form, so the strings will divide up into strings for each block. The 1 1 block has e1 as eigenvector (a string of length 1) with eigenvalue 4. So it suces to nd the Jordan normal form for 1 0 0 A = 0 3 0 . 1 1 3 (We will still call this matrix A, even after we make various changes to it, to avoid a mess of notation.) b. The eigenvalues of A are 1 and 3. Replace A by A 3, so 2 0 0 A= 0 0 0 . 1 1 0 c. Forward elimination applied to (A 1) yields 2 0 0 1 1 (U V ) = 0 0 1 2 0 0 0 0 so 2 U = 0 0 0 1 0 0 1 0 , V = 1 2 0 0

0 0 1 0 0 1

0 1 , 0 0 1 . 0

23.2. Algorithm

217

d. The rst two columns of A are pivot columns. Cutting out nonpivot columns, 2 0 2 0 A = 0 0 , U = 0 1 . 1 1 0 0 e. Solving U X = U A , we nd 2 X11 4 U X UA = X21 0 Therefore X= 2 0 0 . 0 2 X12 X22 . 0

f. This matrix X is already in Jordan normal form. The strings of X are = 2 =0 g. These give strings in A: = 2 2 A e1 = 0 1 0 A e2 = 0 1 e1 e2

=0

h. One of the strings has zero eigenvalue, and starts with 0 z = 0 . 1 Solve U x = V z for an unknown vector x1 x = x2 , x3 nding 2x1 U x V z = x2 1 . 0

218 Back substitution gives 0 x=1 x3

Jordan Normal Form

for any number x3 . We will pick x3 = 0 for simplicity. Add this vector x to the front of the string: e2 , Ae2 = e3 . So the two strings for A are 2 = 2 0 1 =0 e2 , e3

i. We can see a basis already, so skip this step. j. And skip this step too, for the same reason. Returning the original problem, we have to rst shift back the eigenvalues by 3 to restore the matrix back to 1 0 0 A = 0 3 0 . 1 1 3 This shifts eigenvalues, but preserves strings: 2 =1 0 1 =3 e2 , e3

Next we add back in the original block structure of A, so return to 4 0 0 0 1 0 0 0 A= . 0 0 3 0 0 1 1 3 This requires us to relabel our vectors for the second block, giving strings: =4 =1 e1 0 2 0 1 e3 , e4

=3

23.2. Algorithm

219

Writing each string down in reverse order, they become the columns of the matrix 1 0 0 0 2 0 0 0 F = . 0 0 0 1 0 1 1 0 This matrix F brings A to Jordan normal form: 4 0 0 0 1 0 F 1 AF = 0 0 3 0 0 0

0 0 . 1 3

Proposition 23.8. The algorithm takes any complex n n matrix, and yields strings whose entries are a basis of Cn . Proof. Clearly this is true for a 1 1 matrix. Lets imagine that A is n n, and that we have proven this proposition for any smaller complex matrix. U = V A, so dropping pivotless columns gives U = V A . Therefore U X = U A just when A X = AA . This implies A (X ) = (A ) A , and so k k A (X ) = (A ) A for any integer k . Since A has linearly independent columns, only 0 belongs to its kernel. Since the columns of A and of A both span the image of A, the images of A and A are the same. So mapping a vector y in Cr to the vector A y in Cn makes a correspondence between vectors in Cr and vectors in the image of A. A vector y starts a string for X just when A y starts a corresponding string for A (of the same length with the same eigenvalue). By induction, we can assume that the strings for X produce a basis of Cr . We write down the corresponding strings for A in step g. So we have a basis of strings for the image of A, not enough to make a basis of Cn . Lets make some more. First, lets think about strings of A of eigenvalue = 0. They look like x, Ax, A2 x, . . . . Every string of A that we have constructed so far lies in the image of A. We can always lengthen the = 0-strings because if the rst vector in the string, say z , lies in the image, say z = Ax, then we can add x to the front of the string. After that rst vector, which is not in the image, the rest of the string is in the image. This explains step h. But there could still be = 0-strings which have no vectors in the image. Such a = 0-string must have length 1, since any = 0-string x, Ax, . . . has Ax in the image. This explains step i. We cant build any more linearly independent = 0-strings, or lengthen any of the = 0-strings we have. We need to see that our strings form a basis of Cn . First we will check that all of the vectors in all of the strings are linearly independent, and then we will count them. Take any vectors x1 , x2 , . . . , xq , so that any two are either from dierent strings, or from dierent positions on the same string. Suppose that they satisfy

220

Jordan Normal Form

a linear relation. If none of these xj head up a = 0-string, then they live in the image, and are linearly independent by construction. There cannot be a linear relation between generalized eigenvectors with dierent eigenvalues, so we can assume that all of these xj live in = 0-strings. Any linear relation 0 = c1 x1 + c2 x2 + + cp xp entails a linear relation 0 = c1 Ax1 + c2 Ax2 + + cp Axp , pushing each vector down the string. None of these vectors can occur at the same position, so there is no such relation unless all of the Axj vanish. Therefore we can assume that all of the xj are in = 0-strings of length 1. But these are linearly independent by construction. We have to count how many vectors we have written down in all of the strings put together. A is n n, and has rank r. The image of A has dimension r and the kernel has dimension n r. In step f, we picked up strings containing r vectors. In step h, we added one vector to the front of each = 0-string. Such a string ends in the intersection of kernel and image. Suppose that the kernel and image have m dimensional intersectionthere are m such strings. Therefore we have taken r vectors, and added m more. In step i, we added one more vector, for each dimension of the kernel outside the image. This adds n r m more vectors, giving a total of r + m + (n r m) = n vectors. Theorem 23.9. Every complex square matrix A can be brought to Jordan normal form 1 + 2 + 1 F AF = .. . N + by the invertible complex matrix F obtained from the algorithm. Proof. Take the strings and write each one down in reverse order into the columns of a matrix F . Each string now becomes ei , ei1 , ei2 , . . . , just as in a Jordan block. Replace A by F 1 AF , so we can assume that Aei = i ei + ei1 . This gives us the i-th column of A, the same as for the Jordan normal form. Corollary 23.10. The same theorem is true for real square matrices, as long as all of the complex eigenvalues are real numbers. Proof. Use the same proof.

23.2. Algorithm

221

Problem 23.9. For an n n matrix A with entries in a eld F , show that we can put A into Jordan normal form as P AP 1 using a square matrix P with entries from F just when the characteristic polynomial det (A I ) splits into n linear factors with coecients from F .

Cracking of Jordan Blocks


Jordan normal form is very sensitive to small changes in matrix entries. For this reason, we cannot compute Jordan normal form unless we know the matrix entries precisely, to innitely many decimals. Problem 23.10. If A is n n and has n dierent eigenvalues, show that A is diagonalizable.

Example 23.11. The matrix 2 = 0 0 1 0

is not diagonalizable, but the nearby matrix 1 0 1 2

is, as long as 1 = 2 , since these 1 and 2 are the eigenvalues of this matrix. It doesnt matter how small 1 and 2 are. The same idea clearly works for of any size, and for + of any size, and so for any matrix in Jordan normal form. Theorem 23.12. Every complex square matrix can be approximated as closely as we like by a diagonalizable square matrix. Remark 23.13. By approximated, we mean that we can make a new matrix with entries all as close as we wish to the entries of the original matrix. Proof. Put your matrix into Jordan normal form, say F 1 AF is in Jordan normal form, and then use the trick from the last example, to make diagonalizable matrices B close to F 1 AF . Then F BF 1 is diagonalizable too, and is close to A. Remark 23.14. Using the same proof, we can also approximate any real matrix arbitrarily closely by diagonalizable real matrices (i.e. diagonalized by real change of basis matrices), just when its eigenvalues are real.

222

Jordan Normal Form

23.3 Uniqueness of Jordan Normal Form


Proposition 23.15. A square matrix has only one Jordan normal form, up to reordering the Jordan blocks. Proof. Clearly the eigenvalues are independent of change of basis. The problem is to gure out how to measure, for each eigenvalue, the number of blocks of each size. Fix an eigenvalue . Let dm (, A) = dim ker (A ) . (We will only use this notation in this proof.) Clearly dm (, A) is independent of choice of basis. For example, d1 (, A) is the number of blocks with eigenvalue , while d2 (, A) counts vectors at or next to the end of strings with eigenvalue . All 1 1 blocks contribute to both d1 (, A) and d2 (, A). The dierence d2 (, A) d1 (, A) is the number of blocks of size at least 2 2. Similarly the number d3 (, A) d2 (, A) measures the number of blocks of size at least 3 3, etc. But then the dierence (d2 (, A) d1 (, A))(d3 (, A) d2 (, A)) = 2 d2 (, A)d1 (, A)d3 (, A) is the number blocks of size at least 2 2, but not 3 3 or more, i.e. exactly 2 2. The number of m m blocks is the dierence between the number of blocks at least m m and the number of blocks at least (m + 1) (m + 1), so number of m m blocks = (dm (, A) dm1 (, A)) (dm+1 (, A) dm (, A)) = 2 dm (, A) dm1 (, A) dm+1 (, A) . (We can just dene d0 (, A) = 0 to allow this equation to hold even for m = 1.) Therefore the number of blocks of any size is independent of the choice of basis.
m

23.3 Review Problems


Compute the matrix F for which F 1 AF is in Jordan normal form, and the Jordan normal form itself, for Problem 23.11. A= 1 1 1 1

Problem 23.12.

0 A = 0 0

0 0 0

1 0 0

23.3. Uniqueness of Jordan Normal Form

223

Problem 23.13. Thinking about the fact that has string en , en1 , . . . , e1 , what is the Jordan normal form of A = 2 n ? (Dont try to nd the matrix F bringing A to that form.)

Problem 23.14. Use the algorithm 0 0 A= 0 1

to compute the Jordan normal form of 1 0 0 0 1 0 0 0 1 0 2 0

Problem 23.15. Use the algorithm to compute the Jordan normal form of 0 0 A = 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 . 0 0

Problem 23.16. Without computation (and without nding the matrix F taking A to Jordan normal form), explain how you can see the Jordan normal form of 1 10 100 0 20 200 0 0 300 Problem 23.17. If a square complex matrix A satises a complex polynomial equation f (A) = 0, show that each eigenvalue of A must satisfy the same equation. Problem 23.18. Prove that any reordering of Jordan blocks can occur by changing the choice of the matrix F we use to bring a matrix A to Jordan normal form. Problem 23.19. Prove that every matrix is a product of two diagonalizable matrices.

Problem 23.20. Suppose that to each n n matrix A we assign a number D(A), and that D(AB ) = D(A)D(B ).

224 (a) Prove that D P 1 AP = D(A) for any matrix P . (b) Dene a function f (x) by f ( x) = D x 1 1 .. .

Jordan Normal Form

. 1

Prove that f (x) is multiplicative ; i.e. f (ab) = f (a)f (b) for any numbers a and b. (c) Prove that on any diagonal matrix a1 a2 A= .. . an we have D(A) = f (a1 ) f (a2 ) . . . f (an ) . (d) Prove that on any diagonalizable matrix A, D(A) = f (1 ) f (2 ) . . . f (n ) , where 1 , 2 , . . . , n are the eigenvalues of A (counted with multiplicity). (e) Use the previous exercise to show that D(A) = f (det(A)) for any matrix A. (f) In this sense, det is the unique multiplicative quantity associated to a matrix, up to composing with a multiplicative function f . What are all continuous multiplicative functions f ? (Warning: it is a deep result that there are many discontinuous multiplicative functions.)

24 Decomposition and Minimal Polynomial


The Jordan normal form has an abstract version for linear maps, called the decomposition of a linear map. Most results in this chapter are obvious consequences of the Jordan normal form from the previous chapter, but because Jordan normal form is so complicated we wont use it in this chapter. We will assume that the reader has read the last chapter up to, but perhaps not including, the algorithm.

24.1 The Spectral Theorem for Complex Linear Maps


Polynomial Division
Problem 24.1. Divide x2 + 1 into x5 + 3x2 + 4x + 1, giving quotient and remainder.

Problem 24.2. Use the Euclidean algorithm (subsection 17.5 on page 175) applied to polynomials instead of integers, to compute the greatest common divisor r(x) of a(x) = x4 + 2 x3 + 4 x2 + 4 x + 4 and b(x) = x5 + x2 + 2 x3 + 2. Find polynomials u(x) and v (x) so that u(x)a(x) + b(x)v (x) = r(x). Given any pair of polynomials a(x) and b(x), the Euclidean algorithm writes their greatest common divisor r(x) as r(x) = u(x) a(x) + v (x) b(x), a linear combination of a(x) and b(x). Similarly, if we have any number of polynomials, we can write the greatest common divisor of any pair of them as a linear combination. Pick two pairs of polynomials, and write the greatest common divisor of the greatest common divisors as a linear combination, etc. Keep going until you hit the greatest common divisor of the entire collection. We can unwind this process, to write the greatest common divisor of the entire collection of polynomials as a linear combination of the polynomials themselves. 225

226

Decomposition and Minimal Polynomial

Problem 24.3. For integers 2310, 990 and 1386 (instead of polynomials) express their greatest common divisor as an integer linear combination of them.

Generalized Eigenvectors
Lemma 24.1. Let T : V V be a complex linear map on a nite dimensional vector space V of dimension n. For each vector v in V there is a polynomial q (x) of degree at most n for which q (T )v = 0, and for which the roots of q (x) are eigenvalues of T . Remark 24.2. We could just let q (x) be the characteristic polynomial of T , and employ the CayleyHamilton theorem (theorem 18.14 on page 187). But we will opt for a more elementary proof. Proof. Take a vector v in V , and consider the vectors v, T v, T 2 , . . . , T n v . There are n + 1 vectors in this list, so there must be a linear relation a0 v + a1 T v + + an T n v. Let q (x) = a0 + a1 x + + an xn . Clearly q (T )v = 0. Rescale to get the leading coecient (the highest nonzero aj ) to be 1. Suppose that q () = 0. If is not an eigenvalue of T , then T is invertible, and we can drop the x j factor from q (x) and still satisfy q (T )v = 0. Theorem 24.3. Let T : V V be a complex linear map on a nite dimensional vector space. Every vector in V can be written as a sum of generalized eigenvectors, in a unique way. In other words, V is the direct sum of the generalized eigenspaces of T . Proof. We have seen in the previous chapter that generalized eigenvectors with dierent eigenvalues are linearly independent. We need to show that every vector v can be written as a sum of generalized eigenvectors. Pick any vector v in V . Suppose that T has eigenvalues 1 , 2 , . . . , s , all distinct from one another. Pick q (x) as in lemma 24.1. By the fundamental theorem of algebra, we can write q (x) as a product of linear factors, each of the form x j . Let qj (x) be the result of dividing out as many factors of x j as possible from q (x), say: qj (x) = q ( x) (x j )
dj

24.2. The Minimal Polynomial

227

The various qj (x) have no common divisors by construction. Therefore we can nd polynomials uj (x) for which uj (x)qj (x) = 1. Let vj = uj (T )qj (T )v . Clearly vj = = v. These vi are generalized eigenvectors: (T j )
dj

qj (T )uj (T )v

vj = (T j )

dj

qj (T )uj (T )v

= q (T )uj (T )v = uj (T )q (T )v = 0.

24.2 The Minimal Polynomial


We are interested in equations satised by a matrix. Lemma 24.4. Every linear map T : V V on a nite dimensional vector space V satises a polynomial equation p(T ) = 0 with p(x) a nonzero polynomial. Remark 24.5. The CayleyHamilton theorem (theorem 18.14 on page 187) proves this lemma easily, but again we prefer an elementary proof. Proof. The set of all linear maps V V is a nite dimensional vector space, 2 of dimension n2 where n = dim V . Therefore the elements 1, T, T 2 , . . . , T n cannot be linearly independent. Problem 24.4. Prove that k has zeros down the diagonal, for any integer k 1. Problem 24.5. Prove that, for any number , the diagonal entries of ( + ) are all k .
k

Denition 24.6. The minimal polynomial of a square matrix A (or a linear map T : V V ) is the smallest degree polynomial m(x) = xd + ad1 xd1 + + a0 (with complex coecients) for which m(A) = 0 (or m(T ) = 0). Lemma 24.7. There is a unique minimal polynomial for any linear map T : V V on a nite dimensional vector space V . The minimal polynomial divides every other polynomial s(x) for which s(T ) = 0.

228

Decomposition and Minimal Polynomial

Remark 24.8. The CayleyHamilton theorem (theorem 18.14 on page 187) coupled with this lemma ensures that the minimal polynomial divides the characteristic polynomial. Proof. For example, if T satises two polynomials, say 0 = T 3 + 3 T + 1 and 1 to get 0 = T 3 + 1 0 = 2 T 3 + 1, then we can rescale the second equation by 2 2, 3 and then we have two equations which both start with T , so just take the 1 dierence: 0 = T 3 + 3 T + 1 T 3 + 2 . The point is that the T 3 terms wipe each other out, giving a new equation of lower degree. Keep going until you get the lowest degree possible nonzero polynomial. Rescale to get the leading coecient to be 1. If s(x) is some other polynomial, and s(T ) = 0, then divide m(x) into s(x), say s(x) = q (x)m(x) + r(x), with remainder r(x) of smaller degree than m(x). But then 0 = s(T ) = q (T )m(T ) + r(T ) = r(T ), so r(x) has smaller degree than m(x), and r(T ) = 0. But m(x) is already the smallest degree possible without being 0. So r(x) = 0, and m(x) divides s(x). Problem 24.6. Prove that the minimal polynomial of n is m(x) = xn .

Problem 24.7. Prove that the minimal polynomial of a Jordan block + n n is m(x) = (x ) .

Lemma 24.9. If A and B are square matrices with minimal polynomials mA (x) and mB (x), then the matrix C= A 0 0 B

has minimal polynomial mC (x) the greatest common divisor of the polynomials mA (x) and mB (x). Proof. Calculate that C2 = etc., so for any polynomial q (x), q (C ) = q (A) 0 0 . q (B ) A2 0 0 B2 ,

Let g (x) be the greatest common divisor of the polynomials mA (x) and mB (x). Then clearly g (C ) = 0. So mC (x) divides g (x). But mC (C ) = 0, so mC (A) = 0. Therefore mA (x) divides mC (x). Similarly, mB (x) divides mC (x). So mC (x) is the greatest common divisor.

24.2. The Minimal Polynomial

229

Using the same proof: Lemma 24.10. If a linear map T : V V has invariant subspaces U and W so that V = U W , then the minimal polynomial of T is the greatest common divisor of the minimal polynomials of T |U and T |W . Lemma 24.11. The minimal polynomial m(x) of a complex linear map T : V V on a nite dimensional vector space V is m(x) = (x 1 )
d1

(x 2 )

d2

. . . (x s )

ds

where 1 , 2 , . . . , s are the eigenvalues of T and dj is no larger than the dimension of the generalized eigenspace of j . Remark 24.12. In fact dj is the size of the largest Jordan block with eigenvalue j in the Jordan normal form. Lets rst prove the result using Jordan normal form. Proof. We can assume that T is already in Jordan normal form: the minimal polynomial is the greatest common divisor of the minimal polynomials of the blocks. Remark 24.13. Next a proof which doesnt use Jordan normal form. Proof. We need only prove the result on each generalized eigenspace since they form a direct sum. We can assume that V is a single generalized eigenspace, say with eigenvalue . The result holds for T just if it holds for T , so we can assume that = 0 is our only eigenvalue. By lemma 24.1 on page 226, every vector v satises T n v = 0. So the minimal polynomial must divide xn . Corollary 24.14. A square matrix A (or linear map T : V V ) is diagonalizable just when it satises a polynomial equation s(A) = 0 (or s(T ) = 0) s(x) = (x 1 ) (x 2 ) . . . (x s ) = 0, for some distinct numbers 1 , 2 , . . . , s , which happens just when its minimal polynomial is a product of distinct linear factors.

24.2 Review Problems


Problem 24.8. Find the minimal polynomial of A= 1 3 2 . 4

230

Decomposition and Minimal Polynomial

Problem 24.9. Prove that the minimal polynomial of any 2 2 matrix A is m() = 2 tr A + det A, (where tr A is the trace of A), unless A is a multiple of the identity matrix, say A = c for some number c, in ehich case m(A) = c. Problem 24.10. Use Jordan normal form to prove the CayleyHamilton theorem: every complex square matrix A satises p(A) = 0, where p() = det (A I ) is the characteristic polynomial of A.

Problem 24.11. Prove that if A is a square matrix with real entries, then the minimal polynomial of A has real coecients.

Problem 24.12. If A is a square matrix, and An = 1, prove that A is diagonalizable over the complex numbers. Give an example to show that A need not be diagonalizable over the real numbers.

Appendix: How the Find the Minimal Polynomial


Given a square matrix A, to nd its minimal polynomial requires that we nd linear relations among powers of A. If we nd a relation like A3 = I + 5A, then we can multiply both sides by A to obtain a relation A4 = A + 5A2 . In particular, once some power of A is a linear combination of lower powers of A, then every higher power is also a linear combination of lower powers. For each n n matrix A, just for this appendix lets write A to mean the vector you get by chopping out the columns of A and stacking them on top of one another. For example, if A= then 1 3 A = . 2 4 Clearly a linear relation like A = I + 5A will hold just when it holds underlined: A3 = I + 5 A. Now lets suppose that A is n n, and lets form the matrix B= I A A2 ... An .
3

1 3

2 , 4

24.3. Decomposition of a Linear Map

231

Clearly B has n2 rows and n columns. Apply forward elimination to B , and call the resulting matrix U . If one of the columns, lets say A3 , is not a pivot column, then A3 is a linear combination of lower powers of A, so therefore A4 is too, etc. So as soon as you hit a pivotless column of B , all subsequent columns are pivotless. Therefore U looks like U = ,

pivots straight down the diagonal, until you hit rows of zeros. Cut out all of the pivotless columns of U except the rst pivotless column. Also cut out the zero rows. Then apply back substitution, turning U into 1 a0 1 a1 . . .. . . . 1 ap Then the minimal polynomial is m(x) = xp+1 a0 a1 x ap xp . To see that this works, you notice that we have cut out all but the column of Ap+1 , the smallest power of A that is a linear multiple of lower powers. So the minimal polynomial has to express Ap+1 as a linear combination of lower powers, i.e. solving the linear equations a0 I + a1 A + + ap Ap = Ap+1 = 0. These equations yield the matrix B with a0 , a1 , . . . , ap as the unknowns, and we just apply elimination. On large matrices, this process is faster than nding the determinant. But it has the danger that small perturbations of the matrix entries alter the minimal polynomial drastically, so we can only apply this process when we know the matrix entries precisely. Problem 24.13. Find the minimal 0 2 0 What are the eigenvalues? polynomial of 1 1 1 2 . 0 1

24.3 Decomposition of a Linear Map


There is a more abstract version of the Jordan normal form, applicable to an abstract nite dimensional vector space.

232

Decomposition and Minimal Polynomial

Denition 24.15. If T : V W is a linear map, and U is a subspace of V , recall that the restriction T |U : U W is dened by T |U u = T u for u in U . Denition 24.16. Suppose that T : V V is a linear map from a vector space back to itself, and U is a subspace of V . We say that U is invariant under T if whenever u is in U , T u is also in U . A dicult result to prove by any other means: Corollary 24.17. If a linear map T : V V on a nite dimensional vector space is diagonalizable, then its restriction to any invariant subspace is diagonalizable. Proof. The linear map satises the same polynomial equation, even after restricting to the subspace. Problem 24.14. Prove that a linear map T : V V on a nite dimensional vector space is diagonalizable just when every subspace invariant under T has a complementary subspace invariant under T . Denition 24.18. A linear map N : V V from a vector space back to itself is called nilpotent if there is some positive integer k for which N k = 0. Clearly a linear map on a nite dimensional vector space is nilpotent just when its minimal polynomial is p(x) = xk for some positive integer k . Corollary 24.19. A linear map N : V V is nilpotent just when the restriction of N to any N invariant subspace is nilpotent. Problem 24.15. Prove that the only nilpotent which is diagonalizable is 0. Problem 24.16. Give an example to show that the sum of two nilpotents might not be nilpotent. Denition 24.20. Two linear maps S : V V and T : V V commute when ST = T S . Lemma 24.21. The sum and dierence of commuting nilpotents is nilpotent. Proof. If S and T are nilpotent linear maps V V , say S p = 0 and T q = 0. r Then take any number r p + q and expand out the sum (S + T ) . Because S and T commute, every term can be written with all its S factors on the left, and all its T factors on the right, and there are r factors in all, so either p of the factors must be S or q must be T , hence each term vanishes. Lemma 24.22. If two linear maps S, T : V V commute, then S preserves each generalized eigenspace of T , and vice versa.

24.3. Decomposition of a Linear Map

233

Proof. If ST = T S , then clearly ST 2 = T 2 S , etc. so that Sp(T ) = p(T )S for any polynomial p(T ). Suppose that T has an eigenvalue . If x is a generalized eigenvector, i.e. (T )p x = 0 for some p, then (T ) Sx = S (T ) x = 0, so that Sx is also a generalized eigenvector with the same eigenvalue. Theorem 24.23. Take T : V V any linear map from a nite dimensional vector space back to itself. T can be written in just one way as a sum T = D + N with D diagonalizable, N nilpotent, and all three of T, D and N commuting. Example 24.24. For T = + , set D = , and N = . Remark 24.25. If any two of T, D and N commute, and T = D + N , then it is easy to check that all three commute. Proof. First, lets prove that D and N exist, and then prove they are unique. One proof that D and N exist, which doesnt require Jordan normal form: split up V into the direct sum of the generalized eigenspaces of T . It is enough to nd some D and N on each one of these spaces. But on each eigenspace, say with eigenvalue , we can let D = and let N = T . So existence of D and N is obvious. Another proof that D and N exist, which uses Jordan normal form: pick a basis in which the matrix of T is in Jordan normal form. Lets also call this matrix T , say 1 + 2 + . T = .. . N + Let 1 D= 2 .. . N and N = .. . This proves that D and N exist. Why are D and N uniquely determined? All generalized eigenspaces of T are D and N invariant. So we can restrict to a single generalized eigenspace . ,
p p

234

Decomposition and Minimal Polynomial

of T , and need only show that D and N are uniquely determined there. If is the eigenvalue, then D = (T ) N is a dierence of commuting nilpotents, so nilpotent by lemma 24.21 on page 232. Therefore D is both nilpotent and diagonalizable, and so vanishes: D = and N = T , uniquely determined. Theorem 24.26. If T0 : V V and T1 : V V are commuting complex linear maps (i.e. T0 T1 = T1 T0 ) on a nite dimensional complex vector space V , then splitting each into its diagonalizable and nilpotent parts, T0 = D0 + N0 and T1 = D1 + N1 , any two of the maps T0 , D0 , N0 , T1 , D1 , N1 commute. Proof. If x is a generalized eigenvector of T0 (so that (T0 ) x = 0 for some k and integer k > 0), then T1 x is also (because (T0 ) T1 x = 0 too, by pulling the T1 across to the left). Since V is a direct sum of generalized eigenspaces of T0 , we can restrict to a generalized eigenspace of T0 and prove the result there. So we can assume that T0 = 0 + N0 , for some complex number 0 . Switching the roles of T0 and T1 , we can assume that T1 = 1 + N1 . Clearly D0 = 0 and D1 = 1 commute with one another and with anything else. The commuting of T0 and T1 is equivalent to the commuting of N0 and N1 . Remark 24.27. All of the results of this chapter apply equally well to any linear map T : V V on any nite dimensional vector space over any eld, as long as the characteristic polynomial of T splits into a product of linear factors.
k

24.3 Review Problems


Problem 24.17. Find the decomposition T = D + N of T = 1 0 1 2

(using the same letter T for the linear map and its associated matrix).

Problem 24.18. If two linear maps S : V V and T : V V on a nite dimensional complex vector space commute, show that the eigenvalues of ST are products of eigenvalues of S with eigenvalues of T .

25 Matrix Functions of a Matrix Variable


We will make sense out of expressions like eT , sin T, cos T for square matrices T , and for linear maps T : V V . We expect that the reader is familiar with calculus and innite series. Denition 25.1. A function f (x) is analytic if near each point x = x0 , f (x) is the sum of a convergent Taylor series, say f (x) = a0 + a1 (x x0 ) + a2 (x x0 ) + . . . We will henceforth allow the variable x (and the point x = x0 around which we take the Taylor series) to take on real or complex values. Denition 25.2. If f (x) is an analytic function and T : V V is a linear map on a nite dimensional vector space, dene f (T ) to mean the innite series f (T ) = a0 + a1 (T x0 ) + a2 (T x0 ) + . . . , just plugging the linear map T into the expansion. Lemma 25.3. Under an isomorphism F : U V of nite dimensional vector spaces f F 1 T F = F 1 f (T )F, with each side dened when the other is. Proof. Expanding out, we see that F 1 T F
1 k 1 k 2 2 2

= F 1 T 2 F . By induction,

F T F = F T F for k = 1, 2, 3, . . . . So for any polynomial function p(x), we see that p F 1 T F = F 1 p(T )F . Therefore the partial sums of the Taylor expansion converge on the left hand side just when they converge on the right, approaching the same value. Remark 25.4. If a square matrix is in square blocks, A= B 0 235 0 C ,

236 then clearly f (A) = f (B ) 0

Matrix Functions of a Matrix Variable

0 . f (C )

So we only need to work one block at a time. Theorem 25.5. Let f (x) be an analytic function. If A is a single Jordan block, A = + n , and the series for f (x) converges near x = , then f (A) = f () + f () + f (n1) () n1 f () 2 f () 3 + + + . 2 3! (n 1)!

Proof. Expand out the Taylor series, keeping in mind that n = 0. Corollary 25.6. The value of f (A) does not depend on which Taylor series we use for f (x): we can expand f (x) about any point as long as the series converges on the spectrum of A. If we change the choice of the point to expand around, the resulting expression for f (A) determines the same matrix. Proof. Entries will be given by the formulas above, which dont depend on the particular choice of Taylor series, only on the values of the function f (x) for x in the spectrum of A. Corollary 25.7. Suppose that T : V V is a linear map of a nite dimensional vector space. Split T into T = D + N , diagonalizable and nilpotent parts. Take f (x) an analytic function given by a Taylor series converging on the spectrum of T (which is the spectrum of D). Then f (T ) = f (D + N ) = f (D) + f (D)N + f (D )N 2 f (D)N 3 f (n1) (D)N n1 + + + 2! 3! (n 1)!

where n is the dimensional of V . Remark 25.8. In particular, the result is independent of the choice of point about which we expand f (x) into a Taylor series, as long as the series converges on the spectrum of T . Example 25.9. Consider the matrix A= 0 1 = + .

25.1. The Exponential and Logarithm

237

Then sin A = sin ( + ) = sin ( ) + sin ( ) = 0 + cos ( ) = = Problem 25.1. Find 0 0 1 . 0

1 + , log (1 + ), e .

Remark 25.10. If f (x) is analytic on the spectrum of a linear map T : V V on nite dimensional vector space, then we could actually dene f (T ) by using a dierent Taylor series for f (x) around each eigenvalue, but that would require a more sophisticated theory. (We could, for example, split up V into generalized eigenspaces for T , and compute out f (T ) on each generalized eigenspace separately; this proves convergence.) However, we will never use such a complicated theorywe will only dene f (T ) when f (x) has a single Taylor series converging on the entire spectrum of T .

25.1 The Exponential and Logarithm


Example 25.11. The series ex = 1 + x + x2 x3 + + ... 2! 3!

converges for all values of x. Therefore eT is dened for any linear map T : V V on any nite dimensional vector space. Problem 25.2. Recall that the trace of an n n matrix A is tr A = A11 + A22 + + Ann . Prove that det eA = etr A . Example 25.12. For any positive number r,

log (x + r) =
k=0

1 x k r

For |x| < r, this series converges by the ratio test. (Clearly x can actually be real or complex.) If T : V V is a linear map on a nite dimensional vector space, and if every eigenvalue of T has positive real part, then we can pick r larger than the largest eigenvalue of T . Then

log T =
k=0

1 k

T r r

converges. The value of this sum does not depend on the value of r, since the value of log x = log (x r + r) doesnt.

238

Matrix Functions of a Matrix Variable

Remark 25.13. The same tricks work for complex linear maps. We wont ever be tempted to consider f (T ) for f (x) anything other than a real-valued function of a real variable; the reader may be aware that there is a sophisticated theory of complex functions of a complex variable.
1 Problem 25.3. Find the Taylor series of f (x) = x around the point x = r (as 1 long as r = 0). Prove that f (A) = A , if all eigenvalues of A have positive real part.

Problem 25.4. If f (x) is the sum of a Taylor series converging on the spectrum of a matrix A, why are the entries of f (A) smooth functions of the entries of A? (A function is called smooth to mean that we can dierentiate the function any number of times with respect to any of its variables, in any order.) Problem 25.5. For any complex number z = x + iy , prove that ez converges to ex (cos y + i sin y ). Problem 25.6. Use the result of the previous exercise to prove that elog A = A if all eigenvalues of T have positive real part. Problem 25.7. Use the results of the previous two exercises to prove that log eA = A if all eigenvalues of A have imaginary part strictly between /2 and /2. Lemma 25.14. If A and B are two n n matrices, and AB = BA, then eA+B = eA eB . Proof. We expand out the product eA eB and collect terms. (The process proceeds term by term exactly as it would if A and B were real numbers, because AB = BA.) Corollary 25.15. eA is invertible for all square matrices A, and eA Proof. A commutes with A, so eA eA = e0 = 1. Denition 25.16. A real matrix A is skew-symmetric if At = A. A complex matrix A is skew-adjoint if A = A. Corollary 25.17. If A is skew-symmetric/complex skew-adjoint then eA is orthogonal/unitary. Proof. Term by term in the Taylor series, eA other cases.
t 1

= e A .

= eA = eA . Similarly for the

Lemma 25.18. If two n n matrices A and B commute (AB = BA) and the eigenvalues of A and B have positive real part and the products of their eigenvalues also have positive real part, then log (AB ) = log A + log B .

25.1. The Exponential and Logarithm

239

Proof. The eigenvalues of AB will be products of eigenvalues of A and of B , as see in section 24.3 on page 231. Again the result proceeds as it would for A and B numbers, term by term in the Taylor series. Corollary 25.19. If A is orthogonal/unitary and all eigenvalues of A have positive real part, then log A is skew-symmetric/complex skew-adjoint. Problem 25.8. What do corollaries 25.17 and 25.19 say about the matrix A= 0 1 1 ? 0

Problem 25.9. What can you say about eA for A symmetric? For A selfadjoint? If we slightly alter a matrix, we only slightly alter its spectrum. Theorem 25.20 (Continuity of the spectrum). Suppose that A is a n n complex matrix, and pick some disks in the complex plane which together contain exactly k eigenvalues of A (counting each eigenvalue by its multiplicity). In order to ensure that a complex matrix B also has exactly k eigenvalues (also counted by multiplicity) in those same disks, it is sucient to ensure that each entry of B is close enough to the corresponding entry of A. Proof. Eigenvalues are the roots of the characteristic polynomial det (A I ). If each entry of B is close enough to the corresponding entry of A, then each coecient of the characteristic polynomial of B is close to the corresponding coecient of the characteristic polynomial of A. The result follows by the argument principle (theorem 25.32 on page 243 in the appendix to this chapter).

Remark 25.21. The eigenvalues vary as dierentiable functions of the matrix entries as well, except where eigenvalues collide (i.e. at matrices for which two eigenvalues are equal), when there might not be any way to write the eigenvalues in terms of dierentiable functions of matrix entries. In a suitable sense, the eigenvectors can also be made to depend dierentiably on the matrix entries away from eigenvalue collisions. See Kato [5] for more information. Problem 25.10. Find the eigenvalues of A= 0 t 1 0

as a function of t. What happens at t = 0? Problem 25.11. Prove that if an n n complex matrix A has n distinct eigenvalues, then so does every complex matrix whose entries are close enough to the entries of A.

240

Matrix Functions of a Matrix Variable

Corollary 25.22. If f (x) is an analytic function given by a Taylor series converging on the spectrum of a matrix A, then f (B ) is dened by the same Taylor expansion as long as each entry of B is close enough to the corresponding entry of A. Problem 25.12. Prove that a complex square matrix A is invertible just when it has the form A = eL for some square matrix L.

25.2 Appendix: Analysis of Innite Series


Proposition 25.23. Suppose that f (x) is dened by a convergent Taylor series f (x) =
k

ak (x x0 ) ,

converging for x near x0 . Then both of the series |ak | |x x0 | ,


k k

kak (x x0 )
k

k1

converge for x near x0 . Proof. We can assume just by replacing x by x x0 that x0 = 0. Lets suppose that our Taylor series converges for b < x < b. Then it must converge for x = b/r for any r > 1. So the terms must get small eventually, i.e. ak For large enough k , |ak | < Pick any R < r. Then |ak | b R
k

b r

0.

r b

<

b R r = R

r b

Therefore if |x| b/R, we have |ak | |x|


k

r R

25.2. Appendix: Analysis of Innite Series

241

a geometric series of diminishing terms, so convergent. Similarly, kak xk which converges by the comparison test. Corollary 25.24. Under the same conditions, the series k
k

r R

ak (x x0 )

converges in the same domain. Proof. The same trick works. Lemma 25.25. If f (x) is the sum of a convergent Taylor series, then f (x) is too. More specically, if f (x) = then f ( x) = Proof. Let f1 ( x) = kak (x x0 )
k 1

ak (x x0 ) ,
k 1

kak (x x0 )

which we know converges in the same open interval where the Taylor series for f (x) converges. We have to show that f1 (x) = f (x). Pick any points x + h and x where f (x) converges. Expand out: f (x + h) f (x) f1 (x) = h =
k k

ak (x + h) h
k

ak xk

kak xk1

ak

(x + h) x kxk1 h
k

=
k

ak
=0 k

xk h xk h
k

xk h1 kxk1

=h
k

ak
=2

=h
k

ak k (k 1)
=0

1 k 2 k 2 x h. ( + 2)( + 1)

242

Matrix Functions of a Matrix Variable

Each term has absolute value no more than ak k (k 1) k (x + h)k2

which are the terms of a convergent series. The expression f (x + h) f (x) f1 (x) h is governed by a convergent series multiplied by h. In particular the limit as h 0 is 0. Corollary 25.26. Any function f (x) which is the sum of a convergent Taylor series in a disk has derivatives of all orders everywhere in the interior of that disk, given by formally dierentiating the Taylor series of f (x). All of these tricks work equally well for complex functions of a complex variable, as long as they are sums of Taylor series.
z p z p

25.3 Appendix: Perturbing Roots of Polynomials


Lemma 25.27. As a point z travels counterclockwise around a circle, it travels once around every point inside the circle, and does not travel around any point outside the circle. Remark 25.28. Lets state the result more precisely. The angle between any points z and p is dened only up to 2 multiples. As we will see, if z travels around a circle counterclockwise, and p doesnt lie on that circle, we can select this angle to be a continuously varying function . If p is inside the circle, then this function increases by 2 as z travels around the circle. If p is outside the circle, then this function is periodic as z travels around the circle. Proof. Rotate the picture to get p to lie on the positive x-axis, say p = (p0 , 0). Scale to get the circle to be the unit circle, so z = (cos , sin ). The vector from z to p is p z = (p0 cos , sin ) . This vector has angle from the horizontal, where (cos , sin ) = with r= (p0 cos ) + sin2 .
2

A point z travelling around a circle winds around each point p inside, and doesnt wind around any point p outside.
z p z p

As z travels around the circle, the angle from p to z increases by 2 if p is inside, but is periodic if p is outside.

p0 cos sin , r r

If p0 > 1, then cos > 0 so that after adding multiples of 2 , we must have contained inside the domain of the arcsin function: = arcsin sin r ,

25.3. Appendix: Perturbing Roots of Polynomials

243

a continuous function of , and ( + 2 ) = (). This continuous function is uniquely determined up to adding integer multiples of 2 . On the other hand, suppose that p0 < 1. Consider the angle between Z = (p0 cos , p0 sin ) and P = (1, 0). By the argument above = arcsin where r=
2 (1 p0 cos ) + p2 0 sin . 2

p0 sin r

Rotating by takes P to z and Z to p. Therefore the angle of the ray from z to p is = () + , a continuous function increasing by 2 every time increases by 2 . Since is uniquely determined up to adding integer multiples of 2 , so is . Corollary 25.29. Consider the complex polynomial function P (z ) = (z p1 ) (z p2 ) . . . (z pn ) . Suppose that p1 lies inside some disk, and all other roots p2 , p3 , . . . , pn lie outside that disk. Then as z travels once around the boundary of that disk, the argument of the complex number w = P (z ) increases by 2 . Proof. The argument of a product is the sum of the arguments of the factors, so the argument of P (z ) is the sum of the arguments of z p1 , z p2 , etc. Corollary 25.30. Consider the complex polynomial function P (z ) = a (z p1 ) (z p2 ) . . . (z pn ) . Suppose that some roots p1 , p2 , . . . , pk all lie inside some disk, and all other roots pk+1 , pk+2 , . . . , pn lie outside that disk. Then as z travels once around the boundary of that disk, the argument of the complex number w = P (z ) increases by 2k . Corollary 25.31. Consider two complex polynomial functions P (z ) and Q(z ). Suppose that P (z ) has k roots lying inside some disk, and Q(z ) has roots lying inside that same disk, and all other roots of P (z ) and Q(z ) lie outside that disk. (So no roots of P (z ) or Q(z ) lie on the boundary of the disk.) Then as z travels once around the boundary of that disk, the argument of the complex number w = P (z )/Q(z ) increases by 2 (k ). Theorem 25.32 (The argument principle). If P (z ) is a polynomial, with k roots inside a particular disk, and no roots on the boundary of that disk, then every polynomial Q(z ) of the same degree as P (z ) and whose coecients are suciently close to the coecients of P (z ) has exactly k roots inside the same disk, and no roots on the boundary.

244

Matrix Functions of a Matrix Variable

Proof. To apply corollary 25.31 on the previous page, we have only to ensure that Q(z )/P (z ) is not going to change in argument (or vanish) as we travel around the boundary of that disk. So we have only to ensure that while z stays on the boundary of the disk, Q(z )/P (z ) lies in a particular half-plane, for example that Q(z )/P (z ) is never a negative real number (or 0). So it is enough to ensure that |P (z ) Q(z )| < |P (z )| for z on the boundary of the disk. Let m be the minimum value of |P (z )| for z on the boundary of the disk. Suppose that the furthest point of our disk from the origin is some point z with |z | = R. Then if we write out Q(z ) = P (z ) + cj z j , we only need to ensure that the coecients c0 , c1 , . . . , cn satisfy |cj |Rj < m, to be sure that Q(z ) will have the same number of roots as P (z ) in that disk.

26 Symmetric Functions of Eigenvalues


26.1 Symmetric Functions
Denition 26.1. A function f (x1 , x2 , . . . , xn ) is symmetric if its value is unchanged by permuting the variables x1 , x2 , . . . , xn . For example, x1 + x2 + + xn is clearly symmetric. Denition 26.2. The elementary symmetric functions are the functions s1 (x) = x1 + x2 + . . . + xn s2 (x) = x1 x2 + x1 x3 + + x1 xn + x2 x3 + + xn1 xn . . . sk (x) =
i1 <i2 <<ik

xi1 xi2 . . . xik .

For any (real or complex) numbers x = (x1 , x2 , . . . , xn ) let Px (t) = (t x1 ) (t x2 ) . . . (t xn ) . Clearly the roots of Px (t) are precisely the entries of the vector x. Problem 26.1. Prove that Px (t) = tn s1 (x)tn1 + s2 (x)tn2 + + (1) sk (x)tnk + + (1)n sn (x). Denition 26.3. Let s(x) = (s1 (x), s2 (x), . . . , sn (x)), so that s : Rn Rn (If we work with complex numbers, then s : Cn Cn .) Lemma 26.4. The map s is onto, i.e. for each complex vector w in Cn there is a complex vector z in Cn so that s(z ) = w. Proof. Pick any w in Cn . Let z1 , z2 , . . . , zn be the complex roots of the polynomial P (t) = tn w1 tn1 + w2 tn2 + + (1)n wn . Such roots exist by the fundamental theorem of algebra (see theorem 15.21 on page 158). Clearly Pz (t) = P (t), since these polynomial functions have the same roots and same leading term. 245
k

246

Symmetric Functions of Eigenvalues

Lemma 26.5. The entries of two vectors z and w are permutations of one another just when s(z ) = s(w). Proof. The roots of Pz (t) and Pw (t) are the same numbers. Corollary 26.6. A function is symmetric just when it is a function of the elementary symmetric functions. Remark 26.7. This means that every symmetric function f : Cn C has the form f (z ) = h(s(z )), for a unique function h : Cn C , and conversely if h is any function at all, then f (z ) = h(s(z )) determines a symmetric function. Theorem 26.8. A symmetric function of some complex variables is continuous just when it is expressible as a continuous function of the elementary symmetric functions, and this expression is uniquely determined. Proof. If h(z ) is continuous, clearly f (z ) = h(s(z )) is. If f (z ) is continuous and symmetric, then given any sequence w1 , w2 , . . . in Cn converging to a point w, we let z1 , z2 , . . . be a sequence in Cn for which s (zj ) = wj , and z a point for which s(z ) = w. The entries of zj are the roots of the polynomial Pzj (t) = tn wj 1 tn1 + wj 2 tn2 + + (1)n wjn . By the argument principle (theorem 25.32 on page 243), we can rearrange the entries of each of the various z1 , z2 , . . . vectors so that they converge to z . Therefore h (wj ) = f (zj ) converges to f (z ) = h(w). If there are two expressions, f (z ) = h1 (s(z )) and f (z ) = h2 (s(z )), then because s is onto, h1 = h2 .
a1 a2 an If a = (a1 , a2 , . . . , an ), write z a to mean z1 z2 . . . zn . Call a the weight of a the monomial z . We will order weights by alphabetical order, for example so that (2, 1) > (1, 2). Dene the weight of a polynomial to be the highest weight of any of its monomials. (The zero polynomial will not be assigned any weight.) The weight of a product of nonzero polynomials is the sum of the weights. The weight of a sum is at most the highest weight of any term. The weight of sj (z ) is (1, 1, . . . , 1, 0, 0, . . . , 0). j

Theorem 26.9. Every symmetric polynomial f has exactly one expression as a polynomial in the elementary symmetric polynomials. If f has real/rational/integer coecients, then f is a real/rational/integer coecient polynomial of the elementary symmetric polynomials. Proof. For any monomial z a , let
za = p

z p(a)

26.1. Symmetric Functions

247

a sum over all permutations p. Every symmetric polynomial, if it contains a monomial z a , must also contain z p(a) , for any permutation p. Hence every symmetric polynomial is a sum of z a polynomials. Consequently the weight a of a symmetric polynomial f must satisfy a1 a2 an . We have only to write the z a in terms of the elementary symmetric functions, with integer coecients. Let bn = an , bn1 = an1 bn , bn2 = an2 bn1 , . . . , b1 = a1 b2 . Then s(z )b has leading monomial z a , so z a s(z )b has lower weight. Apply induction on the weight.
2 2 Example 26.10. z1 + z2 = (z1 + z2 ) 2 z1 z2 . To compute out these expressions: 2 2 f (z ) = z1 + z2 has weight (2, 0). The polynomials s1 (z ) and s2 (z ) have weights (1, 0) and (1, 1). So we subtract o the appropriate weights of s1 (z )2 from f (z ), and nd f (z ) s1 (z )2 = 2 z1 z2 = 2s2 (z ). 2

Sums of powers
j j j Dene pj (z ) = z1 + z2 + + zn , the sums of powers.

Lemma 26.11. The sums of powers are related to the elementary symmetric functions by 0 = k sk p1 sk1 + p2 sk2 + (1)k1 pk1 s1 + (1)k pk . Proof. Lets write z ( ) for z with the -th entry removed, so if z is a vector in Cn , then z ( ) is a vector in Cn1 . pj skj = zj
i1 <i2 <<ikj

zi1 zi2 . . . zikj

Either we cant pull a z factor out of the second sum, or we can:


i1 ,i2 ,= i1 ,i2 ,=

= =

zj
i1 <i2 <<ikj

zi1 zi2 . . . zikj +


)

z j +1
i1 <i2 <<ikj 1 )

zi1 zi2 . . . zikj1

z j skj z (

z j +1 skj 1 z (

Putting in successive terms of our sum, pj skj pj +1 skj 1 = = z j skj z (


)

+
)

z j +1 skj 1 z (

z j +1 skj 1 z ( z j skj z (
)

z j +2 skj 2 z (
)

z j +2 skj 2 z (

248 Hence the sum collapses to p1 sk p2 sk1 + + (1)k1 pk1 s1 =

Symmetric Functions of Eigenvalues

z sk1 z (

+ (1)k1

z k s0 z (

= k sk + (1)k1 pk .

Proposition 26.12. Every symmetric polynomial is a polynomial in the sums of powers. If the coecients of the symmetric polynomial are real (or rational), then it is a real (or rational) polynomial function of the sums of powers. Every continuous symmetric function of complex variables is a continuous function of the sums of powers. Proof. We can solve recursively for the sums of powers in terms of the elementary symmetric functions and conversely. Remark 26.13. The standard reference on symmetric functions is [7].

26.2 The Invariants of a Square Matrix


Denition 26.14. A complex-valued or real-valued function f (A), depending on the entries of a square matrix A, is an invariant if f F AF 1 = f (A) for any invertible matrix F . So an invariant is independent of change of basis. If T : V V is a linear map on an n-dimensional vector space, we can dene the value f (T ) of any invariant f of n n matrices, by letting f (T ) = f (A) where A is the matrix associated to F T F 1 , for any isomorphism F : Rn V . Example 26.15. For any n n matrix A, write det(A ) = sn (A) sn1 (A) + sn2 (A)2 + + (1)n n . The functions s1 (A), s2 (A), . . . , sn (A) are invariants. Example 26.16. The functions pk (A) = tr Ak . are invariants. Problem 26.2. If A is diagonal, say z1 z2 A=

.. , zn

then prove that sj (A) = sj (z1 , z2 , . . . , zn ), the elementary symmetric functions of the eigenvalues.

26.2. The Invariants of a Square Matrix

249

Problem 26.3. Generalize the previous exercise to A diagonalizable. Theorem 26.17. Each continuous (or polynomial) invariant function of a complex matrix has exactly one expression as a continuous (or polynomial) function of the elementary symmetric functions of the eigenvalues. Each polynomial invariant function of a real matrix has exactly one expression as a polynomial function of the elementary symmetric functions of the eigenvalues. Remark 26.18. We can replace the elementary symmetric functions of the eigenvalues by the sums of powers of the eigenvalues. Proof. Every continuous invariant function f (A) determines a continuous function f (z ) by setting z1 z2 . A= .. . zn Taking F any permutation matrix, invariance tells us that f F AF 1 = f (A). But f F AF 1 is given by applying the associated permutation to the entries of z . Therefore f (z ) is a symmetric function. If f (A) is continuous (or polynomial) then f (z ) is too. Therefore f (z ) = h(s(z )), for some continuous (or polynomial) function h; so f (A) = h(s(A)) for diagonal matrices. By invariance, the same is true for diagonalizable matrices. If we work with complex matrices, then every matrix can be approximated arbitrarily closely by diagonalizable matrices (by theorem 23.12 on page 221). Therefore by continuity of h, the equation f (A) = h(s(A)) holds for all matrices A. For real matrices, the equation only holds for those matrices whose eigenvalues are real. However, for polynomials this is enough, since two polynomial functions equal on an open set must be equal everywhere. Remark 26.19. Consider the function f (A) = sj (|1 | , |2 | , . . . , |n |), where A has eigenvalues 1 , 2 , . . . , n . This function is a continuous invariant of a real matrix A, and is not a polynomial in 1 , 2 , . . . , n .

27 The Pfaan
Skew-symmetric matrices have a surprising additional polynomial invariant, called the Pfaan, but it is only invariant under rotations, and only exists for skew-symmetric matrices with an even number of rows and columns.

27.1 Skew-Symmetric Normal Form


Theorem 27.1. For any skew-symmetric matrix A with an even number of rows and columns, there is a rotation matrix F so that 0 a1 a1 0 0 a2 1 a2 0 F AF = . . .. 0 an an 0 (We can say that F brings A to skew-symmetric normal form.) If A is any skewsymmetric matrix with an odd number of rows and columns, we can arrange the same equation, again via a rotation matrix F , but the normal form has an extra row of zeroes and an extra column of zeroes: 0 a1 0 a1 0 a 2 a 0 2 F AF 1 = . .. . 0 an an 0 0 Proof. Because A is skew-symmetric, A is skew-adjoint, so normal when thought of as a complex matrix. So there is a unitary basis of C2n of complex eigenvectors 251

252

The Pfaan

of A. If is an complex eigenvalue of A, with complex eigenvector z , scale z to have unit length, and then = z, z = z, z = Az, z = z, Az = z, z z, z = = . , i.e. has the form ia for some real number a. So there are Therefore = two dierent possibilities: = 0 or = ia with a = 0. If = 0, then z lies in the kernel, so if we write z = x + iy then both x and y lie in the kernel. In particular, we can write a real orthonormal basis for the kernel, and then x and y will be real linear combinations of those basis vectors, and therefore z will be a complex linear combination of those basis vectors. Lets take u1 , u2 , . . . , us to be a real orthonormal basis for the kernel of A, and clearly then the same vectors u1 , u2 , . . . , un form a complex unitary basis for the complex kernel of A. Next lets take care of the nonzero eigenvalues. If = ia is a nonzero eigenvalue, with unit length eigenvector z , then taking complex conjugates on the equation Az = z = iaz , we nd Az = iaz , so z is another eigenvector with eigenvalue ia. So they come in pairs. Since the eigenvalues ia and ia are distinct, the eigenvectors z and z must be perpendicular. So we can always make a new unitary basis of eigenvectors, throwing out any = ia eigenvector and replacing it with z if needed, to ensure that for each eigenvector z in our unitary basis of eigenvectors, z also belongs to our unitary basis. Moreover, we have three equations: Az = iaz , z, z = 1, and z, z = 0. Write z = x + iy with x and y real vectors, and expand out all three equations in terms of x and y to nd Ax = ay, Ay = ax, x, x + y, y = 1, x, x y, y = 0 and 1 1 x, y = 0. So if we let X = x and Y = y , then X and Y are unit vectors, 2 2 and AX = aY and AY = aX . Now if we carry out this process for each eigenvalue = ia with a > 0, then we can write down vectors X1 , Y1 , X2 , Y2 , . . . , Xt , Yt , one pair for each eigenvector from our unitary basis with a nonzero eigenvalue. These vectors are each unit length, and each Xi is perpendicular to each Yi . We also have AXi = ai Yi and AYi = ai Xi . If zi and zj are two dierent eigenvectors from our original unitary basis of eigenvectors, and their eigenvalues are i = iai and j = iaj with ai , aj > 0, then we want to see why Xi , Yi , Xj and Yj must be perpendicular. This follows immediately from zi , z i , zj and z j being perpendicular, by just expanding into real and imaginary parts. Similarly, we can see that u1 , u2 , . . . , us are

27.2. Partitions

253

perpendicular to each Xi and Yi . So nally, we can let F = X1 Y1 X2 Y2 ... Xt Yt u1 u2 ... us .

Clearly these vectors form a real orthonormal basis, so F is an orthogonal matrix. We want to arrange that F be a rotation matrix. Lets suppose that F is not a rotation. We can either change the sign of one of the vectors u1 , u2 , . . . , us (if there are any), or replace X1 by X1 , which switches the sign of a1 , to make F orthogonal.

27.2 Partitions
A partition of the numbers 1, 2, . . . , 2n is a choice of division of these numbers into pairs. For example, we could choose to partition 1, 2, 3, 4, 5, 6 into {4, 1} , {2, 5} , {6, 3} . This is the same partition if we write the pairs down in a dierent order, like {2, 5} , {6, 3} , {4, 1} , or if we write the numbers inside each pair down in a dierent order, like {1, 4} , {5, 2} , {6, 3} . It isnt really important that the objects partitioned be numbers. Of course, you cant partition an odd number of objects into pairs. Each permutation p of the numbers 1, 2, 3, . . . , 2n has an associated partition {p(1), p(2)} , {p(3), p(4)} , . . . , {p(2n 1), p(2n)} . For example, the permutation 3, 1, 4, 6, 5, 2 has associated partition {3, 1} , {4, 6} , {5, 2} . Clearly two dierent permutations p and q could have the same associated partition, i.e. we could rst transpose various of the pairs of the partition of p, keeping each pair in order, and then transpose entries within each pair, but not across dierent pairs. Consequently, there are n!2n dierent permutations associating the same partition: n! ways to permute pairs, and 2n ways to swap the order within each pair. When you permute a pair, like changing 3, 1, 4, 6, 5, 2 to 4, 6, 3, 1, 5, 2, this is the eect of a pair of transpositions (one to permute 3 and 4 and another to permute 5 and 6), so has no eect on signs. Therefore if two permutations have the same partition, the root cause of any dierence in sign must be from transpositions inside each pair. For example, while it is complicated to nd the signs of the permutations 3, 1, 4, 6, 5, 2 and of 4, 6, 1, 3, 5, 2, it is easy to see that these signs must be dierent.

254

The Pfaan

On the other hand, we can write each partition in alphabetical order, like for example rewriting {4, 1} , {2, 5} , {6, 3} as {1, 4} , {2, 5} , {3, 6} so that we put each pair in order, and then order the pairs among one another by their lowest elements. This in term determines a permutation, called the natural permutation of the partition, given by putting the elements in that order; in our example this is the permutation 1, 5, 2, 5, 3, 6. We write the sign of a permutation p as sgn(p), and dene the sign of a partition P to be sign of its natural permutation. Watch out: if we start with a permutation p, like 6, 2, 4, 1, 3, 5, then the associated partition P is {6, 2} , {4, 1} , {3, 5} . This is the same partition as {1, 4} , {2, 6} , {3, 5} (just written in alphabetical order). The natural permutation q of P is therefore 1, 4, 2, 6, 3, 5, so the original permutation p is not the natural permutation of its associated partition. Problem 27.1. How many partitions are there of the numbers 1, 2, . . . , 2n?

Problem 27.2. Write down all of the partitions of (a) 1, 2; (b) 1, 2, 3, 4; (c) 1, 2, 3, 4, 5, 6.

27.3 The Pfaan


We want to write down a square root of the determinant. Example 27.2. If A is a 2 2 skew-symmetric matrix, A= 0 a a , 0

then det A = a2 , so the entry a = A12 is a polynomial function of the entries of A, which squares to det A.

27.3. The Pfaan

255

Example 27.3. A huge calculation shows that if A is a 4 4 skew-symmetric matrix, then 2 det A = (A12 A34 A13 A24 + A14 A23 ) . So A12 A34 A13 A24 + A14 A23 is a polynomial function of the entries of A which squares to det A. Denition 27.4. For any 2n 2n skew-symmetric matrix A, let Pf A = 1 n!2n sgn(p)Ap(1)p(2) Ap(3)p(4) . . . Ap(2n1)p(2n)
p

where sgn(p) is the sign of the permutation p. Pf is called the Pfaan. Remark 27.5. Dont ever try to use this horrible formula to compute a Pfaan. We will nd better way soon. Lemma 27.6. For any 2n 2n skew-symmetric matrix A, Pf A =
P

sgn(P )Ap(1)p(2) Ap(3)p(4) . . . Ap(2n1)p(2n) ,

where the sum is over partitions P and the permutation p is the natural permutation of the partition P . In particular, Pf A is an integer coecient polynomial of the entries of the matrix A. Proof. Each permutation p has an associated partition P . So we can write the Pfaan as a sum Pf A = 1 n!2n sgn(p)Ap(1)p(2) Ap(3)p(4) . . . Ap(2n1)p(2n)
P p

where the rst sum is over all partitions P , and the second over all permutations p which have P as their associated partition. But if two permutations p and q both have the same associated partition {p(1), p(2)} , {p(3), p(4)} , . . . , {p(2n 1), p(2n)} , then p and q give the same pairs of indices in the expression Ap(1)p(2) Ap(3)p(4) . . . Ap(2n1)p(2n) . Perhaps some of the indices in these pairs might be reversed. For example, we might have partition P being {1, 5} , {2, 6} , {3, 4} , and permutations p being 1, 5, 2, 6, 3, 4 and q being 5, 1, 2, 6, 3, 4. The contribution to the sum coming from p is sgn(p)A15 A26 A34 , while that from q is sgn(q )A51 A26 A34 .

256

The Pfaan

But then A51 = A15 , a sign change which is perfectly oset by the sign sgn(q ): each transposition of pairs gives a sign change from sgn(p). So put together, we nd that for any two permutations p and q with the same partition, their contributions are the same: sgn(q )Aq(1)q(2) Aq(3)q(4) . . . Aq(2n1)q(2n) = sgn(p)Ap(1)p(2) Ap(3)p(4) . . . Ap(2n1)p(2n) . Therefore the n!2n permutations with associated partition P all contribute the same amount as the natural permutation of P .

27.4 Rotation Invariants of Skew-Symmetric Matrices


Theorem 27.7. Pf 2 A = det A. Moreover, for any 2n 2n matrix B , Pf BAB t = Pf(A) det(B ). If A is in skew-symmetric normal form, say 0 a1 A= then Pf A = a1 a2 . . . an . Proof. Lets start by proving that Pf A = a1 a2 . . . an for A in skew-symmetric normal form. Looking at the terms that appear in the Pfaan, we nd that at least one of the factors Ap(2j 1)p(2j ) in each term will vanish unless these factors come from among the entries A1,2 , A3,4 , . . . , A2n1,2n . So all terms vanish, except when the partition is (in alphabetical order) {1, 2} , {3, 4} , . . . , {2n 1, 2n} , yielding Pf A = a1 a2 . . . an . In particular, Pf 2 A = det A. a1 0 0 a2 a2 0 .. . 0 an , an 0

27.4. Rotation Invariants of Skew-Symmetric Matrices

257

For any 2n 2n matrix B ,

n!2n Pf BAB t =
p

sgn(p) BAB t sgn(p)


p i1 i2

p(1)p(2)

BAB t

p(3)p(4)

. . . BAB t

p(2n1)p(2n)

Bp(1)i1 Ai1 i2 Bp(2)i2


i3 i4

Bp(3)i3 Ai3 i4 Bp(4)i4

i2n1 i2n

Bp(2n1)i2n1 Ai2n1 i2n Bp(2n)i2n

=
i1 i2 ...i2n p

sgn(p)Bp(1)i1 Bp(2)i2 Bp(3)i3 . . . Bp(2n)i2n det Bei1


i1 i2 ...i2n

Ai1 i2 Ai3 i4 . . . Ai2n1 i2n

Bei2

...

Bei2n Ai1 i2 Ai3 i4 . . . Ai2n1 i2n

If i1 = i2 then two columns are equal inside the determinant, so we can write this as a sum over permutations:

=
q

det Beq(1) sgn(q ) det Be1


q

Beq(2) Be2

... ...

Beq(2n) Aq(1)q(2) Aq(3)q(4) . . . Aq(2n1)q(2n) Be2n Aq(1)q(2) Aq(3)q(4) . . . Aq(2n1)q(2n)

= n!2n det B Pf A.

Finally, to prove that Pf 2 A = det A, we just need to get A into skew-symmetric normal form via a rotation matrix B , and then Pf 2 A = Pf 2 (BAB t ) = det (BAB t ) = det A.

How do you calculate the Pfaan in practice? It is like the determinant, except that you start running your nger down the rst column under the diagonal, and you write down , +, , +, . . . in front of each entry from the rst column, and then the Pfaan you get by crossing out that row and column,

258 and symmetrically 0 2 0 2 Pf 1 8 3 4 the corresponding column and row. So 1 3 0 2 8 4 0 2 = (2) Pf 1 0 5 8 5 0 3 4 0 2 0 2 + (1) Pf 1 8 3 4 0 2 0 2 (3) Pf 1 8 3 4

The Pfaan

1 8 0 5 1 8 0 5 1 8 0 5

3 4 5 0 3 4 5 0 3 4 5 0

= (2) 5 + (1) (4) (3) 8. Lets prove that this works: Lemma 27.8. If A is a skew-symmetric matrix with an even number of rows and columns, larger than 2 2, then Pf A = A21 Pf A[21] + A31 Pf A[31] . . . =
j>1

(1)i+1 Ai1 Pf A[i1] ,

where A[ij ] is the matrix A with rows i and j and columns i and j removed. Proof. Lets dene a polynomial P (A) in the entries of a skew-symmetric matrix A (with an even number of rows and columns) by setting P (A) = Pf A if A is 2 2, and setting P ( A) = (1)i+1 Ai1 P A[i1] ,
i>1

for larger A. We need to show that P (A) = Pf A. Clearly P (A) = Pf A if A is in skew-symmetric normal form. Each term in Pf A corresponds to a partition, and each partition must put 1 into one of its pairs, say in a pair {1, i}. It then cant use 1 or i in any other pair. Clearly P (A) also has exactly one factor like Ai1 in each term, and then no other factors get to have i or 1 as subscripts. Moreover, all terms in P (A) and in Pf A have a coecient of 1 or 1. So it is clear that the terms of P (A) and of Pf A are the same, up to sign. We have to x the signs. Suppose that we swap rows 2 and 3 of A and columns 2 and 3. Lets show that this changes the signs of P (A) and of Pf A. For Pf A, this is immediate from theorem 27.7 on page 256. Let Q be the permutation matrix of the

27.4. Rotation Invariants of Skew-Symmetric Matrices

259

transposition of 2 and 3. (To be more precise, let Qn be the n n permutation matrix of the transposition of 2 and 3, for any value of n 3. But lets write all such matrices Qn as Q.) Let B = QAQt , i.e. A with rows 2 and 3 and columns 2 and 3 swapped. So Bij is just Aij unless i or j is either 2 or 3. So P (B ) = B21 P B [21] + B31 P B [31] B41 P B [41] + . . . = A31 P A[31] + A21 P A[21] A41 P QA[41] Qt By induction, the sign changes in the last term. = +A21 P A[21] A31 P A[31] + A41 P A[41] = P (A). So swapping rows 2 and 3 changes a sign. In the same way, P QAQt = sgn qP (A), for Q the permutation matrix of any permutation q of the numbers 2, 3, . . . , 2n. If we start with A in skew-symmetric normal form, letting the numbers a1 , a2 , . . . , an in the skew-symmetric normal form be some abstract variables, then Pf A is just a single term of the Pf and of P and these terms have the same sign. All of the terms of Pf are obtained by permuting indices in this term, i.e. as Pf (QAQt ) for suitable permutation matrices Q. Indeed you just need to take Q the permutation matrix of the natural permutation of each partition. Therefore the signs of Pf and of P are the same for each term, so P = Pf . Problem 27.3. Prove that the odd degree elementary symmetric functions of the eigenvalues vanish on any skew-symmetric matrix. Let s1 (a), . . . , sn (a) be the usual symmetric functions of some numbers a1 , a2 , . . . , an . For any vector a let t(a) be the vector 2 2 s1 a2 1 , a2 , . . . , an 2 2 s2 a2 1 , a2 , . . . , an . . . . 2 2 sn1 a2 1 , a2 , . . . , an a1 a2 . . . an Lemma 27.9. Two complex vectors a and b in Cn satisfy t(a) = t(b) just when b can be obtained from a by permutation of entries and changing signs of an even number of entries. A function f (a) is invariant under permutations and even numbers of sign changes just when f (a) = h(t(a)) for some function h.
2 2 2 Proof. Clearly sn (a2 1 , a2 , . . . , an ) = (a1 a2 . . . an ) = tn (a) . In particular, the 2 2 symmetric functions of a2 , a , . . . , a are all functions of t , 1 t2 , . . . , tn . Therefore n 1 2 2

260

The Pfaan
2 2 if we have two vectors a and b with t(a) = t(b), then a2 1 , a2 , . . . , an are a permuta2 2 2 tion of b1 , b2 , . . . , bn . So after permutation, a1 = b1 , a2 = b2 , . . . , an = bn , equality up to some sign changes. Since we also know that tn (a) = tn (b), we must have a1 a2 . . . an = b1 b2 . . . bn . If none of the ai vanish, then a1 a2 . . . an = b1 b2 . . . bn ensures that none of the bi vanish either, and that the number of sign changes is even. It is possible that one of the ai vanish, in which case we can change its sign as we like to arrange that the number of sign changes is even.

Lemma 27.10. Two skew-symmetric matrices A and B with the same even numbers of rows and columns can be brought one to another, say B = F AF t , by some rotation matrix F , just when they have skew-symmetric normal forms 0 a1 a1 0 0 a2 a2 0 .. . 0 an an 0 and 0 b1 b1 0 0 b2 b2 0 .. . 0 bn bn 0

respectively, with t(a) = t(b). Proof. If we have a skew-symmetric normal form for a matrix A, with numbers a1 , a2 , . . . , an as above, then t1 (a), t2 (a), . . . , tn1 (a) are the elementary symmetric functions of the squares of the eigenvalues, while tn (a) = Pf A, so clearly t(a) depends only on the invariants of A under rotation. In particular, suppose that I nd two dierent skew-symmetric normal forms, one with numbers a1 , a2 , . . . , an and one with numbers b1 , b2 , . . . , bn . Then the numbers b1 , b2 , . . . , bn must be given from the numbers a1 , a2 , . . . , an by permutation and switching of an even number of signs. In fact we can attain these changes by actual rotations as follows. For example, think about 4 4 matrices. The permutation matrix F of 3, 4, 1, 2 permutes the rst two and second two basis vectors, and is a rotation because the number of transpositions is even. When we replace A by F AF t , we

27.4. Rotation Invariants of Skew-Symmetric Matrices

261

swap a1 with a2 . Similarly, we can take the matrix F which reects e1 and e3 , changing the sign of a1 and of a2 . So we can clearly carry out any permutations, and any even number of sign changes, on the numbers a1 , a2 , . . . , an . Lemma 27.11. Any polynomial in a1 , a2 , . . . , an can be written in only one way as h(t(a)). Proof. Recall that every complex number has a square root (a vector with half the argument and the square root of the modulus). Clearly 0 has only 0 as square root, while all other complex numbers z have two square roots, which we write as z . Given any complex vector b, I can solve t(a) = b by rst constructing a solution c to b1 b2 . s(c) = . , . bn1 b2 n and then letting aj = cj . Clearly t(a) = b unless tn (a) has the wrong sign. If we change the sign of one of the aj then we can x this. So t : Cn Cn is onto. Theorem 27.12. Each polynomial invariant of a skew-symmetric matrix with even number of rows and columns can be expressed in exactly one way as a polynomial function of the even degree symmetric functions of the eigenvalues and the Pfaan. Two skew-symmetric matrices A and B with the same even numbers of rows and columns can be brought one to another, say B = F AF t , by some rotation matrix F , just when their even degree symmetric functions and their Pfaan agree. Proof. If f (A) is a polynomial invariant under rotations, i.e. f (F AF t ) = f (A), then we can write f (A) = h(t(a)), with a1 , a2 , . . . , an the numbers in the skewsymmetric normal form of A, and h some function. Lets write the restriction of f to the normal form matrices as as polynomial f (a). We can split f into a sum of homogeneous polynomials of various degrees, so lets assume that f is already homogeneous of some degree. We can pick any monomial in f and sum it over permutations and over changes of signs of any even number of variables, and f will be a sum over such quantities. So we only have to consider each such quantity, i.e. assume that f= (a1 )
dp(1)

(a2 )

dp(2)

. . . (an )

dp(n)

(27.1)

where the sum is over all choices of any even number of minus signs and all permutations p of the degrees d1 , d2 , . . . , dn . If all degrees d1 , d2 , . . . , dn are even,

262

The Pfaan
2 2 then f is an elementary symmetric function of a2 1 , a2 , . . . , an , so a polynomial in t1 (a), t2 , (a), . . . , tn1 (a). If all degrees are odd, then they are all positive, and we can divide out a factor of a1 a2 . . . an = tn (a). So lets assume that at least one degree is even, say d1 , and that at least one degree is odd, say d2 . All terms in equation 27.1 that put a plus sign in front of a1 and a2 cancel those terms which put a minus sign in front of both a1 and a2 . Similarly, terms putting a minus sign in front of a1 and a plus sign in front of a2 cancel those which do the opposite. So f = 0. Consequently, invariant polynomial functions f (A) are polynomials in the Pfaan and the symmetric functions of the squared eigenvalues. The characteristic polynomial of an n n skew-symmetric matrix A is clearly

det (A I ) = + a2 1 so that

2 + a2 2 . . . + an ,

2 2 s2j (A) = sj a2 1 , a2 , . . . , an ,

s2j 1 (A) = 0,

for any j = 1, . . . , n. Consequently, invariant polynomial functions f (A) are polynomials in the Pfaan and the even degree symmetric functions of the eigenvalues.

27.5 The Fast Formula for the Pfaan


Since Pf (BAB t ) = det B Pf A, we can nd the Pfaan of a large matrix by a sort of Gaussian elimination process, picking B to be a permutation matrix, or a strictly lower triangular matrix, to move A one step towards skew-symmetric normal form. Careful: Problem 27.4. Prove that replacing A by BAB t , with B a permutation matrix, permutes the rows and columns of A. Problem 27.5. Prove that if B is a strictly lower triangular matrix which adds a multiple of, say, row 2 to row 3, then BAB t is A with that row addition carried out, and with the same multiple of column 2 added to column 3. We leave the reader to formulate the obvious notion of Gaussian elimination of skew-symmetric matrices to nd the Pfaan.

Factorizations

263

28 Dual Spaces and Quotient Spaces


In this chapter, we learn how to manipulate whole vector spaces, rather than just individual vectors. Out of any abstract vector space, we will construct some new vector spaces, giving algebraic operations on vector spaces rather than on vectors.

28.1 The Vector Space of Linear Maps Between Two Vector Spaces
If V and W are two vector spaces, then a linear map T : V W is also often called a homomorphism of vector spaces, or a homomorphism for short, or a morphism to be even shorter. We wont use this terminology, but we will nevertheless write Hom (V, W ) for the set of all linear maps T : V W . Denition 28.1. A linear map is onto if every output w in W comes from some input: w = T v , some input v in V . A linear map is 1-to-1 if any two distinct vectors v1 = v2 get mapped to distinct vectors T v1 = T v2 . Problem 28.1. Turn Hom (V, W ) into a vector space.

Problem 28.2. Prove that a linear map is an isomorphism just when it is 1-to-1 and onto. Problem 28.3. Give the simplest example you can of a 1-to-1 linear map which is not onto. Problem 28.4. Give the simplest example you can of an onto linear map which is not 1-to-1. Problem 28.5. Prove that a linear map is 1-to-1 just when its kernel consists precisely in the zero vector.

Remark 28.2. If V and W are complex vector spaces, we will write Hom (V, W ) to mean the set of linear maps of complex vector spaces, etc. We will only prove results for real vector spaces, and the reader can imagine how to generalize them appropriately. 265

266

Dual Spaces and Quotient Spaces

28.2 The Dual Space


The simplest possible real vector space is R . Denition 28.3. If V is a vector space, let V = Hom (V, R ), i.e. V is the set of linear maps T : V R , i.e. the set of real-valued linear functions on V . We call V the dual space of V . We will usually write vectors in V with Roman letters, and vectors in V with Greek letters. The vectors in V are often called covectors. Example 28.4. If V = Rn , every linear function looks like (x) = a1 x1 + a2 x2 + + an xn . We can write this as (x) = a1 a2 ... an x1 x2 . . . . xn

So we will identify Rn with the set of row matrices. We will write e1 , e2 , . . . , en for the obvious basis: ei is the i-th row of the identity matrix. Problem 28.6. Why is V a vector space?

Problem 28.7. What is dim V ? Remark 28.5. V and V have the same dimension, but we should think of them as quite dierent vector spaces. Lemma 28.6. Suppose that V is a vector is a unique basis for V , called the basis write as v 1 , v 2 , . . . , v n , so that 1 v i ( vj ) = 0 space with basis v1 , v2 , . . . , vn . There dual to v1 , v2 , . . . , vn , which we will

if i = j , if i = j .

Remark 28.7. The hard part is getting used to the notation: v 1 , v 2 , . . . , v n are each a linear function taking vectors from V to numbers: v 1 , v 2 , . . . , v n : V R . Proof. For each xed i, the equations above uniquely determine a linear function v i , by theorem 16.23 on page 167, since we have dened the linear function on a basis. The functions v 1 , v 2 , . . . , v n are linearly independent, because if they satisfy ai v i = 0, then applying this linear function ai v i to the

28.2. The Dual Space

267

basis vector vj we nd aj = 0, and this holds for each j , so all numbers a1 , a2 , . . . , an vanish. Finally if we have any linear function f on V , then we can set a1 = f (v1 ) , a2 = f (v2 ) , . . . , an = f (vn ), and nd f (v ) = aj v j (v ) for v = v1 or v = v2 , etc., and therefore for v any linear combination of v1 , v2 , etc. Therefore f = aj v j , and we see that these functions v 1 , v 2 , . . . , v n span V . Problem 28.8. Find the dual basis v 1 , v 2 , v 3 to the basis 1 1 1 v1 = 0 , v2 = 2 , v3 = 2 . 3 0 0 Problem 28.9. Prove that the dual basis v 1 , v 2 , . . . , v n to any basis v1 , v2 , . . . , vn of Rn satises v1 2 v . = F 1 , . . vn where F = v1 v2 ... vn . Lemma 28.8. Let V be a nite dimensional vector space. V and V are isomorphic, by associating to each vector x from V the linear function fx from V dened by fx () = (x). Remark 28.9. This lemma is very confusing, but very simple, and therefore very important. Proof. First, lets ask what V means. Its vectors are linear functions on V , by denition. Next, lets pick a vector x in V and construct a linear function on V . How? Take any covector in V , and lets assign to it some number f (). Since is (by denition again) a linear function on V , (x) is a number. Lets take the number f () = (x). Lets call this function f = fx . The rest of the proof is a series of exercises. Problem 28.10. Check that fx is a linear function.

Problem 28.11. Check that the map T (x) = fx is a linear map T : V V .

Problem 28.12. Check that T : V V is one-to-one: i.e. if we pick two dierent vectors x and y in V , then fx = fy .

268

Dual Spaces and Quotient Spaces

Remark 28.10. Although V and V are identied as above, V and V cannot be identied in any natural manner, and should be thought of as dierent. Denition 28.11. If T : V W is a linear map, write T : W V for the linear map given by T ()(v ) = (T v ). Call T the transpose of T . Problem 28.13. Prove that T : W V is a linear map. Problem 28.14. What does this notion of transpose have to do with the notion of transpose of matrices?

28.3 Quotient Spaces


A subspace W of a vector space V doesnt usually have a natural choice of complementary subspace. For example, if V = R2 , and W is the vertical axis, then we might like to choose the horizontal axis as a complement to W . But this choice is not natural, because we could carry out a linear change of variables, xing the vertical axis but not the horizontal axis (for example, a shear along the vertical direction). There is a natural choice of vector space which plays the role of a complement, called the quotient space. Denition 28.12. If V is a vector space and W a subspace of V , and v a vector in V , the translate v + W of W is a set of vectors in V of the form v + w where w is in W . Example 28.13. The translates of the horizontal plane through 0 in R3 are just the horizontal planes. Problem 28.15. Prove that any subspace W will have w + W = W, for any w from W . Remark 28.14. If we take W the horizontal plane (x3 = 0) in R3 , then the translates 0 7 + W and 0 2 + W 1 1 are the same, because we can write 7 0 7 0 2 = 0 + 2 + W = 0 + W. 1 1 0 1 This is the main idea behind translates: two vectors make the same translate just when their dierence lies in the subspace.

28.3. Quotient Spaces

269

Denition 28.15. If x + W and y + W are translates, we add them by (x + W ) + (y + W ) = (x + y ) + W . If s is a number, let s(x + W ) = sx + W . Problem 28.16. Prove that addition and scaling of translates is well-dened, independent of the choice of x and y in a given translate.

Denition 28.16. The quotient space V /W of a vector space V by a subspace W is the set of all translates v + W of all vectors v in V . Example 28.17. Take V the plane, V = R2 , and W the vertical axis. The translates of W are the vertical lines in the plane. The quotient space V /W has the various vertical lines as its points. Each vertical line passes through the horizontal axis at a single point, uniquely determining the vertical line. So the translates are the points x + W. 0 The quotient space V /W is just identied with the horizontal axis, by taking x 0 + W to x.

Lemma 28.18. The quotient space V /W of a vector space by a subspace is a vector space. The map T : V V /W given by the rule T x = x + W is an onto linear map. Remark 28.19. The concept of quotient space can each be circumvented by using some complicated matrices, as can everything in linear algebra, so that one never really needs to use abstract vector spaces. But that approach is far more complicated and confusing, because it involves a choice of basis, and there is usually no natural choice to make. It is always easiest to carry out linear algebra as abstractly as possible, descending into choices of basis at the latest possible stage. Proof. One has to check that (x + W ) + (y + W ) = (y + W ) + (x + W ), but this follows from x + y = y + x clearly. Similarly all of the laws of vector spaces hold. The 0 element of V /W is the translate 0 + W , i.e. W itself. To check that T is linear, consider scaling: T (sx) = sx + W = s(x + W ), and addition: T (x + y ) = x + y + W = (x + W ) + (y + W ). Lemma 28.20. If U and W are subspaces of a vector space V , and V = U W a direct sum of subspaces, then the map T : V V /W taking vectors v to v + W restricts to an isomorphism T |U : U V /W . Remark 28.21. So, while there is no natural complement to W , every choice of complement is naturally identied with the quotient space.

270

Dual Spaces and Quotient Spaces

Proof. The kernel of T is clearly U W = 0. To see that T is onto, take a vector v + W in V /W . Because V = U W , we can somehow write v as a sum v = u + w with u from U and w from W . Therefore v + W = u + W = T |U u lies in the image of T |U . Theorem 28.22. If V is a nite dimensional vector space and W a subspace of V , then dim V /W = dim V dim W.

28.4 The Three Isomorphism Theorems


Theorem 28.23 (The First Isomorphism Theorem). Any linear map T : V : V / ker T im T by the rule W of vector spaces yields an isomorphism T T (v + ker T ) = T v . Remark 28.24. For simplicity, mathematicians often write V / ker T = im T , above implicitly understood. with the isomorphism T Proof. Clearly if a vector k belongs to ker T , then T (v + k ) = T v for any vector v in V . Therefore T is constant on each translate v + ker T , and we can dene (v + ker T ) to be T v . We need to see that T is a linear map. If we pick v0 T and v1 any vectors in V , then ((v0 + ker T ) + (v1 + ker T )) = T (v0 + v1 + ker T ) T = T ( v0 + v1 ) = T v0 + T v1 (v0 + ker T ) + T (v1 + ker T ) =T (av0 + ker T ) = aT (v0 + ker T ), so that T A similar argument ensures that T : V / ker T im T . is linear, T is an isomorphism. Clearly T has the same Next, we need to ensure that T (v + ker T ) = T v . Therefore T : V / ker T im T is image as T , because T onto. consists in the translates v + ker T so that T (v + ker T ) = 0, The kernel of T i.e. so that T v = 0. But then v lies in ker T , so v + ker T = ker T is the has kernel 0, i.e. is 1-to-1, and so is an 0 vector in V / ker T . Therefore T isomorphism. Theorem 28.25 (The Second Isomorphism Theorem). Suppose that U and V are two subspaces of a vector space W . Then there is an isomorphism : V /(U V ) (U + V )/U, given by the rule (v + (U V )) = v + U .

28.4. The Three Isomorphism Theorems

271

Remark 28.26. For simplicity, mathematicians often write V /(U V ) = (U + V )/U, with the isomorphism above implicitly understood. Proof. Dene a linear map : V (U + V ) /U , by v = v + U . This map has kernel U V , clearly, so denes a monomorphism : V / (U V ) (U + V ) /U , as in the rst isomorphism theorem, by (v + (U V )) = v . Take any vector in (U + V ) /U , say u + v + U . Clearly u + v + U = v + U (as a translate), so every vector in (U + V ) /U has the form v + U for some vector v from V . Therefore v + U = v = (v + (U V )) , so is an isomorphism. Theorem 28.27 (The Third Isomorphism Theorem). If U is a subspace of V which is itself a subspace of a vector space W , then there is an isomorphism : (W/U )/(V /U ) W/V, given by the rule ((w + U ) + V /U ) = w + V . Remark 28.28. For simplicity, mathematicians often write (W/U )/(V /U ) = W/V, with the isomorphism above implicitly understood. Proof. Dene a map : W/U W/V by the rule (w + U ) = w + V . Clearly is an epimorphism, with ker = V /U . Therefore by the rst isomorphism theorem, induces an isomorphism : (W/U ) / (V /U ) W/V .

29 Singular Value Factorization


We will analyse statistical data, using the spectral theorem.

29.1 Principal Components


Consider a large collection of data coming in from some kind of measuring equipment. Lets suppose that the data consists in a large number of vectors, say vectors v1 , v2 , . . . , vN in Rn . How can we get a good rough description of what these vectors look like? We can take the mean of the vectors = 1 ( v1 + v2 + + v N ) , N

as a good description of where they lie. How do they arrange themselves around the mean? To keep things simple, lets subtract the mean from each of the vectors. So assume that the mean is = 0, and we are asking how the vectors arrange themselves around the origin. Imagine that these vectors v1 , v2 , . . . , vN tend to lie along a particular line through the origin. Lets try to take an orthonormal basis of Rn , say u1 , u2 , . . . , un , so that u1 points along that line. How can we nd the direction of that line? We look at the quantity vk , x . If the vectors lie nearly on a line through 0, then for x on that line, vk , x should be large positive or negative, while for x perpendicular to that line, vk , x should be nearly 0. If we square, we can make sure the large positive or negative becomes large positive, so we take the quantity Q(x) = v1 , x
2

+ v2 , x

+ + vN , x

The spectral theorem guarantees that we can pick an orthonormal basis u1 , u2 , . . . , un of eigenvectors of the symmetric matrix A associated to Q. We will arrange the eigenvalues 1 , 2 , . . . , n from largest to smallest. Because Q(x) 0, we see that none of the eigenvalues are negative. Clearly Q(x) grows fastest in the direction x = u1 . Problem 29.1. The symmetric matrix A associated to Q(x) (for which Ax, x = Q(x) 273

274 for every vector x) is

Singular Value Factorization

Aij = v1 , ei v1 , ej + v2 , ei v2 , ej + + v2 , ei v2 , ej . If we rescale all of the vectors v1 , v2 , . . . , vN by the same nonzero scalar, then the resulting vectors tend to lie along the same lines or planes as the original vectors did. So it is convenient to replace Q(x) by the quadratic polynomial function 2 vk , x Q(x) = k 2 . v This has associated symmetric matrix Aij =
k

vk , ei vk , ej v
2

which we will call the covariance matrix associated to the data. Lemma 29.1. Given any set of nonzero vectors v1 , v2 , . . . , vN in Rn , write them as the columns of a matrix V . Their covariance matrix A= VVt
k 2

vk

has an orthonormal basis of eigenvectors u1 , u2 , . . . , un with eigenvalues 1 1 2 n 0. Remark 29.2. The square roots of the eigenvalues are the correlation coecients, each indicating how much the data tends to lie in the direction of the associated eigenvector. Proof. We have only to check that x, V V t x = k vk , x , an exercise for the reader, to see that A is the covariance matrix. Eigenvalues of A cant be negative, as mentioned already. For any vector x of length 1, the Schwarz inequality (lemma 20.1 on page 193) says that vk , x
k 2 2

vk

Therefore, by the minimum principle, eigenvalues of A cant exceed 1. Our data lies mostly along a line through 0 just when 1 is large, and the remaining eigenvalues 2 , 3 , . . . , n are much smaller. More generally, if we nd that the rst dozen or so eigenvalues are relatively large, and the rest are relatively much smaller, then our data must lie very close to a subspace of dimension a dozen or so. The data tends most strong to lie along the u1 direction; uctations about that direction are mostly in the u2 direction, etc.

29.2. Singular Value Factorization

275

(a) Data points. The mean is marked as a cross.

(b) The same data. Lines indicate the directions of eigenvectors. Vectors sticking out from the mean are drawn in those directions. The lengths of the vectors give the correlation coecients.

Figure 29.1: Applying principal components analysis to some data points.

Every vector x can be written as x = a1 u1 + a2 u2 + + an un , and the numbers a1 , a2 , . . . , an are recovered from the formula ai = x, ui . If the eigenvalues 1 , 2 , . . . , d are relatively much larger than the rest, we can say that our data live near the subspace spanned by u1 , u2 , . . . , ud , and say that our data has d eective dimensions. The numbers a1 , a2 , . . . , ad are called the principal components of a vector x. To store the data, instead of remembering all of the vectors v1 , v2 , . . . , vN , we just keep track of the eigenvectors u1 , u2 , . . . , ud , and of the principal components of the vectors v1 , v2 , . . . , vN . In matrices, this means that instead of storing V , we store F = (u1 u2 . . . un ), and store the rst d columns of F t V ; let W be the matrix of these columns. Coming out of storage, we can approximately recover the vectors v1 , v2 , . . . , vN as the columns of F W . The matrix F represents an orthogonal transformation putting the vectors v1 , v2 , . . . , vN nearly into the subspace spanned by e1 , e2 , . . . , ed , and mostly along the e1 direction, with uctations mostly along the e2 direction, etc. So it is often useful to take a look at the columns of W themselves, as a convenient picture of the data.

29.2 Singular Value Factorization


Theorem 29.3. Every real matrix A can be written as A = U V t , with U and V orthogonal, and has the same dimensions as A, with the form D 0 0 0

276 with D diagonal with nonnegative diagonal entries: 1 D=

Singular Value Factorization

2 .. . r .

Proof. Suppose that A is p q . Just like when we worked out principal components, we order the eigenvalues of At A from largest to smallest. For each eigenvalue j , let j = j . (Since we saw that the eigenvalues j of At A arent negative, the square root makes sense.) Let V be the matrix whose columns are an orthonormal basis of eigenvectors of At A, ordered by eigenvalue. Write V = V1 V2

with V1 the eigenvectors with positive eigenvalues, and V2 those with 0 eigenvalue. For each nonzero eigenvalue, dene a vector uj = 1 Avj . j

Suppose that there are r nonzero eigenvalues. Lets check that these vectors u1 , u2 , . . . , ur are orthonormal. ui , uj = 1 1 Avi , Avj i j 1 = Avi , Avj i j 1 = vi , At Avj i j 1 = vi , j vj i j j 1 if i = j , = i j 0 otherwise. 1 if i = j , = 0 otherwise.

If there arent enough vectors u1 , u2 , . . . , ur to make up a basis (i.e. if r < p), then just write down some more vectors to make up an orthonormal basis, say

29.2. Singular Value Factorization

277

vectors ur+1 , u2 , . . . , up , and let U1 = u1 U2 = ur+1 U = U1 u2 u2 U2 . ... ... ur , up ,

By denition of these uj , Avj = j uj , so AV1 = U1 D. Calculate U V t = U1 = U1 DV1 t = AV1 V11 = A. U2 D 0 0 0 V1 t V2 t

Corollary 29.4. Any square matrix A can be written as A = KP (the Cartan decomposition, also called the polar decomposition), where K is orthogonal and P is symmetric and positive semidenite. Proof. Write A = U V t and set K = U V t and P = V V t .

30 Factorizations
Most theorems in linear algebra are obvious consequence of simple factorizations.

30.1 LU Factorization
Forward elimination is messy: we swap rows and add rows to lower rows. We want to put together all of the row swaps into one permutation matrix, and all of the row additions into one strictly lower triangular matrix.

Algorithm
To forward eliminate a matrix A (lets say with n rows), start by setting p to be the permutation 1, 2, 3, . . . , n (the identity permutation ), L = 1, U = A. To start with, no entries of L are painted. Carry out forward elimination on U . a. Each time you nd a nonzero pivot in U , you paint a larger square box in the upper left corner of L. (This number of rows in this painted box is always the number of pivots in U .) b. When you swap rows k and of U , a) swap entries k and of the permutation p and b) swap rows k and of L, but only swap unpainted entries which lie beneath painted ones. c. If you add s (row k ) to row in U , then put s into column k , row in L. The painted box in L is always square, with number of rows and columns equal to the number of pivots drawn in U . Remark 30.1. As each step begins, the pivot rows of U with nonzero pivots in them are nished, and the entries inside the painted box and the entries on and above the diagonal of L are nished. Theorem 30.2. By following the algorithm above, every matrix A can be written as A = P 1 LU where P is a permutation matrix of a permutation p, L a strictly lower triangular matrix, and U an upper triangular matrix. Proof. Lets show that after each step, we always have P A = LU and always have L strictly lower triangular. For the rst forward elimination step, we might have to swap rows. There is no painted box yet, so the algorithm says that 279

280 Figure 30.1: Computing the LU factorization p 1, 2, 3 1 0 0 1 2 3 1 3 2 1 3 2 1 3 2 L 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 1 2 3 1 0 0 1 0 0 1 0 0 1 0 0 U 0 0 3 0 0 3 0 3 0 0 3 0 0 3 0

Factorizations

1, 2, 3

1, 3, 2

1, 3, 2

1, 3, 2

0 1 3 0 1 3 0 3 1 0 3 1 0 3 1

the row swap leaves all entries of L alone. Let Q be the permutation matrix of the required row swap, and q the permutation. Our algorithm will pass from p = 1, L = I, U = A to p = q, L = I, U = QA, and so P A = LU . Next, we might have to add some multiples of the rst row of U to lower rows. We carry this out by a strictly lower triangular matrix, say 1 s 0 , I

S=

with s a vector. Notice that S 1 = 1 s 0 , I

subtracts the corresponding multiples of row 1 from lower rows. So U becomes U new = SU , while the permutation p (and hence the matrix P ) stays the same. The matrix L becomes Lnew = S 1 L = S 1 , strictly lower triangular, and P new A = Lnew U new . Suppose that after some number of steps, we have reduced the upper left

30.1. LU Factorization

281

corner of U to echelon form, say U= U0 0 U1 U2 ,

with U0 in echelon form. Suppose that L= L0 L1 0 1

is strictly lower triangular, and that P is some permutation matrix. Finally, suppose that P A = LU . Our next step in forward elimination could be to swap rows k and in U , and these we can assume are rows in the bottom of U , i.e. rows of U2 . Suppose that Q is the permutation matrix of a transposition so that QU2 is U2 with the appropriate rows swapped. In particular, Q2 = I since Q is the permutation matrix of a transposition. Let P new = Lnew = U new = I 0 I 0 I 0 0 P, Q 0 I L Q 0 0 U. Q 0 , Q

Check that P new A = Lnew U new . Multiplying out: Lnew = L0 QL1 0 , I

strictly lower triangular. The upper left corner L0 is the painted box. So Lnew is just L with rows k and swapped under the painted box. If we add s (row k ) of U to row , this means multiplying by a strictly lower triangular matrix, say S . Then P A = LU implies that P A = LS 1 SU . But LS 1 is just L with s (column ) subtracted from column k . Problem 30.1. Find the LU-factorization of each of A= 0 1 1 , B= 1 , C= 1 . 0

Problem 30.2. Suppose that A is an invertible matrix. Prove that any two LU-factorizations of A which have the same permutation matrix P must be the same.

Tensors

283

31 Quadratic Forms
Quadratic forms generalize the concept of inner product, and play a crucial role in modern physics.

31.1 Bilinear Forms


Denition 31.1. A bilinear form on a vector space V is a rule associating to each pair of vectors x and y from V a number b(x, y ) which is linear as a function of x for each xed y and also linear as a function of y for each xed x. Example 31.2. If V = R , then every bilinear form on V is b(x, y ) = cxy where c could be any constant. Example 31.3. Every inner product on a vector space is a bilinear form. Moreover, a bilinear form b on V is an inner product just when it is symmetric (b(v, w) = b(w, v )) and positive denite (b(v, v ) > 0 unless v = 0). Example 31.4. If V = Rp , each p p matrix A determines a bilinear map b by the rule b(v, w) = v, Aw . Conversely, given a bilinear form b on Rp , we can dene a matrix A by setting Aij = b (ei , ej ), and then clearly if we expand out, b(x, y ) =
ij

xi yj b (ei , ej ) xi yj Aij
ij

= x, Ay . So every bilinear form on Rp has the form b(x, y ) = x, Ay for a uniquely determined matrix A. Of course, we can add and scale bilinear forms in the obvious way to make more bilinear forms, and the bilinear forms on a xed vector space V form a vector space. Lemma 31.5. Fix a basis v1 , v2 , . . . , vp of V . Given any collection of numbers bij (with i, j = 1, 2, . . . , p), there is precisely one bilinear form b with bij = b (vi , vj ). Thus the vector space of bilinear forms on V is isomorphic to the vector space of p p matrices (bij ). 285

286

Quadratic Forms

Proof. Given any bilinear form b we can calculate out the numbers bij = b (vi , vj ). Conversely given any numbers bij , and vectors x = i xi vi and y = j yj vj we can let b (x, y ) = bij xi yj .
i,j

Clearly adding bilinear forms adds the associated numbers bij , and scaling bilinear forms scales those numbers. Lemma 31.6. Let B be the set of all bilinear forms on V . If V has nite 2 dimension, then dim B = (dim V ) . Proof. There are (dim V ) numbers bij .
2

31.1 Review Problems


Problem 31.1. Let V be any vector space. Prove that b(x0 + y0 , x1 + y1 ) = y0 (x1 ) is a bilinear form on V V . Problem 31.2. What are the numbers bij if b(x, y ) = x, y (the usual inner product on Rn )? Problem 31.3. Suppose that V is a nite dimensional vector space, and let B be the vector space of all bilinear forms on V . Prove that B is isomorphic to the vector space Hom (V, V ), by the isomorphism F : Hom (V, V ) B given by taking each linear map T : V V , to the bilinear form b given by 2 b(v, w) = (T w)(v ). (This gives another proof that dim B = (dim V ) .) Problem 31.4. A bilinear form b on V is degenerate if b(x, y ) = 0 for all x for some xed y , or for all y for some xed x. (a) Give the simplest example you can of a nonzero bilinear form which is degenerate. (b) Give the simplest example you can of a bilinear form which is nondegenerate. Problem 31.5. Let V be the vector space of all 2 2 matrices. Let b(A, B ) = tr AB . Prove that b is a nondegenerate bilinear form. Problem 31.6. Let V be the vector space of polynomial functions of degree at most 2. For each of the expressions (a)
1

b(p(x), q (x)) =
1

p(x)q (x) dx,

(b) b(p(x), q (x)) =

p(x)q (x)ex dx,

(c) b(p(x), q (x)) = p(0)q (0),

31.2. Quadratic Forms

287

(d) b(p(x), q (x)) = p(1) + q (1), (e) b(p(x), q (x)) = p(0)q (0) + p(1)q (1) + p(2)q (2), is b bilinear? Is b degenerate?

31.2 Quadratic Forms


Denition 31.7. A bilinear form b on a vector space V is symmetric if b(x, y ) = b(y, x) for all x and y in V . A bilinear form b on a vector space V is positive denite if b(x, x) > 0 for all x = 0. Problem 31.7. Which of the bilinear forms in problem 31.6 on the facing page are symmetric? Denition 31.8. The quadratic form Q of a bilinear form b on a vector space V is the real-valued function Q(x) = b(x, x). Example 31.9. The squared length of a vector in Rn , Q(x) = x
2

= x, x =
i n

x2 i

is the quadratic form of the inner product on R . Example 31.10. Every quadratic form on Rn has the form Q(x) =
ij

Aij xi xj ,

for some numbers Aij = Aji . We could make a symmetric matrix A with those numbers as entries, so that Q(x) = x, Ax . Of course, as in section 14.2 on page 138, the symmetric matrix A is uniquely determined by the quadratic form Q and uniquely determines Q. Problem 31.8. What are the quadratic forms of the bilinear forms in problem 31.6 on the facing page? Lemma 31.11. Every quadratic form Q determines a symmetric bilinear form b by 1 b(x, y ) = (Q(x + y ) Q(x) Q(y )) . 2 Moreover, Q is the quadratic form of b. Proof. There are various identities we have to check on b to ensure that b is a bilinear form. Each identity involves a nite number of vectors. Therefore it suces to prove the result over a nite dimensional vector space V (replacing V by the span of the vectors involved in each identity). Be careful: the identities

288

Quadratic Forms

have to hold for all vectors from V , but we can rst pick vectors from V , and then replace V by their span and then check the identity. Since we can assume that V is nite dimensional, we can take a basis for V and therefore assume that V = Rn . Therefore we can write Q(x) = x, Ax , for a symmetric matrix A. Expanding out b(x, y ) = 1 ( x + y, A(x + y ) x, Ax y, Ay ) 2 = x, Ay ,

which is clearly bilinear. Problem 31.9. The results of this chapter are still true over any eld (although we wont try to make sense of being positive denite if our eld is not R ), except for lemma 31.11. Find a counterexample to lemma 31.11 on the previous page over the eld of Boolean numbers. Theorem 31.12. The equation b(x, y ) = 1 (Q(x + y ) Q(x) Q(y )) . 2

gives an isomorphism between the vector space of symmetric bilinear forms b and the vector space of quadratic forms Q. The proof is obvious just looking at the equation: if you scale the left side, then you scale the right side, and vice versa, and similarly if you add bilinear forms on the left side, you add quadratic forms on the right side.

31.3 Sylvesters Law of Inertia


Theorem 31.13 (Sylvesters Law of Inertia). Given a quadratic form Q on a nite dimensional vector space V , there is an isomorphism F : V Rn for which 2 2 2 2 2 Q(x) = x2 1 + x2 + + xp xp+1 xp+2 xp+q ,
p positive terms q negative terms

where

x1 x2 . Fx = . . . xn

We cannot by any linear change of variables alter the value of p (the number of positive terms) or the value of q (the number of negative terms).

31.3. Sylvesters Law of Inertia

289

Remark 31.14. Sylvesters Law of Inertia tells us what all quadratic forms look like, if we allow ourselves to change variables. The numbers p and q are the only invariants. The reader should keep in mind that in our study of the spectral theorem, we only allowed orthogonal changes of variable, so we got eigenvalues as invariants. But here we allow any linear change of variable; in particular we can rescale, so only the signs of the eigenvalues are invariant. Proof. We could apply the spectral theorem (theorem 14.2 on page 136), but we will instead use elementary algebra. Take any basis for V , so that we can assume that V = Rn , and that Q(x) = x, Ax for some symmetric matrix A. In other words, Q(x) = A11 x2 1 + A12 x1 x2 + . . . . Suppose that A11 = 0. Lets collect together all terms containing x1 and complete the square A11 x2 1+
j>1

A1j x1 xj +
i>1

Ai 1 x i x 1 2

1 =A11 x1 + A11 Let

2 1 A1 j x j A1 j x j A11 j>1 j>1 1 A11

y1 = x1 + Then

A1j xj .
j>1

2 Q(x) = A11 y1 + ...

where the . . . involve only x2 , x3 , . . . , xn . Changing variables to use y1 in place of x1 is an invertible linear change of variables, and gets rid of nondiagonal terms involving x1 . We can continue this process using x2 instead of x1 , until we have used up all variables xi with Aii = 0. So lets suppose that all diagonal terms of A vanish. If there is some nondiagonal term of A which doesnt vanish, say A12 , then make new variables y1 = x1 + x2 and y2 = x1 x2 , so x1 = 1 2 (y1 + y2 ) 1 2 2 and x2 = 1 ( y y ) . Then x x = y y , turning the A x x term into 1 2 1 2 12 1 2 1 2 2 4 two diagonal terms. So now we have killed o all nondiagonal terms, so we can assume that
2 2 Q(x) = A11 x2 1 + A22 x2 + + Ann xn .

We can rescale x1 by any nonzero constant c which scales A11 by 1/c2 . Lets choose c so that c2 = A11 . Problem 31.10. Apply this method to the quadratic form Q(x) = x2 1 + x1 x2 + x2 x1 + x2 x3 + x3 x2 .

290

Quadratic Forms

Next we have to show that the numbers p of positive terms and q of negative terms cannot be altered by any linear change of variables. We can assume (by using the linear change of variables we have just constructed) that V = Rn and that
2 2 2 2 2 Q(x) = x2 1 + x2 + + xp xp+1 + xp+2 + + xp+q .

We want to show that p is the largest dimension of any subspace on which Q is positive denite, and similarly that q is the largest dimension of any subspace on which Q is negative denite. Consider the subspace V+ of vectors of the form x1 x2 . . . xp x= 0 . 0 . . . 0 Clearly Q is positive denite on V+ . Similarly, Q is negative denite on the subspace V of vectors of the form 0 0 . . .

0 x p+1 xp+2 x= . . . . xp+q 0 0 . . . 0 Suppose that we can nd some subspace W of V of greater dimension than p, so that Q is positive denite on W . Let T be the orthogonal projection to V+ .

31.3. Sylvesters Law of Inertia

291

In other words, for any vector x in Rn , let x1 x2 . . . xp P+ x = 0 . 0 . . . 0 Then P+ |W : W V+ is a linear map, and dim W > dim V+ , so dim W = dim ker P+ |W + dim im T |W dim ker P+ |W + dim V+ < dim ker P+ |W + dim W. so, subtracting dim W from both sides, 0 < dim ker P+ |W . Therefore there is a nonzero vector x in W for which P+ x = 0, i.e. 0 0 . . .

0 x p+1 xp+2 x= . . . . xp+q x p+q+1 xp+q+2 . . . xn Clearly Q(x) > 0 since x lies in W . But clearly
2 2 Q(x) = x2 p+1 xp+2 xp+q 0,

a contradiction.

292

Quadratic Forms

Remark 31.15. Much of the proof works over any eld, as long as we can divide by 2, i.e. as long as 2 = 0. However, there could be a problem when we try to rescale: even if 2 = 0, we can only arrange
2 2 Q(x) = 1 x2 1 + 2 x2 + + n xn ,

where each i can be rescaled by any nonzero number of the form c2 . (There is no reasonable analogue of the numbers p and q in a general eld). In particular, since every complex number has a square root, the same theorem is true for complex quadratic forms, but in the stronger form that we can arrange q = 0, i.e. we can arrange 2 2 Q(x) = x2 1 + x2 + + xp . Problem 31.11. Prove that a quadratic form on any real n-dimensional vector space is nondegenerate just when p + q = n, with p and q as in Sylvesters law of inertia. Problem 31.12. For a complex quadratic form, prove that if we arrange our quadratic form to be 2 2 Q(x) = x2 1 + x2 + + xp . then we are stuck with the resulting value of p, no matter what linear change of variables we employ.

31.3 Review Problems


Problem 31.13. Find a function Q(x) for x in Rn which satises Q(ax) = a2 Q(x), but which is not a quadratic form.

31.4 Kernel and Null Cone


Denition 31.16. Take a vector space V . The kernel of a symmetric bilinear form b on V is the set of all vectors x in V for which b(x, y ) = 0 for any vector y in V . Problem 31.14. Find the kernel of the symmetric bilinear form b(x, y ) = x1 y1 x2 y2 for x and y in R3 . Denition 31.17. The null cone of a symmetric bilinear form b(x, y ) is the set of all vectors x in V for which b(x, x) = 0. Example 31.18. The vector x= 1 0

lies in the null cone of the symmetric bilinear form b(x, y ) = x1 y1 x2 y2 for x and y in R2 . Indeed the null cone of that symmetric bilinear form is 2 2 0 = b(x, x) = x2 1 x2 , so its the pair of lines x1 = x2 and x1 = x2 in R .

31.5. Orthonormal Bases

293

Problem 31.15. Prove that the kernel of a symmetric bilinear form lies in its null cone.

Problem 31.16. Find the null cone of the symmetric bilinear form b(x, y ) = x1 y1 x2 y2 for x and y in R3 . What part of the null cone is the kernel? Problem 31.17. Prove that the kernel is a subspace. For which symmetric bilinear forms is the null cone a subspace? For which symmetric bilinear forms is the kernel equal to the null cone?

31.4 Review Problems


Problem 31.18. Prove that the kernel of a symmetric bilinear form consists precisely in the vectors x for which Q(x + y ) = Q(y ) for all vectors y , with Q the quadratic form of that bilinear form. (In other words, we can translate in the x direction without altering the value of the function Q(y ).)

31.5 Orthonormal Bases


Denition 31.19. If b is a nondegenerate symmetric bilinear form on a vector space V , a basis v1 , v2 , . . . , vn for V is called orthonormal for b if if 1 if i = j, b (vi , vj ) = 0 if i = j. Corollary 31.20. Any nondegenerate symmetric bilinear form any nite dimensional vector space has an orthonormal basis. Proof. Take the linear change of variables guaranteed by Sylvesters law of inertia, and then the standard basis will be orthonormal. Problem 31.19. Find an orthonormal basis for the quadratic form b(x, y ) = x1 y2 + x2 y1 on R2 .

Problem 31.20. Suppose that b is a symmetric bilinear form on nite dimensional vector space V . (a) For each vector x in V , dene a linear map : V R by (y ) = b(x, y ). Write this covector as = T x. Prove that the map T : V V given by = T x is linear. (b) Prove that the kernel of T is the kernel of b. (c) Prove that T is an isomorphism just when b is nondegenerate. (The moral of the story is that a nondegenerate symmetric bilinear form b identies the vector space V with its dual space V via the map T .)

294

Quadratic Forms

(d) If b is nondegenerate, prove that for each covector , there is a unique vector x in V so that b(x, y ) = (y ) for every vector y in V .

31.5 Review Problems


Problem 31.21. What is the linear map T of problem 31.20 on the previous page (i.e. write down the associated matrix) for each of the following symmetric bilinear forms? (a) b(x, y ) = x1 y2 + x2 y1 on R2 (b) b(x, y ) = x1 y1 + x2 y2 on R2 (c) b(x, y ) = x1 y1 x2 y2 on R2 (d) b(x, y ) = x1 y1 x2 y2 x3 y3 x4 y4 on R4 (e) b(x, y ) = x1 (y1 + y2 + y3 ) + (x1 + x2 + x3 ) y1

32 Tensors and Indices


In this chapter, we dene the concept of a tensor in Rn , following an approach common in physics.

32.1 What is a Tensor?


Vectors x have entries xi x1 x2 . x= . . . xn A matrix A has entries Aij . To describe the entries of a vector x, we use a single index, while for a matrix we use two indices. A tensor is just an object whose entries have any number of indices. (Entries will also be called components .) Example 32.1. In a physics course, one learns that stress applied to a crystal causes an electric eld (see Feynman et. al. [3] II-31-12). The electric eld is a vector E (at each point in space) with components Ei , while the stress is a symmetric matrix S with components Sij = Sji . These are related by the piezoelectric tensor P , which has components Pijk : Ei =
jk

Pijk Sjk .

Just as a matrix is a rectangle of numbers, a tensor with three indices is a box of numbers. For this chapter, all of our tensors will be tensors in Rn , which means that the indices all run from 1 to n. For example, our vectors are literally in Rn , while our matrices are n n, etc. The subject of tensors is almost trivial, since there is really nothing much we can say in any generality about them. There are two subtle points: upper versus lower indices and summation notation. 295

296

Tensors and Indices

Upper versus lower indices


It is traditional (following Einstein) to write components of vectors x not as xi but as xi , so not as x1 x2 , x= . . . xn but instead as x1 2 x . x= . . . xn In particular, x2 doesnt mean x x, but means the second entry of x. Next we write elements of Rn as y = y1 y2 ... yn ,

with indices down. We will call the elements of Rn covectors . Finally, we write matrices as A1 A1 ... A1 q 1 2 2 A2 ... A2 A1 q 2 , A= . . . . . . . . . ... Ap Ap ... Ap q 1 2 so Arow column . In general, a tensor can have as many upper and lower indices as we need, and we will treat upper and lower indices as being dierent. ij For example, the components of a matrix look like Ai j , never like Aij or A , which would represent tensors of a dierent type.

Summation Notation
Following Einstein further, whenever we write an expression with some letter j appearing once as an upper index and once as a lower index, like Ai j x , this i j means j Aj x , i.e. a sum is implicitly understood over the repeated j index. We will often refer to a vector x as xi . This isnt really fair, since it confuses a single component xi of a vector with the entire vector, but it is standard. Similarly, we write a matrix A as Ai j and a tensor with 2 upper and 3 lower indices as tij klm . The names of the indices have no signicance and will usually change during the course of calculations.

32.2. Operations

297

32.2 Operations
What can we do with tensors? Very little. At rst sight, they look complicated. But there are very few operations on tensors. We can (1) Add tensors that have the same numbers of upper and of lower indices; for example add sijk lm to tijk lm to get
ijk sijk lm + tlm .

If the tensors are vectors or matrices, this is just adding in the usual way. (2) Scale; for example 3 tij klm means the obvious thing: triple each component of tij klm . (3) Swap indices of the same type; for example, take a tensor ti jk and make the tensor ti . There is no nice notation for doing this. kj (4) Take tensor product : just write down two tensors beside one another, with distinct indices; for example, the tensor product of si j and tij is si j tkl . Note that we have to change the names on the indices of t before we write it down, so that we dont use the same index names twice. (5) Finally, contract : take any one upper index, and any one lower index, and set them equal and sum. For example, we can contract the i and k indices i i of a tensor ti jk to produce tji . Note that tji has only one index j , since the summation convention tells us to sum over all possibilities for the i index. So ti ji is a covector. In tensor calculus there are some additional operations on tensor quantities (various methods for dierentiating and integrating), and these additional operations are essential to physical applications, but tensor calculus is not in the algebraic spirit of this book, so we will never consider any other operations than those listed above. Example 32.2. If xi is a vector and yi a covector, then we cant add them, because the indices dont match. But we can take their tensor product xi yj , and then we can contract to get xi yi . This is of course just y (x), thinking of every covector y as a linear function on Rn . Example 32.3. If xi is a vector and Ai j is a matrix, then their tensor prodk i j uct is Ai x , and contracting gives A x , which is the vector Ax. So matrix j j multiplication is tensor product followed by contraction.
i i k Example 32.4. If Ai j and Bj are two matrices, then Ak Bj is the matrix AB . k i Similarly, Aj Bk is the matrix BA. These are the two possible contractions of k the tensor product Ai j Bl .

298

Tensors and Indices


i Example 32.5. A matrix Ai j has only one contraction: Ai , the trace, which is a number (because it has no free indices).

Example 32.6. It is standard in working with tensors to write the entries of the i i identity matrix not as Ij but as j : 1 if i = j, i j = 0 otherwise.
i The trace of the identity matrix is i = n, since for each value of i we add one. i k Tensor products of the identity matrix give many other tensors, like j l .

Problem 32.1. Take the tensor product of the identity matrix and a covector , and simplify all possible contractions.

Example 32.7. If a tensor has various lower indices, we can average over all permutations of them. This process is called symmetrizing over indices. For 1 i 1 i example, a tensor ti jk can be symmetrized to a tensor 2 tjk + 2 tkj . Obviously we can also symmetrize over any two upper indices. But we cant symmetrize over an upper and a lower index. If we x our attention on a pair of indices, we can also antisymmetrize over them, say taking a tensor tjk and producing 1 1 tjk tkj . 2 2 A tensor is symmetric in some indices if it doesnt change when they are permuted, and is antisymmetric in those indices if it changes by the sign of the permutation when the indices are permuted. Again focusing on just two lower indices, we can split any tensor into a sum tjk = 1 1 1 1 tjk + tkj + tjk tkj 2 2 2 2
symmetric antisymmetric

of a symmetric part and an antisymmetric part. Problem 32.2. Suppose that a tensor tijk is symmetric in i and j , and antisymmetric in j and k . Prove that tijk = 0. Of course, we write a tensor as 0 to mean that all of its components are 0. Example 32.8. Lets look at some tensors with lots of indices. Working in R3 , dene a tensor by setting if i, j, k is an even permutation of 1, 2, 3, 1 ijk = 1 0 if i, j, k is an odd permutation of 1, 2, 3, if i, j, k is not a permutation of 1, 2, 3.

32.3. Changing Variables

299

Of course, i, j, k fails to be a permutation just when two or three of i, j or k are equal. For example, 123 = 1, 221 = 0, 321 = 1, 222 = 0. Problem 32.3. Take three vectors x, y and z in R3 , and calculate the contraction ijk xi y j z k .

Problem 32.4. Prove that every tensor tijk which is antisymmetric in all lower indices is a constant multiple tijk = c ijk . Example 32.9. More generally, working in Rn , we can dene a tensor by if i1 , i2 , . . . in is an even permutation of 1, 2, . . . , n, 1 i1 i2 ...in = 1 0 if i1 , i2 , . . . in is an odd permutation of 1, 2, . . . , n, if i1 , i2 , . . . in is not a permutation of 1, 2, . . . , n.

We can restate equation 18.1 on page 181 as


i2 in 1 i1 i2 ...in Ai 1 A2 . . . An = det A,

for a matrix A.

32.3 Changing Variables


Lets change variables from x to y = F x, with F an invertible matrix. We want to make a corresponding tensor F t out of any tensor t, so that sums, scalings, tensor products, and contractions will correspond, and so that on vectors F x will be F x. These requirements will determine what F t has to be. Lets start with a covector i . We need to preserve the contraction (x) on any vector x, so need to have F (F x) = (x). Replacing the vector x by some vector y = F x, we get F (y ) = F 1 y , for any vector y , which identies F as F = F 1 . In indices: (F )i = j F 1
j i

In other words, F is contracted against F 1 . So vectors transform as F x = F x (contract with F ), and covectors as F = F 1 (contract with F 1 ). Contracting any tensor with as many vectors and covectors as needed to form a number; in order to preserve these contractions, the tensors upper indices must transform like vectors, and its lower indices like covectors, when we carry out F . For example, r ij i j pq (F t)k = Fp Fq tr F 1 k

300

Tensors and Indices

In other words, we contract one copy of F with each upper index and contract one copy of F 1 with each lower index. For example, lets see the invariance of contraction under F in the simplest case of a matrix.
1 (F A)i = Fji Aj k F 1 = Aj k F k i i k i

Fji
k j

1 = Aj F k F k = Aj k j

= Aj j
k since Aj k j has a sum over j and k , but each term vanishes unless j = k , in which case we nd Aj j being added.

Problem 32.5. Prove that both of the contractions of a tensor ti jk are preserved by linear change of variables.

Problem 32.6. Find F . (Recall the tensor from example 32.9 on the previous page.) Remark 32.10. Note how abstract the subject is: it is rare that we would write down examples of tensors, with actual numbers in their entries. It is more common that we think about tensors as abstract algebraic gadgets for storing multivariate data from physics. Writing down examples, in this raried air, would only make the subject more confusing.

32.4 Two Problems


Two problems: a. How can you tell if two tensors can be made equal by a linear change of variable? b. How can a tensor be split into a sum of tensors, in a manner that is unaltered by linear change of variables? The rst problem has no solution. For a matrix, the answer is Jordan normal form (theorem 23.2 on page 214). For a quadratic form, the answer is Sylvesters law of inertia (theorem 31.13 on page 288). But a general tensor can store an enormous amount of information, and no one knows how to describe it. We can give some invariants of a tensor, to help to distinguish tensors. These invariants are similar to the symmetric functions of the eigenvalues of a matrix. The second problem is much easier, and we will provide a complete solution.

32.5. Cartesian Tensors

301

Tensor Invariants
The contraction of indices is invariant under F , as is tensor product. So given a tensor t, we can try to write down some invariant numbers out it by taking any number of tensor products of that tensor with itself, and any number of contractions until there are no indices left. For example, a matrix Ai j has among its invariants the numbers
i j i j k Ai i , Aj Ai , Aj Ak Ai , . . .

In the notation of earlier chapters, these numbers are tr A, tr A2 , tr A3 , . . . i.e. the functions we called pk (A) in example 26.16 on page 248. We already know from that chapter that every real-valued polynomial invariant of a matrix is a function of p1 (A), p2 (A), . . . , pn (A). More generally, all real-valued polynomial invariants of a tensor are polynomial functions of those obtained by taking some number of tensor products followed by some number of contractions. (The proof is very dicult; see Olver [8] and Procesi [10]). Problem 32.7. Describe as many invariants of a tensor tij kl as you can. It is dicult to decide how many of these invariants you need to write down before you can be sure that you have a complete set, in the sense that every invariant is a polynomial function of the ones you have written down. General theorems (again see Olver [8] and Procesi [10]) ensure that eventually you will produce a nite complete set of invariants. If a tensor has more lower indices than upper indices, then so does every tensor product of it with itself any number of times, and so does every contraction. Therefore there are no polynomial invariants of such tensors. Similarly if there are more upper indices than lower indices then there are no polynomial invariants. For example, a vector has no polynomial invariants. For example, a quadratic form Qij = Qji has no upper indices, so has no polynomial invariants. (This agrees with Sylvesters law of inertia (theorem 31.13 on page 288), which tells us that the only invariants of a quadratic form (over the real numbers) are the integers p and q , which are not polynomials in the Qij entries.

Breaking into Irreducible Components

32.5 Cartesian Tensors


Tensors in engineering and applied physics never have any upper indices. However, the engineers and applied physicists never carry out any linear changes of variable except for those described by orthogonal matrices. This practice is (quite misleadingly) referred to as working with Cartesian tensors. The point to keep in mind is that they are working in Rn in a physical context in which lengths (and distances) are physically measurable. Theorem 20.5 on page 195

302

Tensors and Indices

tells us that in order to preserve lengths, the only linear changes of variable we can employ are those given by orthogonal matrices. If we had a tensor with both upper and lower indices, like ti jk , we can see that it transforms under a linear change of variables y = F x as
i p (F t)jk = Fp tqr F 1 i q j

F 1

r k

Lets dene a new tensor by letting sijk = ti jk , just dropping the upper index. If
i F is orthogonal, i.e. F 1 = F t , then F = F 1 , so Fp = F 1 i . Therefore t p p q r s F 1 j F 1 k i pqr q r i p Fp tqr F 1 j F 1 k i (F t)jk .

(F s)ijk = F 1 = =

We summarize this calculation: Theorem 32.11. Dropping upper indices to become lower indices is an operation on tensors which is invariant under any orthogonal linear change of variable. Note that this trick only works for orthogonal matrices F , i.e. orthogonal changes of variable. Problem 32.8. Prove that doubling (i.e. the linear map y = F x = 2x) acts on a vector x by doubling it, on a covector by scaling by 1 2 . (In general, this linear map F x = 2x acts on any tensor t with p upper and q lower indices by scaling by 2pq , i.e. F t = 2pq t.) What happens if you rst lower the index of a vector and then apply F ? What happens if you apply F and then lower the index? Problem 32.9. Prove that contracting two lower indices with one another is an operation on tensors which is invariant under orthogonal linear change of variable, but not under rescaling of variables. Engineers often prefer their approach to tensors: only lower indices, and all of the usual operations. However, their approach makes rescalings, and other nonorthogonal transformations (like shears, for example) more dicult. There is a similar approach in relativistic physics to lower indices: by contracting with a quadratic form.

33 Tensors
We give a mathematical denition of tensor, and show that it agrees with the more concrete denition of chapter 32. We will continue to use Einsteins summation convention throughout this chapter.

33.1 Multilinear Functions and Multilinear Maps


Denition 33.1. If V1 , V2 , . . . , Vp are some vector spaces, then a function t(x1 , x2 , . . . , xp ) is multilinear if t(x1 , x2 , . . . , xp ) is a linear function of each vector when all of the other vectors are held constant. (The vector x1 comes from the vector space V1 , etc.) Similarly a map t taking vectors x1 in V1 , x2 in V2 , . . . , xp in Vp , to a vector w = t(x1 , x2 , . . . , xp ) in a vector space W is called a multilinear map if t(x1 , x2 , . . . , xp ) is a linear map of each vector xi , when all of the other vectors are held constant. Example 33.2. The function t(x, y ) = xy is multilinear for x and y real numbers, being linear in x when y is held xed and linear in y when x is held xed. Note that t is not linear as a function of the vector x . y Example 33.3. Any two linear functions : V R and : W R on two vector spaces determine a multilinear map t(x, y ) = (x) (y ) for x from V and y from W .

33.2 What is a Tensor?


We already have a denition of tensor, but only for tensors in Rn . We want to dene some kind of object which we will call a tensor in an abstract vector space, so that when we pick a basis (identifying the vector space with Rn ), the object becomes a tensor in Rn . The clue is that a tensor in Rn has lower and upper indices, which can be contracted against vectors and covectors. For now, lets only think about tensors with just upper indices. Upper indices contract against covectors, so we should be able to plug in covectors, motivating the denition: 303

304

Tensors

Denition 33.4. For any nite dimensional vector spaces V1 , V2 , . . . , Vp , let V1 V2 Vp (called the tensor product of the vector spaces V1 , V2 , . . . , Vp ) be the set of all multilinear maps t (1 , 2 , . . . , p ) , where 1 is a covector from V1 , 2 is a covector from V2 , etc. Each such multilinear map t is called a tensor . Example 33.5. A tensor tij in Rn following our old denition (from chapter 32) yields a tensor t(, ) = tij i j following this new denition. On the other hand, if t (, ) is a tensor in Rn Rn , then we can dene a tensor following our old denition by letting tij = t ei , ej , where e1 , e2 , . . . , en is the usual dual basis to the standard basis of Rn . Example 33.6. Let V be a nite dimensional vector space. Recall that there is a natural isomorphism V V , given by sending any vector v to the linear function fv on V given by fv ( ) = (v ). We will henceforth identify any vector v with the function fv ; in other words we will from now on use the symbol v itself instead of writing fv , so that we think of a covector as a linear function (v ) on vectors, and also think of a vector v as a linear function on covectors , by the bizarre denition v ( ) = (v ). In this way, a vector is the simplest type of tensor. Denition 33.7. Let V and W be nite dimensional vector spaces, and take v a vector in V and w a vector in W . Then write v w for the multilinear map v w(, ) = (v ) (w). So v w is a tensor in V W , called the tensor product of v and w. Denition 33.8. If s is a tensor in V1 V2 Vp , and t is a tensor in W1 W2 Wq , then let s t, called the tensor product of s and t, be the tensor in V1 V2 Vp W1 W2 Wq given by s t (1 , 2 , . . . , p , 1 , 2 , . . . , q ) = s (1 , 2 , . . . , p ) t (1 , 2 , . . . , q ) . Denition 33.9. Similarly, we can dene the tensor product of several tensors. For example, given nite dimensional vector spaces U, V and W , and vectors u from U , v from V and w from W , let u v w mean the multilinear map u v w(, , ) = (u) (v ) (w), etc. Problem 33.1. Prove that (av ) w = a (v w) = v (aw) ( v1 + v2 ) w = v1 w + v2 w v (w1 + w2 ) = v w1 + v w2

for any vectors v, v1 , v2 from V and w, w1 , w2 from W and any number a.

33.2. What is a Tensor?

305

Problem 33.2. Take U, V and W any nite dimensional vector spaces. Prove that (u v ) w = u (v w) = u v w for any three vectors u from U , v from V and w from W . Theorem 33.10. If V and W are two nite dimensional vector spaces, with bases v1 , v2 , . . . , vp and w1 , w2 , . . . , wq , then V W has as a basis the vectors vi wJ for i running over 1, 2, . . . , p and J running over 1, 2, . . . , q . Proof. Take the dual bases v 1 , v 2 , . . . , v p and w1 , w2 , . . . , wq . Every tensor t from V W has the form t(, ) = t i v i , J wJ = i J t v i , w J so let tij = t v i , wJ to nd = tiJ i J = tiJ vi wJ (, ) . So the vi wJ span. Any linear relation between the vi wJ , just reading these lines from bottom to top, would yield a vanishing multilinear map, so would have to satisfy 0 = t v k , wL = tkL , forcing all coecients in the linear relation to vanish. Remark 33.11. A similar theorem, with a similar proof, holds for any tensor products: take any nite dimensional vector spaces V1 , V2 , . . . , Vp , and pick any basis for V1 and any basis for V2 , etc. Then taking one vector from each basis, and taking the tensor product of these vectors, we obtain a tensor in V1 V2 Vp . These tensors, when we throw in all possible choices of basis vectors for all of those bases, yield a basis for V1 V2 Vp , called the tensor product basis. Problem 33.3. Let V = R3 , W = R2 and let 1 4 x = 2 , y = . 5 3 What is x y in terms of the standard basis vectors ei eJ ?

Denition 33.12. Tensors of the form v w are called pure tensors. Problem 33.4. Prove that every tensor in V W can be written as a sum of pure tensors.

306 Example 33.13. Consider in R3 the tensor e1 e1 + e2 e2 + e3 e3 .

Tensors

This tensor is not pure (which is certainly not obvious just looking at it). Lets see why. Any pure tensor x y must be x y = x1 e1 + x2 e2 + x3 e3 y 1 e1 + y 2 e2 + y 3 e3 =x 1 y 1 e 1 e 1 + x 2 y 1 e 2 e 1 + x 3 y 1 e 3 e 1 + x1 y 2 e1 e2 + x2 y 2 e2 e2 + x3 y 2 e3 e2 + x1 y 3 e1 e3 + x2 y 3 e2 e3 + x3 y 3 e3 e3 . If we were going to have x y = e1 e1 + e2 e2 + e3 e3 , we would need x1 y 1 = 1, x2 y 2 = 1, x3 y 3 = 1, but also x1 y 2 = 0, so x1 = 0 or y 2 = 0, contradicting x1 y 1 = x2 y 2 = 1. Denition 33.14. The rank of a tensor is the minimum number of pure tensors that can appear when it is written as a sum of pure tensors. Denition 33.15. If U, V and W are nite dimensional vector spaces and b : U V W is a map for which b(u, v ) is linear in u for any xed v and linear in v for any xed u, say that b is a bilinear map. Theorem 33.16 (Universal Mapping Theorem). Every bilinear map b : U V W induces a unique linear map B : U V W , by the rule B (u v ) = b(u, v ). Sending b to B = T b gives an isomorphism T between the vector space Z of all bilinear maps f : U V W and the vector space Hom (U V, W ). Problem 33.5. Prove the universal mapping theorem:

33.2 Review Problems


Problem 33.6. Write down an isomorphism V W = (V W ) , and prove that it is an isomorphism. Problem 33.7. If S : V0 V1 and T : W0 W1 are linear maps of nite dimensional vector spaces, prove that there is a unique linear map, which we will write as S T : V0 W0 V1 W1 , so that S T (v0 w0 ) = (Sv0 ) (T w0 ) . Problem 33.8. What is the rank of e1 e1 + e2 e2 + e3 e3 as a tensor in R3 R3 ? Problem 33.9. Let any tensor v w in V W eat a covector from V by the rule (v w) = (v )w. Prove that this makes v w into a linear map V W . Prove that this denition extends to a linear map V W Hom (V , W ). Prove that the rank of a tensor as dened above is the rank of the associated linear map V W . Use this to nd the rank of i ei ei in Rn Rn .

33.3. Lower Indices

307

Problem 33.10. Take two vector spaces V and W and dene a vector space V W to be the collection of all real-valued functions on V W which are zero except at nitely many points. Careful: these functions dont have to be linear. Picking any vectors v from V and w from W , lets write the function 1, if x = v and y = w f (x, y ) = 0, otherwise, as v w. So clearly V W is a vector space, whose elements are linear combinations of elements of the form v w. Let Z be the subspace of V W spanned by the vectors (av ) w a(v w), v (aw) a(v w), (v1 + v2 ) w v1 w v2 w, v (w1 + w2 ) v w1 v w2 ,

for any vectors v, v1 , v2 from V and w, w1 , w2 from W and any number a. a. Prove that if V and W both have positive and nite dimension, then V W and Z are innite dimensional. b. Write down a linear map V W V W . c. Prove that your linear map has kernel containing Z . It turns out that (V W ) /Z is isomorphic to V W . We could have dened V W to be (V W ) /Z , and this denition has many advantages for various generalizations of tensor products. Remark 33.17. In the end, what we really care about is that tensors using our abstract denitions should turn out to have just the properties they had with the more concrete denition in terms of indices. So even if the abstract denition is hard to swallow, we will really only need to know that tensors have tensor products, contractions, sums and scaling, change according to the usual rules when we linearly change variables, and that when we tensor together bases, we obtain a basis for the tensor product. This is the spirit behind problem 33.10.

33.3 Lower Indices


Fix a nite dimensional vector space V . Consider the tensor product V V . A basis v1 , v2 , . . . , vn for V has a dual basis v 1 , v 2 , . . . , v n for V , and a tensor product basis vi v j . Every tensor t in V V has the form
j t = ti j vi v .

So it is clear where the lower indices come from when we pick a basis: they come from V .

308

Tensors

Problem 33.11. We saw in chapter 32 that a matrix A is written in indices as Ai j . Describe an isomorphism between Hom (V, V ) and V V . Lets dene contractions. Theorem 33.18. Let V and W be nite dimensional vector spaces. There is a unique linear map V V W W, called the contraction map, that on pure tensors takes v w to (v )w. Remark 33.19. We can generalize this idea in the obvious way to any tensor product of any nite number of nite dimensional vector spaces: if one of the vector spaces is the dual of another one, then we can contract. For example, we can contract V W V by a linear map which on pure tensors takes v w to (v )w. Proof. Pick a basis v1 , v2 , . . . , vp of V and a basis w1 , w2 , . . . , wq of W . Dene T on the basis vi v j wK by w if i = j, K T vi v j wK = 0 if i = j. By theorem 16.23 on page 167, there is a unique linear map T : V V W W, which has these values on these basis vectors. Writing any vector v in V as v = ai vi and any covector in V as = bi v i , and any vector w in W as w = cJ wJ , we nd T v w = ai bj cJ wJ = (v )w. Therefore there is a linear map T : V V W W , that on pure tensors takes v w to (v )w. Any other such map, say S , which agrees with T on pure tensors, must agree on all linear combinations of pure tensors, so on all tensors.

33.4 Swapping Indices


We have one more tensor operation to generalize to abstract vector spaces: i when working in indices we can associate to a tensor ti jk the tensor tkj , i.e. swap indices. This generalizes in the obvious manner. Theorem 33.20. Take V and W nite dimensional vector spaces. There is a unique linear isomorphism V W W V , which on pure tensors takes v w to w v .

33.5. Summary

309

Problem 33.12. Prove theorem 33.20, by imitating the proof of theorem 33.18. Remark 33.21. In the same fashion, we can make a unique linear isomorphism reordering the factors in any tensor product of vector spaces. To be more specic, take any permutation q of the numbers 1, 2, . . . , p. Then (with basically the same proof) there is a unique linear isomorphism V1 V2 Vp Vq(1) Vq(2) Vq(p) which takes each pure tensor v1 v2 vp to the pure tensor vq(1) vq(2) vq(p) .

33.5 Summary
We have now acheived our goal: we have dened tensors on an abstract nite dimensional vector space, and dened the operations of addition, scaling, tensor product, contraction and index swapping for tensors on an abstract vector space. All there is to know about tensors is that (1) they are sums of pure tensors v w, (2) the pure tensor v w depends linearly on v and linearly on w, and (3) the universal mapping property. Another way to think about the universal mapping property is that there are no identities satised by tensors other than those which are forced by (1) and (2); if there were, then we couldnt turn a bilinear map which didnt satisfy that identity into a linear map on tensors, i.e. we would contradict the universal mapping property. Roughly speaking, there is nothing else that you could know about tensors besides (1) and (2) and the fact that there is nothing else to know.

33.6 Cartesian Tensors


If V is a nite dimensional inner product space, then we can ask how to lower indices in this abstract setting. Given a single vector v from V , we can lower its index by turning v into a covector. We do this by contructing the covector (x) = v, x . One often sees this covector written as v or some such notation, and usually called the dual to v . Problem 33.13. Prove that the map : V V given by taking v to v is an isomorphism of nite dimensional vector spaces. Careful: the covector v depends on the choice not only of the vector v , but also of the inner product. Problem 33.14. In the usual inner product in Rn , what is the map ? Problem 33.15. Let x, y 0 be the usual inner product on Rn , and dene a new inner product by the rule x, y 1 = 2, x, y 0 . Calculate the map which gives the dual covector in the new inner product.

310

Tensors

The inverse to is usually also written , and we write the vector dual to a covector as . Naturally, we can dene an inner product on V by , = , . Problem 33.16. Prove that this denes an inner product on V . If V and W are nite dimensional inner product spaces, we then dene an inner product on V W by setting v1 w1 , v2 w2 = v1 , v2 w1 , w2 . This expression only determines the inner product on pure tensors, but since the inner product is required to be bilinear and every tensor is a sum of pure tensors, we only need to know the inner product on pure tensors. Problem 33.17. Prove that this denes an inner product on V W . Lets write V 2 to mean V V , etc. We refer to tensors in a vector space V to mean elements of V p V q for some positive integers p and q , i.e. tensor products of vectors and covectors. The elements of V p are called covariant tensors : they are sums of tensor products of vectors. The elements of V p are called contravariant tensors : they are sums of tensor products of covectors. Problem 33.18. Prove that an inner product yields a unique linear isomorphism

: V p V q V (p+q) ,

so that (v1 v2 vp 1 2 q ) = v1 v2 vp 1 2 q . This isomorphism raises indices. Similarly, we can dene a map to lower indices.

33.7 Polarization
We will generalize the isomorphism between symmetric bilinear forms and quadratic forms to an isomorphism between symmetric tensors and polynomials. Denition 33.22. Let V be a nite dimensional vector space. If t is a tensor in V p , i.e. a multilinear function t (v1 , v2 , . . . , vp ) depending on p vectors v1 , v2 , . . . , vp from V , then we can dene the polarization of t to be the function (also written traditionally with the same letter t) t(v ) = t (v, v, . . . , v ). Example 33.23. If t is a covector, so a linear function t(v ) of a single vector v , then the polarization is the same linear function. Example 33.24. In R2 , if t = e1 e2 , then the polarization is t (x) = x1 x2 for x= in R2 . x1 x2

33.7. Polarization

311

Example 33.25. The antisymmetric tensor t = e1 e2 e2 e1 in R2 has polarization t(x) = x1 x2 x2 x1 = 0, vanishing. Denition 33.26. A function f : V R on a nite dimensional vector space is called a polynomial if there is a linear isomorphism F : Rn V for which f (F (x)) is a polynomial in the usual sense. Clearly the choice of linear isomorphism F is irrelevant. A dierent choice would only alter the linear functions by linear combinations of one another, and therefore would alter the polynomial functions by substituting linear combinations of new variables in place of old variables. In particular, the degree of a polynomial function is well dened. A polynomial function f : V R is called homogeneous of degree d if f (x) = d f (x) for any vector x in V and number . Clearly every polynomial function splits into a unique sum of homogeneous polynomial functions. There are two natural notions of multiplying symmetric tensors, which simply dier by a factor. The rst is st (x1 , x2 , . . . , xa+b ) =
p

s xp(1) , xp(2) , . . . , xp(a) t xp(a+1) , xp(a+2) , . . . , xp(a+b) ,

when s has a lower indices and t has b and the sum is over all permutations p of the numbers 1, 2, . . . , a + b. The second is st (x1 , x2 , . . . , xa+b ) = 1 s t. (a + b)!

Theorem 33.27. Polarization in is a linear isomorphism taking symmetric contravariant tensors to polynomials, preserving degree, and taking products to products (using the second multiplication above). Proof. ???

34 Exterior Forms
This chapter develops the denition and basic properties of exterior forms.

34.1 Why Forms?


This section is just for motivation; the ideas of this section will not be used subsequently. Antisymmetric contravariant tensors are also called exterior forms. In terms of indices, they have no upper indices, and they are skew-symmetric in their lower indices. The reason for the importance of exterior forms is quite deep, coming from physics. Consider uid passing into and out of a region in space. We would like to measure how rapidly some quantity (for instance heat) ows into that region. We do this by integrating the ux of the quantity across the boundary: counting how much is going in, and subtracting o how much is going out. This ux is an integral along the boundary surface. But if we change our mind about which part is the inside and which is the outside, the sign of the integral has to change. So this type of integral (called a surface integral) has to be sensitive to the choice of inside and outside, called the orientation of the surface. This sign sensitivity has to be built into the integral. We are used to integrating functions, but they dont change sign when we change orientation the way a ux integral should. For example, the area of a surface is not a ux integral, because it doesnt depend on orientation. Lets play around with some rough ideas to get a sense of how exterior forms provide just the right sign changes for ux integrals: we can integrate exterior forms. For precise denitions and proofs, Spivak [13] is an excellent introduction. Lets imagine some kind of object that we can integrate over any surface S (as long as S is reasonably smooth, except maybe for a few sharp edges: lets not make that precise since we are only playing around). Suppose that the integral is a number. Of course, S must be a surface with a choice of orientation. S Moreover, if we write S for the same surface with the opposite orientation, then S = S . Suppose that S varies smoothly as we smoothly deform S . Moreover, suppose that if we cut a surface S into two surfaces S1 and S2 , then S = S1 + S2 : the integral is a sum of locally measureable quantities. Fix a point, which we can translate to become the origin, and scale up the picture so that eventually, for S a little piece of surface, S is very nearly invariant under small translations of S . (We can do this because the integral 313
S = S1 S1

314

Exterior Forms

varies smoothly, so after rescaling the picture the integral hardly varies at all.) Just for simplicity, lets assume that S is unchanged when we translate the surface S . Any two opposite sides of a box are translations of one another, but with opposite orientations. So they must have opposite signs for . Therefore any small box has as much of our quantity entering as leaving. Approximating any region with small boxes, we must get total ux S = 0 when S is the boundary of the region. Pick two linearly independent vectors u and v , and let P be the parallelogram at the origin with sides u and v . Pick any vector w perpendicular to u and v and with det u v w > 0. Orient P so that the outside of P is the side in the direction of w. If we swap u and v , then we change the orientation of the parallelogram P . Lets write (u, v ) for P . Slicing the parallelogram into 3 equal pieces, say into 3 parallelograms with sides u/3, v , we see that (u/3, v ) = (u, v )/3. In the same way, we can see that (u, v ) = (u, v ) for any positive rational number (dilate by the numerator, and cut into a number of pieces given by the denominator). Because (u, v ) is a smooth function of u and v , we see that (u, v ) = (u, v ) for > 0. Similarly, (0, v ) = 0 since the parallelogram is attened into a line. Moreover, (u, v ) = (u, v ), since the parallelogram of u, v is the parallelogram of u, v reected, reversing its orientation. So (u, v ) scales in u and v . By reversing orientation, (v, u) = (u, v ). A shear applied to the parallelogram will preserve the area, and after the shear we can cut and paste the parallelogram as in chapter 19. The integral must be preserved, by translation invariance, so (u + v, v ) = (u, v ). The hard part is to see why is linear as a function of u. This comes from the drawing u u+v v v+w The integral over the boundary of this region must vanish. If we pick three vectors u, v and w, which we draw as the standard basis vectors, then the region has boundary given by various parallelograms and triangles (each triangle being half a parallelogram), and the vanishing of the integral gives 1 1 0 = (u, v ) + (v, w) + (w, u) (w, u) (u + w, v ). 2 2 Therefore (u, v ) + (v, w) = (u + w, v ), so that nally is a tensor. If you like indices, you can write as ij with ji = ij . Our argument is only slightly altered if we keep in mind that the integral should not really be exactly translation invariant, but only vary slightly S w

34.2. Denition

315

with small translations, and that the integral around the boundary of a small region should be small. We can still carry out the same argument, but throwing in error terms proportional to the area of surface and extent of translation, or to the volume of a region. We end up with being an exterior form whose coecients are functions. If we imagine a ow contained inside a surface, we can similarly measure ux across a curve. We also need to be sensitive to orientation: which side of the boundary of a surface is the inside of the surface. Again the correct object to work with in order to have the correct sign sensitivity is an exterior form (whose coecients are functions, not just numbers). Similar remarks hold in any number of dimensions. So exterior forms play a vital role because they are the objects we integrate. We can easily change variables when we integrate exterior forms.

34.2 Denition
Denition 34.1. A tensor t in V p is called a p-form if it is antisymmetric, i.e. t (v1 , v2 , . . . , vp ) is antisymmetric as a function of the vectors v1 , v2 , . . . , vp , For any permutation q , t vq(1) , vq(2) , . . . , vq(p) = (1)N t (v1 , v2 , . . . , vp ) , where (1)N is the sign of the permutation q . Example 34.2. The form in Rn (v1 , v2 , . . . , vn ) = det v1 v2 ... vn

is called the volume form of Rn (because of its interpretation as an integrand: is the volume of any region R.). R Example 34.3. A covector in V is a 1-form, because there are no permutations you can carry out on (v ). Example 34.4. In Rn we traditionally write points as x1 2 x , x= . . . xn and write dx1 for the covector given by the rule dx1 (y ) = y 1 for any vector y in Rn . Then = dx1 dx2 dx2 dx1 is a 2-form: (u, v ) = u1 v 2 u2 v 1 .

316 Problem 34.1. If t is a 3-form, prove that a t(x, y, y ) = 0 and b t(x, y + 3 x, z ) = t(x, y, z ) for any vectors x, y, z .

Exterior Forms

A Hints

1 Solving Linear Equations


1.1. 0 0 1 1 0 0 0 1 1 1 3 1 1 1 0 1

Swap rows 1 and 3. 1 0 0 1 0 0 0 1 3 1 1 1 0 1 1 1

Add (row 1) to row 4. 1 0 0 0 0 0 0 1 3 1 1 2 0 1 1 1

Move the pivot

. 1 0 0 0 0 0 0 1 317 3 1 1 2 0 1 1 1

318 Swap rows 2 and 4. 1 0 0 0 0 1 0 0 3 2 1 1 0 1 1 1

Hints

Move the pivot

. 1 0 0 0 0 1 0 0 3 2 1 1 0 1 1 1

Add (row 3) to row 4. 1 0 0 0 0 1 0 0 3 2 1 0 0 1 1 0

Move the pivot

. 1 0 0 0 0 1 0 0 3 2 1 0 0 1 1 0

Move the pivot . 1 0 0 0 0 1 0 0 3 2 1 0 0 1 1 0

1.2. 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1

Hints

319 . 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1

Move the pivot

Move the pivot

. 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1

Move the pivot . 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1

Swap rows 3 and 4. 1 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0

1.3. x1 = 7 x4 x4 x2 = 3 2 x4 x 3 = 3 + . 2 1.4. Forward eliminate:

0 4 4

2 1 3

1 1 3

1 2 4

320 Swap rows 1 and 2. 4 0 4 Add (row 1) to row 3. 4 0 0 Move the pivot . 4 0 0 Add 2(row 2) to row 3. 4 0 0 Move the pivot . 4 0 0 Move the pivot . 4 0 0 Move the pivot . 4 0 0 1 2 0 1 1 0 2 1 0 1 2 0 1 1 0 2 1 0 1 2 0 1 1 0 2 1 0 1 2 0 1 1 0 2 1 0 1 2 4 1 1 2 2 1 2 1 2 4 1 1 2 2 1 2 1 2 3 1 1 3 2 1 4

Hints

Hints

321

Back substitute:
1 Scale row 2 by 2 .

4 0 0 Add row 2 to row 1. 4 0 0


1 . Scale row 1 by 4

1 1 0

1
1 2

2 0

1 2

0 1 0

3 2 1 2

5 2 1 2

1 0 0

0 1 0

3 8 1 2

5 8 1 2

x1 = 3/8 x3 + 5/8 x2 = 1/2 x3 + 1/2

1.5. 2 0 0 1.6. 1 0 0 1.9. 0 0 0 1 1 1 0 1 1 0 0 0 0 1 0 1 0 0 0 1 1 2 0 1 1 2 0 2 0 2 2 1

322 Swap rows 1 and 4. 1 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 1 0 0 0

Hints

Move the pivot

. 1 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 1 0 0 0

Add (row 2) to row 4. 1 0 0 0 1 1 0 0 0 0 0 1 1 1 0 1 1 0 0 0

Move the pivot

. 1 0 0 0 1 1 0 0 0 0 0 1 1 1 0 1 1 0 0 0

Swap rows 3 and 4. 1 0 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0 0

Move the pivot

. 1 0 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0 0

Hints

323

Move the pivot . Move the pivot . 1 0 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0 0

1.10. 1 2 3 3 5 8 2 4 6 6 1 7

Add 2(row 1) to row 2, 3(row 1) to row 3. 1 3 2 6 0 1 0 11 0 1 0 11 Move the pivot . 1 0 0 Add (row 2) to row 3. 3 1 1 2 0 0 6 11 11

1 0 0

3 1 0

2 0 0

6 11 0

Move the pivot

. 1 0 0 3 1 0 2 0 0 6 11 0

324 Move the pivot . 1 0 0 Move the pivot . 1 0 0 3 1 0 2 0 0 6 11 0 3 1 0 2 0 0 6 11 0

Hints

1.11. Scale row 3 by 1. Add (row 3) to row 1. Add (row 2) to row 1. 1 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0

1.12. Scale row 2 by 1 2. 1 0 0 1 1 0 0 0 1

Hints

325

Add (row 2) to row 1. 1 0 0 Scale row 1 by 1. I 1.13. Scale row 2 by 1. 1 0 0 0 1 0 1 1 0 0 1 0 0 0 1

1.14.

1 0 0 1.15. I 1.16. Forward eliminate: 1 1 0 0 Add (row 1) to row 2. 1 0 0 0 Move the pivot . 1 0 0 0 2 0 0 0

0 1 0

1 3 1 3 0

2 2 0 0

1 2 1 0

1 1 2 1

1 0 0 2

2 0 0 0

1 1 1 0

1 0 2 1

1 1 0 2

1 1 1 0

1 0 2 1

1 1 0 2

326 Move the pivot . 1 0 0 0 2 0 0 0 1 1 1 0 1 0 2 1 1 1 0 2

Hints

Add (row 2) to row 3. 1 0 0 0 Move the pivot . 1 0 0 0

2 0 0 0

1 1 0 0

1 0 2 1

1 1 1 2

2 0 0 0

1 1 0 0

1 0 2 1

1 1 1 2

Add 1 2 (row 3) to row 4. 1 0 0 0 Move the pivot . 1 0 0 0

2 0 0 0

1 1 0 0

1 0 2 0

1 1 1
3 2

2 0 0 0

1 1 0 0

1 0 2 0

1 1 1 3 2

Back substitute: Scale row 4 by 2 3. 1 0 0 0 2 0 0 0 1 1 0 0 1 0 2 0 1 1 1 1

Hints

327

Add (row 4) to row 1, row 4 to row 2, (row 4) to row 3. 1 0 0 0 2 0 0 0 1 1 0 0 1 0 2 0 0 0 0 1

Scale row 3 by 1 2. 1 0 0 0 2 0 0 0 1 1 0 0 1 0 1 0 0 0 0 1

Add (row 3) to row 1. 1 0 0 0 2 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1

Add (row 2) to row 1. 1 0 0 0 2 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

Scale row 1 by 1. 1 0 0 0 2 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

There are no solutions. 1.17. Forward eliminate: 1 2 0

2 5 1

3 7 1

4 11 4

5 12 3

328 Add 2(row 1) to row 2.

Hints

1 0 0

2 1 1

3 1 1

4 3 4

5 2 3

Move the pivot

. 1 0 0 2 1 1 3 1 1 4 3 4 5 2 3

Add (row 2) to row 3.

1 0 0

2 1 0

3 1 0

4 3 1

5 2 1

Move the pivot

. 1 0 0 2 1 0 3 1 0 4 3 1 5 2 1

Move the pivot . 1 0 0 2 1 0 3 1 0 4 3 1 5 2 1

Back substitute: Add 4(row 3) to row 1, 3(row 3) to row 1 2 3 0 1 1 0 0 0 Add 2(row 2) to row 1. 1 0 0 2. 0 0 1 1 1 1

0 1 0

1 1 0

0 0 1

3 1 1

Hints

329 x 1 = x 3 + 3 x 2 = x 3 1 x4 = 1

1.18. Forward eliminate: 2 1 2 1 1 1 1 1

1 1 2 1

1 1 1 2

0 0 0 0

1 1 Add 1 2 (row 1) to row 2, 2 (row 1) to row 3, 2 (row 1) to row 4.

2 0 0 0

1 3 2
3 2 3 2

1
3 2 3 2 3 2

1
3 2 3 2 3 2

0 0 0 0

Move the pivot

. 2 1 3 2
3 2 3 2

1
3 2 3 2 3 2

1
3 2 3 2 3 2

0 0 0

0 0 0

Add row 2 to row 3, row 2 to row 4. 2 1 3 2 0 0 1


3 2

1
3 2

0 0 0 Move the pivot . 2

0 3

3 0

0 0 0 0

0 0 0

1 3 2 0 0

1
3 2

1
3 2

0 3

3 0

0 0 0

330 Swap rows 3 and 4. 2 1 3 2 0 0 1


3 2

Hints

1
3 2

0 0 0 Move the pivot . 2

3 0

0 3

0 0 0

0 0 0

1 3 2 0 0

1
3 2

1
3 2

3 0

0 3

0 0 0

Back substitute:
1 Scale row 4 by 3 .

0 0 0

1 3 2 0 0

1
3 2

1
3 2

3 0

0 1

0 0 0

3 (row 4) to row 2. Add (row 4) to row 1, 2

0 0 0
1 Scale row 3 by 3 .

1 3 2 0 0

1
3 2

0 0 0 1

3 0

0 0 0

0 0 0

1 3 2 0 0

1
3 2

0 0 0 1

1 0

0 0 0

Hints

331

Add (row 3) to row 1, 3 2 (row 3) to row 2. 2 1 3 2 0 0 0 0 1 0 0 0 0 1 0

0 0 0 Scale row 2 by 2 3. 2 0 0 0

0 0 0

1 1 0 0

0 0 1 0

0 0 0 1

0 0 0 0

Add (row 2) to row 1. 2 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0

Scale row 1 by 1 2. 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0

x1 = 0 x2 = 0 x3 = 0 x4 = 0

1.20. You could try: a. One solution: x1 = 0. b. No solutions: x1 = 1, x2 = 1, x1 + x2 = 0. c. Innitely many solutions: x1 + x2 = 0.

332 1.22.

Hints

1.27. (a)=(2), (b)=(4), (c)=(1), (d)=(5), (e)=(3)

2 Matrices
2.2. (a) All coordinates of each vertex are 1. (b) The vertices of a regular octahedron lie in the centers of the faces of a cube. (c) Try an equilateral triangle in the plane rst. This should lead you to the points 1 1 1 with an even number of minus signs. 2.3. 2.4. A= 1 1 0 ,B = 0 0 . 1

2.9. Any matrix full of zeros with at least two columns. 2.13. 1 8 + (1) 1 + 2 3 = 13 2.14. 14 20 20 29

Hints

333

2.16. 0 AB = 0 0 4 4 0 AD = 0 0 AC = BC = 0 0 10 0 2 2 2 2 2 0 0 0 1 1 0 . 0 1 1 4 4

CA = 2.18. You could try A=

1 1

1 . 1

2.20. In an upper triangular matrix, Aij = 0 if i > j . So nonzero terms have i j . The product: (AB )ij = k Aik Bkj The Aik vanishes unless i k , and the second vanishes unless k j , so the whole sum consists in terms with i k j . The sum vanishes unless i j , hence upper triangular. Moreover, the terms with i = j must have i k j , so just the one Aii Bii term. 2.22. (c(AB ))ij = c(AB )ij =c
k

Aik Bkj cAik Bkj


k

= =
k

(cA)ik Bkj

= ((cA)B )ij . 2.23. ((AB ) C )ij =


k

(AB )ik Ckj Ai B k Ckj .


k

334 On the other hand, (A (BC ))ij =


k

Hints

Aik (BC )kj Aik Bk C j .


k

Since k and are just used to add up, we can change their names to anything we like. In particular, the resulting sums wont change if we rename k to and to k . Moreover, we can carry out the sums in any order. (You still have to show that each side is dened just when the other is.) 2.24. (A(B + C ))ij =
k

Aik (B + C )kj Aik (Bkj + Ckj )


k

= =
k

Aik Bkj +
k

Aik Ckj

= (AB )ij + (AC )ij .

3 Important Types of Matrices


3.2. One proof: (IA)ij =
k

Iik Akj

= Aij because Iik = 1 just when k = i. A hint for a dierent proof (without using ): if A is 1 1, then the result is clear. Now suppose that we have proven the result already for all matrices of some size smaller than p q , but that we face a matrix A which is p q . Then split A into blocks, in any way you like, say as A= and write out IA = I 0 0 I P R Q , S P R Q , S

and calculate out the result, using the fact that since P, Q, R and S are smaller matrices, we can pretend that have already checked the result for them.

Hints

335

3.3. IB = BI = I but IB = B . 3.5.

1 e1 = 0 . 0

3.6. Row j . All rows except row j . 3.7. One proof: A11 A12 A21 A22 Ae1 = . . . . . . An 1 An 2

... ... ... ...

1 A1n A2n 0 . . . . . . Ann 0

so running your ngers along the rows of A and column of e1 : A11 1 + A12 0 + + A1n 0 A21 1 + A22 0 + + A2n 0 = . . . An1 1 + An2 0 + + Ann 0 A11 A21 = . . . An 1 Another proof: e1 has entries (e1 )i = 1 if i = 1 and (e1 )i = 0 if i = 1. So Ae1 has entries (Ae1 )i = k Aik (e1 )k . Each term is zero except if k = 1, in which case it is Ai1 . But the entries of the rst column of A are A11 , A21 , . . . , An1 . 3.9. Write x = xj ej , and multiply both sides by A. 3.10. Use the fact that the columns of AB are A times columns of B . 3.11. You could try A = I3 3.15. 0 , B= I3 0 .

2 1

0 0

0 1

1 0

2 1

0 1

1 1

0 1

336

Hints

3.16. Mostly yes. You only have to try to gure out where e1 goes to, and where e2 goes to. The vector e1 is close to her left eye. The vector e2 is close to the top of her head, which is not really marked by anything, so harder to follow. But you cant gure out what matrix gives the straight line segment. Why? 3.19. C = CI = C (AB ) = (CA)B = IB = B . 3.22. B 1 A1 (AB ) = B 1 A1 A B = B 1 IB = B 1 B = I. and similarly multiplying out (AB ) B 1 A1 . 3.23. By denition, AA1 = A1 A = I . But these equations say exactly that A is the inverse of A1 . 3.24. Multiply both sides of the equation Ax = 0 by A1 . 3.25. Multiply both sides of AB = I by A1 to nd B = A1 . Therefore BA = I , and so A = B 1 . 3.27. If x = y then clearly Ax = Ay . If Ax = Ay , then multiply both sides by A1 . 3.28. M 1 = A 1 0 A1 BD1 D 1

3.29. See gure A.1 on the next page. 3.31. 2, 3, 4, 1; 3, 4, 1, 2; and 4, 2, 3, 1. 3.33. pq is 2, 4, 3, 1. 3.35. Put the number 1 into the spot where the permutation wants it to go, by swapping 1 with whatever number sits in that spot. Then put the number 2 into its spot, etc., swapping two numbers at each step. 3.36. (a) 3,1,2 is (b) 4,3,2,1 is (c) 4,1,2,3 is 3.37. These transpositions allow us to shift any number we want over to the left or to the right, one step. Keep doing this until it lands in its place. So we can put whatever number we want to in the last place, and then proceed by induction.

Hints

337

Figure A.1: Images coming from some matrices, and from their inverses

3.39. P = e4 0 0 = 0 1 0 e2 0 1 0 0 0 0 0 0 0 1 e5 0 0 1 0 0 e3 e1 1 0 0 . 0 0

3.40. 2,3,1,4 3.42. Each column has to be a column of the identity matrix, since it has all 0s except for a single 1. But all of the columns have to be dierent, in order that no two columns have 1s in the same row. So they are dierent columns of the identity matrix. If some column of the identity matrix doesnt show up anywhere in our matrix, say the third column for example, then the third row has only 0s. So every column of the identity matrix shows up, precisely once, scrambled in some permutation.

338 3.43. P Qej = P eq(j ) = ep(q(j )) .

Hints

3.45. By denition, column j of P is ep(j ) . So we permuted by p. For the rows, P A is A with row 1 moved to row p(1), etc. Take A = I : then P is 1 with row 1 moved to row p(1), etc. So row p(1) of P is e1 , etc. So row 1 of P is ep1 (1) , etc. 3.47. The rows of P are those of 1 swapped by p1 .

4 Elimination Via Matrix Arithmetic


4.2. Sej must be ej with multiples of rows added only to later rows, so column j of S has zeros above row j and 1 on row j . Hence S1j = S2j = = Sj 1 j = 0 and Sjj = 1. 4.3. 1 0 0 S = 7 1 0 . 0 5 1 4.4. Proof (1): Multiplying by R adds entries only to lower entries, preserving the 1s on and 0s above the diagonal. Proof (2): A matrix R is strictly lower triangular just when Rij = 0 unless i j and Rij = 1 for i = j . (RS )ij =
k

Rik Skj .

But this sum is all 0s unless we nd i k and k j , so we need i j . If i = j , then only the term k = i = j makes a contribution, which is Rii Sii = 1. Proof (3): Obvious for 1 1 matrices. Suppose that we have proven the result for all matrices of size smaller than n n. If R and S are n n, write R= 1 a 0 B , S= 1 p 0 Q

where a and p are columns, and B and Q are strictly lower triangular. Then RS = 1 a + Bp 0 BQ

which is strictly lower triangular because B and Q are strictly lower triangular of smaller size. 4.5. Suppose that we want to make a matrix S which is p q . Start with the identity matrix. Multiply it by the elementary matrix with S1q in row 1,

Hints

339

column q . We get one element into place. Keep going, rst getting things set up properly along the bottom row. 4.7.

4.9. If

t1 D=

t2 .. . tn ,

then
1

D =

1 t 1 1 t 2

.. .

.
1 t n

If one of these ti is 0, then there cant be an inverse, because Dei = 0, so if D has an inverse, then D1 Dei = ei , but also D1 Dei = D1 0 = 0. 4.11. ad 0 0 0 be 0 0 0 cf 4.12. The original picture is

The three resulting pictures are:

4.13. 1 3 2 4 x1 x2 = 7 8

340 4.14. P 99 = P swaps rows 1 and 2. 0 P 99 = P = 1 0

Hints

1 0 0

0 0 . 1

4.15. S 101 adds 202(row 1) to row 3. 1 0 101 S = 0 1 202 0 4.23. Any number between 0 and 3. 4.24. a. 1 2 3 0 0 0 0 0 0

0 0 . 1

4 6 0

5 7 8

b. Impossible: can only use 1 pivot in each row, so at most 3 pivots binding up variables, so must have at least 2 left over free variables (or at least one free variable, if the last column is a column of constants). c. 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 4.27. There is at most one pivot in each row. There are more columns than rows. So there must be a pivotless column: a free variable for the equation Ax = 0.

5 Finding the Inverse of a Matrix


5.1. Forward eliminate: 0 0 1 1 1 2 Swap rows 1 and 2. 1 0 1 1 0 2 0 1 1 0 1 0 1 0 0 0 0 1

1 0 1

1 0 0

0 1 0

0 0 1

Hints

341

Add (row 1) to row 3. 1 0 0 Move the pivot . 1 0 1 0 1 1 0 1 0 1 0 1 0 0 1

1 0 0 Swap rows 2 and 3. Move the pivot .

1 0 1

0 1 1

0 1 0

1 0 1

0 0 1

1 0 0

1 1 0

0 1 1

0 0 1

1 1 0

0 1 0

1 0 0

1 1 0

0 1 1

0 0 1

1 1 0

0 1 0

Back substitute: Add (row 3) to row 2. 1 0 0 Add (row 2) to row 1. 1 0 0 0 1 0 0 0 1 1 1 1 2 1 0 1 1 0 1 1 0 0 0 1 0 1 1 1 1 0 0 1 0

A 1

1 = 1 1

2 1 0

1 1 . 0

342 5.2. Forward eliminate: 1 2 1 2 1 0

Hints

3 0 0

1 0 0

0 1 0

0 0 1

Add (row 1) to row 2, (row 1) to row 3. 1 2 3 1 0 0 3 1 0 2 3 1 Move the pivot . 1 0 0 Swap rows 2 and 3. Move the pivot . 1 0 0 2 2 0 3 3 3 1 1 1 2 0 2 3 3 3 1 1 1

0 1 0

0 0 1

0 1 0

0 0 1

1 0 0

2 2 0

3 3 3

1 1 1

0 0 1

0 1 0

0 0 1

0 1 0

Back substitute: Scale row 3 by 1 3. 1 0 0 2 2 0 3 3 1 1 1


1 3

0 0 1 3

0 1 0

Add 3(row 3) to row 1, 3(row 3) to row 2. 1 2 0 0 2 0 0 0 1 0 0 1 3

1 1 1 3

0 1 0

Hints

343

Scale row 2 by 1 2. 1 0 0 2 1 0 0 0 1 0 0
1 3

1
1 2 1 3

0 1 2 0

Add 2(row 2) to row 1. 1 0 0 0 1 0 0 0 1 0 0


1 3

0
1 2 1 3

1 1 2 0

A 1 5.3. Not invertible. 5.4.

0 = 0
1 3

0
1 2 1 3

1 1 . 2 0

A1 = 3 2 1 2

1 1 0

1 3 . 2
1 2

5.6. Yes, invertible. 5.7. No, not invertible. 5.8. If A is invertible, A1 Ax = x = 0. On the other hand, if Ax = 0 holds only for x = 0, then the same is true of U x = 0, after GaussJordan elimination. Any column of U with no pivot gives a free variable, so there must be a pivot in each column, going straight down the diagonal, so U is invertible. 5.9. Lets use theorem 5.9. Is there a solution x to the equation Ax = b for every choice of b? Yes: try x = Bb. So A is invertible. Multiply AB = I on both sides by A1 to get B = A1 . 5.10. We have already seen that if A and B are both invertible, then 1 1 (AB ) = B 1 A1 . Suppose that AB is invertible. Then (AB ) (AB ) = I so A B (AB )
1 1

= I . Therefore A is invertible. Multiply on the left by A1 :


1

B (AB ) = A1 . Multiply by A on the right: B (AB ) invertible. 5.11. Yes 5.12. Forward elimination: 1 2 3 1 2 1 0 1 15

A = I . So B is

344 Add 1(row 1) to row 2. 1 0 0 Swap rows 2 and 3 1 0 0 1 0 0 2 1 0 2 1 0 3 15 2 3 15 2 2 0 1 3 2 15

Hints

The matrix is invertible, so the equations have a unique solution. 5.15. You could try 0 1 0 A = 0 2 1 1 0 0 which has forward elimination 1 0 0 0 2 0 0 1 1 2 ,

while its transpose At has forward elimination 1 0 0 2 1 0 0 0 . 1

5.16. U x = V b just when V Ax = V b just when Ax = b. So to solve Ax = b we need b to solve U x = V b. The last two rows of U x = V b give the conditions above. If those are satised, we can then drop those rows, and start solving the rst two rows with the pivots.

Hints

345

6 The Determinant
6.1. If a = 0, use it as a pivot, and forward elimination yields a b bc . 0 d a Therefore if a = 0, then A is invertible just when d bc a = 0. Multiplying by a, we see that A is invertible just when ad bc = 0. What if a = 0? We try to swap. Forward elimination yields c 0 d b .

Invertibility (when a = 0) is just precisely both b and c not vanishing. But (when a = 0) ad bc = bc vanishes just when b or c does, so just when invertibility fails. 6.2. det a c b d = a det = ad bc. 6.3. +3 (3) 1 (1) = 10 6.4. +1 det 1 0 1 1 0 det 1 0 0 1 + 1 det 1 1 0 1 = 2 a c b d c det a c b d

6.10. 1 4 6 = 24 6.11. For 1 1 matrices U , the results are obvious: U = (u) , U 1 = 1 u .

Suppose that U is n n and assume that we have already checked that all smaller invertible upper triangular matrices. Split into blocks, in any manner at all A B U= 0 C with A and C square and upper triangular. You will have to nd a way to see that A and C are invertible. Once you do that, check that U 1 = A1 0 A1 BC 1 C 1 .

346

Hints

We see by induction that (1) U 1 is upper triangular, (2) the diagonal entries of U 1 are the reciprocals of the diagonal entries of U , and (3) we can compute the entries U 1 inductively in terms of the entries of U . 6.13. Swapping changes the sign of the determinant. But swapping doesnt change the matrix, so it doesnt change the sign of the determinant. Therefore the determinant is 0. 6.14. You can use A= 1 0 0 ,B = 0 0 0 0 1

for example. 6.15. The determinant goes up by 6 times.

7 The Determinant via Elimination


7.1. 2 2 2 5 5 6 5 7 11

Add (row 1) to row 2, (row 1) to row 3. 5 5 2 0 2 0 0 1 6 Make a new pivot . 2 0 0 Swap rows 2 and 3 2 0 0 Make a new pivot . 2 0 0 5 1 0 5 6 2 5 1 0 5 6 2 5 0 1 5 2 6

Hints

347

So det A = (2)(1)(2): the minus sign because of one row swap. 7.2. 0 because the second row is a multiple of the rst. 7.3. From the fast formula, det A = 0 just when there is a pivot in each column. 7.4. GaussJordan elimination with one row swap yields 1 0 1 1 0 1 1 1 0 0 1 1 0 so 1 1 det 0 1 0 0 1 0 1 0 1 1 1 0 = 1 1 0 0 0 1

7.5. GaussJordan elimination 1 0 0 0 so 0 1 det 1 2

with one row swap yields 1 0 0 2 2 0 0 2 1 3 0 0 2 2 1 1 0 2 0 0 1 0 0 =6 1 1

7.6. GaussJordan elimination with no row swaps yields 1 1 2 3 0 1 2 2 0 0 0 so 2 det 1 2 7.7. +0 det 0 2 1 1 0 det 2 2 0 1 + 2 det 2 0 0 1 = 4 1 1 1 1 0 =0 1

348 7.8. +2 det 7.9. +0 det 1 1 2 0 0 det 1 1 1 0 + 0 det 1 1 1 2 0 2 2 1 2 det 1 2 1 1 + 0 det 1 0 1 2 = 14

Hints

=0

7.10. Rescaling that row rescales the determinant. But rescaling that row doesnt change anything, so it must not change the determinant. The determinant must not change when scaled, so must be 0. 7.12. To see that det A = 12, we compute 0 2 1 1 2 3 3 5 2 Swap rows 1 and 2 3 0 3 Add (row 1) to row 3. 3 0 0 Make a new pivot . 3 0 0 Add 2(row 2) to row 3. 3 0 0 1 2 0 2 1 2 1 2 4 2 1 0 1 2 4 2 1 0 1 2 5 2 1 2

Hints

349 . 3 0 0 1 2 0 2 1 2

Make a new pivot

7.13. If L11 = 0, then we have a zero row, so det = 0, and the result is obviously true. If L11 = 0, then use it as a pivot to kill everything underneath it: L11 L22 0 0 L32 L33 .. 0 . L L 42 43 . . . . . . . . . . . . . . . 0 Ln2 Ln3 ... Ln(n1) Lnn Proceed by induction. 7.15. Here is one proof: the i-th row of At is obvious the transpose of t the i-th column of A: ei t At = (Aei ) . Writing out any vector x as x = t x1 e1 + x2 e2 + + xn en , and adding up, we see that xt At = (Ax) for any vector x. Apply this to a column of B , say x = Bei . ei t B t At = (Bei ) At = (ABei )
t t t

= ei t (AB ) . Here is another proof, using lots of indices: (AB )ij = (AB )ji =
k t

Ajk Bki Bki Ajk


k t Bik At kj k

= =

= B t At

ij

7.16. It is the inverse permutation; see the exercises in subsection 3.3. 7.17. 4: expand down the third column.

350 7.18. For k = 1, A1 = A, so obvious. By induction, det Ak+1 = det A Ak = (det A) det Ak = (det A) (det A) = (det A)
k+1 k

Hints

7.19. det A2222444466668888 = (det A) = (1)2222444466668888 = 1 (an even number of minus signs). 7.20. AA1 = I so det A det A1 = 1. 7.21. a. By expanding down any column. b. By expanding across any row. c. By forward elimination, and then taking the product of the diagonal entries. (The fastest way for a big matrix.) 7.22. One, because the determinant of the coecients is 1 2 3 = 6.

2222444466668888

8 Span
8.1. x1 + 2x2 + x3 = 1 x2 + x3 = 0 x1 + x2 = 2 8.3. Yes. 8.4. Lets call these vectors x1 , x2 , x3 . Make the matrix A = x1 Apply forward elimination: 1 A = 0 1 Add 1(row 1) to row 3. 1 0 0 1 0 1 1 1 1 1 0 0 1 1 0 x2 x3 .

Hints

351

Swap rows 2 and 3 1 0 0 1 0 0 1 1 0 1 1 0 1 1 1 1 1 1

If we add any vector y , we cant add another pivot, so every vector y is a linear combination of x1 , x2 , x3 . Therefore the span is all of R3 . 8.5. Yes. Forward eliminate: 0 2 1 1 1 1 1 0 0 2 1 1 Swap rows 1 and 2. 1 0 0 Move the pivot . 1 0 0 Add (row 2) to row 3. 1 0 0 Move the pivot . 1 0 0 1 2 0 1 1 2 0 1 2 1 2 0 1 1 2 0 1 2 1 2 2 1 1 1 0 1 1 1 2 2 1 1 1 0 1 1

352

Hints

There is no pivot in the nal column, so the nal column is a linear combination of earlier columns. 8.6. No. Forward eliminate: 2 2 4 0 0 1 0 2 0 1 0 0 Move the pivot . 2 0 0 Add row 2 to row 3. 2 0 0 Move the pivot . 2 0 0 Move the pivot . 2 0 0 2 1 0 4 0 0 0 2 2 2 1 0 4 0 0 0 2 2 2 1 0 4 0 0 0 2 2 2 1 1 4 0 0 0 2 0

There is a pivot in the nal column, so the nal column is linearly independent of earlier columns. 8.9. x1 + x2 + 2x3 = 0 8.11. You can rescale and add as many times as you need to, forming any linear combination. 8.12. No: it doesnt contain 0. 8.13. Yes 8.14. Obviously every vector in a subspace is a linear combination of vectors in the subspace: x = 1 x. So the subspace lies inside the span of its vectors. Conversely, every linear combination of vectors in a subspace belongs

Hints

353

to the subspace, so the span of the vectors in the subspace lies in the subspace. Therefore a subspace is its own span. 8.15. {0} and R . 8.16. a. Not always. Take the x and y axes in the (x, y ) plane. b. Yes. 8.17. No: it contains 1 x= 1 but doesnt contain 2x = 8.18. a. no b. yes c. yes d. no 8.20. The lines through 0. 2 . 2

9 Bases
9.2. Put the standard basis into the columns of a matrix, and you have the identity matrix. Look: there is a pivot in each column. 9.4. Try adding a vector to the set. If you cant then you are done: a basis. If you can, then keep going. If you end up with more than n vectors, then use theorem 9.4. 9.5. Put them into the columns of matrix A. You nd det A = 0, so these vectors are linearly dependent. 9.6. Put them into the columns of a matrix, and apply forward elimination to nd pivots: 2 0 1 3 . 2

A square matrix, a pivot in every column, so a basis. 9.10. Write A as columns A = u1 u2 ... un .

The equation Ax = 0 is the equation x1 u1 + x2 u2 + + xn un = 0, which imposes a linear relation. Therefore u1 , u2 , . . . , un are linearly independent just when Ax = 0 has x = 0 as its only solution.

354 9.11. F 1 1 = 0 0 0 1 0 1 1 1 , F AF = 0 0 1 0 0 2 0 1 0 . 2

Hints

9.12. 1 A = 0 0 1 F 1 = 0 0 0 1 0 2 1 0 1 0 0 , F = 0 0 0 4 1 2 , F AF 1 2 1 0 1 = 0 0 0 2 , 1 0 1 0 4 2 . 0

9.14. Expand out x = x1 e1 + x2 e2 + + xq eq to give Ax = x1 (Ae1 ) + x2 (Ae2 ) + + xq (Aeq ) . So Ax = 0 just when x1 , x2 , . . . , xq give a linear relation among the columns of A. 9.15. No 9.17. Yes 9.19. Let F = x1 G = y1 x2 y2 ... ... xn yn ,

and let A = G F 1 . But why is there only one such matrix? 9.21. The idea is that e2 e3 = (e1 e3 ) (e1 e2 ), etc. So consider the vectors e1 e2 , e1 e3 , . . . , e1 en1 . Clearly if i = 1, then ei ej is one of these vectors. But if i = 1, then ei ej = (1) (e1 ei ) + (e1 ej ) . So the vectors e1 e2 , e1 e3 , . . . , e1 en1 span the subspace. Clearly these vectors are linearly independent, because each one has a nonzero entry just at a spot where all of the others have a zero entry. Alternatively, to see linear independence, any linear relation among them: 0 = c2 (e1 e2 ) + c3 (e1 e3 ) + . . . cn (e1 en ) = (c2 + c3 + + cn ) e1 + c2 e2 + c3 e3 + + cn en determines a linear relation among the standard basis vectors, forcing 0 = c2 = c3 = = cn . The subspace is actually the set of vectors x for which x1 + x2 + + xn = 0.

Hints

355

9.22. You could try 0 A = 1 1 1 1 . 2

Depending on whether you swap row 1 with row 2 or with row 3, forward elimination yields 1 1 1 2 0 1 or 0 1 . 0 0 0 0

10 Kernel and Image


10.3. The reduced echelon form 1 0 0 The basis for the kernel is is 0 1 0 1 2 . 0

1 2 1

10.6. Remove zero rows. 1 0 0

0 1 0

0 0 1

0 0 2

0 1 2

Change signs of the entries after each pivot. 1 0 0 0 0 1 0 0 0 0 1 2

0 1 2

Pad with rows from the identity matrix, to get 1s down the diagonal. 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 2 1 0 0 1 2 0 1

356 Keep the pivotless columns. 0 0 2 , 1 0 10.7. Remove zero rows. 0 0 0 0 0 1 2 0 1

Hints

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

Change signs of the entries after each pivot. 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

Pad with rows from the identity matrix, to get 1s down the diagonal. 1 0 0 0 0 Keep the pivotless columns. 1 0 0 0 0 10.8. The reduced echelon 1 0 0 0 form is 0 1 0 0 0 0 1 0 0 0 0 1 2 5 2 5 2 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1

Hints

357

Remove zero rows. 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 2 5 2 5 2 1

Change signs of the entries after each pivot. 1 0 0 0 1 0 0 0 1 0 0 0

0 0 0 1

5 2 5 2

Pad with rows from the identity matrix, to get 1s down the diagonal. 1 0 0 0 2 0 5 0 0 1 2 0 0 1 0 5 2 0 0 0 1 1 0 0 0 0 1 Keep the pivotless columns. 2 5 2 5 2 1 1 10.9. 1 1 0 1 1 0

10.10.

10.11. The only vector in the kernel is 0. 10.12. y = Ax =A


j

xj ej xj Aej

=
j

358

Hints

so a linear combination (with coecients x1 , x2 , . . . , xq ) of the columns Aej of A. Therefore the vectors of the form y = Ax are precisely the linear combinations of the columns of A. 10.14. It is the horizontal plane in R3 , the xy -plane in xyz coordinates. 10.16. If A is tall, then At is short, so there are nonzero vectors c so that t A c = 0, i.e. ct A = 0. Since c = 0, there must be some entry ci = 0. If Ax = ei , we nd 0 = ct Ax = ct ei = ci a contradiction. So ei is not in the image. 10.17. 1, 1, 2, 1, 1, 0 10.19. The kernel of B consists in the vectors x v = y z for which Bv = 0. This is just asking for Ax + Ay + Az = 0, i.e. for A(x + y + z ) = 0. We can pick x and y arbitrarily, and pick an arbitrary vector w in the kernel of A, and set z = w (x + y ). In particular, B has kernel of dimension 2q + k where k is the dimension of the kernel of A. 10.20. Ax = 0 implies that Bx = CAx = 0. Conversely, Bx = 0 implies that Ax = C 1 Bx = 0. 10.21. Linear relations among vectors pass through C and through C 1 . 10.22. Compute echelon form: 0 1 0 0 2 2 1 2 2 0 2 2 2 0 2 2

Swap rows 1 and 2 1 0 0 0 2 2 1 2 2 2 1 2 0 2 2 2 0 2 2 2 0 2 2 2 0 2 2 2

1 0 0 0 Add 1/2(row 2) to row 3.

Hints

359

Add 1(row 2) to row 4. 1 0 0 0 2 2 0 0 0 2 1 0 0 2 1 0

So the rank is 3, the number of pivots. There is 1 pivotless column. The image is 3-dimensional, while the kernel is 1-dimensional. 10.23. You could try A= 1 0 0 , B= 0 0 1 0 , C= 0 0 1 1 . 0

The image of A is the span of the columns, so the span of 1 0 while the image of B is the span of its columns, so the span of 0 , 1 which are clearly not the same subspaces, but both one dimensional. 10.24. The rank of At is the number of linearly independent columns of t A (rows of A). Let U be the forward elimination of A. When we compute forward elimination, we add rows to other rows, and swap rows, so the rows of U are linear combinations of the rows of A, and vice versa. Thus the rank of At is the number of linearly independent rows of U . None of the zero rows of U count toward this number, while pivot rows are clearly linearly independent. Therefore the rank of At is the number of pivots, so equal to the rank of A. 10.31. If they both have solutions, then At y = 0 so y t A = 0, so y t Ax = 0. But y t Ax = y t b, so bt y = 0, a contradiction. On the other hand, if neither has a solution, then b is not in the image of A. So b is not a linear combination of the columns of A, and the matrix M = (A b) has rank 1 higher than the matrix A. Therefore the matrix Mt = At bt

also has rank one higher than At . So the dimension of the kernel of M t is one lower than the dimension of the kernel of At , and therefore there is a vector y in the kernel of At but not in the kernel of M t . 10.34. A A+B = 1 1 B

360

Hints

so you can use the previous exercise. 10.35. (a) only 10.36. The rank of AB is the dimension of the image. But (AB )x = A(Bx), so every vector in the image of AB lies in the image of A. The clever bit: on the other hand, the rank of AB is the same as the rank of AB t , as shown in problem 10.24 on page 97. But AB t = B t At , so the rank is at most the rank of B t , which is the rank of B .

11 Eigenvalues and Eigenvectors


11.2. The determinant det (A I ) is the determinant of an upper (lower) triangular matrix if A is upper (lower) triangular, with diagonal entries. 11.3. You could try 0 1 A= 1 0 which has eigenvalues = 1 and = 1. 11.4. You could try A= 1 0 1 , B= 1 1 1 0 ,A + B = 1 2 1 1 . 2

Their eigenvalues are: for A, = 1; for B , = 1, for A + B , = 1 or = 3. 11.5. Suppose that A is an n n matrix. The determinant of a matrix A is a sum of terms, each one linear in any of the entries of A which appear in it. The characteristic polynomial det (A I ) therefore involves at most n terms with a in them, coming from the n diagonal entries of A , so a polynomial in of degree at most n. 11.6. (1)n n 11.7. A : 0, 5, B : 1, 3, C : 0, 1, D : 0, 0, 2 11.8. det A = det At for any square matrix A, so I t = I and det (A I ) = det (At I ) . 11.9. det F 1 AF I = det F 1 (A I ) F = det F 1 det (A I ) det (F ) = det (A I ) . 11.11. If A is invertible, then det (AB I ) = det A1 (AB I ) A = det (BA I ) . If B is invertible, the same trick works. But if neither A nor B is invertible, we have to work harder. Pick any number which is not an eigenvalue of A. Then

Hints

361

A I is invertible, so (A I ) B and B (A I ) have the same eigenvalues. Therefore det ((A I ) B I ) = det (B (A I ) I ) as polynomials in . Now we can plug in = 0. 11.13. 0,3 11.16. a. sn (A) = 1 b. s0 (A) = det (A 0) = det A c. The expression det A is a sum ofterms, each a product of precisely n entries of A, as we have seen. So det (A I ) is a sum of terms, each involving some A entries and some s, with a total of n factors in each term. d. If A is upper triangular, or lower triangular, then the result is obvious. But each term of sn1 (A) involves precisely one entry of A, hence linear in those entries. So sn1 (A) is a linear function of A, i.e. sn1 (A + B ) = sn1 (A) + sn1 (B ). We can write any matrix as a sum of lower triangular and an upper triangular. e. Follows immediately from problem 11.9. f. Clearly F 1 AF ej = F 1 Auj = 0 for j > r. So we get zeros just where we need them, to be able to write F 1 AF = P Q 0 . 0

P must have rank r, since there is nowhere else to put the r pivots, so P is invertible. Since A has rank r, so must F 1 AF . g. det (A I ) = det P I Q 0 I
nr

= det (P I ) () has no terms of degree less than n r in . h. You could try 0 1 0 A= , B= 0 0 0 11.17. The = 3-eigenvectors are multiples of x= 11.20. =1 0 1 0 . 1

0 . 0

362

Hints

11.21. = 2 =3 11.22. =1
3 2 1

1 1 0 1 0 1 2 2 1 0 3 2 1

=2

=3

12 Bases of Eigenvectors
12.1. The kernel of any matrix is a subspace. 12.2. Ax = x so ABx = BAx = Bx = Bx. 12.3. Depending on the order in which you write down your eigenvalues, you could get: 1 0 0 F = 0 1 0 1 0 1 1 0 0 F 1 = 0 1 0 1 0 1 1 0 0 F 1 AF = 0 2 0 0 0 3

Hints

363

12.4. Again, it depends on the order you choose to write the eigenvalues, and the order in which you write down basis vectors for each eigenspace. You could get:

= 3

= 2

=0

1 0 1 2 1 0 0 1 1

1 F = 0 1 1 F 1 = 1 1 3 F 1 AF = 0 0

2 1 0 2 1 2 0 2 0

0 1 1 2 1 1 0 0 0

12.5. You could get

=2

1 1 0 , 1 0 1
1

= 4

2 1 2 1

364

Hints

1 F = 0 1 1 1 F 1 = 2 1 2 1 F AF = 0 0

1 1 0 1
3 2

1 2 1 2

1 0 1 2 1

1 0 2 0

0 0 4

12.6. Suppose we have a matrix F for which 1 F 1 AF = 2 .. . n Call this diagonal matrix . Therefore AF = F . Lets check that the columns of F are eigenvectors. We need only see that F is just F with columns scaled by the diagonal entries of . 12.7. You could try A= because it has only one eigenvector, 1 0 0 0 1 0 .

x=

(up to rescaling), with eigenvalue = 0. Therefore there is no basis of eigenvectors. 12.9. You could get 1 1 1 2 1

= 1 =1

Hints

365

F = F 1 = F 1 AF =

1 1 2 2 1 0

1 2 1

1 2 0 1

Let = F 1 AF . So A = F F 1 . Clearly 100000 = 1, so A100000 = F 100000 F 1 = 1. 12.11. 1 13 4 A= 5 4 7 12.12. A=


1 2 1 4 1 4

3 4

=1

1 4

0 0 1 1 2 v1 = 1 2 1 0 v2 = 0 1 1 v3 = 1 0
1 4 1 2 1 4

1 2 1 F = 2 1 1 F 1 = 1 1 2
3

0 0 1 1 1
1 2

4 AF = 0 0

0 1 0

0 0
1 4

1 1 0 0 1 0

366 Consider the coordinates of the vector h0 s0 . d0

Hints

Numbers of people cant be negative. Clearly the vector must be a linear combination of eigenvectors, with a positive coecient for the = 1 eigenvector: h0 s0 = a1 v1 + a2 v2 + a3 v3 , d0 with a2 > 0. Over time, the numbers develop according to hn hn1 sn = A sn1 dn dn1 h0 = An s0 d0 = 3 4
n

a1 v1 + a2 v2 +

1 4

a 3 v3 .

Since the other eigenvalues are smaller than 1, their powers become very small, and their components in the resulting vector gradually decay away. Thereforethe result becomes every closer to 0 a 2 v2 = 0 , a2 everybody dead. So everyone dies, in an exponential decay of population. (It should be obvious, because we didnt allow any births in our model.) 12.15. See table A.1. Table A.1: Invertibility criteria. A is n n of rank r. U is any matrix obtained from A by forward elimination. 5.1 5.2 Invertible GaussJordan on A yields 1. U is invertible. Not invertible GaussJordan on A yields a matrix with n r zero rows. U has n r zero rows.

Hints

367

5.2 5.2 5.2 5.2

Invertible Pivots lie on the diagonal. U has no zero rows U has n pivots. Ax = b has a solution x for each b. Ax = b has exactly one solution x for each b. Ax = b has exactly one solution x for some b. Ax = 0 only for x = 0. A has rank n. At is invertible. det A = 0. The columns are linearly independent. The columns form a basis.

Not invertible Some pivot lies above the diagonal, and all pivots after it. U has n r zero rows. U has r < n pivots. Ax = b has no solution for some b, nr dimensions worth for other b. Ax = b has no solution for some b, many for other b. Ax = b has no solution for some b, many for other b. Ax = 0 for many x. A has rank r < n. At is not invertible. Every square block larger than r r has det = 0. The n r pivotless columns are linear combinations of the r pivot columns. Each of the n r pivotless columns is a linear combination of earlier pivot columns. One row is a linear combination of earlier rows. The kernel has positive dimension n r. The image has positive dimension r. The = 0 eigenspace has positive dimension n r.

5.2 5.2 5.2 5.2 7.4 7.4 9.2

9.2

9.2 10.2 10.3 11.1

The rows form a basis. The kernel of A is just the 0 vector. The image of A is all of Rn . 0 is not an eigenvalue of A.

13 Inner Product
13.1. Aej , ei = Akj ek , ei = Aij .

368 13.2. xt y =
k

Hints

xt k yk xk yk .
k

= 13.3. Pij = P ei , ej

= ep(i) , ej 1, if p(i) = j, = 0, otherwise. 1, if i = p1 (j ), = 0, otherwise. = ei , ep1 (j ) = e i , P 1 e j


1 = Pji .

13.4. v v, u u
2

u, u

= v, u

v, u u
2

u, u

= v, u v, u = 0. 13.5. (a) One: x = 0. (b) 2n: one xi is 1, all other xj are 0. (c) 2n + 16 n 4 : one xi is 2, all other xj are 0, or 4 xi s are 1 and all other are 0. n 6 9 n (d) 2n + 8(n 2) n 2 + 2 (n 5) 5 + 2 9 : a. one 3 or b. two 2s and one 1 or c. one 2 and ve 1s or d. nine 1s. 13.6. The hour hand starts at angle /2, and completes a revolution every 12 hours. So the hour hand is at an angle of 2 t, 2 12 after t hours. The minute hand, if we measure time in hours, revolves every hour, so has angle = 2t. 2 =

Hints
3 11 (2k 6k 11 ,

369 + 1), any integer k

a. t = b. t =

any integer k .

13.11. You could try:

A=

0 1

1 , B= 1

0 1

1 , AB = 2

1 1

2 . 3

13.12. For A symmetric. 13.19. Those which have 1 in each diagonal entry. 13.20. 1 0 0 1

P =

is not of that form. 13.21. You could try:

A=

0 1

1 , B = A. 0

13.22. x, y = 1 2 x+y
2

Preserve the right hand side, and you must preserve the left hand side. 13.25. The rows of A are orthonormal just when At is orthogonal, which 1 occurs just when A = (At ) , which occurs just when At = A1 , which occurs just when A is orthogonal. 13.26. See table A.2 on the following page and table A.3 on the next page. 13.27. u1 =
2 2 22 ,

u2 =

6 6 6 , 6 6 3

33 3 u3 = 3 ,
3 3

13.28. The pictures should look like

370

Hints

w 1 = v1

1 = 1 0

v2 , w1 w 2 = v2 w1 w1 , w1 2 1 (2)(1) + (0)(1) + (2)(0) = 0 1 (1)(1) + (1)(1) + (0)(0) 2 0 1 = 1 2

Table A.2: Orthogonalizing vectors: the projections

u1 =

1 w1 , w1

w1

1 1 = 1 (1)(1) + (1)(1) + (0)(0) 0 1 2 21 = 2 2 0 1 u2 = w2 w2 , w2 1 1 = 1 (1)(1) + (1)(1) + (2)(2) 2 1 6 6 1 = 6 6 1 6 3 Table A.3: Orthogonalizing vectors: rescaling

Hints

371

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

13.29. u1 =
1 16 , 6 1 2 6

u2 =

1 13 . 3 1 3

Notice that v1 , v2 , v3 did not give a basis, so when we try to compute u3 , we run into trouble. 13.30. The only problem that can come up is division by zero. But that happens only when we divide by a length wj . If this length is 0, then wj is zero, so vj =
i

vj , ui ui .

But each ui is a linear combination of vectors v1 , v2 , . . . , vi , so this is a linear dependence. 13.32. If v is perpendicular to u, then set w = v , see that v is perpendicular to v , so v = 0. Otherwise, if v is not perpendicular to u, then w = v v,u u is u 2 perpendicular to u and v , and therefore perpendicular to any linear combination of u and v . In particular, since w is a linear combination of u and v , w must be perpendicular to w, so w = 0. So v= v, u u
2

u.

372 13.33.

Hints

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (5)(1) + (3)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 5 3 4 4

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1 (4)(4) + (4)(4) 1 2 2 1 2 2 4 4

See gure A.2 on the facing page.

Hints

373

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.2: Orthogonalizing vectors in the plane

13.34.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (0)(1) + (2)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 0 2 1 1

374

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.3: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1

w1 1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1 (1)(1) + (1)(1) 1 2 2 1 2 2 1 1

See gure A.3.

Hints

375

13.35.

w 1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (1)(1) + (2)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 1 2
1 2 1 2

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2 1 2 1 2

See gure A.4 on the following page.

376

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.4: Orthogonalizing vectors in the plane

13.36.

w 1 = v1 = 2 1 v2 , w1 w1 w1 , w1 (1)(2) + (1)(1) (2)(2) + (1)(1) 2 1

w 2 = v2 = 1 1
1 5 2 5

Hints

377

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.5: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 2 1

= u2 = =

(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1
1 2 2 (1 5 )( 5 ) + ( 5 )( 5 ) 1 5 5 2 5 5 1 5 2 5

See gure A.5.

378 13.37.

Hints

w1 = v1 = 1 2 v2 , w1 w1 w1 , w1 (0)(1) + (2)(2) (1)(1) + (2)(2) 1 2

w2 = v2 = 0 2
4 5 2 5

u1 = =

1 w1 , w1 1

w1 1 2

= u2 = =

(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1

4 5 2 5

4 2 2 ( 4 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5

See gure A.6 on the facing page.

Hints

379

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.6: Orthogonalizing vectors in the plane

13.38.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (2)(1) + (0)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 2 0 1 1

380

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.7: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1

(1)(1) + (1)(1) 1 2 2 1 2 2

1 1

See gure A.7.

Hints

381

13.39.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (0)(1) + (2)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 0 2 1 1

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1

(1)(1) + (1)(1) 1 2 2 1 2 2

1 1

See gure A.8 on the following page.

382

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.8: Orthogonalizing vectors in the plane

13.40.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (1)(1) + (2)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 1 2 3 2
3 2

Hints

383

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.9: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1

3 2 3 2

3 3 3 ( 3 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2

See gure A.9.

384 13.41.

Hints

w 1 = v1 = 0 1 v2 , w1 w1 w1 , w1 (1)(0) + (1)(1) (0)(0) + (1)(1) 0 1

w2 = v2 = 1 1 1 0

u1 = =

1 w1 , w1 1

w1 0 1

(0)(0) + (1)(1) 0 1 1 w2 , w2 1 w2

= u2 = =

(1)(1) + (0)(0) 1 0

1 0

See gure A.10 on the facing page.

Hints

385

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.10: Orthogonalizing vectors in the plane

13.42.

w1 = v1 = 2 1 v2 , w1 w1 w1 , w1 (1)(2) + (2)(1) (2)(2) + (1)(1) 2 1

w2 = v2 = 1 2 1 2

386

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.11: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 2 1

= u2 = =

(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1

(1)(1) + (2)(2) 1 5 5 2 5 5

1 2

See gure A.11.

Hints

387

13.43.

w 1 = v1 = 1 0 v2 , w1 w1 w1 , w1 (1)(1) + (1)(0) (1)(1) + (0)(0) 1 0

w2 = v2 = 1 1 0 1

u1 = =

1 w1 , w1 1

w1 1 0

(1)(1) + (0)(0) 1 0 1 w2 , w2 1 (0)(0) + (1)(1) 0 1 w2

= u2 = =

0 1

See gure A.12 on the following page.

388

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.12: Orthogonalizing vectors in the plane

13.44.

w 1 = v1 = 2 1 v2 , w1 w1 w1 , w1 (0)(2) + (1)(1) (2)(2) + (1)(1) 2 1

w 2 = v2 = 0 1 2 5 4 5

Hints

389

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.13: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 2 1

= u2 = =

(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1

2 4 4 ( 2 5 )( 5 ) + ( 5 )( 5 ) 1 5 5 2 5 5

2 5 4 5

See gure A.13.

390 13.45.

Hints

w 1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (1)(1) + (0)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 1 0 1 2
1 2

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1

1 2 1 2

1 1 1 ( 1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2

See gure A.14 on the facing page.

Hints

391

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.14: Orthogonalizing vectors in the plane

13.46.

w1 = v1 = 1 2 v2 , w1 w1 w1 , w1 (1)(1) + (0)(2) (1)(1) + (2)(2) 1 2

w2 = v2 = 1 0 4 5
2 5

392

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.15: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 1 2

= u2 = =

(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1

4 5 2 5

4 2 2 ( 4 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5

See gure A.15.

Hints

393

13.47.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (0)(1) + (1)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 0 1
1 2 1 2

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1

1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2

1 2 1 2

See gure A.16 on the following page.

394

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.16: Orthogonalizing vectors in the plane

13.48.

w 1 = v1 = 1 2 v2 , w1 w1 w1 , w1 (1)(1) + (1)(2) (1)(1) + (2)(2) 1 2

w 2 = v2 = 1 1
2 5 1 5

Hints

395

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.17: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 1 2

= u2 = =

(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1
2 1 1 (2 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5 2 5 1 5

See gure A.17.

396 13.49.

Hints

w 1 = v1 = 0 1 v2 , w1 w1 w1 , w1 (1)(0) + (2)(1) (0)(0) + (1)(1) 0 1

w2 = v2 = 1 2 1 0

u1 = =

1 w1 , w1 1

w1 0 1

(0)(0) + (1)(1) 0 1 1 w2 , w2 1 w2

= u2 = =

(1)(1) + (0)(0) 1 0

1 0

See gure A.18 on the facing page.

Hints

397

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.18: Orthogonalizing vectors in the plane

13.50.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (0)(1) + (2)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 0 2 1 1

398

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.19: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1

(1)(1) + (1)(1) 1 2 2 1 2 2

1 1

See gure A.19.

Hints

399

13.51.

w 1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (2)(1) + (1)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 2 1
1 2 1 2

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2 1 2 1 2

See gure A.20 on the following page.

400

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.20: Orthogonalizing vectors in the plane

13.52.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (1)(1) + (0)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 1 0 1 2
1 2

Hints

401

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.21: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1

1 2 1 2

1 1 1 ( 1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2

See gure A.21.

402 13.53.

Hints

w1 = v1 = 1 2 v2 , w1 w1 w1 , w1 (2)(1) + (2)(2) (1)(1) + (2)(2) 1 2

w2 = v2 = 2 2
4 5 2 5

u1 = =

1 w1 , w1 1

w1 1 2

= u2 = =

(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1

4 2 2 (4 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5

4 5 2 5

See gure A.22 on the facing page.

Hints

403

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.22: Orthogonalizing vectors in the plane

13.54.

w1 = v1 = 2 2 v2 , w1 w1 w1 , w1 (1)(2) + (2)(2) (2)(2) + (2)(2) 2 2

w2 = v2 = 1 2 1 2
1 2

404

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.23: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 2 2

= u2 = =

(2)(2) + (2)(2) 1 2 2 1 2 2 1 w2 w2 , w2 1

1 2 1 2

1 1 1 ( 1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2

See gure A.23.

Hints

405

13.55.

w1 = v1 = 1 0 v2 , w1 w1 w1 , w1 (0)(1) + (1)(0) (1)(1) + (0)(0) 1 0

w2 = v2 = 0 1 0 1

u1 = =

1 w1 , w1 1

w1 1 0

(1)(1) + (0)(0) 1 0 1 w2 , w2 1 w2

= u2 = =

(0)(0) + (1)(1) 0 1

0 1

See gure A.24 on the following page.

406

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.24: Orthogonalizing vectors in the plane

13.56.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (1)(1) + (0)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 1 0
1 2 1 2

Hints

407

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.25: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2 1 2 1 2

See gure A.25.

408 13.57.

Hints

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (2)(1) + (0)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 2 0 1 1

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1 (1)(1) + (1)(1) 1 2 2 1 2 2 1 1

See gure A.26 on the facing page.

Hints

409

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.26: Orthogonalizing vectors in the plane

13.58.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (2)(1) + (1)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 2 1
1 2 1 2

410

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.27: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1

1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2

1 2 1 2

See gure A.27.

Hints

411

13.59.

w1 = v1 = 0 2 v2 , w1 w1 w1 , w1 (1)(0) + (1)(2) (0)(0) + (2)(2) 0 2

w2 = v2 = 1 1 1 0

u1 = =

1 w1 , w1 1

w1 0 2

(0)(0) + (2)(2) 0 1 1 w2 , w2 1 (1)(1) + (0)(0) 1 0 w2

= u2 = =

1 0

See gure A.28 on the following page.

412

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.28: Orthogonalizing vectors in the plane

13.60.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (2)(1) + (1)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 2 1
1 2 1 2

Hints

413

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.29: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1

1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2

1 2 1 2

See gure A.29.

414 13.61.

Hints

w1 = v1 = 2 1 v2 , w1 w1 w1 , w1 (1)(2) + (2)(1) (2)(2) + (1)(1) 2 1

w2 = v2 = 1 2
3 5 6 5

u1 = =

1 w1 , w1 1

w1 2 1

= u2 = =

(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1

3 5 6 5

3 6 6 ( 3 5 )( 5 ) + ( 5 )( 5 ) 1 5 5 2 5 5

See gure A.30 on the facing page.

Hints

415

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.30: Orthogonalizing vectors in the plane

13.62.

w1 = v1 = 1 2 v2 , w1 w1 w1 , w1 (1)(1) + (1)(2) (1)(1) + (2)(2) 1 2

w2 = v2 = 1 1 6 5 3 5

416

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.31: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 1 2

= u2 = =

(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1

6 3 3 ( 6 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5

6 5 3 5

See gure A.31.

Hints

417

13.63.

w1 = v1 = 2 1 v2 , w1 w1 w1 , w1 (2)(2) + (0)(1) (2)(2) + (1)(1) 2 1

w2 = v2 = 2 0
2 5 4 5

u1 = =

1 w1 , w1 1

w1 2 1

= u2 = =

(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1

2 4 4 (2 5 )( 5 ) + ( 5 )( 5 ) 1 5 5 2 5 5

2 5 4 5

See gure A.32 on the following page.

418

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.32: Orthogonalizing vectors in the plane

13.64.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (2)(1) + (1)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 2 1
3 2 3 2

Hints

419

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.33: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1

w1 1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1
3 3 3 (3 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2

3 2 3 2

See gure A.33.

420 13.65.

Hints

w 1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (0)(1) + (1)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 0 1
1 2 1 2

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1

1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2

1 2 1 2

See gure A.34 on the facing page.

Hints

421

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.34: Orthogonalizing vectors in the plane

13.66.

w1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (2)(1) + (0)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 2 0 1 1

422

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.35: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1

w1 1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 2 1 2 1 w2 w2 , w2 1 (1)(1) + (1)(1) 1 2 2 1 2 2 1 1

See gure A.35.

Hints

423

13.67.

w1 = v1 = 2 0 v2 , w1 w1 w1 , w1 (0)(2) + (2)(0) (2)(2) + (0)(0) 2 0

w2 = v2 = 0 2 0 2

u1 = =

1 w1 , w1 1

w1 2 0

(2)(2) + (0)(0) 1 0 1 w2 , w2 1 (0)(0) + (2)(2) 0 1 w2

= u2 = =

0 2

See gure A.36 on the following page.

424

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.36: Orthogonalizing vectors in the plane

13.68.

w1 = v1 = 0 1 v2 , w1 w1 w1 , w1 (1)(0) + (1)(1) (0)(0) + (1)(1) 0 1

w2 = v2 = 1 1 1 0

Hints

425

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.37: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 0 1

(0)(0) + (1)(1) 0 1 1 w2 , w2 1 (1)(1) + (0)(0) 1 0 w2 1 0

= u2 = =

See gure A.37.

426 13.69.

Hints

w1 = v1 = 1 0 v2 , w1 w1 w1 , w1 (2)(1) + (2)(0) (1)(1) + (0)(0) 1 0

w2 = v2 = 2 2 0 2

u1 = =

1 w1 , w1 1

w1 1 0

(1)(1) + (0)(0) 1 0 1 w2 , w2 1 (0)(0) + (2)(2) 0 1 w2

= u2 = =

0 2

See gure A.38 on the facing page.

Hints

427

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.38: Orthogonalizing vectors in the plane

13.70.

w1 = v1 = 0 2 v2 , w1 w1 w1 , w1 (2)(0) + (1)(2) (0)(0) + (2)(2) 0 2

w2 = v2 = 2 1 2 0

428

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.39: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 0 2

(0)(0) + (2)(2) 0 1 1 w2 , w2 1 (2)(2) + (0)(0) 1 0 w2

= u2 = =

2 0

See gure A.39.

Hints

429

13.71.

w1 = v1 = 0 2 v2 , w1 w1 w1 , w1 (2)(0) + (0)(2) (0)(0) + (2)(2) 0 2

w2 = v2 = 2 0 2 0

u1 = =

1 w1 , w1 1

w1 0 2

(0)(0) + (2)(2) 0 1 1 w2 , w2 1 (2)(2) + (0)(0) 1 0 w2

= u2 = =

2 0

See gure A.40 on the following page.

430

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.40: Orthogonalizing vectors in the plane

13.72.

w1 = v1 = 2 2 v2 , w1 w1 w1 , w1 (1)(2) + (1)(2) (2)(2) + (2)(2) 2 2

w2 = v2 = 1 1 1 1

Hints

431

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.41: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 2 2

= u2 = =

(2)(2) + (2)(2) 1 2 2 1 2 2 1 w2 w2 , w2 1

(1)(1) + (1)(1) 1 2 2 1 2 2

1 1

See gure A.41.

432 13.73.

Hints

w1 = v1 = 0 2 v2 , w1 w1 w1 , w1 (1)(0) + (1)(2) (0)(0) + (2)(2) 0 2

w2 = v2 = 1 1 1 0

u1 = =

1 w1 , w1 1

w1 0 2

(0)(0) + (2)(2) 0 1 1 w2 , w2 1 (1)(1) + (0)(0) 1 0 w2

= u2 = =

1 0

See gure A.42 on the facing page.

Hints

433

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.42: Orthogonalizing vectors in the plane

13.74.

w 1 = v1 = 2 0 v2 , w1 w1 w1 , w1 (1)(2) + (1)(0) (2)(2) + (0)(0) 2 0

w 2 = v2 = 1 1 0 1

434

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.43: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 2 0

(2)(2) + (0)(0) 1 0 1 w2 , w2 1 w2

= u2 = =

(0)(0) + (1)(1) 0 1

0 1

See gure A.43.

Hints

435

13.75.

w 1 = v1 = 1 0 v2 , w1 w1 w1 , w1 (0)(1) + (1)(0) (1)(1) + (0)(0) 1 0

w2 = v2 = 0 1 0 1

u1 = =

1 w1 , w1 1

w1 1 0

(1)(1) + (0)(0) 1 0 1 w2 , w2 1 (0)(0) + (1)(1) 0 1 w2 0 1

= u2 = =

See gure A.44 on the following page.

436

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.44: Orthogonalizing vectors in the plane

13.76.

w 1 = v1 = 2 1 v2 , w1 w1 w1 , w1 (0)(2) + (1)(1) (2)(2) + (1)(1) 2 1

w 2 = v2 = 0 1 2 5 4 5

Hints

437

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.45: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 2 1

= u2 = =

(2)(2) + (1)(1) 2 5 5 1 5 5 1 w2 w2 , w2 1

2 4 4 ( 2 5 )( 5 ) + ( 5 )( 5 ) 1 5 5 2 5 5

2 5 4 5

See gure A.45.

438 13.77.

Hints

w1 = v1 = 2 0 v2 , w1 w1 w1 , w1 (2)(2) + (1)(0) (2)(2) + (0)(0) 2 0

w2 = v2 = 2 1 0 1

u1 = =

1 w1 , w1 1

w1 2 0

(2)(2) + (0)(0) 1 0 1 w2 , w2 1 (0)(0) + (1)(1) 0 1 w2

= u2 = =

0 1

See gure A.46 on the facing page.

Hints

439

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.46: Orthogonalizing vectors in the plane

13.78.

w1 = v1 = 2 2 v2 , w1 w1 w1 , w1 (1)(2) + (1)(2) (2)(2) + (2)(2) 2 2

w2 = v2 = 1 1 1 1

440

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.47: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 2 2

= u2 = =

(2)(2) + (2)(2) 1 2 2 1 2 2 1 w2 w2 , w2 1

(1)(1) + (1)(1) 1 2 2 1 2 2

1 1

See gure A.47.

Hints

441

13.79.

w1 = v1 = 2 0 v2 , w1 w1 w1 , w1 (2)(2) + (1)(0) (2)(2) + (0)(0) 2 0

w2 = v2 = 2 1 0 1

u1 = =

1 w1 , w1 1

w1 2 0

(2)(2) + (0)(0) 1 0 1 w2 , w2 1 (0)(0) + (1)(1) 0 1 w2

= u2 = =

0 1

See gure A.48 on the following page.

442

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.48: Orthogonalizing vectors in the plane

13.80.

w1 = v1 = 1 2 v2 , w1 w1 w1 , w1 (0)(1) + (1)(2) (1)(1) + (2)(2) 1 2

w2 = v2 = 0 1
2 5 1 5

Hints

443

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.49: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 1 2

= u2 = =

(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1

2 1 1 (2 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5

2 5 1 5

See gure A.49.

444 13.81.

Hints

w1 = v1 = 1 2 v2 , w1 w1 w1 , w1 (1)(1) + (0)(2) (1)(1) + (2)(2) 1 2

w2 = v2 = 1 0
4 5 2 5

u1 = =

1 w1 , w1 1

w1 1 2

= u2 = =

(1)(1) + (2)(2) 1 5 5 2 5 5 1 w2 w2 , w2 1

4 2 2 (4 5 )( 5 ) + ( 5 )( 5 ) 2 5 5 1 5 5

4 5 2 5

See gure A.50 on the facing page.

Hints

445

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.50: Orthogonalizing vectors in the plane

13.82.

w1 = v1 = 2 2 v2 , w1 w1 w1 , w1 (0)(2) + (1)(2) (2)(2) + (2)(2) 2 2

w2 = v2 = 0 1
1 2 1 2

446

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.51: Orthogonalizing vectors in the plane

u1 = =

1 w1 , w1 1

w1 2 2

= u2 = =

(2)(2) + (2)(2) 1 2 2 1 2 2 1 w2 w2 , w2 1

1 1 1 (1 2 )( 2 ) + ( 2 )( 2 ) 1 2 2 1 2 2

1 2 1 2

See gure A.51.

Hints

447

13.83. w 1 = v1 = 1 1 v2 , w1 w1 w1 , w1 (1)(1) + (1)(1) (1)(1) + (1)(1) 1 1

w2 = v2 = 1 1 1 1

u1 = =

1 w1 , w1 1

w1 1 1

= u2 = =

(1)(1) + (1)(1) 1 2 2 1 2 2 1 w2 w2 , w2 1 (1)(1) + (1)(1) 1 2 2 1 2 2 1 1

See gure A.52 on the following page. 13.84. Nothing

14 The Spectral Theorem


14.1. If Au = u and Av = v , then Au, v = u, v = u, Av = u, v . So u, v = 0.

448

Hints

The original vectors.

Project the second vector perpendicular to the rst.

Projected. Shrink/stretch all vectors to length 1.

Done: orthonormal.

Figure A.52: Orthogonalizing vectors in the plane

14.2. = 16 =3
2 3 1 3 2

GramSchmidt the eigenvectors, and 2 13


3 13 3 13 2 13 3 13 2 13

F = F 1 = F 1 AF =

2 13
3 13

16 0

0 3

Each eigenvector comes from a dierent eigenvalue, so they are already perpendicular you only have to rescale them to have unit length. 14.4. det (A I ) =2 5 = ( 5)

Hints

449 2 1
1 2

=5 =0 After orthogonalizing the eigenvectors, F = F 1 = F 1 AF = 14.5.

2 5 1 5 2 5 1 5

1 5 2 5 1 5 2 5

5 0

0 0

det (A I ) =3 8 2 + 15 = ( 3) ( 5) 0 =3 0 1
1

=0

=5 After orthogonalizing the eigenvectors, 0 F = 0 1 0

2 1 0 2 1 0

1 5 2 5

2 5 1 5

0 0
2 5 1 5

0 1 0 0

1 F 1 = 5 2 5 3 0 F 1 AF = 0 0 0 0

0 0 5

450 14.6. det (A I ) =3 4 2 + 5 2 = ( 2) ( 1)


2

Hints

=1

=2

2 1 1 , 0 1 0 1 12 2 1

After orthogonalizing the eigenvectors, F =

1 12 2

1 3 1 3 1 3 1 2 1 3 1 6

1 6 1 6 2 6

F 1 =

1 12 3 1 6

1 3 2 6

1 F 1 AF = 0 0

0 1 0

0 0 2

14.7.

= 1

=1

4 3 0 1 3 0 4 1 , 0 0 1

Hints

451

After orthogonalizing the eigenvectors, 4 5 F = 0


3

F 1

5 4 5 = 0
3

0 1 0 0 1 0 0 1 0

3 5 4 5 3 5 4 5

0 0 0 1

5 1 F 1 AF = 0 0

14.8. Such an F must preserve the eigenspaces, so preserve the span of e1 , the span of e2 , and the span of e3 . Therefore 1 F = 1 . 1 14.11. You only change x1 x2 terms: a. x2 2 2 b. x2 1 + x2 3 2 c. x1 + 2 x1 x2 + 3 2 x2 x1 1 1 d. x2 + x x + 1 2 1 2 2 x2 x1 14.12. a. A= b. A= c. A= d. A= 1
1 2 1 2

0 0 0 0 1
3 2

0 1 0 1
3 2

14.31. Look at the eigenvalues of the symmetric matrix, to get started. a. ellipse b. hyperbola c. pair of lines d. hyperbola

452

Hints

e. line f. empty set 14.32. First, by orthogonal transformations, you can get you quadratic form to look like 2 2 1 x2 1 + 2 x2 + + n xn . Then you can get every eigenvalue j to be 0, 1 or 1, by rescaling the associated variable xj . Then you can permute the order of the variables. So you can get
2 2 x 2 1 + x2 + + xs ,

for some s between 1 and n.

15 Complex Vectors
15.1. Take the complex number with half as much argument, and square root as much modulus. 15.10. z, w = 1(i) + i(2 2i) = 2 + i 15.16. Since A is self-adjoint, Az, z = z, Az . If we pick z an eigenvector, with eigenvalue , then the left side becomes Az, z = z, z , and the right side becomes z, z . z, Az =
2 . Since z, z = z = 0, we nd = 15.21. For an eigenvector z , with eigenvalue ,

z, z = Az, Az (because A is unitary) = z, z z, z = = ||


2

z, z .

Since z = 0, we can divide z, z o of both sides. 15.23. u1 = 15.27.


1 2 i 2

, u2 =

1+2 i 10 2i 10

Hints

453

a. A is self-adjoint. b. =3 =4 c. F = F AF =
1 2 i 2 1 2 (1 1 2 (1 1 2 i 2 1 2 (1 1 2 (1

+ i) i)

+ i) i)

3 4

16 Vector Spaces
16.1. a. 0 v = (0 + 0) v = 0 v + 0 v . Add the vector w for which 0 v + w = 0 to both sides. b. a 0 = a (0 + 0) = a 0 + a 0. Similar. 16.3. If there are two such vectors, w1 and w2 , then v + w1 = v + w2 = 0. Therefore w1 = 0 + w1 = (v + w2 ) + w1 = v + (w2 + w1 ) = v + (w1 + w2 ) = (v + w1 ) + w2 = 0 + w2 = w2 . 0 = (1 + (1)) v = 1 v + (1) v = v + (1) v so (1) v = v . 16.8. Ax = Sx and By = T y for any x in Rp and y in Rq . Therefore T Sx = BSx = BAx for any x in Rp . So (BA)ej = T Sej , and therefore BA has the required columns to be the associated matrix. 16.10.

454 a. 1x = 1y just when x = y , since 1x = x.

Hints

b. Given any z in V , we nd z = 1x just for x = z . 16.11. Given z in V , we can nd some x in U so that T x = z . If there are two choices for this x, say z = T x = T y , then we know that x = y . Therefore x is uniquely determined. So let T 1 z = x. Clearly T 1 is uniquely determined, by the equation T 1 T = 1, and satises T T 1 = 1 too. Lets prove that T 1 is linear. Pick z and w in V . Then let T x = z and T y = w. We know that x and y are uniquely determined by these equations. Since T is linear, T x + T y = T (x + y ). This gives z + w = T (x + y ). So T 1 (z + w) = T 1 T (x + y ) =x+y = T 1 z + T 1 w. Similarly, T (ax) = aT x, and taking T 1 of both sides gives aT 1 z = T 1 az . So T 1 is linear. 16.12. Write p(x) = a + bx + cx2 . Then a Tp = a + b + c . a + 2 b + 4c To solve for a, b, c in terms of p(0), p(1), p(2), we solve 1 1 1 Apply forward elimination: 1 1 1 1 0 0 1 0 0 0 1 2 0 1 2 0 1 0 0 1 4 0 1 4 0 1 2 p(0) p(1) p(2) p(0) p(1) p(0) p(2) p(0) p(0) p(1) p(0) p(2) 2p(1) + p(0) a p(0) 1 b = p(1) . 4 c p(2)

1 2

Hints

455

Back substitute to nd: 1 1 p(0) p(1) + p(2) 2 2 b = p(1) p(0) c 3 1 = p(0) + 2p(1) p(2) 2 2 a = p(0). c= Therefore we can recover p = a+bx+cx2 completely from knowing p(0), p(1), p(2), so T is one-to-one and onto. 16.14. Some examples you might think of: The set of constant functions. The set of linear functions. The set of polynomial functions. The set of polynomial functions of degree at most d (for any xed d). The set of functions f (x) which vanish when x < 0. The set of functions f (x) for which there is some interval outside of which f (x) vanishes. The set of functions f (x) for which f (x)p(x) goes to zero as x gets large, for every polynomial p(x). The set of functions which vanish at the origin. 16.15. a. no b. no c. no d. yes e. no f. yes g. yes 16.16. a. no b. no c. no d. yes e. yes f. no 16.17. a. If AH = HA and BH = HB , then (A + B )H = H (A + B ), clearly. Similarly 0H = H 0 = 0, and (cA)H = H (cA). b. P is the set of diagonal 2 2 matrices. 16.18. You might take: a. 1, x, x2 , x3 b. 1 0 1 0 0 0 0 0 0 , 0 0 , 0 1 0 1 0 0 0 0 0 0 , 0 0 , 0 1 0 1 0 0 0 0 0 0 , 0 0 0

456

Hints

c. The set of matrices e(i, j ) for i j , where e(i, j ) has zeroes everywhere, except for a 1 at row i and column j . d. A polynomial p(x) = a + bx + cx2 + dx3 vanishes at the origin just when a = 0. A basis: x, x2 , x3 . 16.19. If F : Rp V and G : Rq V are isomorphisms, then A = G1 F : R Rq is an isomorphism. Hence A is a matrix with 0 kernel and image Rq . So A must have rank q . By theorem 10.7 on page 96, the dimension of the kernel plus the dimension of the image must be p. Therefore p = q . 16.21. The kernel of T is the set of x for which T x = 0. But T x = 0 implies Sx = P 1 T x = 0, and Sx = 0 implies T x = P Sx = 0. So the same kernel. The image of S is the set of vectors of the form Sx, and each is carried by P to a vector of the form P Sx = T x. Conversely P 1 carries the image of T to the image of S . Check that this is an isomorphism. 16.23. If T : U V is an isomorphism, and F : Rn U is a isomorphism, prove that T F : Rn V is also an isomorphism. 16.24. The same proof as for Rn ; see proposition 9.12 on page 87. 16.25. Take F and G two isomorphisms. Determinants of matrices multiply. Let A be the matrix associated to F 1 T F : Rn Rn and B the matrix associated to G1 T G : Rn Rn . Let C be the matrix associated to G1 F . Therefore CAC 1 = B .
p

det B = det CAC 1 = det C det A (det C ) = det A.


1

16.26. a. T (p(x) + q (x)) = 2 p(x 1) + 2 q (x 1) = T p(x) + T q (x), and T (ap(x)) = 2a p(x 1) = a T p(x). b. If T p(x) = 0 then 2 p(x 1) = 0, so p(x 1) = 0 for any x, so p(x) = 0 for any x. Therefore T has kernel {0}. As for the image, if q (x) is any polynomial of degree at most 2, then let p(x) = 1 2 q (x + 1). Clearly T p(x) = q (x). So T is onto. c. To nd the determinant, we need an isomorphism. Let F : R3 V , a F b = a + bx + cx2 . c

Hints

457 Calculate the matrix A of F 1 T F by a F 1 T F b = F 1 T (a + bx + cx2 ) c = F 1 2 a + b(x 1) + c(x 1)2 = F 1 2a + 2b(x 1) + 2c x2 2x + 1 = F 1 (2a 2b + 2c) + (2b 4c) x + 2cx2 2a 2b + 2 c = 2b 4c 2c So the associated matrix is 2 A = 0 0 giving det T = det A = 8. 2 2 0 2 4 2

16.27. det T = 2n 16.28. det T = det A2 . The eigenvalues of T are the eigenvalues of A. The eigenvectors with eigenvalue j are spanned by xj 0 , 0 xj .

16.29. Let y1 and y2 be the eigenvectors of At with eigenvalues 1 and 2 respectively. Then the eigenvalues of T are those of A, with multiplicity 2, and i -eigenspace spanned by t yi 0 , . t 0 yi 16.30. The characteristic polynomial is p () = ( + 1) ( 1) . The 1-eigenspace is the space of polynomials q (x) for which q (x) = q (x), so q (x) = a + cx2 . This eigenspace is spanned by 1, x2 . The (1)-eigenspace is the space of polynomials q (x) for which q (x) = q (x), so q (x) = bx. The eigenspace is spanned by x. Indeed T is diagonalizable, and diagonalized by the isomorphism F e1 = 1, F e2 = x, F e3 = x2 , for which 1 F 1 T F = 1 . 1 16.31.
2

458

Hints

(a) A polynomial with average value 0 on some interval must take on the value 0 somewhere on that interval, being either 0 throughout the interval or positive somewhere and negative somewhere else. A polynomial in one variable cant have more zeros than its degree. (b) It is enough to assume that the number of intervals is n, since if it is smaller, we can just add some more intervals and specify some more choices for average values on those intervals. But then T p = 0 only for p = 0, so T is an isomorphism. 16.33. Clearly the expression is linear in p(z ) and conjugate linear in q (z ). Moreover, if p(z ), p(z ) = 0, then p(z ) has roots at z0 , z1 , z2 and z3 . But p(z ) has degree at most 3, so has at most 3 roots or else is everywhere 0.

17 Fields
17.1. If there were two, say z1 and z2 , then z1 + z2 = z1 , but z1 + z2 = z2 + z1 = z2 . 17.2. Same proof, but with instead of +. 17.6. If p = ab, then in Fp arithmetic ab = p (mod p) = 0 (mod p). If a has a reciprocal, say c, then ab = 0 (mod p) implies that b = cab (mod p) = c0 (mod p) = 0 (mod p). So b is a multiple of p, and p is a multiple of b, so p = b and a = 1. 17.7. You nd 21 as answer from the Euclidean algorithm. But you can add 79 any number of times to the answer, to get it to be between 0 and 78, since we are working modulo 79, so the nal answer is 21 + 79 = 58 17.8. x = 3 17.9. It is easy to check that Fp satises addition laws, zero laws, multiplication laws, and the distributive law: each one holds in the integers, and to see that it holds in Fp , we just keep track of multiples of p. For example, x + y in Fp is just addition up to a multiple of p, say x + y + ap, usual integer addition, some integer a. So (x + y ) + z in Fp is (x + y ) + z in the integers, up to a multiple of p, and so equals x + (y + z ) up to a multiple of p, etc. The tricky bit is the reciprocal law. Since p is prime, nothing divides into p except p and 1. Therefore for any integer a between 0 and p 1, the greatest common divisor gcd(a, p) is 1. The Euclidean algorithm computes out integers u and v so that ua + vp = 1, so that ua = 1 (mod p). Adding or subtracting enough multiples of p to u, we nd a reciprocal for a.

Hints

459 1): 0 0 1 0 0 1 0 0 1 0 0 1 1 0 1

17.10. Gauss-Jordan elimination applied to (A 0 1 0 1 0 1 0 1 0 1 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 1 1 1 0 0 1 0 0 1 0 1 0 0 0 1 1 1 So A 1 1 = 1 1 0 0 1 1 0 . 1

17.11. Carry out GaussJordan elimination thinking of the entries as living in the eld of rational functions. The result has rational functions as entries. For any value of t for which none of the denominators of the entries are zero, and none of the pivot entries are zero, the rank is just the number of pivots. There are nitely many entries, and the denominator of each entry is a polynomial, so has nitely many roots.

18 Permutations and Determinants


18.3. (a) Q (x1 , x2 , x3 , x4 ) = (x1 x2 ) (x1 x3 ) (x1 x4 ) (x2 x3 ) (x2 x4 ) (x3 x4 ) . (b) It is enough to show that the sign changes under a transposition swapping successive numbers, like swapping 1 and 2 or 2 and 3, etc. So lets swap j and j + 1. The terms that are aected: xj xj +1 has its sign changed, and xi xj is swapped with xi xj +1 for i < j , and xj xk is swapped with xj +1 xk for j + 1 < k .

460

Hints

18.6. Each permutation chooses somewhere for the number 1 to go to, for which there are n choices, and then there are n 1 choices left for which it can put the number 2, etc. 18.12. A1 is upper triangular, with 1s down the diagonal, and its entries above the diagonal are polynomials with integer coecients, of degree at most 12.

20 Geometry and Orthogonal Matrices


20.2. Set X = x z and Y = z y . 20.4. The distances tell us that z lies on the line segment. But then zy =t xy , so we recover t, and we know which point of the line segment we are at. 20.6. Suppose that A is a 2 2 orthogonal matrix. Then the columns of A are orthonormal. In particular, each column is a unit vector, so the rst column can be written as cos sin for a unique angle (unique up to adding integer multiples of 2 ). The second column must be perpendicular to the rst, and of unit length, so at an angle of /2 from the rst, so sin . cos 20.8. Any vector x can be written as x = x, u u + x, v v + (x x, u u x, v v ) . Apply R to both sides: Rx = x, u (cos u + sin v )+ x, v ( sin u + cos v )+(x x, u u x, v v ) . 20.9. cos = 1, sin = 0

21 Orthogonal Projections
21.1. Each element x of W yields a linear equation x, y = 0. The vectors y that satisfy all of these linear equations form precisely W . Clearly if we add vectors perpendicular to such a vector x, or scale them, they remain perpendicular to x. 21.2. If we have two vectors, say v1 and v2 in W , then we can write each one uniquely in the form v1 = x1 + y1 and v2 = x2 + y2 . But then

Hints

461

v1 + v2 = (x1 + x2 ) + (y1 + y2 ), writes v1 + v2 as a sum of a vector in W and a vector in W . Uniqueness of such a representation tells us that P (v1 + v2 ) = P v1 + P v2 . A similar argument works for rescaling vectors. 21.3. All vectors in W and in W are perpendicular. So W lies in the set of vectors perpendicular to W , i.e. in W . Conversely, take v any vector in W . Write v = x + y with x in W and y in W . 0 = v, y because v lies in W while y lies in W

= x + y, y = y, y . So y = 0. 21.4. To say that x and y are perpendicular is to say that x, y = 0. x+y


2

= x + y, x + y = x, x + x, y + y, x + y, y = x, x + y, y = x
2

+ y

21.7. It is easy to show that the projection P to any subspace satises P = P 2 = P t . On the other hand, any linear map P with P = P 2 = P t has some image, say W , a subspace of V . Take x + y any vector in V , with x in W and y in W . We need to see that P x = x and P y = 0. Since W is the image of P , and x lies in W , we must have x = P z for some vector z in V . Therefore P x = P 2 z = P z = x, so P x = x. Next, take any vector v in V . Then P y, z = y, P t z = y, P z = 0, since y is perpendicular to the image of P , so to P z . Therefore P y = 0.

22 Direct Sums of Subspaces


22.2. You could try letting U be the horizontal plane, consisting in the vectors x1 x = x2 0

462 and V any other plane, for instance the plane consisting in the vectors x1 x = 0 . x3

Hints

22.3. Any vector w in Rn splits uniquely into w = u + v for some u in U and v in V . So we can unambiguously dene a map Q by the equation Qw = Su + T v . It is easy to check that this map Q is linear. Conversely, given any map Q : Rn Rp , dene S = Q|U and T = Q|V . 22.7. The pivot columns of (A B ) form a basis for U + W . All columns are pivot columns just when U + W is a direct sum.

23 Jordan Normal Form


23.1. Take any point of the plane, project it horizontally onto the vertical axis, and then rotate the vertical axis clockwise by a right angle to become the horizontal axis. If you do it twice, clearly you end up at the origin. 23.4. Take a generalized eigenvector x = 0 with two dierent eigenvalues: k (A ) x = 0 and (A ) x = 0. Pick x so that the powers k and are as small as possible. Then let y = (A ) x. Check that y is a generalized eigenvector of A with two dierent eigenvalues, but with smaller powers than x had. This is a contradiction unless y = 0. So (A ) x = 0, giving power k = 1. Using the same reasoning, switching the roles of and , we nd that (A ) x = 0. So Ax = x = x, forcing = . 23.5. Take the shortest possible linear relation between nonzero generalized eigenvectors. Suppose that it involves vectors xi with dierent eigenvalues i : k (A i ) i xi = 0. If you can nd a shorter relation, or one of the same length with lower values for all powers ki , use it instead. Let yi = (A 1 ) xi . These yi are still generalized eigenvectors with the same eigenvalue, and same powers ki but a smaller power for k1 . So these yi must all vanish. But then all of the vectors xi each have two dierent eigenvalues. 23.7. =2 =3 e1 e3 , e2 e6 , e5 , e4

23.8. Clearly x by itself is a string of length 1. Let k be the length of the longest string starting from x. So the string is x, (A ) x, . . . , (A )
k1

x = y.

Hints

463

Lets try to take another step. The next step must be (A ) y. If that vanishes, we cant step. If it doesnt vanish, then we can step, as long as this next step is linearly independent of all of the vectors earlier in the string. If the next step isnt linearly independent, then we must have a relation: c0 x + c1 (A ) x + + ck (A ) x = 0. Multiply both sides with a big enough power of A to kill o all but the rst term. This power exists, because all of the vectors in the string are generalized eigenvectors, so killed by some power of A , and the further down the list, the smaller a power of A you need to do the job, since you already have some A sitting around. So this forces c0 = 0. Hit with the next smallest power of A to force c1 = 0, etc. There is no linear relation, and we can take this next step. The last step y must be an eigenvector, because (A ) y = 0. 23.10. Two ways:(1) There is an eigenvector for each eigenvalue. Pick one for each. Eigenvectors with dierent eigenvalues are linearly independent, so they form a basis. (2) Each Jordan block of size k has an eigenvalue of multiplicity k . So all blocks must be 1 1, and hence A is diagonal in Jordan normal form. 23.11. F = 23.12. 1 F = 0 0 0 1 1 0 0 1 1 , F AF = 0 0 0 1 0 0 0 0 0
1 2 1 2 1 2 1 2 k

, F 1 AF =

0 0

0 2

23.13. Two blocks, each at most n/2 long, and a zero block if needed to pad it out. 23.14. i 4 1 F = 4 i 4 1 4
1 2 i 4 i 4 1 4 i 4 1 4 1 2

0
i 4

i 4 , 0 i 4

i 0 F 1 AF = 0 0

1 i 0 0

0 0 i 0

0 0 1 i

23.15. A is in echelon form, so U = A, and 1 0 U = A = 0 0 0

0 1 0 0 0

0 0 1 . 0 0

464 We can save time writing U = A = 2 , and I 0

Hints

U =A =

(where I here is actually I3 ). Solving U X = U A is easy: I X 2 0 X 0 2 0 I 0 .

U X UA =

(Note how we can allow ourselves to just play with I and as if they were numbers. Dont write I3 or 5 , i.e. forget the subscripts.) So X = 2 , and X is 3 3. We leave the reader to check that X has strings: e2 e3 , e1

=0

giving A the strings: A e2 A e3 , A e1

=0

But A = 1 0

so A e1 = e1 , A e2 = e2 and A e3 = e3 . Therefore the corresponding strings of A are =0 e2 e3 , e1

Take the start z of each 0-string, here z = e2 and z = e3 , and try to solve U x = V z . But U = 2 shifts labels back by 2. Therefore the strings of A are e4 , e2 e5 , e3 , e1

=0

Hints

465

This is a basis, so we have our strings. 0 1 F = 0 0 0 0 0 F 1 AF = 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 , 0 1 0 . 1 0

23.19. Clearly it is enough to prove the result for a single Jordan block. Given an n n Jordan block + , let A be any diagonal matrix, A= a1 a2 .. . an with all of the aj distinct and nonzero. Compute out the matrix B = A( + ) to see that all diagonal entries of B are distinct and that B is upper triangular. Why is B diagonalizable? Then + = A1 B , a product of diagonalizable matrices.

24 Decomposition and Minimal Polynomial


24.1. Using long division of polynomials, x3 x+3 x2 + 1 x5 + 3 x2 + 4 x + 1 5 3 x x x3 + 3 x2 + 4 x x3 +x 3x2 + 5x + 1 3x2 3 5x 2 so the quotient is x3 x + 3 and the remainder is 5x 2. 24.2.

466 x 2 x + 2x + 4x + 4x + 4
4 3 2

Hints

x + 2x + x +2 x5 2x4 4x3 4x2 4x 2x4 2x3 3x2 4x + 2 2 x4 + 4 x3 + 8 x2 + 8 x + 8 2x3 + 5x2 + 4x + 10 1 1 2x 4

2x3 + 5x2 + 4x + 10

x4 + 2 x3 + 4 x2 + 4 x + 4 5 3 x4 2 x 2x2 5x
1 3 2 x + 2 x2 x + 4 5 2 1 3 +x + 5 2x + 4x 2 13 2 4 x

13 2

2x + 5 x +2
2

2x + 5x + 4x + 10 2x3 4x 5x2 5x2 + 10 10

0 Clearly r(x) = x2 + 2 (up to scaling; we will always scale to get the leading term to be 1.) Solving backwards for r(x): r(x) = u(x)a(x) + v (x)b(x) with u(x) = v (x) = 2 13 1 , 2 5 x2 x + 3 . 2 x

2 13

24.3. Euclidean algorithm yields: 2310 2 990 = 330 990 3 330 = 0 and 1386 1 990 = 396 990 2 396 = 198 396 2 198 = 0. Therefore the greatest common divisors are 330 and 198 respectively. Apply Euclidean algorithm to these: 330 1 198 = 132 198 1 132 = 66 132 2 66 = 0.

Hints

467

Therefore the greatest common divisor of 2310, 990 and 1386 is 66. Turning these equations around, we can write 66 = 198 1 132 = 198 1 (330 1 198) = 2 198 1 330 = 2 (990 2 396) 1 (2310 2 990) = 4 990 4 396 1 2310 = 4 990 4 (1386 1 990) 1 2310 = 8 990 4 1386 1 2310. 24.6. Clearly n n = 0, since shifts each vector of the string en , en1 , . . . , e1 one step (and sends e1 to 0). Therefore the minimal polynomial of n must k divide xn . But for k < n, k n en = enk is not 0, so x is not the minimal polynomial. 24.7. If the minimal polynomial of + is m(x), then the minimal polynomial of must be m(x ). 24.8. m () = 2 5 2 24.10. Take A to have Jordan normal form, and you nd characteristic polynomial det (A I ) = (1 )
n1

(2 )

n2

. . . (N )

nN

with n1 the sum of the sizes of all Jordan blocks with eigenvalue 1 , etc. The characteristic polynomial is clearly divisible by the minimal polynomial. 24.11. Split the minimal polynomial m(x) into real and imaginary parts. Check that s(A) = 0 for s(x) either of these parts, a polynomial equation of the same or lower degree. The imaginary part has lower degree, so vanishes. 24.12. Let 2k 2k zk = cos + i sin . n n
n By deMoivres theorem, zk = 1. If we take k = 1, 2, . . . , n, these are the so-called n-th roots of 1, so that

z n 1 = (z z1 ) (z z2 ) . . . (z zn ) . Clearly each root of 1 is at a dierent angle. An = 1 implies that (A z1 ) (A z2 ) . . . (A zn ) = 0, so by corollary 24.14, A is diagonalizable. Over the real numbers, we can take A= 0 1 1 , 0

which satises A4 = 1, but has complex eigenvalues, so is not diagonalizable over the real numbers.

468 24.13. Applying forward elimination to the matrix

Hints

1 0 0 0 1 0 0 0 1

0 2 0 1 1 0 1 2 1

2 2 0 1 3 0 1 2 1

2 6 0 3 5 0 3 6 1

yields

1 0 0 0 1 0 0 0 1

0 2 0 1 1 0 1 2 1

2 2 0 1 3 0 1 2 1

2 6 0 3 5 0 3 6 1

Add (row 1) to row 5, (row 1) to row 9.

1 0 0 0 0 0 0 0 0

0 2 0 1 1 0 1 2 1

2 2 0 1 1 0 1 2 1

2 6 0 3 3 0 3 6 3

Hints

469 .

Move the pivot

1 0 0 0 0 0 0 0 0

0 2 0 1 1 0 1 2 1

2 2 0 1 1 0 1 2 1

2 6 0 3 3 0 3 6 3

1 1 Add 1 2 (row 2) to row 4, 2 (row 2) to row 5, 2 (row 2) to row 7, (row 2) 1 to row 8, 2 (row 2) to row 9.

1 0 0 0 0 0 0 0 0

0 2 0 0 0 0 0 0 0

2 2 0 0 0 0 0 0 0

2 6 0 0 0 0 0 0 0

Move the pivot

1 0 0 0 0 0 0 0 0

0 2 0 0 0 0 0 0 0

2 2 0 0 0 0 0 0 0

2 6 0 0 0 0 0 0 0

470 Move the pivot . Move the pivot . 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 2 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 2 6 0 0 0 0 0 0 0

Hints

Cutting out all of the pivotless columns after the rst one, and all of the zero rows, yields 1 0 2 0 2 2

Scale row 2 by 1 2. 1 0 0 1 2 1

The minimal polynomial is therefore 2 + 2 = ( + 1) ( 2) . The eigenvalues are = 1 and = 2. We cant see which eigenvalue has multiplicity 2 and which has multiplicity 1.

Hints

471

24.17. If 1 = 2 , then D = T and N = 0. If 1 = 2 , then D= 1 0 0 2 , N= 0 0 1 . 0

If we imagine 1 and 2 as variables, then D jumps (its top right corner changes from 0 to 1 suddenly) as 1 and 2 collide. 24.18. We have seen previously that S and T will preserve each others k generalized eigenspaces: if (S )k x = 0, then (S ) T x = 0. Therefore we can restrict to a generalized eigenspace of S , and then further restrict to a generalized eigenspace of T . So we can assume that S = 0 +N0 and T = 1 +N1 , with 0 and 1 complex numbers and N0 and N1 commuting nilpotent linear maps. But then ST = 0 1 + N where N = 0 N1 + 1 N0 + N0 N1 . Clearly large enough powers of N will vanish, because they will be sums of terms like
j+ k k+ j N1 . 0 1 N0

So N is nilpotent.

25 Matrix Functions of a Matrix Variable


25.11. If we have a matrix A with n distinct eigenvalues, then every nearby matrix has nearby eigenvalues, so still distinct.

26 Symmetric Functions of Eigenvalues


26.2. det (A I ) = (z1 ) (z2 ) . . . (zn ) = (1)n Pz () = sn (z ) sn1 (z ) + sn2 (z )2 + + (1)n n .

27 The Pfaan
27.1. There are (2n)! permutations of 1, 2, . . . , 2n, and each partition is associated to n!2n dierent permutations, so there are (2n)!/ (n!2n ) dierent partitions of 1, 2, . . . , 2n. 27.2. In alphabetical order, the rst pair must always start with 1. Then we can choose any number to sit beside the 1, and the smallest number not choosen starts the second pair, etc.

472 (a) {1, 2} (b) {1, 2} , {3, 4} {1, 3} , {2, 4} {1, 4} , {2, 3} (c) {1, 2} , {3, 4} , {5, 6} {1, 2} , {3, 5} , {4, 6} {1, 2} , {3, 6} , {4, 5} {1, 3} , {2, 4} , {5, 6} {1, 3} , {2, 5} , {4, 6} {1, 3} , {2, 6} , {4, 5} {1, 4} , {2, 3} , {5, 6} {1, 4} , {2, 5} , {3, 6} {1, 4} , {2, 6} , {3, 5} {1, 5} , {2, 3} , {4, 6} {1, 5} , {2, 4} , {3, 6} {1, 5} , {2, 6} , {3, 4} {1, 6} , {2, 3} , {4, 5} {1, 6} , {2, 4} , {3, 5} {1, 6} , {2, 5} , {3, 4}

Hints

28 Dual Spaces and Quotient Spaces


28.1. See example 16.14 on page 164. 28.5. If a linear map T : V W is 1-to-1 then T 0 = T v unless v = 0, so there are no vectors v in the kernel of T except v = 0. On the other hand, suppose that T : V W has only the zero vector in its kernel. If v1 = v2 but T v1 = T v2 , then v1 v2 = 0 but T (v1 v2 ) = T v1 T v2 = 0. So the vector v1 v2 is a nonzero vector in the kernel of T . 28.6. Hom (V, W ) is always a vector space, for any vector space W . 28.7. If dim V = n, then V is isomorphic to Rn , so without loss of generality we can just assume V = Rn . But then V is identied with row matrices, so has dimension n as well: dim V = dim V . 28.10. fx ( + ) = ( + ) (x) = (x) + (x) = fx () + fx ( ). Similarly for fx (s ). 28.11. fsx () = (sx) = s (x) = sfx (). Similarly for any two vectors x and y in V , expand out fx+y . 28.12. We need only nd a linear function for which (x) = (y ). Identifying V with Rn using some choice of basis, we can assume that V = Rn ,

Hints

473

and thus x and y are dierent vectors in Rn . So some entry of x is not equal to some entry of y , say xj = yj . Let be the function (z ) = zj , i.e. = ej . 28.16. If x + W is a translate, we might nd that we can write this translate two dierent ways, say as x + W but also as z + W . So x and z are equal up to adding a vector from W , i.e. x z lies in W . Then after scaling, clearly sx sz = s(x z ) also lies in W . So sx + W = sz + W , and therefore scaling is dened independent of any choices. A similar argument works for addition of translates.

29 Singular Value Factorization


29.1. Calculate out Ax, x for x = x1 e1 + x2 e2 + + xn en to see that this gives Q(x). We know that the equation Ax, x = Q(x) determines the symmetric matrix A uniquely.

30 Factorizations
30.1. p 1, 2 2, 1 2, 1 1 0 1 0 1 0 L 0 1 0 1 0 1 0 1 1 0 1 0 U 1 0 0 1 0 1

31 Quadratic Forms
31.13. Try 0 Q(x) = x4 +x4 ++x4 2 n 1 2 2 2 if x = 0, otherwise .

x1 +x2 ++xn

31.15. If b(x, y ) = 0 for all y , then plug in y = x to nd b(x, x) = 0. 31.19. 1 1 1 1 , . 2 1 2 1 31.20.

474

Hints

(a) Suppose that = T x and = T y , i.e (z ) = b(x, z ) and (z ) = b(y, z ) for any vector z . Then (z ) + (z ) = b(x, z ) + b(y, z ) = b(x + y, z ), so T x + T y = + = T (x + y ). Similarly for scaling: aT x = T ax. (b) If T x = 0 then 0 = b(x, y ) for all vectors y from V , so x lies in the kernel of b, i.e. the kernel of T is the kernel of b. Since b is nondegenerate, the kernel of b consists precisely in the 0 vector. Therefore T is 1-to-1. (c) T : V V is a 1-to-1 linear map, and V and V have the same dimensions, say n. dim im T = dim ker T + dim im T = dim V = n, so T is onto, hence T is an isomorphism. (d) You have to take x = T 1 .

32 Tensors and Indices


i 32.1. The tensor product is j k . There are two contractions: i a. i k = n k (in other words, n ), and i b. j i = j (in other words, ). 32.3. ijk xi y j z k = det x y z .

32.5. One contraction is sk = ti ik .


1 (F t)ik = Fli tl pq F i p i 1 p tl F F pq l 1 q tl F lq k q k q F 1 k

F 1

= =

= (F s)k .

33 Tensors
33.3. x y = 4 e1 e1 + 5 e1 e2 + 8 e2 e1 + 10 e2 e2 + 12 e3 e1 + 15 e3 e2 33.4. Picking a basis v1 , v2 , . . . , vp for V and w1 , w2 , . . . , wq for W , we have already seen that every tensor in V W has the form tiJ vi wJ , so is a sum of pure tensors. 33.11. Take any linear map T : V V and dene a tensor t in V V , which is thus a bilinear map t (, v ) for in V and v in V = V , by the rule t (, v ) = (T v ). (A.1)

Hints

475

Clearly if we scale T , then we scale t by the same amount. Similarly, if we add linear maps on the right side of equation A.1, then we add tensors on the left side. Therefore the mapping taking T to t is linear. If t = 0 then (T v ) = 0 for any vector v and covector . Thus T v = 0 for any vector v , and so T = 0. Therefore the map taking T to t is 1-to-1. Finally, we need to see that the map taking T to t is onto, the tricky part. But we can count dimensions for that: if dim V = n then dim Hom (V, V ) = n2 and dim V V = n.

Bibliography
[1] Emil Artin, Galois theory, 2nd ed., Dover Publications Inc., Mineola, NY, 1998. [2] Otto Bretscher, Linear algebra, Prentice Hall Inc., Upper Saddle River, NJ, 1997. [3] Richard P. Feynman, Robert B. Leighton, and Matthew Sands, The Feynman lectures on physics. Vol. 2: Mainly electromagnetism and matter, Addison-Wesley Publishing Co., Inc., Reading, Mass.-London, 1964. [4] Johann Hartl, Zur Jordan-Normalform durch den Faktorraum, Archiv der Mathematik 69 (1997), no. 3, 192195. [5] Tosio Kato, Perturbation theory for linear operators, Springer-Verlag, Berlin, 1995. [6] Peter D. Lax, Linear algebra, John Wiley & Sons Inc., New York, 1997. [7] I. G. Macdonald, Symmetric functions and Hall polynomials, 2nd ed., Oxford University Press, New York, 1995. [8] Peter J. Olver, Classical invariant theory, Cambridge University Press, Cambridge, 1999. [9] G. Polya, How to solve it, Princeton Science Library, Princeton University Press, Princeton, NJ, 2004. [10] Claudio Procesi, Lie groups, Springer, New York, 2007. [11] Igor R. Shafarevich, Basic notions of algebra, Encyclopaedia of Mathematical Sciences, vol. 11, Springer-Verlag, Berlin, 2005. [12] Daniel Solow, How to read and do proofs: An introduction to mathematical thought processes, Wiley, New Jersey, 2004. [13] Michael Spivak, Calculus on manifolds, Addison-Wesley, Reading, Mass., 1965. [14] , Calculus, 3rd ed., Publish or Perish, 1994. [15] Gilbert Strang, Linear algebra and its applications, 2nd ed., Academic Press, New York, 1980. [16] Richard L. Wheeden and Antoni Zygmund, Measure and integral, Marcel Dekker Inc., New York, 1977.

477

List of Notation
||x|| x, y z, w 0 1 1 A1 A[ij ] A Ab,i adj A A(ij ) arg z At C Cn det dim ei ei The length of a vector Inner product of two vectors Hermitian inner product Any matrix whose entries are all zeroes identity map identity permutation The inverse of a square matrix A A with rows i and j and columns i and j removed Adjoint of a complex matrix or complex linear map A The matrix obtained by replacing column i of A by b adjugate The matrix obtained from cutting out row i and column j from A Argument (angle) of a complex number The transpose of a matrix A The set of all complex numbers The set of all complex vectors with n entries The determinant of a square matrix Dimension of a subspace i-th row of the identity matrix The i-th standard basis vector (also the i-th column of the identity matrix) The set of linear maps T : V W 116 115 142 20 153 31 28 248 143 175 175 51 140 61 140 140 51 84 256 25

Hom (V, W )

255

479

480

List of Notation

I im A In ker A pq Pf Rn

The identity matrix The image of a matrix A The n n identity matrix The kernel of a matrix A Matrix with p rows and q columns Pfaan of a skew-symmetric matrix The space of all vectors with n real number entries Sum Transpose of a linear map Restriction of a linear map to a subspace Sum of two subspaces Direct sum of two subspaces The dual space of a vector space V Translate of a subspace Quotient space Covector dual to the vector v . orthogonal complement Modulus (length) of a complex number

25 91 25 87 17 245 17

T T |W U +V U V V v+W V /W v W |z |

22 258 158 197 197 256 258 259 299 191 140

Index
adjoint, 153 adjugate, 185 alphabetical order see partition, alphabetical order, 254 analytic function, 235 antisymmetrize, 298 argument, 150 associated matrix, see matrix, associated back substitution, 5 ball, 146 center, 146 closed, 146 open, 146 radius, 146 basis, 81, 86 dual, 266 orthonormal, 123 standard, 82 unitary, 154 Bessel, see inequality, Bessel bilinear form, 285 degenerate, 286 nondegenerate, 286 positive denite, 287 symmetric, 287 bilinear map, 306 block Jordan, see Jordan block Boolean numbers, 174 bounded, 146 box closed, 146 CayleyHamilton, see theorem, Cayley Hamilton theorem, see theorem, CayleyHamilton center of ball, see ball, center change of variables, 84 change of basis, see matrix, change of basis characteristic polynomial, see polynomial, characteristic circle unit, 151 closed ball, see ball, closed box, see box, closed set, 146 column permutation formula, see determinant, permutation formula, column combination linear, see linear combination commuting, 187, 232 complement, see subspace, complementary complementary subspace, see subspace, complementary complex linear map, see linear map, complex vector space, see vector space, complex complex number, 149 imaginary part, 149 real part, 149 complex plane, 149 component, 295 components see principal components, 273 composition, 163 conjugate, 150

481

482
continuous, 146 convergence, 145 of points, 145 covector, 266, 296 dual, 309 de Moivre, see theorem, de Moivre decoupling theorem, see theorem, decoupling degenerate bilinear form, see bilinear form, degenerate determinant, 51 permutation formula column, 183 row, 182 diagonal of a matrix, 19 diagonalize, 110, 139 orthogonally, 136 dimension eective, 275 of subspace, 88 direct sum, see subspace, direct sum distance, 194 dot product, see inner product echelon form, 18 equation, 9 reduced, 89 eective dimension, see dimension, effective eigenspace, 109 eigenvalue, 101 complex, 151 multiplicity, 105 eigenvector, 101 complex, 151 generalized, 214 elimination, 3 forward, 4 GaussJordan, 7 fast formula for the determinant, 61 Fibonacci, 27 eld, 173 form bilinear, see bilinear form volume, 315

Index

form, exterior, 315 forward, see elimination, forward free variable, 8 fundamental theorem of algebra, 158 GaussJordan, see elimination, Gauss Jordan generalized eigenvector, see eigenvector, generalized Hermitian inner product, see inner product, Hermitian, see inner product, Hermitian homomorphism, 265 identity permutation, see permutation, identity identity map, see linear, map, identity identity matrix, see matrix, identity image, 95 independence linear, see linear independence inequality Bessel, 203 Schwarz, 193 triangle, 193 inertia law of, 288 inner product, 119, 170 Hermitian, 152, 171 space, 171, 201 intersection, 207 invariant subspace, see subspace, invariant inverse, 173 of a matrix, see matrix, inverse of permutation, see permutation, inverse isometry, 195 isomorphism, 164 Jordan block, 214 normal form, 214 kernel, 91 kill, 42, 91

Index

483
nondegenerate bilinear form, see bilinear form, nondegenerate normal form Jordan, see Jordan normal form normal matrix, see matrix, normal open ball, see ball, open orthogonal, 122 orthogonal complement, 201 orthogonally diagonalize, see diagonalize, orthogonally orthonormal basis, see basis, orthonormal partition, 253 alphabetical order, 254 associated to a permutation, 253 permutation, 30 identity, 31, 279 inverse, 31 matrix, see matrix, permutation natural, of partition, 254 product, 31 sign, 182 permutation formula column, see determinant, permutation formula, column row, see determinant, permutation formula, row perpendicular, 120 Pfaan, 255 pivot, 4, 18 column, 74 plane complex, see complex plane polar coordinates, 149 polarization, 310 polynomial, 311 characteristic, 101 homogeneous, 311 minimal, 227 positive denite, see quadratic form, positive denite positive semidenite, see quadratic form, positive semidenite principal component, 275 principal components, 273 product

length, 120 line, 194 segment, 194 linear combination, 72 equation, 3 independence, 81 map, 163 complex, 170 identity, 163 relation, 81 map, 195 linear, see linear map matrix, 17 addition, 20 associated, 163 change of basis, 83 diagonal entries, 19 identity, 25 inverse, 28, 43 multiplication, 22 normal, 155 permutation, 32 self-adjoint, 153 short, 42 skew-adjoint, 153, 238 skew-symmetric, 238 square, 17, 28 strictly lower triangular, 35 strictly upper triangular, 36 subtraction, 20 symmetric, 121 unitary, 154 upper triangular, 53 minimal polynomial, see polynomial, minimal minimum principle, 135 modulus, 150 multilinear, 303 multiplicative function, 224 multiplicity eigenvalue, see eigenvalue, multiplicity negative denite, see quadratic form, negative denite nilpotent, 232

484

Index

dot, see inner product skew-symmetric normal form, 251 of permutations, see prmutation, product31 spectral scalar, see inner product theorem, see theorem, spectral spectrum, 102 tensor, see tensor, product square, see matrix, square product, inner, see inner product matrix, see matrix, square see inner product, 119 standard basis, see basis, standard projection, 201 string, 214 pure subspace, 77 tensor, see tensor, pure complement, see subspace, comPythagorean theorem, see theorem, Pythagorean plementary quadratic form, 138 complementary, 207 kernel, 292 direct sum, 207 negative denite, 144 invariant, 232 positive denite, 144 sum, 207 positive semidenite, 144 substitution quotient space, 269 back, see back substitution sum radius of subspaces, see subspace, sum of ball, see ball, radius sum, direct, see subspace, direct sum rank, 9 Sylvester, 288 tensor, see tensor, rank symmetric real matrix, see matrix, symmetric vector space, see vector space, real symmetrize, 298 reciprocal, 173 tensor, 295, 304 reduced echelon form, see echelon form, antisymmetric, 298 reduced contraction, 297 reection, 198 contravariant, 310 relation covariant, 310 linear, see linear relation product, 297 restriction, 168, 232 pure, 305 reversal, 197 rank, 306 rotation, 196, 198 symmetric, 298 row tensor product permutation formula, see determibasis, 305 nant, permutation formula, of tensors, 304 row of vector spaces, 304 row echelon form, 9 of vectors, 304 scalar product, see inner product theorem Schwarz inequality, see inequality, Schwarz CayleyHamilton, 187, 230 self-adjoint, see matrix, self-adjoint de Moivre, 150, 157 decoupling, 110 shear, 189 Pythagorean, 120, 202 short matrix, see matrix, short spectral, 136 sign trace, 105, 237, 248, 298 of a permutation, see permutatranslate, 268 tion, sign transpose, 64, 268 skew-adjoint matrix, see matrix, skewadjoint transposition, 32

Index

485

transverse, 210 triangle inequality, see inequality, triangle unit circle see circle, unit, 151 unitary basis, see basis, unitary matrix, see matrix, unitary variable free, see free variable vector, 17, 161 vector space, 161 complex, 170 real, 170 weight of polynomial, 246

You might also like