You are on page 1of 79

Unit 5

Inner Product Spaces and Orthogonal


Vectors
5.1

Introduction

In previous units we have looked at general vector spaces defined by a very small number of defining
axioms, and have looked at many specific examples of such vector spaces. However, in our usual view
of the space in which we live we always think of it having certain properties such as distance and
angles that do not occur in many vector spaces. In this Unit we investigate vector spaces that have one
additional property, an inner product, that allows us to define distance, angle and other properties and
we call these inner product spaces. We have already studied some special cases of these in the
Euclidean Vector Spaces,, Rn , for n = 1, 2, , but in this unit we show how the same results apply to
other vector spaces. In particular the Euclidean space aspect is covered in the subsection Geometry
of Linear Transformations Between R2 , R3 and R of Unit 3, Section 3: Linear Transformations from Rn
to Rm .
The approach in this section follows a very important and powerful mathematical approach in which a
very general concept, such as inner product spaces, is developed from only a very few basic defining
properties, or axioms. The properties developed for the general concept apply to a wide variety of
specific and often very different-looking manifestations. That is, any theorem or property that is proved
using the defining axioms of an inner product space, will then apply to every specific example or
manifestation of inner product space.
In particular, theorems like that of Pythagoras (sum of squares of the two sides adjacent to a right angle
in a triangle is equal to the square of the length of the hypotenuse) are shown to apply in all inner
product spaces. A method for creating an orthogonal basis (any two basis vectors are orthogonal to
each other), called the Gram-Schmidt process, is developed for all inner product spaces. Orthonormal
bases (orthogonal bases of unit vectors) were previously shown to be important in using
eigenvectors/eigenvalue to compute powers of a matrix (see Unit 4, Section 5: Diagonalizing a Matrix).
A method called least squares approximation is shown to apply to all inner product spaces. Least
squares approximation has many important applications in Euclidean vector spaces, and in finding best
approximations of data sets, such as linear regression.
The topics are:

Unit 5

Basic definitions and properties of inner product spaces

Constructing an orthonormal basis, using the Gram-Schmidt process, and applications

Linear Algebra 2

MATH 2300

5.2

Least squares approximation and applications

Learning objectives

Upon completion of this unit you should be able to:

write down the defining axioms of an inner product space;

define and give properties satisfied by basic concepts of an inner product space, such as angle,
orthogonality, length/distance, norm, orthogonal complement;

write down and prove a number of basic results that are true in inner product spaces, such as the
triangle inequality, Cauchy-Schwarz inequality, Pythagoras Theorem, the parallelogram theorem;

describe and analyse a number of specific examples of inner product spaces;

describe the Gram-Schmidt process for creating an orthogonal, or orthonormal, basis from any
other basis of an inner product space;

apply the Gram-Schmidt process to find an orthogonal/orthonormal basis from any given basis of
a specific inner product spaces;

show how an orthogonal basis allows properties of vectors to be easily calculated including
coordinates, norms, inner products, distances, orthogonal projections;

show how the Gram-Schmidt process is equivalent to finding a QR decomposition of a certain


matrix;

calculate the QR decomposition of a matrix;

explain the basic concept of best approximation of a vector in terms of projections in an inner
product space;

explain how the best approximation theory is applied to find an approximate solution, called the
least squares solution, of a system of linear equations Ax = b that has no exact solution;

1 T
show how the least squares solution of the linear system Ax = b is given by x = AT A
A b,
provided the columns of A are linearly independent; and

5.3

derive least squares solutions for a variety of practical problems.

Assigned readings

Section 5.5, read sections 6.1 and 6.2 in your textbook.

Section 5.6, read sections 6.3, 6.5 and 6.6 in your textbook.

Section 5.7, read section 6.4 in your textbook.

5.4
1.

Unit activities
Read each section in the unit and carefully work through each illustrative example. Make sure you
understand each concept, process, or Theorem and how it is used. Add all key points to your
personal course summary sheets.
Work through on your own all examples and exercises throughout the unit to check your
understanding of each concept.

2.

Unit 5

Read through the corresponding sections in your textbook and work through the sample problems
and exercises.

Linear Algebra 2

MATH 2300

3.

If you have difficulty with a problem after giving it a serious attempt, check the discussion topic for
this unit to see if others are having similar problems. The link to the discussion area is found in
the left hand menu of your course. If you are still having difficulty with the problem then ask your
instructor for assistance.

4.

After completing the unit, review the learning objectives again. Make sure that you are familiar
with each objective and understand the meaning and scope of the objective.

5.

Review the course content in preparation for the final examination.

6.

Complete the online course evaluation.

5.5

Basic definitions and properties of inner product spaces

An inner product space is any vector space that has, in addition, a special kind of function, called the
inner product. The inner product computes a real number from any two vectors, in a way similar to the
previously-encountered scalar product in Euclidean spaces Rn . The inner product function has linearity
properties, is commutative, and the inner product of a non-zero vector with itself is a positive number.
These properties are exactly the properties satisfied by the scalar product (dot product) of vectors in a
Euclidean vector space, and so Euclidean spaces R2 , R3 and more generally Rn , with the scalar
product, are already inner product spaces. In the Euclidean spaces
p our intuitive idea of distance
between two points/vectors u, v is given by the scalar product (u v) (u v) = ku vk, and other
concepts such as perpendicular vectors and the angle between vectors are defined in terms of the
scalar product. Analogously, an inner product can be used to define a distances, perpendiculartiy, and
angles in any inner product space as shown in this section. Many theorems of Euclidean spaces,
depending on these concepts, hold true in inner product spaces, such as the well known Pythagoras
Theorem for right angle triangles. A number of examples of inner product spaces are given in this
section. The following sections develop more extensive applications for orthogonal bases and least
squares approximations.

5.5.1

Definition of inner product space

Definition.
In a vector space V an inner product is a function, written as hu, vi, for any two vectors u, v V,
satisfying the following five axioms (properties):
(1)

hu, vi is a real number for every u, v V (real number axiom).

(2)

hu, vi = hv, ui for every u, v V (symmetry or commutative axiom).

(3)

hu + v, wi = hu, wi + hv, wi for every u, v, w V (additive linearity axiom for the first variable).

(4)

hku, vi = k hu, vi for every u, v V and every k R (homogenity or scalar linearity axiom for the
first variable).

(5)

hv, vi > 0 for every v 6= 0 V and hv, vi = 0 if v = 0 (positivity axiom).

Note: The symmetry axiom (2) shows that the linearity of axioms (3) and (4) also applies to the second
variable. That is, for every w, u, v V , and every k R:
hw, u + vi = hw, ui + hw, vi
hv, kui = k hv, ui
Note: Combining properties (3) and (4) shows that it is also true that it is fully linear on the first variable:
hku + lv, wi = k hu, wi + l hv, wi

Unit 5

Linear Algebra 2

MATH 2300

Using the note above it also follows that a similar result holds for the second variable:
hw, ku + lvi = k hw, ui + l hw, ui
Note: Setting k = 1 and l = 1 in the result immediately above shows that it is also true that:
hu v, wi = hu, wi hv, wi
Definition.
A real vector space that has an inner product is called an inner product space.
Example 5.5.1.
This example has two parts:
(a)

In R2 , for any two vectors u = (u1 , u2 ) and v = (v1 , v2 ) we have previously defined the scalar
product u v = u1 v1 + u2 v2 , which is a real number. Show that hu, vi = u v is an inner product,
called the Euclidean inner product.

(b)

Similarly in Rn , for any n 1 show that the scalar product


hu, vi = u v = u1 v1 + u2 v2 + + un vn is an inner product.

Solution. It is left as an exercise for the reader to show that the scalar product satisfies all five axioms
of the definition. If you have difficulty with this then please consult your instructor or textbook.
An inner product space can have more than one choice of inner product, as the next example shows for
Euclidean vector spaces. This means that there are other ways to define distance in Euclidean
spaces that are different than our normal definition of distance.
Example 5.5.2.
Show that each of the following hu, vi is an inner product (the first three are called weighted Euclidean
inner products):
(a)

In R2 , for any two vectors u = (u1 , u2 ) and v = (v1 , v2 ) define hu, vi = u1 v1 + 2u2 v2 .

(b)

In R2 , define hu, vi = au1 v1 + bu2 v2 , where a, b R are any two positive numbers.

(c)

In Rn , for a fixed n 1 define hu, vi = a1 u1 v1 + a2 u2 v2 + + an un vn , where a1 , a2 , an are


any non-negative real numbers.

(d)

In R2 , for any two vectors u = (u1 , u2 ) and v = (v1 , v2 ) define


hu, vi = 2u1 v1 u1 v2 u2 v1 + 2u2 v2 .

Solution. We show here the proofs for parts (b) and (d). The proofs for parts (a) and (c) are similar
and are left as an exercise for the reader.
(b)

Suppose u = (u1 , u2 ) , v = (v1 , v2 ) , w = (w1 , w2 ) are any three vectors in R2 .


Proof of axiom (1):
au1 v1 + bu2 v2 is clearly a real number since it consists of products and sums of real numbers
a, b, u1 , u2 , v1 , v2 .
Proof of axiom (2):
Using the definition hu, vi = au1 v1 + bu2 v2 and hv, ui = av1 u1 + bv2 u2 but
au1 v1 + bu2 v2 = av1 u1 + bv2 u2 since real number multiplication is commutative. Hence,
hu, vi = hv, ui .
Proof of axiom (3):
hu + v, wi = h(u1 + v1 , u2 + v2 ) , (w1 , w2 )i
= a (u1 + v1 ) w1 + b (u2 + v2 ) w2
= (au1 w1 + bu2 w2 ) + (av1 w1 + bv2 w2 )
= hu, wi + hv, wi

Unit 5

Linear Algebra 2

MATH 2300

Proof of axiom (4):


hku, vi = h(ku1 , ku2 ) , (v1 , v2 )i
= aku1 v1 + bku2 v2
= k (au1 v1 + bu2 v2 )
= k hu, vi
Proof of axiom (5):

hv, vi = av12 + bv22

and av12 + bv22 0 since a > 0, , b > 0, v12 0, v22 0. That is, this inner product, being composed
of products and and sums of non-negative numbers, is also non-negative and so satisfies:
hv, vi = av12 + bv22 0
Furthermore, the only way that hv, vi = 0 is if v1 = v2 = 0 when v = (v1 , v2 ) is the zero vector.
(d)

Proof of axiom (1):


hu, vi = 2u1 v1 u1 v2 u2 v1 + 2u2 v2 is clearly a real number.
Proofs of axioms (2), (3), (4):
These proofs are left as an exercise for the reader and should present no problems.
Proof of axiom (5):
2
hv, vi = 2v12 2v1 v2 + 2v22 = (v1 v2 ) + v12 + v22 . Hence:
2

hv, vi = (v1 v2 ) + v12 + v22 0


and hv, vi = 0 only if v1 = v2 = 0 in which case v = 0.
Example 5.5.3.
In R2 let any two vectors be given by u = (u1 , u2 ) and v = (v1 , v2 ) . Show that each of the following is
not an inner product in R2 :
(a)

hu, vi = 2u1 v1 3u2 v2

(b)
(c)

hu, vi = u1 v1

hu, vi = u1 v1 + u2 v2

(d)

hu, vi = u21 v12 + u22 v22

Note: In order to show that a function hu, vi is not an inner product it is only necessary to find two
specific vectors u, vf or which one of the five axioms fails to hold.
Solution. It is left for the reader to show that each one fails to satisfy at least one axiom of the
definition as follows:
Show that (a) does not satisfy the positivity axiom 5 (for example, when u = (0, 1) , v = (1, 1)). In
addition it does not satisfy axiom (2).
Show that (b) does not satisfy axiom 5 (for example, if u = (0, 1) , v = (1, 1) - but for a different reason
than in part (a).
Show that (c) does not satisfy axiom (1) because it is not even defined as a real number for some
choices of vectors u, v. In addition, even when it is defined axioms 3 and 4 are not satisfied.
Show that (d) does not satisfy axiom (3). In additon it does not satisfy axiom (4).

Unit 5

Linear Algebra 2

MATH 2300

Example 5.5.4.
In Rn show that any n n real non-singular matrix A generates an inner product defined in terms of the
Euclidean inner product by:
hu, vi = Au Av

Note: This can be re-written in the standard way as a matrix product:


T

Au Av = (Au) Av = uT AT A v
Solution. Proving each axiom:
Axiom 1 (it is a real number) is clearly satisfied.
Axiom 2: hu, vi = uT AT Av and hv, ui = vT AT Au appear to be different, but are in fact the same.
This is because uT AT Av is a real number and so transposing it does not change its value. Hence,
transposing, using the usual matrix formula that the individual parts of the product are transposed in
reverse order:

T
vT AT Au= vT AT Au
T T T
= uT AT AT
v
= uT AT A v
Axiom 3: For any u, v, w Rn :
T

hu + v, wi = (u + v) AT A w

= uT +vT AT A w
= uT AT Aw + vT AT A w
= hu, wi + hv, wi
Axiom 4: For any u, v Rn and k R :
T

hku, vi = (ku) AT A v
= kuT AT A v
= k hu, vi
Axiom 5: For any v Rn
hv, vi = vT AT Av
T

= (Av) Av
2

= kAvk (the usual Euclidean norm)


Since A is non-singular it follows that Av = 0 only when v = 0, and so kAvk > 0. when v 6= 0. Hence:
hv, vi > 0 when v 6= 0
hv, vi = 0 when v = 0
Theorem 5.1. An inner product exists in every finite dimensional vector space V. That is, every finite
dimensional vector space can be made into an inner product space.

Unit 5

Linear Algebra 2

MATH 2300

Proof. Suppose V has dimension n and has a basis {b1 , b2 , b3 , , bn } . For any two vectors
u, v V suppose that the unique linear combinations of the basis vectors are:
u = k1 b1 + k2 b2 + k3 b3 + kn bn
v = l1 b1 + l2 b2 + l3 b3 + ln bn
Define the inner product as the scalar product (Euclidean inner product) of the coordinates of the two
vectors:
hu, vi = k1 l1 + k2 l2 + k3 l3 + + kn ln
This is clearliy a real number, so Axiom 1 holds. It is easy to show that hu, vi = hv, ui ,
hu + v, wi = hu, wi + hv, wi and hku, vi = k hu, vi , so Axioms (2), (3), (4) hold (verify these for
2
2
2
2
yourself). For Axiom 5, hu, ui = (k1 ) + (k2 ) + (k3 ) + + (kn ) 0 and clearly hu, ui = 0 only if
k1 = k2 = k3 = = kn = 0, which means u = 0.

Theorem 5.2. If V is an inner product space, and S is a subspace of V then S is also an inner product
space with the same inner product function as V.
Proof. This is left as an exercise for the reader. Convince yourself that all five axioms for the inner
product of V will also be true in the subspace S.
Example 5.5.5.
In R3 find a formula for the inner product induced, as in Theorem 5.1, by the basis
B = {(1, 0, 0) , (1, 2, 0) , (1, 1, 1)} .
Solution. Put the basis vectors as the columns of the matrix P. To express a vector v = (x, y, z) , with
respect to the standard basis, in terms the basis B we need to find a vector multiplying P on the right
that gives the vector v. That is, using column matrices for vectors, the required coordinates, X, Y, Z, for
the basis B satisfy:

x
y =
z


X
Y =
Z

1
0
0
1
0
0

1
2
0

1
X
X
1 Y = Y =
1
Z
Z


1
1
x
x y2 z2
2 2
y
z
1
1
y =
2
2 2
2
z
z
0
1

P 1

1 1
0 2
0 0

1
x
1 y
1
z

Hence, using this formula for the coordinates, the induced inner product for two vectors
u = (u1 , u2 , u3 ) , v = (v1 , v2 , v3 ) is:

u3 u2
u3
v3 v2
v3
u2
v2
,
, u 3 v1
,
, v3
hu, vi = u1
2
2 2
2
2
2 2
2

u3
v3 u2
u3 v2
v3
v2
u2

+ u 3 v3
v1
= u1
2
2
2
2
2
2
2
2
1
1
1
1
3
1
hu, vi = u1 v1 u1 v2 u2 v1 u1 v3 + u2 v2 u3 v1 + u3 v3
2
2
2
2
2
2

5.5.2

Norm, length, distance, angle, and projections

Definition.
In an inner product space with inner product hu, vi , define:

Unit 5

Linear Algebra 2

MATH 2300

1.

the norm or length of a vector v as:


kvk =

p
hv, vi

2.

the distance d (u, v) between the two point/vectors u, v as the length of the vector u v, namely:
p
d (u, v) = ku vk = hu v, u vi

3.

the angle between two non-zero vectors as the angle 0 satisfying:


cos =

hu, vi
kuk kvk

Note: The angle definition is analogous to the formula found for angles in a Euclidean space defined in
terms of the scalar product. That is, we previously saw the scalar product formula, u v = kuk kvk cos ,
where is the angle between the Euclidean vectors u, v. Thus gives the analogous Euclidean space
formula:
uv
cos =
kuk kvk
This is the same as the inner product formula above because u v is an inner product (see Example
5.5.1).
Note: For any angle the cosine function satisfies 1 cos 1. Hence, the above formula for angles
hu,vi
1 for every pair of vectors u, v in an
in an inner product space can only make sense if 1 kukkvk
inner product space. This result is true and it is known as the Cauchy-Schwarz inequality, described
next in Theorem 5.3.
Theorem 5.3. The Cauchy-Scwarz inequality. For any two vectors u, v in an inner product space the
inner product hu, vi satisfies:
|hu, vi| kuk kvk
Note: The inner product hu, vi is a positive or negative real
|hu, vi| means the absolute
p number so p
value of hu, vi , whereas u, v are vectors and so kuk = hu, ui, kvk = hv, vi are the norms or
lengths of the vectors.
Proof. The proof is short but non-intuitive, and so is not given here. The proof may be found in most
textbooks.
Note: Since |hu, vi| = hu, vi or |hu, vi| = hu, vi (whichever is positve), then the theorem can be
restated:

hu, vi kuk kvk


hu, vi kuk kvk
=
hu, vi kuk kvk =
hu, vi kuk kvk
hu, vi kuk kvk
since multiplying an inequality by a negative number reverses its direction. Hence, the theorem is
equivalent to:
hu, vi
1
1
kuk kvk
hu,vi
(since cos assumes
This justifies the definition of angle between vectors given by cos = kukkvk
every value between -1 and
1
for
just
one
value
of

with
0

,
often
written
in terms of the inverse

hu,vi
cosine formula, = arccos kukkvk ).

Definition.
Two non-zero vectors u, v in an inner product space are said to be perpendicular if the angle
between them is = 2 radians (90 degrees), which is equivalent to hu, vi = 0 (since for angles
between 0 and , cos = 0 = 2 and therefore hu, vi = kuk kvk cos = 0 = 2 ).

Unit 5

Linear Algebra 2

MATH 2300

Theorem 5.4. The norm or length of a vector in an inner product space satisfies the normal properties
that we expect of length and distance. That is, for any two vectors u, v of an inner product space:
(a)

kvk 0 and kvk = 0 only if v = 0 (the zero vector).

(b)

kkvk = |k| kvk for any k R. - multiplying a vector by a scalar changes the length of the vector by
the positive value of that scalar.

(c)

ku + vk kuk + kvk - sometimes called the triangle inequality. That is, the sum of two vectors
cannot be longer than the lengths of the two individual vectors.

Proof. The following outlines the proof:


(a)

See the exercise set.

(b)

For any k R:
p
p
hkv, kvi = k hv, kvi by Axiom 4
p
= k 2 hv, vi by Axioms 2, 3, 4 (see Note after Axioms)

p
= |k| hv, vi since k 2 = |k| for any k R

kkvk =

(c)

Starting with the square of the left side and using the axioms of inner products:
2

ku + vk = hu + v, u + vi
= hu, ui + 2 hu, vi + hv, vi
2

ku + vk = kuk + 2 hu, vi + kvk

Using the Cauchy-Schwarz inequality formula: hu, vi kuk kvk this becomes:
2

ku + vk kuk + 2 kuk kvk + kvk


2

ku + vk (kuk + kvk)
Taking square roots of both sides gives the result.

Example 5.5.6.
In R3 let u = (1, 0, 0) and v = (2, 1, 3). Define the inner product by hu, vi = Au Av = uT AT Av
where A is the matrix:

1 0 2
A = 0 3 1
2 0 2
(a)

Compute hu, vi .

(b)

Compute the norms kuk , kvk .

(c)

Find all vectors w perpendicular to u.

(d)

Find the equation satisfied by all vectors w = (x, y, z) with kwk = 1.

Solution. The following outlines the solution:

Unit 5

Linear Algebra 2

MATH 2300

(a)

Using the scalar product form:


1 0 2
1
1 0
hu, vi = 0 3 1 0 0 3
2 0 2
0
2 0

1
8
hu, vi = 0 6 = 28
2
10

(b)

2
2
1 1
2
3


1
1

kuk = hu, ui = Au Au = 0 0 = 5 = kuk = 5


2
2

Show for yourself that kvk = 200 = 10 2.

(c)

If w = (x, y, z) then it is perpendicular to u = (1,


1
1 0 2
0 = Au Aw = 0 0 3 1
2
2 0 2

0, 0) if hu, wi = Au Aw = 0. That is:


x
1
x + 2z
y = 0 3y z = 5x + 6z
z
2
2x + 2z

Hence, w = (x, y, z) is perpendicular to u exactly when 5x + 6z = 0.


Note: The equation 5x + 6z = 0 defines a set of vectors w that is a plane through the origin of R3
(that is, the vectors w go from the origin to points on the plane 5x + 6z = 0).
(d)

Since kwk > 0 when w 6= 0 it follows that kwk = 1 if, and only if, kwk = 1. Hence, the vectors
satisfy:

x + 2z
x + 2z
2
1 = kwk = Aw Aw = 3y z 3y z = 5x2 + 9y 2 + 9z 2 + 12xz 6yz
2x + 2z
2x + 2z
Hence, kwk = 1 exactly when 5x2 + 9y 2 + 9z 2 + 12xz 6yz = 1.
Note: In a Euclidean coordinate system this is the equation of an ellipsoid with centre at the
origin.

Example 5.5.7.
Using the weighted Euclidean inner product on R2 given by:
hu, vi = 2u1 v1 + 3u2 v2
Let u = (3, 2) , v = (1, 4) .
(a)

Find the norms u, v.

(b)

Find the inner product hu, vi .

(c)

Find the distance between the vectors/points u, v.

(d)

Find the angle between the vectors u, v.

(e)

Find the set of all vectors perpendicular to u.

Solution. The following outlines the solution:


p
p

(a) kuk = hu, ui = 2u21 + 3u22 = 30. Similarly kvk = 50

Unit 5

Linear Algebra 2

MATH 2300

10

(b)
(c)
(d)

hu, vi = 2u1 v1 + 3u2 v2 = 18


p
p

ku vk = hu v, u vi = h(2, 6) , (2, 6)i = 126 and this is the distance between the
vectors/points u, v.
The angle satisfies cos =
of the angle is:

hu,vi
kukkvk

18
30 50

= 5915 . Using a calculator, the approximate value

9
' 2. 054 2radians
= arccos
5 15

In degrees the approximate value is:


2. 054 2

180
' 117. 70 degrees

Note: This angle is different from the angle calculated using the standard scalar product, which is
uv
uv

= uu
= 13517 , giving ' 1. 913 8 radians or about 109. 65 degrees.
cos = kukkvk
vv
(e)

The vector w = (w1 , w2 ) is perpendicular to u if:


0 = hu, wi = 2u1 w1 + 3u2 w2 = 6w1 6w2
The set of vectors perpendicular to u therefore satisfies 6w1 6w2 = 0, or simply w1 = w2 . In a
Euclidean coordinate system this is a line through the origin at 45 degrees to the axes. That is
every vector from the origin along this line is perpendicular to u.

Theorem 5.5. This theorem is in two parts:


(a)

Pythagoras Theorem. For any two perpendicular vectors u, v in an inner product space both of
the following hold:
2

kuk + kvk = ku vk
kuk + kvk = ku + vk
(b)

The cosine law. If is the angle between two vectors u, v in an inner product space, then:
2

ku vk = kuk + kvk 2 kuk kvk cos


Note: To see the connection with the usual theorem of Pythagoras and the cosine law in R2 think of
u, v as being two vectors in R2 , starting from the origin with angle between them. The vector joining
the end of v to the end of u (the hypotenuse of the triangle) is u v. Hence, part (b) of the theorem in
R2 states that the square of the length of the hypotenues is the sum of the squares on the other two
sides of the triangle minus 2 kuk kvk cos . This last term is zero when the vectors are perpendicular,
thus giving Pythagoras Theorem. See Figure 5.1.
Proof. By the definition of norm and the axioms of the inner product definition:it follows that:
2

ku vk = hu v, u vi
= hu v, ui hu v, vi
= hu, ui hv, ui hu, vi + hv, vi
2

= kuk 2 hu, vi + kvk

Using the definition of angle, hu, vi = kuk kvk cos , between the vectors, this becomes the part (b)
result:
2
2
2
ku vk = kuk + kvk 2 kuk kvk cos

Unit 5

Linear Algebra 2

MATH 2300

11

Figure 5.1: Pythagoras Theorem


If the vectors u, v are perpendicular then cos = 0, and part (a) of the theorem follows:
2

ku vk = kuk + kvk

Replacing v by v gives the alternate form of the part (a).

Theorem 5.6. The parallelogram theorem. Given two vectors u, v:

2
2
2
2
2 kuk + kvk = ku vk + ku + vk

Note: This can be interpreted as follows. The sum of the squares of the lengths of the four sides of a
parallelogram is equal to the sum of the squares of the lengths of the diagonals, as in Figure 5.2.
2

Proof. Try this for yourself. Use the norm property kwk = hw, wi applied to ku vk and ku + vk ,
together with the axioms satisfied by the inner product.

Example 5.5.8.
In M22 (all 2 by 2 matrices) prove:
(a)

The following is an inner product:


a11 a12
b11
hA, Bi =
,
a21 a22
b21

b12
b22

= a11 b11 + a12 b12 + a21 b21 + a22 b22

That is, multiply the matrix entries in the corresponding positions and add them together.

Unit 5

Linear Algebra 2

MATH 2300

12

Figure 5.2: Parallelogram Theorem


(b)

Prove that the two vectors/matrices are orthogonal with respect to the inner product:

1 3
0 2
C=
, D=
2 4
3 3

(c)

Find the hypotenuse H of the triangle for which two of the sides are the orthogonal
vectors/matrices C, D, and confirm that these satisfy Pythagoras Theorem.

(d)

Find the angles between C and H and between D and H. Do the angles of this triangle add up to
180 degrees?

Solution. The following outlines the solution:


(a)

We could prove that each of the five axioms hold for this formula. However, note that the matrix
shape plays no role in the formula for hA, Bi. That is, if we re-write the matrix entries as vectors:
A (a11 , a12 , a21 , a22 ) , B (b11 , b12 , b21 , b22 )
then the formula is exactly the same as the Euclidean inner product on R4 , and so it must be an
inner product in M22 .

(b)

hC, Di = 1 0 + 3 2 + 2 3 + (4) 3 = 0. Hence, the matrices/vectors are orthogonal.

(c)

The hypotenuse vector/matrix is given by:


1 3
0
H =C D =

2 4
3

2
3

1
1

1
7

Computing the norms (squaring all of the entries and adding them):
2

kHk = hH, Hi = 52, kCk = hC, Ci = 30, kDk = hD, Di = 22


and we have:

kCk + kDk = 30 + 22 = 52 = kHk

Unit 5

Linear Algebra 2

MATH 2300

13

Figure 5.3: Projection onto a Line


(d)

The angle between C and H is given by:


cos =

hC, Hi
32
= = ' 0.626 32 radians, ' 35. 885degrees
kCk kHk
30 52

The angle between D and H is given by:


cos =

hD, Hi
22
= = ' 2. 279 0 radians, ' 130. 58 degrees
kDk kHk
22 52

Since the third angle in the triangle is 2 radians or 90 degrees, the angles of this triangle clearly
do not add up to 180 degrees. That theorem only works with the standard Euclidean norm.
Definition.
In an inner product space, the projection (also called orthogonal projection) of a vector q onto a vector
t is a vector p parallel to t such that p q is orthogonal to t, as in Figure 5.3.
Note: This is exactly analogous to the previously-defined projection in an Euclidean vector space (see
Unit3, Section 3: Linear Transformations from Rn to Rm ) where the projection is given in terms of the
qt
qt
scalar product by p = ktk
2 t = tt t.
Definition.
In an inner product space, the reflection of a vector q in the line formed by a vector t is a vector r such
that p r is orthogonal to t and such that the mean, 21 (p + r) is the projection of q ono t, as in Figure
5.4.
Note: This is exactly analogous to the previously-defined reflection in an Euclidean vector space (see
Unit3, Section 3: Linear Transformations from Rn to Rm ) where the reflection is given in terms of the
t q.
scalar product by r = 2(tq)
ktk2
Theorem 5.7. If q, t are any two vectors in an inner product space V then:

Unit 5

Linear Algebra 2

MATH 2300

14

Figure 5.4: Reflection in a Line


(a)

There exists a unique vector p that is the projection of the vector q onto a vector t, given by the
formula:
hq, ti
hq, ti
p=
2 t = ht, ti t
ktk

(b)

There exists a unique vector r that is the reflection of q in t, given by the formula:
p=2

hq, ti
ktk

t q =2

hq, ti
tq
ht, ti

Proof. The proofs are exactly the same as for Euclidean spaces, except that the scalar product is
replaced by the inner product. Try it for yourself, and look at the proofs in Unit 3 if you have difficulty.

Example 5.5.9.
Using the M22 inner product and matrix D of Example 5.5.8, find the projection of the matrix C onto the
matrix D and the projection of the matrix E onto D where:

1 3
E=
2 4
Proof. From Theorem 5.7 the matrix M that is the projection of C onto D is given by:


hC, Di
0
0 2
0 0
M=
D=
=
0 0
hD, Di
22 3 3
and so the projection is the zero vector, because the two matrices are orthogonal to each other.
From Theorem 5.7 the matrix M that is the projection of E onto D is given by:


24
24 0 2
hE, Di
0 11
D=
= 36 36
M=
hD, Di
22 3 3
11
11

Unit 5

Linear Algebra 2

MATH 2300

15

Example 5.5.10.
Requires calculus. In P4 find the reflection of the polynomial f (x) = x in the polynomial
R1
g (x) = 1 + x + x2 + x3 , using the inner product 1 f (x) g (x) dx.
Solution. The reflection polynomial h (x) is given by the formula:
hf, gi
g (x) f (x)
hg, gi
R1
f (x) g (x) dx
g (x) g (x)
= 2 R1
1
f (x) g (x) dx
1

h (x) = 2

Section 5.5 exercise set


Check your understanding by answering the following questions.
1.

In R2 prove that hu, vi is an inner product (show it satisfies the five axioms):
hu, vi = 2u1 v1 + 3u2 v2
for any two vectors u = (u1 , u2 ) and v = (v1 , v2 ).

2.

3.

4.

In R2 let any two vectors be given by u = (u1 , u2 ) and v = (v1 , v2 ) . Show that each of the
following is not an inner product in R2 . Recall that you need only produce one example where the
result fails in order to disprove something.
(a)

hu, vi = 2u1 v1 + 3u2 v2

(b)

hu, vi = u1 v1 + u2

(c)

hu, vi = u1 v1 + u2 v2 + 1

(d)

hu, vi = |u1 v1 + u2 v2 | (absolute value)

(e)

hu, vi =

u1
v1

u2
v2

Using the inner product on R2 given by hu, vi = 2u1 v1 + 3u2 v2 :


(a)

Find the lengths of the vectors (1, 0) , (2, 1) .

(b)

Find the inner product of the two vectors above.

(c)

Find the angle between the two vectors.

Define an inner product on R3 by:


h(u1 , u2 , u3 ) , (v1 , v2 , v3 )i = 2u1 v1 + 3u2 v2 + u3 v3

5.

Unit 5

(a)

Find the length of the vector (3, 2, 1) .

(b)

Find the inner product of the vector (3, 2, 1) with the vector (0, 1, 2) .

(c)

Find all vectors perpendicular to the vector (3, 2, 1) .

(d)

Find the distance from (3, 2, 1) to (0, 1, 2) .

Define an function on P3 for any two polynomials p (x) = a0 + a1 x + a2 x2 , q (x) = b0 + b1 x + b2 x2


by:
hp, qi = a0 b0 + a1 b1 + a2 b2
(a)

Prove that it is an inner product.

(b)

Find the length of the vector p (x) = 2 3x + 4x2 .

Linear Algebra 2

MATH 2300

16

6.

(c)

Find all vectors perpendicular to the vector p (x) = 1.

(d)

Find all vectors perpendicular to the vector p (x) = 1 + 2x x2 .

Note: Requires calculus. Repeat the previous problem, parts (a), (b), (c), but with the inner
product defined for any two polynomials in p, q P3 by:
Z

hp, qi =

p (x) q (x) dx
1

7.

8.

Using the matrix-based inner product for R2 defined by hu, vi = uT AT Av (see Examples 5.5.4
and 5.5.5), where:

2 3
A=
0 1
(a)

Find the length of the vector (1, 1) .

(b)

Find the inner product of the vector (1, 1) with the vector (0, 1) .

(c)

Find the distance from (1, 1) to (0, 1) .

(d)

Find the angle between the vectors (1, 1) and (0, 1) .

In R2 define a function hu, vi = uT AT Av where:

1 0
A= 0 1
1 1
Is it an inner product for R2 ? Justify your answer.

9.

In R2 with inner product hu, vi = 2u1 v1 + 3u2 v2 , find the equation satisfied by all vectors with
kuk = 1.

10.

Find
a formula

for the inner product in P3 given in Theorem 5.1, when the basis of P3 is
1, x, 1 + x2 .

11.

In M22 , for any two matrices A = [aij ] , B = [bij ] define:


hA, Bi = a11 b11 + 2a12 b12 + 3a21 b21 + 4a22 b22
That is, multipy elements in corresponding positions in each matrix and form a weighted sum.

12.

(a)

Show this is an inner product.

(b)

Compute the norm of the matrix/vector given in question 7.

(c)

Compute the inner product of the two matrices/vectors:

1 0
0 3
C=
, D=
0 1
1 0

(d)

Compute the distance between the two matrices/vectors above.

(e)

Compute the angle between the two matrices/vectors above.

Use the Cauchy-Schwarz inequality to prove:


(a)

For any two vectors u, v R2


1

1
|u1 v1 + u2 v2 | u21 + u22 2 v12 + v22 2

Unit 5

Linear Algebra 2

MATH 2300

17

(b)

For any two vectors u, v Rn

1
1
|u1 v1 + u2 v2 + + un vn | u21 + u22 + + u2n 2 v12 + v22 + + vn2 2

(c)

Requires calculus. For any two functions f, g continuous on [0, 1]:


Z

2 Z
f (x) g (x) dx

(d)

21

[f (x) + g (x)] dx

g (x) dx

21 Z
[f (x)] dx +
2

[g (x)] dx

1
1
2
2
ku + vk ku vk
4
4

Show that Pythagoras Theorem can be written in the following form and prove this result. For all
non-zero vectors p, q, r satisfying (p r) (r q) = 0, it is true that:
2

kp rk + kr qk = kp qk

15.

In any inner product space prove that if kvk = 0 then v = 0 (the zero vector).

16.

If V is an inner product space then prove:

17.

21

Let V be any inner product space. Prove that the inner product function can be expressed in
terms of norms of the vectors by:
hu, vi =

14.

Z
f (x) dx

Requires calculus. For any two functions f, g continuous on [0, 1]:


Z

13.

(a)

If u V , is a fixed vector then prove that the set S = {v V | hu, vi = 0} , of all vectors
perpendicular to u, is a subspace of V.

(b)

If w is perpendicular to each of the vectors in T = {v1 , v2 , vk } then w is perpendicular


to every vector in the linear span of T (that is, all linear combinations of the vectors vj ).

(c)

The set S of all vectors w V perpendicular to T is a subspace of V . Note: Sometime S is


designated by the symbol T .

(d)

If the set T in part (b) is a basis of V then w must be the zero vector (that is, only the zero
vector is perpendicular to every vector of a basis).

In an inner product space V prove that any three vectors satisfy:


kv1 + v2 + v3 k kv1 k + kv2 k + kv3 k

Solutions
1.

hu, vi = 2u1 v1 + 3u2 v2 is clearly a real number so Axiom 1 is satisfied, and clearly
hu, vi = hv, ui = 2u1 v1 + 3u2 v2 so Axiom 2 is satisfied.
Axiom 3 follows from the linearity of real number multiplication/addition:
hu + w, vi = 2 (u1 + w1 ) v1 + 3 (u2 + w2 ) v2
= 2u1 v1 + 3u2 v2 + 2w1 v1 + 3w2 v2
= hu, vi + hw, vi

Unit 5

Linear Algebra 2

MATH 2300

18

Axiom 4 follows in a similar way:


hku, vi = 2ku1 + k3u2
= k (2u1 + 3u2 )
= k hu, vi
Axiom 5 follows easily:
hu, ui = 2u21 + 3u22 0
and clearly hu, ui = 0 only if u1 = u2 = 0.
2.

Note: In each case one Axiom is shown to fail, but it is noted that other Axioms fail as well. Try
yourself to find examples of failure for these other Axioms.
(a)

hu, vi = 2u1 v1 + 3u2 v2 satisfies the first four axioms but Axiom 5 fails if, for example,
u = (1, 0) when hu, ui = 2 is negative.

(b)

hu, vi = u1 v1 + u2 satisfies Axioms 1, but none of the other axioms. For example, Axiom 2
fails when u = (1, 1) , v = (1, 0) because hu, vi = 2 6= hv, ui = 1.

(c)

hu, vi = u1 v1 + u2 v2 + 1 satisfies Axioms 1, 2, 5 but not Axioms 3 and 4. For example, to


disprove Axiom 3, if u = (1, 0) , v = (1, 1) , w = (1, 0) then:
hu + w, vi = 3, hu, vi + hw, vi = 2 + 2 = 4
so hu + w, vi 6= hu, vi + hw, vi.

(d)

hu, vi = |u1 v1 + u2 v2 | satisfies Axioms 1, 2, 5 but not Axioms 3 and 4. For example, to
disprove Axiom 4, if u = (1, 0) , v = (1, 0) , k = 2 then:
hu, vi = 1, hku, vi = 2, k hu, vi = 2
so hku, vi 6= k hu, vi.

(e)
3.

4.

hu, vi = uv11 + uv22 does not satisfy any of the axioms. For example, Axiom 1 does not hold if
u = (1, 0) because hu, vi is not defined (cannot divide by zero).

If u = (1, 0) , v = (2, 1) , hu, vi = 2u1 v1 + 3u2 v2 , then:


p
p

(a) kuk = hu, ui = 2u21 + 3u22 = 2. Similarly kvk = 11.


(b)

hu, vi = 2u1 v1 + 3u2 v2 = 4

(c)

hu,vi
= 422 , and it is approximately (using
The angle between 0 and is given by cos = kukkvk

a calculator): = arccos 422 ' 0.549 47 radians (or 31. 482 degrees). Note: This is not the
usual angle between these two vectors given by the Euclidean norm, which is about 0.463 65
radians or 26. 565 degrees -check this for yourself.

If h(u1 , u2 , u3 ) , (v1 , v2 , v3 )i = 2u1 v1 + 3u2 v2 + u3 v3 and u = (3, 2, 1) , v = (0, 1, 2) then:


p
p

(a) kuk = hu, ui = 2u21 + 3u22 + u23 = 31


(b)

hu, vi = 2u1 v1 + 3u2 v2 + u3 v3 = 4

(c)

The vector w = (w1 , w2 , w3 ) is perpendicular to u if:


2u1 w1 + 3u2 w2 + u3 w3 = 0 =
6w1 + 6w2 + w3 = 0
This is the equation of a plane through the origin. That is, all vectors from the origin to a
point on this plane are perpendicular to u.

Unit 5

Linear Algebra 2

MATH 2300

19

(d)
5.

ku vk =

p
p

hu v, u vi = (3, 1, 3) , (3, 1, 3) = 30

If f (x) = a0 + a1 x + a2 x2 , g (x) = b0 + b1 x + b2 x2 and hf, gi = a0 b0 + a1 b1 + a2 b2 then:


(a)

Axioms 1 and 2 clearly hold since a0 b0 + a1 b1 + a2 b2 is a real number, and the real number
multiplications are all commutative. Axiom 3 holds because if h (x) = c0 + c1 x + c2 x2 then:
hf + h, gi = (a0 + c0 ) b0 + (a1 + c1 ) b1 + (a2 + c2 ) b2
= (a0 b0 + a1 b1 + a2 b2 ) + (c0 b0 + c1 b1 + c2 b2 )
= hf, gi + hh, gi
Axiom 4 holds because:

(b)
(c)

hf, f i = a20 + a21 + a22 0

and hf, f i = 0 only if a0 = a1 = a2 = 0 (that is, when f (x) is the zero function).
p
p

kp (x)k = hp, pi = p20 + p21 + p22 = 29


q (x) = q0 + q1 x + q2 x2 is perpendicular to p (x) = p0 + p1 x + p2 x2 = 1 if:
q0 p0 + q1 p1 + q2 p2 = 0 =
q0 = 0 (since p0 = 1, p1 = p2 = 0)

(d)

Hence, the vectors/polynomials perpendicular to p (x) = 1 are all polynomials of the form
q (x) = q1 x + q2 x2 for any real values q1 , q2 .
Similar to the previous part q (x) must satisfy:
q0 p0 + q1 p1 + q2 p2 = 0
where p0 = 1, p1 = 2, p2 = 1. That is:
q0 + 2q1 q2 = 0

6.

That is, replacing q2 by (q0 + 2q1 ) , all vectors/polynomials perpendicular to


p (x) = 1 + 2x x2 are q (x) = q0 + q1 x + (q0 + 2q1 ) x2 , for any real values q0 , q1 .
R1
If f (x) = a0 + a1 x + a2 x2 , g (x) = b0 + b1 x + b2 x2 and hf, gi = 1 p (x) q (x) dx then:
(a)

Axiom 1 holds because the integral always exists (when the functions are continuous) and is
a real number.
R1
Axiom 2 holds because hf, gi = hg, f i = 1 f (x) g (x) dx.
Axiom 3 holds because if h (x) = c0 + c1 x + c2 x2 then:
Z 1
hf + h, gi =
(f (x) + h (x)) g (x) dx
1
1

Z
=

f (x) g (x) + h (x) g (x) dx


1
Z 1

f (x) g (x) dx +
1

h (x) g (x) dx
1

= hf, gi + hh, gi
Axiom 4 holds because if k R:
Z

hkf, gi =

kf (x) g (x) dx
1

=k

f (x) g (x) dx
1

= k hf, gi

Unit 5

Linear Algebra 2

MATH 2300

20

Axiom 5 holds because:

hf, f i =

[f (x)] dx 0
1

since the integral of a non-negative function over any interval is a non-negative value. A
theorem of calculus shows that the only way this integral can be equal to zero when f is
continous on the interval [1, 1] is when f (x) is equal to zero for every x in the interval.
Note: This proof also works for the vector space of functions for which the integrals exist,
such as the vector space of all functions continuous on [1, 1]. The proof also works if
different limits of integration are used to define the inner product formula.
(b)

kp (x)k =

qR
p
1
2
hp, pi =
[p (x)] dx and this is given by:
1
sZ

sZ

sZ

(16x4 24x3 + 25x2 12x + 4) dx

=
s
=
r
=
(c)

(2 3x + 4x2 ) dx

[p (x)] dx =

25 3
16 5
4
2
x 6x + x 6x + 4x
5
3
1
466
15

q (x) = q0 + q1 x + q2 x2 is perpendicular to p (x) = p0 + p1 x + p2 x2 = 1 if:


Z

0 = hp, qi =
Z

p (x) q (x) dx
1

0=

q0 + q1 x + q2 x2 dx

0 = q0 x + q1

1
x3
x2
+ q2
2
3 1

2
0 = 2q0 + q2
3
q2
0 = q0 +
3
Hence, substituting q0 = q32 , the polynomials perpendicular to p (x) = 1 are
q (x) = q32 + q1 x + q2 x2 .
7.

Let u = (1, 1) , v = (0, 1) . First compute:

A A=
(a)

kuk =

2 0
3 1

2 3
0 1

4
6

6
10

p
hu, ui = uT AT Au and this is given by:
uT AT Au =

Hence, kuk =

Unit 5

4 6
6 10

1
1

= 26

26. Similarly, kvk = 10.

Linear Algebra 2

MATH 2300

21

(b)

This is given by:


hu, vi = uT AT Av =

(c)

ku vk =

4 6
6 10

0
1

= 16

p
hu v, u vi and hu v, u vi is given by:
T

hu v, u vi = (u v) AT A (u v)

4 6
1
= 1 0
6 10
0
=4
(d)

Hence, the distance from (1, 1) to (0, 1) is


The angle between u and v is given by:

4 = 2.

8
16
hu, vi
= =
kuk kvk
26 10
65

cos =

The approximate angle (using a calculator) is = arccos 865 ' 0.124 35 radians or about 7.
124 7 degrees. Note: This has no relationship to the angle calculated using the Euclidean
inner product, which is 4 or 45 degrees.
8.

The function hu, vi = uT AT Av is defined for all vectors u, v R2 and is a real number (since
the sizes of the four parts of the product are compatible: 1 2, 2 3, 3 2, 2 1, and the result
has size 1 1). Hence, Axiom 1 is satisfied.

T
Axiom 2 is satisfied because hv, ui = vT AT Au = vT AT Au (since the transpose of a number
is the same number). Hence:

T
hv, ui = vT AT Au = uT AT A v = hu, vi
Axiom 3 is satisfied since:
T

hu + w, vi = (u + w) AT A v
= uT AT Av + wT AT A v
= hu, vi + hw, vi
Axiom 4 is satisfed since for k R:
T

hku, vi = (ku) AT Av

= k uT AT Av
= k hu, vi
Axiom 5 is satisfied since:
hv, vi = vT AT Av
T

= (Av)

= kAvk (standard Euclidean norm)


2

and so hv, vi = kAvk 0. Furthermore, Av = 0 only if:

1 0
0
v1
0
0 1 v1 = 0 =
= 0
v2
v2
1 1
0
v1 + v2
0
which shows v = 0.
Hence, all five axioms are satisfied and so the function is an inner product.

Unit 5

Linear Algebra 2

MATH 2300

22

9.
10.

kuk = 1 if, and only if, kuk = 1, which is 2u21 + 3u22 = 1. In the standard Euclidean axis system,
this is an ellipse.
Any vector/polynomial p (x) = p0 + p1 x + p2 x2 can be expressed in terms of the basis by:

p (x) = (p0 p2 ) 1 + p1 x + p2 1 + x2
Hence, by Theorem 5.1 the inner product defined as a dot product by this basis is:
hp, qi = (p0 p2 , p1 , p2 ) (q0 q2 , q1 , q2 )
= (p0 p2 ) (q0 q2 ) + p1 q1 + p2 q2
hp, qi = p0 q0 p0 q2 + p1 q1 p2 q0 + 2p2 q2
Hence, the inner product is:
hp, qi = p0 q0 p0 q2 + p1 q1 p2 q0 + 2p2 q2

11.
(a)

We could prove that each of the five axioms hold for this formula. However, we can avoid this
by using the method of Example 5.5.8. Note that the matrix shape plays no role in the
formula for hA, Bi. The formula is exactly the same as a weighted (weights 1, 2, 3, and 4)
Euclidean inner product on R4 where the matrix entries are re-written as vectors:
A (a11 , a12 , a21 , a22 ) , B (b11 , b12 , b21 , b22 )

(b)
(c)
(d)

Since a weighted Euclidean inner product is always an inner product, then the matrix formula
hA, Bi is also an inner product.

2 3
2
2
2

0 1 = 1 2 + 2 3 + 3 0 + 4 1 = 26
hC, Di = 1 0 + 2 0 3 + 3 0 1 + 4 1 0 = 0

1 3
1 3
2
2
hC D, C Di =
,
= 1 + 2 (3) + 3 (1) + 4 = 26. Hence:
1 1
1 1
kC Dk =

(e)

If is the angle between C and D then:


cos =
Hence, =

12.

hC D, C Di = 26

hC, Di
= 0 since hC, Di = 0
kCk kDk

(90 degrees). The matrices are perpendicular to each other.

The Cauchy-Schwarz result (Theorem 5.3) is: |hu, vi| kuk kvk.
(a)

If u = (u1 , u2 ) , v = (v1 , v2 ) with thep


inner product is the
p standard scalar or dot product so
that hu, vi = u1 v1 + u2 v2 and kuk = u21 + u22 , kvk = v12 + v22 . Applying these in the
Cauchy-Schwarz formula gives the required result:
|u1 v1 + u2 v2 |

(b)

Unit 5

q
q
u21 + u22 v12 + v22

The proof is very similar to part (a). Try this for yourself.

Linear Algebra 2

MATH 2300

23

(c)

R1
The function hf, gi = 0 f (x) g (x) dx is an inner product for the vector space of functions
continuous
qR of question 6(a). With this inner product:
qR on [0, 1] (see the proof
1
1
2
2
[f (x)] dx, kgk =
[g (x)] dx, and the Cauchy-Schwarz result,
kf k =
0
0
|hf, gi| kf k kgk, becomes:
Z 1
Z 1
21
21 Z 1

2
2

[g (x)] dx
f (x) g (x) dx
[f (x)] dx

Both sides are positive, so squaring both sides gives the required result:
Z 1
2 Z 1
Z 1

2
2
f (x) g (x) dx
[f (x)] dx
[g (x)] dx
0

(d)

This result is simply the triangle inequality (see Theorem 5.4, part (c)):
kf + gk kf k + kgk

13.

By the definition of norm in terms of the inner product and the axioms of the inner product:
1
1
1
1
2
2
ku + vk ku vk = hu + v, u + v i hu v, u vi
4
4
4
4
1
1
= [hu, u i + 2 hu, vi + hv, v i] [hu, u i 2 hu, vi + hv, v i]
4
4
= hu, vi

14.

kp rk + kr qk = kp qk
15.

One version of Pythagoras Theorem (Theorem 5.5) states that kuk + kvk = ku + vk when
hu, vi = 0. Replace u by p r and v by r q so that u + v is replaced by p q and hu, vi = 0 is
replaced by hp r, r qi = 0, thus giving the required result:
2

p
Proof of the if part: If v = 0 then hv, vi = 0 by Axiom 5. Hence, kvk = hv, vi = 0.
2
Proof of the only if part: If kvk = 0 then kvk = hv, vi = 0. Axiom 5 states that hv, vi = 0 only
when v = 0 and so the result follows.

16.
(a)

The set S of vectors in V perpendicular to a particular vector v is a subset of a vector space


V. To prove S is a vector space it is only necessary to prove it is closed under addition and
scalar multiplication (see Theorem 2.3 of Unit 2, Section 3: Subspaces of a Vector Space).
Suppose k R and v, w are any two vectors in S, so hv, ui = hw, ui = 0. Using the axioms
of the inner product:
hv + w, ui = hv, ui + hw, ui = 0 + 0 = 0
hkv, ui = k hv, ui = k 0 = 0

(b)

Hence, S is closed under addition and scalar multiplication and so it is a vector space (a
subspace of V ). Note: S must also be an inner product space since the inner product of V is
also an inner product for S.
If w satisfies hv1 , wi = 0, hv2 , wi = 0, , hvk , wi = 0 and then by the axioms of inner
products, for any scalars r1 , r2 , , rk :
hr1 v1 + r2 v2 + + rk vk , wi = hr1 v1 , wi + hr2 v2 , wi + + hrk vk , wi
= r1 hv1 , wi + r2 hv2 , wi + + rk hvk , wi
=0
Hence, w is perpendicular to the linear span of T.

Unit 5

Linear Algebra 2

MATH 2300

24

(c)

As in part (a), it is only necessary to prove that S is closed under addition and scalar
multiplication. The proof is very similar to the proof of part (a) and is not given here.

(d)

The vector w is perpendicular to all of the basis vectors, and w is also a linear combination
of the basis vectors. However w is orthogonal to all linear combinations of the basis vectors,
by part (b), and so w is perpendicular to itself:
0 = hw, wi
By Axiom 5 for linear products, it follows that w = 0.

17.

The triangle inequality for any two vectors u, v V, states that ku + vk kuk + kvk . Writing
u = v1 + v2 and v = v3 changes this to:
k(v1 + v2 ) + v3 k kv1 + v2 k + kv3 k
Applying the triangle inequality a second time to v1 , v2 shows that kv1 + v2 k kv1 k + kv2 k and
so the above inequality becomes the required result:
kv1 + v2 + v3 k kv1 k + kv2 k + kv3 k

5.6

Orthogonal bases, the Gram-Schmidt process and QR factorization

We have previously seen that a basis of a vector space can be used to develop most processes and
properties of interest. Usually it does not matter which particular basis is used, but sometimes a special
basis is easier to use than other bases. In particular, if the vector space is an Inner Product Space then
it is often very advantageous to work with an orthogonal basis (each basis vector is perpendicular to
every other basis vector). Orthogonal bases are required in some applications, such as the method for
diagonalizing a matrix using eigenvalues and eigenvectors (see Unit 4 Eigenvalues, Eigenvectors, and
Diagonalization of Matrices, Diagonalizing a Matrix).
In this section an algorithm , the Gram-Schmidt Process, is described for converting any
non-orthogonal basis of an Inner Product Space into an orthogonal basis. If the vectors of the original
basis are the columns of a matrix A then the Gram-Schmidt process is shown to be equivalent to
finding a QR factorization, A = QR, where Q is an orthogonal matrix, and R is upper triangular. The
QR factorization is used in the QR algorithm, one of the most successful numerical methods for
finding the eigenvalues a matrix (see Unit 4 Eigenvalues, Eigenvectors and Diagonalization of Matrices,
Methods for finding Eigenvalues and Eigenvectors).

5.6.1

The Gram-Schmidt process

Definition.
A set of vectors {v1 , v2 , v3 , , vn } of an Inner Product space is said to be orthogonal if the vectors
are mutually orthogonal, meaning:
hvi , vj i = 0 for i 6= j = 1, 2, 3, , n
The set {v1 , v2 , v3 , , vn } is said to be orthonormal if it is orthogonal and all vectors are unit
vectors. That is, for i, j = 1, 2, 3, , n:
hvi , vi i = 1 and hvi , vj i = 0 if i 6= j
2

Note: A
pvector v is a unit vector if its length is 1, meaning kvk = 1. Since kvk = hv, vi , and so
kvk = hv, vi, it follows that v is also a unit vector if hv, vi = 1.
Note: An orthogonal set of vectors can always be converted to an orthonormal set by simply converting
each vector to a unit vector by dividing it by its length (change each vector v to the vector
1
1 v).
kvk v =
hv, vi

Unit 5

Linear Algebra 2

MATH 2300

25

The Gram-Schmidt Process, or algorithm, uses a known basis of an Inner Product space to construct
an orthogonal basis. The next two examples show how this process works in simple cases, and
Theorem 5.8 gives the general process. The process makes extensive use of the formula for the
orthogonal projection of one vector, u, onto another vector v derived in the previous section.
projv u =

hu, vi
2

kvk

v=

hu, vi
v - orthogonal projection of u onto v
hv, vi

The formula for the projection of u onto vector v in Euclidean spaces was developed in Unit 3, Linear
Transformations from Rn to Rm in the subsection: Projection Operators R2 R and is the same
formula with the scalar product being the inner product:
projv u =

uv

2v

kvk

uv
v
vv

Example 5.6.1.
Given the basis {u1 , u2 } = {(1, 1) , (1, 0)} of R2 , construct an orthogonal basis {v1 , v2 }.
Solution. Note that the inner product here is the normal scalar product, and the existing basis is not
orthogonal, since
u1 u2 = (1, 1) (1, 0) = 1 6= 0
First we will compute an orthogonal basis.
Step 1. Choose arbitrarily v1 as one of the basis vectors, say:
v1 = u1 = (1, 1)
Step 2. Choose v2 as the second original basis vector, u2 , minus the projection of u2 onto v1 :
(u2 v1 )

v1
2
kv1 k
(1, 0) (1, 1)
(1, 1)
= (1, 0)
2
1
= (1, 0) (1, 1)
2

1
1
,
v2 =
2
2
v2 = u2

Multiplying v2 by 2 to simplify it without changing the orthogonality gives the orthogonal basis
{v1 , v2 } = {(1, 1) , (1, 1)} . Note: Normalizing the vectors gives the orthonormal basis:

{v1 , v2 } =

1
1
,
2
2

1
1
,
2
2

Note: Check for yourself that these are orthogonal.


Note: The process used here is followed in more general cases. Start with one of the original basis
vectors, then modify the second one by subtracting its projection on the first vector. In the next
example, with three vectors, the third original basis vector is modified by subtracting its projection on
the first two vectors of the orthogonal basis.
Example 5.6.2.
Use this to construct an orthonormal basis {v1 , v2 , v3 }.

Unit 5

Linear Algebra 2

MATH 2300

26

Solution. Note that the inner product here is the normal scalar product, and the existing basis is
clearly not orthogonal. It is easiest to construct an orthogonal basis first, then normalize (make into unit
vectors) the basis afterwards. Start as in the previous example:
Step 1. Choose arbitrarily v1 as one of the basis vectors, say:
v1 = u1 = (1, 1, 1)
Step 2. You might notice that u2 is already orthogonal to v1 and so we can choose
v2 = u2 = (1, 0 1) . If you did not notice this then the solution process gives the same result as
follows. Choose v2 as the second original basis vector, u2 , minus the projection of u2 onto v1 :
v2 = u2

(u2 v1 )
2

kv1 k

v1

(1, 0, 1) (1, 1, 1)
(1, 1, 1)
(1, 1, 1) (1, 1, 1)
0
= (1, 0 1) (1, 1, 1)
3
v2 = (1, 0 1)
= (1, 0, 1)

Step 3. In order to find v3 apply the method of Step 2 to u3 , but this time subtract off the projections of
u3 onto both v1 and v2 .
(u3 v1 )
2

kv1 k

v1

(u3 v2 )

v2
2
kv2 k
(2, 3, 4) (1, 0, 1)
(2, 3, 4) (1, 1, 1)
(1, 1, 1)
(1, 0, 1)
= (2, 3, 4)
(1, 1, 1) (1, 1, 1)
(1, 0, 1) (1, 0, 1)
2
3
(1, 0, 1)
= (2, 3, 4) (1, 1, 1)
3
2
v3 = (2, 4, 2)
v3 = u3

Hence, the orthogonal basis is {v1 , v2 , v3 } = {(1, 1, 1) , (1, 0 1) , (2, 4, 2)} .


Note: Check for yourself that this set is orthogonal. Notice that v3 = (1, 2, 1) can be simplified, by
dividing by 2, to give v3 = (1, 2, 1), and the set is still orthogonal.
Note: Normalizing the orthogonal basis gives an orthonormal basis:

1
1
1
1
1
2
1
1
, ,
, , 0
, , ,
{v1 , v2 , v3 } =
3
3
3
2
2
6
6
6
The general method for producing an orthogonal basis is given next in Theorem 5.8. It is a simple
extension of the process in Examples 5.6.1 and 5.6.2 above, except that a dot product like u2 v1 is
replaced by the inner product hu2 , v1 i .
Theorem 5.8. The Gram-Schmidt Process If {u1 , u2 , u3 , , um } is a set of linearly independent
vectors spanning a subspace S of an inner-product space V then an orthogonal set of vectors,
{v1 , v2 , v3 , , vm } , spanning the same subspace S is produced by the following process
(algorithm):
Step 1. Set v1 = u1
Step 2. Set v2 = u2

Unit 5

hu2 , v1 i
kv1 k

v1

Linear Algebra 2

MATH 2300

27

Step 3. Set v3 = u3

Step 4. Set v4 = u4

hu3 , v1 i
kv1 k

v1

hu4 , v1 i
kv1 k

v1

hu3 , v2 i

v2

kv2 k

hu4 , v2 i

v2

kv2 k

hu4 , v3 i
2

kv3 k

v3

and continuing this pattern until step m is reached:


Step m. Set vm = um

hum , v1 i
kv1 k

v1

hum , v2 i
2

kv2 k

v2

hum , v3 i
kv3 k

v3

hum , vm1 i
2

kvm1 k

vm1

Note: In the equations above we can replace kvj k by hvj , vj i for each j = 1, 2, , m, and an
1
vj .
vj = 1
orthonormal basis is obtained by changing each vj into the unit vector kvjk
hvj , vj i

Note: If {u1 , u2 , u3 , , um } is a basis of V then {v1 , v2 , v3 , , vm } will be an orthogonal basis of


V.

Proof. Consult your textbook or other source for a proof of this result. The proof is conceptually simple
but rather messy and confusing.

The Gram-Schmidt Process can be written in a slightly different and, in some ways, simpler form in
order to directly produce an orthonormal basis, as in the next theorem.
Theorem 5.9. If {u1 , u2 , u3 , , um } is a set of linearly independent vectors spanning a subspace S
of an inner-product space V then an orthonormal set of vectors, {w1 , w2 , w3 , , wm } , spanning
the same subspace S is produced by the following process (algorithm):
Step 1. Set v1 = u1 and define w1 =
p
dividing by kv1 k = hv1 , v1 i.

1
kv1 k v1 .

That is, normalize the vector v1 so it has length one by

Step 2. Set v2 = u2 hu2 , w1 i w1 and define w2 =

1
kv2 k v2 .

Step 3. Set v3 = u3 hu3 , w1 i w1 hu3 , w2 i w2 and define w3 =

1
kv3 k v3 .

Step 4. Set v4 = u4 hu4 , w1 i w1 hu4 , w2 i w2 hu4 , w3 i w3 and define w4 =

1
kv4 k v4 ,

and continuing this pattern until step n is reached.


Step m. Set vm = um hum , w1 i w1 hum , w2 i w2 hum , w3 i w3 huw , wm1 i wm1 , and
define wm = kv1m k vm .

Proof. In step 2 of Theorem 5.8 show that the formula there is the same as the one used here by

Unit 5

Linear Algebra 2

MATH 2300

28

verifying that:
v2 = u2

hu2 , v1 i
2

kv1 k

v1

v1
= u2 u2 ,
kv1 k

v1
kv1 k

= u2 hu2 , w1 i w1
Show that the other formulae are also the same by following the same method.
Example 5.6.3.

The set of polynomials {g1 (x) , g2 (x) , g3 (x) , g4 (x)} = 1, 1 + x, 1 2x2 , x + x3 is a basis of P3 ,
the vector space of all polynomials of degree 3 or less. P3 is an inner product space with the inner
product computed as the scalar product of the coefficients of the polynomials (see previous section of
this unit for details).

a0 + a1 x + a2 x2 + a3 x3 , b0 + b1 x + b2 x2 + b3 x3 = a0 b0 + a1 b1 + a2 b2 + a3 b3
Use the Gram-Schmidt process to find an orthogonal basis {f1 , f2 , f3 , f4 } for P3 .
Solution. Step 1: Define f1 (x) = g1 (x) = 1
Step 2: Define f2 (x) = g2 (x)

hg2 , f1 i
h1 + x, 1i
1=1+x1=x
f1 (x) = 1 + x
hf1 , f1 i
h1, 1i

Step 3: Define:
hg3 , f1 i
hg3 , f2 i
f1 (x)
f2 (x)
hf1 , f1 i
hf2 , f2 i

1 2x2 , x
1 2x2 , 1
2

= 1 2x
x
h1, 1i
hx, xi
0
= 1 2x2 1 x
1
f3 (x) = 2x2
f3 (x) = g3 (x)

Step 4: Define:
hg4 , f1 i
hg4 , f2 i
hg4 , f3 i
f1 (x)
f2 (x)
f3 (x)
hf1 , f1 i
hf2 , f2 i
hf3 , f3 i

x + x3 , x
x + x3 , 2x2
x + x3 , 1

2x2
x
= x + x3
2
2
h1, 1i
hx, xi
h2x , 2x i

1
0
0
2x2
= x + x3 x
1
1
4
f4 (x) = x3

Hence, the orthogonal basis is: 1, x, 2x2 , x3 .


f4 (x) = g4 (x)

Note: We can divide the third polynomial by 2 without changing the orthogonality, thus giving the
standard basis of P3 :

1, x, x2 , x3
In fact we could have done this at step 3 when we found f3 (x) = 2x2 , thus simplifying step 4 slightly.
Satisfy yourself that this basis is orthogonal.

Unit 5

Linear Algebra 2

MATH 2300

29

Example 5.6.4.
Requires calculus. The set of functions f (x) that are continuous on an interval [a, b] , written C [a, b]
is a an infinite dimensional vector space and is an inner product space with the inner product defined
as the integral of the product of the two functions between a and b :
Z

hf, gi =

f (x) g (x) dx
a

This is also an inner product for the subspaces of C [a, b] of polynomials Pn , n = 1, 2, 3, . Given
a = 1, b = 1 and the four functions: g1 (x) = 1, g2 (x) = x, g3 (x) = x2 , g4 (x) = x3 then find four
orthogonal functions, f1 , f2 , f3 , f4 , that span the same subspace.
Solution. The following is the solution:
Step 1: Define f1 (x) = g1 (x) = 1
Step 2: Define:
f2 (x) = g2 (x)
R1
= x R11

hg2 , f1 i
f1 (x)
hf1 , f1 i
x dx

=x

dx

0
2

f2 (x) = x
Step 3: Define:
hg3 , f1 i
hg3 , f2 i
f1 (x)
f2 (x)
hf1 , f1 i
hf2 , f2 i
R1 2
R1 2
x x dx
x dx
1
2
1 1R 1
x
= x R1
dx
x2 dx
1
1
2
0
2
= x 3 2 x
2
3
1
2
f3 (x) = x
3
f3 (x) = g3 (x)

Step 4: Define:
hg4 , f2 i
hg4 , f3 i
hg4 , f1 i
f1 (x)
f2 (x)
f3 (x)
hf1 , f1 i
hf2 , f2 i
hf3 , f3 i
R1 3
R 1 3 2 1
R1 3

x x dx
x x 3 dx
x dx
1
1
1
1
2
3
x
1
x R1
= x R1

R1
3
dx
x2 dx
x2 31 x2 31 dx
1
1
1
2

0
1
0
= x3 52 x 8 x2
2
3
3
45
3
x
f4 (x) = x3
5
f4 (x) = g4 (x)

Unit 5

Linear Algebra 2

MATH 2300

30

Note: The orthogonal polynomials f1 (x) = 1, f2 (x) = x, f3 (x) = x2 31 , f4 (x) = x2


multiples of the first four Legendre Polynomials, which are:
p0 (x) = 1, p1 (x) = x, p2 (x) =

3
5

x are

1 2
1 3
x 1 , p3 (x) =
5x 3x
2
2

These multiples are chosen so that pk (1) = 1, so they are an orthogonal set but are not normalized
with respect to the inner product and so do not form an orthonormal set. Legendre Polynomials are
important in some Engineering applications and more details can be found on the web at
http://en.wikipedia.org/wiki/Legendre polynomials

5.6.2

Uses of orthogonal and orthonormal bases

When vectors are expressed as linear combinations of orthogonal basis vectors, many operations are
much easier to carry out, as is shown in the following theorems. If the basis is orthonormal, then it
becomes even easier, and vectors with respect to this orthonormal basis interact very like vectors in
Euclidean spaces, Rn . The first theorem notes the fairly obvious fact that orthogonal sets of vectors
must be linearly independent.
Theorem 5.10. If S = {v1 , v2 , v3 , , vn } is an orthogonal set of (non-zero) vectors in an inner
product space then the vectors are linearly independent.
Note: Hence, if S is produced by the Gram-Schmidt process applied to a basis of an inner product
space V , then S must also be a basis of V. Hence, every inner product space has an orthogonal basis.
Proof. If the vectors are linearly dependent then a non-zero linear combination of the vectors gives the
zero vector:
k1 v1 + k2 v2 + k3 v3 + + kn vn = 0
Take the inner product with v1 , using the axioms:
h(k1 v1 + k2 v2 + k3 v3 + + kn vn ) , v1 i = h0, v1 i
k1 hv1 , v1 i + k2 hv2 , v1 i + k3 hv3 , v1 i + + kn hvn , v1 i = 0
k1 hv1 , v1 i = 0
2

This shows that k1 = 0, since hv1 , v1 i = kvk 6= 0. Repeating the above process, taking inner products
with v2 , v3 , , vn , shows that all of the coefficients are zero:
k1 = k2 = k3 = = kn = 0
Hence, there is no linear combination of the vectors giving the zero vector except the trivial one with all
coefficients equal to zero, and so the vectors are linearly independent.

Theorem 5.11. If B = {b1 , b2 , b3 , , bn } is an orthonormal basis of an inner product space and


vectors u, v have coordinates with respect to this basis:
(u)B = (u1 , u2 , u3 , , un ) and (v)B = (v1 , v2 , v3 , , vn )
(that is: u = u1 b1 + u2 b2 + u3 b3 + + un bn and similarly for v), then:
(a)
(b)
(c)

Unit 5

hu, vi = u1 v1 + u2 v2 + u3 v3 + + un vn
p
kuk = u21 + u22 + u23 + + u2n
q
2
2
2
2
d (u, v) = (u1 v1 ) + (u2 v2 ) + (u3 v3 ) + + (un vn )

Linear Algebra 2

MATH 2300

31

Note: These quantities do not depend on the values of the basis vectors, but only on the coordinates
relative to the basis, and are exactly the same formulae as for Euclidean vectors in Rn with respect to
the standard basis of Rn .
Proof. The following is the proof:
(a)

When n = 2: Using the linearity of the inner product:


hu, vi = hu1 b1 + u2 b2 , v1 b1 + v2 b2 i
= u1 v1 hb1 , b1 i + u1 v2 hb1 , b2 i + u2 v1 hb2 , b1 i + u2 v2 hb2 , b2 i
Using the orthonormality hb1 , b1 i = hb2 , b2 i = 1, and hb1 , b2 i = hb2 , b1 i = 0, this simplifies to:
hu, vi = u1 v1 + u2 v2

Note: Try yourself the same proof for n = 3, and attempt to generalize your proof for all values of n.
Note: Try yourself to prove (b) and (c) for the cases n = 2, 3 and for all n.

Theorem 5.12. If B = {v1 , v2 , v3 , , vn } is an orthogonal basis of an inner product space and a


vector u has coordinates (u)S = (u1 , u2 , u3 , , un ) with respect to this basis then:
u1 =

hu, v1 i
kv1 k

, u2 =

hu, v2 i
kv2 k

, , un =

hu, vn i
2

kvn k

, so u =

hu, v1 i
2

kv1 k

v1 +

hu, v2 i
2

kv2 k

v2 + +

hu, vn i
2

kvn k

vn

If the basis is also orthonormal then:


u1 = hu, v1 i , u2 = hu, v2 i , , un = hu, vn i so that u = hu, v1 i v1 + hu, v2 i v2 + +hu, vn i vn
v1 i
v1 is the formula for the perpendicular projection of the vector u onto the
Note: Recall that hu,
kv1 k2
vector v1 . Hence, the coordinates with respect to an orthogonal basis are given by the perpendicular
2
projections onto the basis vectors. Recall also that for any vector v, kvk = hv, vi .

Proof. Proof for n = 2 : Suppose that u = u1 v1 + u2 v2 then taking the inner product with v1 and using
the linearity axioms:
hu, v1 i = hu1 v1 + u2 v2 , v1 i
= u1 hv1 , v1 i + u2 hv2 , v1 i
hu, v1 i = u1 hv1 , v1 i
Hence, solving for u1 :
u1 =

hu, v1 i
hu, v1 i
=
2
hv1 , v1 i
kv1 k

Taking the inner product of u with v2 gives in the same way the formula for u2 :
u2 =

hu, v2 i
hu, v2 i
=
2
hv2 , v2 i
kv2 k

Note: Try yourself the proof for n = 3, and think about the generalization to the proof for all n.

Unit 5

Linear Algebra 2

MATH 2300

32

Generalizing the previous theorem to projections onto a subspace of an inner product space gives the
following result.
Theorem 5.13. Suppose that W is a r dimensional subspace with orthogonal basis
B = {v1 , v2 , , vr } of an inner product space V and u V, then:
(a)

The perpendicular projection of u onto W is given by:


projW u =

hu, v1 i
2

kv1 k

v1 +

hu, v2 i
2

kv2 k

v2 + +

hu, vr i
kvr k

vr

or if {v1 , v2 , , vr } is orthonormal then:


projW u = hu, v1 i v1 + hu, v2 i v2 + + hu, vr i vr
(b)

The basis B can be extended to an orthogonal basis of V :


{v1 , v2 , , vr , vr+1 , , vn }
and the additional basis vectors vr+1 , , vn are a basis of W (the subspace of all vectors in
W that are perpendicular to every vector in W ).

(c)

Every vector u V can be expressed in exactly one way:


u = u1 + u2 where u1 W and u2 W

Note: Part (a) states that the projection of u onto W is simply the sum of the projections onto each
individual vector of B (the orthogonal basis of W ).
Note: The formula in part (a) for the projection of u onto the space S spanned by the orthogonal set
{v1 , v2 , , vr } is exactly the same as the formula from Theorem 5.12. for expressing u as a linear
combination of the {v1 , v2 , , vr } . That is, the formula gives that linear combination if it exists (if u
is in the span of S) and otherwise gives the projection of the vector u onto S.
Proof. The following is the proof:
(a)

If w = projW u = k1 v1 + k2 v2 + + kr vr then use the fact that w u is orthogonal to every


vector in W, first taking the inner product with v1 :
0 = hw u, v1 i
0 = hk1 v1 + k2 v2 + + kr vr , v1 i hu, v1 i
Using the linearity of the inner product and orthonormality of B:
0 = k1 hv1 , v1 i hu, v1 i =
k1 =

hu, v1 i
hu, v1 i
=
2
hv1 , v1 i
kv1 k

The formulae for


other ki values can be established using the same method, computing
0 = w u, vj for j = 2, 3, , r.
(b)

Unit 5

(outline only) The basis B = {v1 , v2 , , vr } can be extended to a basis of V by first simply
adding any vector wr+1 not in W, then repeating this by adding another vector not in the span of
{v1 , v2 , , vr , wr+1 } and so on until a basis of V is created. Apply the Gram-Schmidt process
to this basis, starting with v1 , v2 , , vr (i.e., leave them unchanged) to produce an orthogonal
basis {v1 , v2 , , vr , vr+1 , , vn } of V. By orthogonality all of the basis vectors
vr+1 , , vn are orthogonal to every vector v1 , v2 , , vr . Linearity of the inner product shows
that every vector of the subspace W spanned by vr+1 , , vn is perpendicular to every vector
of the subspace W spanned by v1 , v2 , , vr . Furthermore, any vector w orthogonal to W can
easily be shown to belong to W .

Linear Algebra 2

MATH 2300

33

(c)

This immediately follows from the proof of (b). That is, express u as a linear (unique) combination
of the basis vectors {v1 , v2 , , vr , vr+1 , , vn } . The part involving v1 , v2 , , vr will be
u1 and the part involving vr+1 , , vn will be u2 .

Example 5.6.5.
This example has four parts:
(a)

Write the vector u = (1, 2, 3) of R3 as a linear combination of the orthogonal basis vectors:
v1 = (1, 0, 0) , v2 = (0, 1, 1) , v3 = (0, 1, 1)
Note: Check for yourself that this is an orthogonal set of vectors.

(b)

Use Theorem 5.11 to compute kuk and compare it with the value by the standard calculation.

(c)

Find the projection of u onto the subspace S of R3 spanned by v2 and v3 .

(d)

Write u = u1 + u2 where u1 S and u2 S .

Solution. The following is the solution:


(a)

We could solve for x, y, z the system of equations formed by equating the components:
(1, 2, 3) = x (1, 0, 0) + y (0, 1, 1) + z (0, 1, 1)
However, Theorem 5.12 gives us an easier way to do this when the basis is orthogonal, namely:
hu, v2 i
hu, vn i
2 v1 +
2 v2 +
2 vn
kv1 k
kv2 k
kvn k
(1, 2, 3) (1, 0, 0)
(1, 2, 3) (0, 1, 1)
(1, 2, 3) (0, 1, 1)
(1, 0, 0) +
(0, 1, 1) +
(0, 1, 1)
=
(1, 0, 0) (1, 0, 0)
(0, 1, 1) (0, 1, 1)
(0, 1, 1) (0, 1, 1)
1
5
u = (1, 0, 0) + (0, 1, 1) (0, 1, 1)
2
2
u=

(b)

hu, v1 i

Converting the basis vectors to unit vectors, the expression for u becomes:

1
1
1
1
1
5

2 0, ,
2 0, ,
u = 1 (1, 0, 0) +
2
2
2
2
2
2
By Theorem 5.12 the norm is:
v
!2 !2
u
u

5 2
2
t
+
= 14
kuk = 12 +
2
2
In comparison the direct calculation of the norm is:
p

k(1, 2, 3)k = 12 + 22 + 32 = 14

(c)

The projection of u onto S is, according to Theorem 5.13:


1
5
(0, 1, 1) (0, 1, 1)
2
2
(that part of the linear combination for u from part (a) that involves the basis vectors of S).

Unit 5

Linear Algebra 2

MATH 2300

34

(d)

From Theorem 5.13, u2 = 25 (0, 1, 1) 21 (0, 1, 1) (from part (c) above) and u1 = (1, 0, 0) - the
remaining part of the linear combination of orthogonal basis vectors given in part (a).

Example 5.6.6.

M22 is an inner product space with inner product hU, V i = trace U T V . Recall that the trace is the
sum of the diagonal entries (it is the same as multiplying the entries in the same position in U and V
and adding those four products).
(a)

Show that the four matrices A1 , A2 , A3 , A4 are mutually orthogonal.

(b)

Show that {A1 , A2 , A3 , A4 } is a basis of M22 .

(c)

Express the matrix B as a linear combination of A1 , A2 , A3 , A4 .

1 1
2 1
0 1
0 0
A1 =
, A2 =
, A3 =
, A4 =
1 0
1 0
1 0
0 3

1 2
B=
3 4

Solution. The following is the solution:


(a)

We must show that the trace is zero for each of the six products (there are 12 products but the
others are transposes of these and so have the same trace):
AT1 A2 , AT1 A3 , AT1 A4 , AT2 A3 , AT2 A4 , AT3 A4
The first one is:

AT
1

1
1

1
0

A2

2
1

1
0

1
2

1
1

with trace 1 + (1) = 0

An alternate, perhaps simpler, way of computing this is to multiply the corresponding entries in
each matrix and form the sum:
1 2 + 1 (1) + 1 (1) + 0 0 = 0
Verify that the other five inner products are also zero. Hence, {A1 , A2 , A3 , A4 } is an orthogonal
set.
(b)

The set {A1 , A2 , A3 , A4 } is linearly independent by Theorem 5.10. Hence, it must be a basis of
M22 since the dimension of M22 is four.

(c)

By Theorem 5.12, omitting details of the computations of the inner products:


hB, A2 i
hB, A1 i
A1 +
A2 +
hA1 , A1 i
hA2 , A2 i
(3)
(1)
6
A2 +
A3 +
B = A1 +
3
6
2

B=

hB, A3 i
hB, A4 i
A3 +
A4
hA3 , A3 i
hA4 , A4 i
12
A4
9

Note: Check for yourself that the result is correct by computing the matrices on the right hand
side as follows:

1
1
4 0 0
2 1
0 1
1 1
?
B=2

+
1 0
2 1 0
2 1 0
3 0 3

Unit 5

Linear Algebra 2

MATH 2300

35

Example 5.6.7.
Requires calculus. Compute the projection of the polynomial g (x) = x + x3 onto the subspace S of
P4 , spanned by the first three Legendre polynomials:
f1 (x) = 1, f2 (x) = x, f3 (x) = x2
with the inner product defined by:

1
3

hf, gi =

f (x) g (x) dx
1

Solution. It was
shown
in Example 5.6.4 that the set {f1 , f2 , f3 } is orthogonal and spans the same
subspace as 1, x, x2 . Consequently, it is tempting to suppose that the projection of g (x) on S is the
polynomial x (that is, drop the x3 term). However, we will confirm this, or otherwise, using Theorem
5.13.
According to Theorem 5.13, the projection is given by:
projS g =
=

=
=
projS g =

hg, f1 i
hg, f2 i
hg, f3 i
f1 (x) +
f2 (x) +
f3 (x)
hf1 , f1 i
hf2 , f2 i
hf3 , f3 i
R1
R1
R1
g (x) f2 (x) dx
g (x) f3 (x) dx
g (x) f1 (x) dx
1
1
f1 (x) + R 1
f2 (x) + 1
f3 (x)
R1
R1
2
2
2
(f1 (x)) dx
(f2 (x)) dx
(f3 (x)) dx
1
1
1
2 1

R1
R1
R1

x + x3
x + x3 dx
x + x3 xdx
x 3 dx
1
1
1
1
2
x

1+
x+
R1
R1

R
2
1
3
1 dx
x2 dx
x2 31 dx
1
1
1
16

0
1
0
x + 8 x2
+ 15
2
2
3
3
45
8
x
5

Note: The intuitive argument above, that projS g = x, is clearly wrong. This is because the projection of
the term x3 of g (x) onto S is not zero, but is in fact the polynomial 53 x (check this for yourself). Since
the other term in g (x) is x, and this is already in S, it follows that the projection of g onto S is
8
3
5 x + x = 5 x as was found above.

5.6.3

The QR factorization of a matrix

The Gram-Schmidt Process takes a linearly independent set of vectors {u1 , u2 , u3 , , um } and
converts it into an orthogonal set {v1 , v2 , v3 , , vm } that spans the same vector space, that can be
changed to an orthonormal basis {w1 , w2 , w3 , , wm } by normalizing the vectors: wj = kv1j k vj .
This process can be expressed as a matrix factorization, A = QR, called the QR factorization or
QR decomposition, of the matrix A. In this factorization the columns of A are the vectors uj , and the
columns of Q are the normalized vectors wj . The matrix R is upper triangular (entries below the main
diagonal are zero), and each non-zero row i, column j entry is equal to the inner product of the form
hwi , uj i . In fact the whole Gram-Schmidt orthogonalization process can be carried out very efficiently
by working with matrices, rather than vectors.
The QR factorization is usually applied to Euclidean spaces Rn , and the components of the vectors
ui are written as columns of the matrix A. Similarly the wi components are columns of Q. In this case,
if the number of vectors m = n, the dimension of the vector space, then the matrix Q will be an n n
orthogonal matrix. However, the matrix form applies to any inner product space.
Examples 5.6.8 and 5.6.9 show the QR factorization in a simple two-vector case previously examined
in example 5.6.1. The complete result is given in Theorem 5.14.

Unit 5

Linear Algebra 2

MATH 2300

36

Example 5.6.8.
Construct the QR factorization for orthogonalizing any set of two linearly independent vectors
{u1 , u2 } .
Solution. The set {u1 , u2 } is used to create an orthonormal basis {w1 , w2 } . Refer to Theorem 5.9 to
see the equations used in the Gram-Schmidt Process which are rewritten here with u1 , u2 on the left,
as follows:
(

v1 = u1 and w1 = kv11 k v1
u1 = v1
=
u2 = v2 + hu2 , w1 i w1
v2 = u2 hu2 , w1 i w1 and w2 = kv12 k v2
u1 = kv1 k w1 + 0 w2
=
u2 = hu2 , w1 i w1 + kv2 k w2

Writing the equations as matrix products, with vectors being columns, gives the QR factorization:


kv1 k

[u1 ] = [w1 | w2 ]

Q
A
0
kv1 k hu2 , w1 i

= [u1 | u2 ] = [w1 | w2 ]
hu2 , w1 i
0
kv2 k

[u2 ] = [w1 | w2 ]

kv2 k
To give the matrix R a more consistent look, note that kv1 k = hw1 , u1 i, and kv2 k = hw2 , u2 i because:
hu1 , w1 i = h(kv1 k w1 ) , w1 i = kv1 k hw1 , w1 i = kv1 k
hu2 , w2 i = hhu2 , w1 i w1 + kv2 k w2 , w2 i
= hu2 , w1 i h w1 , w2 i + kv2 k hw2 , w2 i
= kv2 k since h w1 , w2 i = 0 and hw2 , w2 i = 1
Hence, using the symmetry of the inner product, hui , wj i = hwj , ui i , the A = QR formula becomes:

[u1 | u2 ] = [w1 | w2 ]

hw1 , u1 i
0

hw1 , u2 i
hw2 , u2 i

Note: That is, the columns uj of A are the original vectors, the columns wj of Q are the normalized
vectors produced by the Gram-Schmidt process, and the row i column j entry of R is hwi , uj i (same
as huj , wi i).
Example 5.6.9.
Find the actual QR decomposition for Example 5.6.1, that starts with the two vectors in R2 :
{u1 , u2 } = {(1, 1) , (1, 0)}
Solution. In Example 5.6.1 the Gram-Schmidt process was used to derive the following orthogonal set:

1
1
,
{v1 , v2 } = (1, 1) ,
2
2
Hence, the orthonormal set is:

{w1 , w2 } =

1
1
,
2
2

1
1
,
2
2

The scalar product


replaces the inner product in the formulae of the previous Example 5.6.8. Noting
that u1 w1 = 2, u2 w2 = 12 , and u2 w1 = 12 . Putting these values into the formula derived in
Example 5.6.8 (with the vectors becoming columns of the matrices):
A

[u1 | u2 ] = [w1 | w2 ]

Unit 5

u1 w1
0

w1 u2
w2 u2

Linear Algebra 2

MATH 2300

37

1
1

1
0

"

Q
1
2
12

1
2
1
2

#" R
2
0

1
2
1
2

Note: Check for yourself that this decomposition is correct.


Theorem 5.14. Let {u1 , u2 , u3 , , um } be a set of m linearly independent vectors in an inner
product space V of dimension n. If {w1 , w2 , w3 , , wm } is the orthonormal set produced by the
Gram-Schmidt process (spanning the same subspace as the uj ), produced as in Theorem 5.9, then
these satisfy the matrix equation:
A = QR
where A has the vectors uj as columns, Q is an orthogonal matrix with the vectors wj as columns, and
R is the non-singular upper triangular matrix given by:

A = [u1 | u2 | um ] = [w1 | w2 | wm ]

hw1 , u1 i hw1 , u2 i
0
hw2 , u2 i
0
0
..
..
.
.
0
0

hw1 , um i
hw2 , um i

hw3 , um i

..

hw1 , u3 i
hw2 , u3 i
hw3 , u3 i
..
..
.
.
0

hwm , um i

Proof. Multiplying out the matrix product in A = QR gives the same formulae as the Gram-Schmidt
orthonormalization formulae of Theorem 5.9
However the details are complicated, but you may be able to prove it yourself using the results that
vj = kvj k wj and kvj k = huj , wj i for each j = 1, 2, , m.

Example 5.6.10.
Given the basis {u1 , u2 , u3 } = {(1, 1, 1) , (1, 0, 1) , (2, 3, 4)} of R3 , write out the associated
QR factorization, using the Gram-Schmidt orthogonalization previously found in Example 5.6.2.
Solution. Example 5.6.2 used the Gram-Schmidt orthogonalization algorithm to find the orthogonal set
of vectors:
{v1 , v2 , v3 } = {(1, 1, 1) , (1, 0 1) , (2, 4, 2)}
Normalizing these gives the orthonormal set:

1
1
1
1
1
2
1
1
, ,
, , 0
, , ,
{w1 , w2 , w3 } =
3
3
3
2
2
6
6
6
Using the scalar product as the inner product, the QR factorization is therefore:

w1 u1
Q
A
0
[u1 | u2 | um ] = [w1 | w2 | wm ]
0

w1 u2
w2 u2
0

w1 u3
w2 u3
w3 u3

The inner products

of R are easily calculated

as: hw1 u1 i = 3, w1 u2 = 0, w1 u3 = 3, w2 u2 = 2, w2 u3 = 2, w3 u3 = 2 6, so the


QR factorization is:

1 1
1 0
1 1

Unit 5

3 =
4

1
3
1
3
1
3

Q
1
2

12

1
6
26
1
6

R


3 0
3

2 2
0

0
0
2 6

Linear Algebra 2

MATH 2300

38

Note: Check for yourself that the matrix equation is correct, and that Q is orthogonal (QQT = I).
Example 5.6.11.
Find an orthonormal set of vectors spanning the same subspace of R4 as the vectors;
u1 = (1, 0, 0, 2) , u2 = (0, 1, 1, 0) , u3 = (1, 1, 0, 1)
carrying out the calculations within the matrices A = QR.
Solution. The following is the solution:
Step 1: By Theorem 5.9, the matrices have the following form, with the first column of Q being the
normalized u1 . The first row of R is then computed:

1
0

0
2

0
1
1
0

1
1
5

1
= 0
0 0
2
1
5
1

0
=
0

2
5

w1 u3
w2 u3
w3 u3

w1 u1
0
0

w1 u2
w2 u2
0

5
0
0 w2 u2
0
0

15
w2 u3
w3 u3

Step 2: The second column of Q is equal to the vector


v2 = u2 hu2 , w1 i w1 = (0, 1, 1, 0) 0 15 , 0, 0, 25 = (0, 1, 1, 0), which is then normalized to

w2 = 0, 12 , 12 , 0 . The second row of R is then computed:

1
0

0
2

1
1
5
0
1
=

0 0
1
2
5
1

0
1
1
0

=
0

2
5

1
2
1
2

0
0

1
2
1
2


5 0 15

0 w2 u2

0
0

w2 u3
w3 u3


5 0

0
2

0
0

15
1
2

w3 u3

Step 3: The third column of Q is equal to the vector:


v3 = u3 hu3 , w1 i w1 hu3 , w2 i w2

1
2
1
1
1
1
, 0, 0,

0, , , 0
= (1, 1, 0, 1) +
5
5
5
2
2
2

6 1 1 3
, , ,
=1
5 2 2 5

6 1 1 3
, , 2 , 5 . The bottom right entry of R is then w3 u3 =
which is then normalized to w3 = 10
23 5 2

1
0

0
2

Q
1
5

0 1

0
1 1
=
1 0
0
0 1
2

Unit 5

0
1
2
1
2

610
5 23
10
2 23
210
23
5310
23

R

5 0 15

1
2
0

q2

23
0
0
10

Linear Algebra 2

23
10

MATH 2300

39

Note: Verify for yourself that the matrix equation is correct.


Note: Some authors make Q into an orthogonal matrix by adding appropriate extra columns (one is
needed in this case), and by adding rows of zeros to the bottom of R to make the matrix multiplication
compatible. The extra column of Q is found by choosing an arbitrary fourth vector u4 (it is very unlikely
to be in the span of {u1 , u2 , u3 }) and then continuing the Gram-Schmidt algorithm for one more step,
but leaving out the extra column in the final matrix equation. This form of the factorization in this
example is:
Q
R

1
A
610

2
0
5 0 15
5
5
23
23
1 0 1

1
10
1
2

0 1 1 0
323 0
2
2
23

=
q2

0 1 0 0
10
3
1
23

223
0

2
23 0
10

2 0 1
310
2
1
0
0
0
0

5
5 23
23
For completeness we give a QR factorization for P3 , but in fact this matrix factorization is rarely used
outside of Euclidean spaces.
Example 5.6.12.
Referring to Example 5.6.3, starting
with the polynomials
{g1 (x) , g2 (x) , g3 (x) , g4 (x)} = 1, 1 + x, 1 2x2 , x + x3 and inner product defined by:

a0 + a1 x + a2 x2 + a3 x3 , b0 + b1 x + b2 x2 + b3 x3 = a0 b0 + a1 b1 + a2 b2 + a3 b3
write the Gram-Schmidt orthogonalization as a QR factorization.
Solution. The orthogonal set found in Example 5.6.3 is:

{f1 (x) , f2 (x) , f3 (x) , f4 (x)} = 1, x, 2x2 , x3


Normalizing these gives the orthonormal basis of P3 :

{h1 (x) , h2 (x) , h3 (x) , h4 (x)} = 1, x, x2 , x3


The QR factorization is therefore:

h1 (x) h2 (x)

g2 (x) g3 (x) g4 (x) =

hh1 , g1 i hh1 , g2 i hh1 , g3 i

0
hh2 , g2 i hh2 , g3 i
h3 (x) h4 (x)

0
0
hh3 , g3 i
0
0
0
g1 (x)

hh1 ,
hh2 ,
hh3 ,
hh4 ,

g4 i
g4 i

g4 i
g4 i

and hh1 , g1 i = 1, hh1 , g2 i = 1, hh1 , g3 i = 1 hh1 , g3 i = 0, hh2 , g2 i = 1, hh2 , g3 i = 0, hh2 , g3 i = 1,


hh3 , g3 i = 2, hh3 , g4 i = 0, hh4 , g4 i = 1, so

1+x

1 2x2

x + x3

x2

0
x3
0
0

1 0
0 1

2 0
0 1

1
1
0
0

Note: Check for yourself that the matrix equality is correct.

Section 5.6 exercise set


Check your understanding by answering the following questions.

Unit 5

Linear Algebra 2

MATH 2300

40

1.

Let S be the subspace of R3 spanned by the following set of two vectors:


{(1, 2, 1) , (3, 0, 3)}

2.

(a)

Check that the set of vectors is orthogonal.

(b)

Convert it to an orthonormal set.

(c)

Extend it to an orthonormal basis of R3 (Hint: add to the two orthogonal vectors any other
vector not in S - a random choice usually works - and apply one step of the Gram-Schmidt
process).

(d)

Find the projection of the vector (1, 2, 3) onto S (use Theorem 5.13).

For the following set of vectors in R3 :


{(1, 2, 0) , (1, 1, 1) , (3, 2, 1)}

3.

(a)

Use the Gram-Schmidt process of Theorem 5.8 to find an orthogonal basis of R3 .

(b)

Find an orthonormal basis of R3 .

(c)

Can you be certain that the original three vectors are linearly independent?

Apply the Gram-Schmidt orthogonalization process (Theorem 5.8) to the following set of vectors
in R4 . What do you conclude about this set?
{(1, 2, 0, 3) , (1, 1, 1, 1) , (1, 0, 2, 5)}

4.

For the following set of vectors in R3 :


{(1, 0, 1) , (1, 1, 0) , (1, 1, 1)}

5.

Unit 5

(a)

Use the Gram-Schmidt process of Theorem 5.8 to find an orthonormal basis of R3 .

(b)

Use the Gram-Schmidt process of Theorem 5.9 to find an orthonormal basis of R3 . Discuss
which of the two methods in (a) and (b) is easiest.

(c)

Use Theorem 5.12 to find linear combination of the basis vectors, giving the two vectors
u1 = (1, 3, 2) , u2 = (4, 0, 1).

(d)

Show that Theorem 5.11 gives the correct values for u1 u2 , for the norms of u1 . and u2 , and
for the distance between u1 and u2 .

The following set of polynomials is a basis of P2 :

1 x2 , x + 2x2 , 2 + x + 3x2
(a)

Find an orthogonal basis using the Gram-Schmidt orthogonalization process (Theorem 5.8)
with the inner product of Example 5.6.3 (scalar product of the vectors of coefficients).

(b)

Find an orthogonal basis using the Gram-Schmidt orthogonalization process (Theorem 5.8)
with the inner product:

a0 + a1 x + a2 x2 , b0 + b1 x + b2 x2 = a0 b0 + 2a1 b1 + 3a2 b2

(c)

Requires calculus and more difficult. Find an orthogonal basis using the Gram-Schmidt
orthogonalization process (Theorem 5.8) with the inner product of Example 5.6.7 (the
integral of the product from x = 1 to x = 1).

(d)

Find orthonormal basis with the inner product of part (a).

(e)

Find orthonormal basis with the inner product of part (b).

Linear Algebra 2

MATH 2300

41

6.

7.

For the basis vectors found in the previous question


(a)

Express the polynomial 2 + 3x 4x2 as a linear combination of the orthogonal polynomials


found in part (a) of the previous question (see Theorems 5.11and 5.12).

(b)

Express the polynomial 2 + 3x 4x2 as a linear combination of the orthogonal polynomials


found in part (b)of the previous question.

(c)

Express the Gram-Schmidt process covered in part (a) and (d) of the previous question as a
QR factorization.

Apply the Gram-Schmidt orthogonalization process (Theorem 5.8) with the inner product of
Example 5.6.3 (scalar product of the vectors of coefficients).to the set of polynomials in P3

1 x2 , x + 2x2 , 2 + x + 4x2
What do you conclude?

8.

9.

Let S be the subspace of M22 spanned by the three matrices:

1 3
2 1
0 3
S=
,
,
2 0
2 1
2 1
(a)

Find an orthogonal basis of S using the Gram-Schmidt orthogonalization process (Theorem


5.8) with the inner product of Example 5.6.6 (scalar product of the 4vectors formed by the
entries of the matrices).

(b)

Find an orthonormal basis of S.

(c)

Attempt to express the following matrix as a linear combination of the orthonormal basis
matrices of S (see Theorems 5.11 and 5.12):

5 2
0 1

(d)

Attempt to express the following matrix as a linear combination of the orthonormal basis
matrices of S :

3 0
1 2

Requires calculus and is more difficult. Starting with the following five basis polynomials of P4
R1
and the inner product hf, gi = 1 f (x) g (x) dx :

10.

1, x, x2 , x3 , x4

(a)

Use the Gram-Schmidt orthogonalization process (Theorem 5.8) to find the fifth function of
the orthogonal basis (the first four are given in Example 5.6.4).
Note: These orthogonal basis polynomials are multiples of the first four Legendre
Polynomials that are very useful in some Engineering applications.

(b)

Convert the basis from part (a) into an orthonormal basis of P4 .

Let W be the subspace of R4 spanned by the two vectors:


{(1, 0, 0, 1) , (1, 2, 0, 1)}

Unit 5

(a)

Find a basis of W .

(b)

If u = (2, 3, 0, 4) then find two (unique) vectors u1 W, u2 W such that u = u1 + u2 .

(c)

Find the QR factorization equivalent to the Gram-Schmidt process in part (a) (that is,
including all four basis vectors of W and W ).

Linear Algebra 2

MATH 2300

42

11.

Find, if possible, the QR factorization of the following matrices:

(a) A =

1 3
0 4

1
, (b) B = 0
1

1 0
3
1 1 , (c) C = 4
1 1
0

5
1
0 , (d) D = 0
2
2

0 2
1 3
1 1

Solutions
1.

For the vectors (1, 2, 1) , (3, 0, 3) :


(a)

They are orthogonal since (1, 2, 1) (3, 0, 3) = 1 3 + 2 0 + (1) 3 = 0.

(b)

Normalizing (dividing the vector by its norm/length) the vectors gives the orthonormal set:

w1 =

(c)

2
1
1
, ,
6
6
6

, w2 =

1
1
, 0,
2
2

Choose, randomly, the vector u = (1, 0, 0) and apply the last step of the Gram-Schmidt
algorithm (Theorem 5.9) to the vector {w1 , w2 , u} to form an orthonormal set {w1 , w2 , w3 }
using the intermediate vector v :
v= u (w1 u) w1 (w2 u) w2

1
2
1
1
1
1
1
, ,
, 0,

= (1, 0, 0)
6
6
6
6
2
2
2

1
1
1
1 1
, ,

, 0,
= (1, 0, 0)
6 3
6
2
2

1
1
1
, ,
v=
3
3
3

Normalizing v gives the required result w3 = 13 , 13 , 13 and the orthonormal


basis:

2
1
1
1
1
1
1
1
, w2 = , 0,
, w3 = , ,
w1 = , ,
6
6
6
2
2
3
3
3

(d)

The projection of the vector u = (1, 2, 3) onto the subspace S with orthonormal basis
{w1 , w2 } is given by (see Theorem 5.13):
projS u= hu, w1 i w1 + hu, w2 i w2

2
1
1
1
1
w1 + (1, 2, 3) , 0,
w2
= (1, 2, 3) , ,
6
6
6
2
2

2
1
1
1
1
1
+ 2 2 , 0,
6 , ,
=
3
6
6
6
2
2

1
1 2
, ,
+ (2, 0, 2)
=
3 3
3

7 2 5
, ,
projS u =
3 3 3

2.

For the vectors:


u1 = (1, 2, 0) , u2 = (1, 1, 0) , u3 = (3, 2, 1)

Unit 5

Linear Algebra 2

MATH 2300

43

(a)

From Theorem 5.8, steps 1, 2 and 3:


v1 = u1 = (1, 2, 0)
v2 = u2

u2 v1
2

kv1 k

v1

3
= (1, 1, 0) (1, 2, 0)
5

1
2
, , 0
v2 =
5
5
v3 = u3

u3 v1
2

kv1 k

v1

= (3, 2, 1) +

u3 v2
2

kv2 k

v2

1
(1, 2, 0) 8
5

2
1
, , 0
5
5

v3 = (0, 0, 1)
Note: Check for yourself that the three vectors v1 , v2 , v3 are mutually orthogonal by showing
v1 v2 = 0, v1 v3 = 0, , v2 v3 = 0.

3.

(b)

Normalizing the three vectors from part (a) gives the orthonormal basis:

2
2
1
1
, , 0 , , , 0 , (0, 0, 1)
5
5
5
5

(c)

The original vectors must be linearly independent because otherwise the Gram-Schmidt
process would not be able to produce three mutually orthogonal (and therefor linearly
independent by Theorem 5.10) vectors as linear combinations of the original vectors. For
example, if u3 is linearly dependent on u1 and u2 then v3 is also linearly dependent on u1
and u2 . This is because v3 is defined as a linear combination of u3 , v1 , v2 , and each of
these is itself a linear combination of, or is equal to, one of, u1 and u2 . In that case v3 could
not be orthogonal to u1 and u2 , since orthogonal vectors are always linearly independent
(Theorem 5.10).
Note: If the Gram-Schmidt process is applied to three vectors u1 , u2 , u3 u3 for which u3 is
linearly dependent on u1 and u2 then a zero vector will be produced: v3 = (0, 0, 0) . In fact
whenever the Gram-Schmidt process produces a zero vector, it shows that the original
vectors used up to that point are linearly dependent. An example of this is in the next
question.

For the set of vectors:


{u1 = (1, 2, 0, 3) , u2 = (1, 1, 1, 1) , u3 = (1, 0, 2, 5)}
From Theorem 5.8, steps 1, 2 and 3:
v1 = u1 = (1, 2, 0, 3)
v2 = u2

u2 v1
2

kv1 k

v1

= (1, 1, 1, 1)

0
(1, 2, 0, 3)
5

v2 = (1, 1, 1, 1)

Unit 5

Linear Algebra 2

MATH 2300

44

That is v2 = u2 because u2 was already orthogonal to v1 . Finally,


v3 = u3

u3 v1
2

kv1 k

v1

u3 v2
2

kv2 k

v2

8
14
(1, 2, 0, 3) (1, 1, 1, 1)
14
4
= (1, 0, 2, 5) + (1, 2, 0, 3) (2, 2, 2, 2)
v3 = (0, 0, 0, 0)
= (1, 0, 2, 5) +

Hence, v3 = 0, the zero vector, and this indicates that it is not possible to find a non-zero vector
orthogonal to v1 and v2 . This only happens when the original set of vectors is linearly dependent.
Note: Check for yourself that the linear dependence is: u3 = 2u2 u1 .
4.

For the vectors:


u1 = (1, 0, 1) , u2 = (1, 1, 0) , u3 = (1, 2, 1)
(a)

From Theorem 5.8, steps 1, 2 and 3:


v1 = u1 = (1, 0, 1)
v2 = u2

u2 v1
2

kv1 k

v1

1
= (1, 1, 0) (1, 0, 1)
2

1
1
, 1,
v2 =
2
2

v3 = u3

u3 v1

= (1, 2,
v3 =

2
,
3

kv1 k

v1

u3 v2
2

kv2 k

v2

2
2
1) (1, 0, 1) 3
2
2

2
2
,
3
3

1
1
, 1,
2
2

Hence, simplifying by multiplying two of the vectors by an appropriate constant, an


orthogonal set is:

3
v1 , 2v2 , v3 = {(1, 0, 1) , (1, 2, 1) , (1, 1, 1)}
2
Hence, normalizing the vectors gives the orthonormal set:

1
1
2
1
1
1
1
1
, 0,
, , ,
, , ,
{w1 , w2 , w3 } =
2
2
6
6
6
3
3
3

1
1
1
{w1 , w2 , w3 } = (1, 0, 1) , (1, 2, 1) , (1, 1, 1)
2
6
3
(b)

Unit 5

Repeat the process, using Theorem 5.9, steps 1, 2 and 3, starting with the vectors
u1 = (1, 0, 1) , u2 = (1, 1, 0) , u3 = (1, 2, 1) :

1
1
1
u1 = , 0,
w1 =
ku1 k
2
2

Linear Algebra 2

MATH 2300

45

v2 = u2 (u2 w1 ) w1

1
1
1
1
1
, 0,
=
, 1,
= (1, 1, 0)
2
2
2
2
2

2
1
1
1
v2 = , ,
w2 =
kv2 k
6
6
6
v3 = u3 (u3 w1 ) w1 (u3 w2 ) w2

1
4
1
2
1
2 2
2
1

, 0,

,
,
= , ,
= (1, 2, 1) 2
3 3
3
2
2
6
6
6
6

1
1
1
1
v3 = = , ,
w3 =
kv3 k
3
3
3

(c)

Note: There is a slight difference from part (a) in that w3 is the negative of the vector found
in part (a) but that just reflects the fact that a unit vector along a given line can have two
different directions.
Part (b) has simpler formulae and fewer computations (for example, in part (a) division by
kv1 k occurs at step (2), (3) and in the final normalization, but only occurs once in part (b).
However, when using hand calculations the part (b) method produces more complicated
calculations.
With u1 = (1, 3, 2) , u2 = (4, 0, 1) and using {w1 , w2 , w3 } from part (a), by Theorem 5.12:
u1 = (u1 w1 ) w1 + (u1 w2 ) w2 + (u1 w3 ) w3
1
1
1
= ((1, 3, 2) (1, 0, 1)) w1 + ((1, 3, 2) (1, 2, 1)) w2 + ((1, 3, 2) (1, 1, 1)) w3
2
6
3

4
5
3
w3
u1 = w1 + w2 +
2
6
3
Similarly, show for yourself that:
5
5
3
u2 = (u2 w1 ) w1 + (u2 w2 ) w2 + (u2 w3 ) w3 = w1 + w2 + w3
2
6
3

(d)

Using the coefficients of w1 , w2 , w3 found in part (c) for u1 and u2 in the formulae of
Theorem 5.11 gives:

4
3
5
5
5
3
=2
u1 u2 = + +
2
2
6
6
3
3
s
2
2
2

5
4
3

+
+
= 14
ku1 k =
2
6
3
s
2
2
2

5
5
3

+
+
= 17
ku2 k =
2
6
3
s
2
2
2

3
5
5
4
5
3

+
+
= 27
d (u1 , u2 ) =
2
2
6
6
3
3
Computing u1 u2 , ku1 k , ku2 k , d (u1 , u2 ) for u1 = (1, 3, 2) , u2 = (4, 0, 1) in the usual way
confirms that these values are correct. That is:
u1 u2 = 1 4 + 3 0 + (2) 1 = 2
q
p

2
ku1 k = 12 + 32 + (2) = 14 and ku2 k = 42 + 02 + 12 = 17
q

2
2
2
d (u1 , u2 ) = (1 4) + (3 0) + (2 1) = 27

Unit 5

Linear Algebra 2

MATH 2300

46

5.

g1 (x) = 1 x2 , g2 (x) = x + 2x2 , g3 (x) = 2 + x + 3x2


(a)

From Theorem 5.8, steps 1, 2 and 3 (using f1 , f2 , f3 as the orthogonal polynomials) with
inner product:

a0 + a1 x + a2 x2 , b0 + b1 x + b2 x2 = a0 b0 + a1 b1 + a2 b2
f1 (x) = g1 (x) = 1 x2
hg2 , f1 i
f1 (x)
hf1 , f1 i

(2)
1 x2
= x + 2x2
2
f2 (x) = 1 + x + x2
f2 (x) = g2 (x)

hg3 , f1 i
hg3 , f2 i
f1 (x)
f2 (x)
hf1 , f1 i
hf2 , f2 i
2

(5)
1 x2
1 + x + x2
= 2 + x + 3x2
2
3
1 2
1 1
f3 (x) = + x x
6 3
6

f3 (x) = g3 (x)

For a simpler answer we multiply f3 (x) by 6 (this does not change the orthogonality) so that:
f3 (x) = 1 + 2x x2
Hence, the orthogonal set is:

{f1 (x) , f2 (x) , f3 (x)} = 1 x2 , 1 + x + x2 , 1 + 2x x2


(b)

x2 , g2 (x) = x +2x2 , g3 (x) = 2 + x + 3x2 and with inner product


With g1 (x) = 1
2
a0 + a1 x + a2 x , b0 + b1 x + b2 x2 = a0 b0 + 2a1 b1 + 3a2 b2 , Theorem 5.8, steps 1, 2 and 3
(using f1 , f2 , f3 as the orthogonal polynomials) are:
f1 (x) = g1 (x) = 1 x2
hg2 , f1 i
f1 (x)
hf1 , f1 i

(6)
1 x2
= x + 2x2
4
1
3
f2 (x) = + x + x2
2
2
f2 (x) = g2 (x)

For simpler calculations we multiply f2 (x) by 2 (this does not change the orthogonality) so
that:
f2 (x) = 3 + 2x + x2
hg3 , f1 i
hg3 , f2 i
f1 (x)
f2 (x)
hf1 , f1 i
hf2 , f2 i

7
(11)
1 x2
3 + 2x + x2
= 2 + x + 3x2
4
20
3
1 2
3
f3 (x) = + x x
10 10
10

f3 (x) = g3 (x)

Unit 5

Linear Algebra 2

MATH 2300

47

For a simpler answer we multiply f2 (x) by 2 (this does not change the orthogonality) so that:
f3 (x) = 3 + 3x x2
Hence, the orthogonal set is:

{f1 (x) , f2 (x) , f3 (x)} = 1 x2 , 3 + 2x + x2 , 3 + 3x x2


Note: Check for yourself
that these three polynomials

are mutually orthogonal but only for


the inner product a0 + a1 x + a2 x2 , b0 + b1 x + b2 x2 = a0 b0 + 2a1 b1 + 3a2 b2 .
(c)

With g1 (x) = 1 x2 , g2 (x) = x + 2x2 , g3 (x) = 2 + x + 3x2 and with inner product
R1
hp (x) , q (x)i = 1 p (x) q (x) dx, Theorem 5.8, steps 1, 2 and 3, using f1 , f2 , f3 as the
orthogonal polynomials, and omitting details of the integrations, are:
f1 (x) = g1 (x) = 1 x2
hg2 , f1 i
f1 (x)
hf1 , f1 i

R1
x + 2x2 1 x2 dx

1
2
1 x2
= x + 2x
R1
2
(1 x2 ) dx
1
8

1 x2
= x + 2x2 15
16

f2 (x) = g2 (x)

15

5
1
f2 (x) = + x + x2
2
2
For simpler calculations we multiply f2 (x) by 2 (this does not change the orthogonality) so
that:
f2 (x) = 1 + 2x + 5x2
hg3 , f2 i
hg3 , f1 i
f1 (x)
f2 (x)
hf1 , f1 i
hf2 , f2 i

R1
2 + x + 3x2 1 x2 dx

1
2
1 x2
= 2 + x + 3x
R1
2
(1 x2 ) dx
1

R1
2 + x + 3x2 1 + 2x + 5x2 dx

1 + 2x + 5x2
R1
2
(1 + 2x + 5x2 ) dx
1
8
28

= 2 + x + 3x2 1615 1 x2 3 1 + 2x + 5x2


8
15
1
5
1
+ x x2
f3 (x) =
12 3
12

f3 (x) = g3 (x)

For a simpler answer we multiply f3 (x) by 12 (this does not change the orthogonality) so
that:
f3 (x) = 1 + 4x 5x2
Hence, the orthogonal set is:

{f1 (x) , f2 (x) , f3 (x)} = 1 x2 , 1 + 2x + 5x2 , 1 + 4x 5x2


Note: Check for yourself that these three polynomials are mutually orthogonal but only for
the inner product used in this part.

Unit 5

Linear Algebra 2

MATH 2300

48

(d)

(e)

6.

p
The orthonormal basis in part (a) is (recall that kf k = hf, f i ):

1
1
1
f1 (x) ,
f2 (x) ,
f3 (x)
kf1 k
kf2 k
kf3 k

1
1

1
2
2
2
= 1 x , 1 + x + x , 1 + 2x x
2
3
6

1 2
1
1
1 2
1
2
1 2
1
x , + x + x , + x x
=
2
2
3
3
3
6
6
6
The orthonormal basis in part (b) is:

1
1
1
f1 (x) ,
f2 (x) ,
f3 (x)
kf1 k
kf2 k
kf3 k

1
1
1
2
2
2

1x ,
3 + 2x + x ,
3 + 3x x
=
4
20
30

2
1 2
3
3
1 2
3
1 1 2

+
x+
x ,
+
x
x
x ,
=
2 2
20
20
20
30
30
30

To illustrate Theorem 5.12 fully, the part (a) answer uses the orthogonal polynomials from the
previous question - part (a) , whereas part (b) of this question uses the orthonormal polynomials
from the previous question - part (e).
(a)

Let g (x) = 2 + 3x 4x2 and take f1 , f2 , f3 to be the orthogonal polynomials set from part
(a) of the previous question:

{f1 (x) , f2 (x) , f3 (x)} = 1 x2 , 1 + x + x2 , 1 + 2x x2


From Theorem 5.12, with the inner product used in the previous question - part (a):
hg, f1 i
hg, f2 i
hg, f3 i
f1 (x) +
f2 (x) +
f3 (x)
hf1 , f1 i
hf2 , f2 i
hf3 , f3 i
1
8

6
1 x2 +
1 + x + x2 +
1 + 2x x2
g (x) =
2
3
6

g (x) =

That is,
4

1
1 + x + x2 +
1 + 2x x2
g (x) = 2 + 3x 4x2 = 3 1 x2 +
3
3
Note: Check for yourself that g (x) does equal the given linear combination of f1 , f2 , f3
(b)

Let g (x) = 2 + 3x 4x2 and take f1 , f2 , f3 to be the orthonormal polynomials set from part
(e) of the previous question:
{f1 (x) , f2 (x) , f3 (x)}

6
3
2
1 2
3
3
1 2
1 1 2
+ x + x , + x x
x ,
=
2 2
20
20
20
20
30
30
30
Using the Theorem 5.12 formula for orthonormal sets and with the inner product used in the
previous question - part (b):
g (x) = hg, f1 i f1 (x) + hg, f2 i f2 (x) + hg, f3 i f3 (x)

3
2
1
24
3
3
1
6
1 1 2
+ x + x2 +
+ x x2
x +
g (x) = 7
2 2
20
20
20
20
30
30
30
30
Note: Check for yourself that g (x) does equal the given linear combination of f1 , f2 , f3 .

Unit 5

Linear Algebra 2

MATH 2300

49

(c)

Recall that the starting polynomials in the previous question are


g1 (x) = 1 x2 , g2 (x) = x + 2x2 , g3 (x) = 2 + x + 3x2 , and the orthonormal set of
polynomials found in part (d) of the previous question is:
{f1 (x) , f2 (x) , f3 (x)}

1 2
1
1
1 2
1
2
1 2
1
x , + x + x , + x x
=
2
2
3
3
3
6
6
6
The QR factorization formula uses the inner product from part (a) of the previous question,
and it is:


hf1 , g1 i hf1 , g2 i hf1 , g3 i
g1 (x) g2 (x) g3 (x) = f1 (x) f2 (x) f3 (x)
0
hf2 , g2 i hf2 , g3 i
0
0
hf3 , g3 i
Hence, the factorization is:

1 x2 x + 2x2 2 + x + 3x2
h
=

1
2

1 x2
2

1
3

1 x
3

1 x2
3

16 +

2 x
6

1 x2
6

3
0

52
2
3
1
6

Note: It is a somewhat challenging exercise to show that the product of the two right-hand
side matrices does give the left-hand side matrix.
7.

With the original polynomials g1 (x) = 1 x2 , g2 (x) = x + 2x2 , g3 (x) = 2 + x + 4x2 , from
Theorem 5.8, steps 1, 2 and 3 (using f1 , f2 , f3 as the orthogonal polynomials) with inner product:

a0 + a1 x + a2 x2 , b0 + b1 x + b2 x2 = a0 b0 + a1 b1 + a2 b2
f1 (x) = g1 (x) = 1 x2
hg2 , f1 i
f1 (x)
hf1 , f1 i

(2)
1 x2
= x + 2x2
2
f2 (x) = 1 + x + x2
f2 (x) = g2 (x)

hg3 , f2 i
hg3 , f1 i
f1 (x)
f2 (x)
hf1 , f1 i
hf2 , f2 i
3

(6)
1 x2
1 + x + x2
= 2 + x + 4x2
2
3
f3 (x) = 0
f3 (x) = g3 (x)

That is, the polynomial f3 (x) is the zero polynomial, and this indicates that it is not possible to
find a non-zero polynomial orthogonal to f1 and f2 . This only happens when the original set of
polynomials is linearly dependent. In this case the linear dependence is:
g3 (x) = g2 (x) 2g1 (x)

8.

Unit 5

For the matrices A1 =

1 1
2 0

, A2 =

2
4

0
1

, A3 =

1 1
0 1

Linear Algebra 2

MATH 2300

50

(a)

the Gram-Schmidt process of Theorem 5.8 gives the orthogonal matrices B1 , B2 , B3 as


follows:

1 1
B1 = A1 =
2 0
B2 = A2

2
=
4

3
B2 =
2

hA2 , B1 i
B1
hB1 , B1 i

(6)
1 1
0

1
2 0
6

1
1

hA3 , B1 i
B3 = A3
B1
hB1 , B1 i

2
1
1 1
=

0 1
6 2
1 1
3 3
B3 =
2
0
3

hA3 , B2 i
B2
hB2 , B2 i

5
3 1
1

0
15 2 1

For a simpler answer we multiply B3 by 3 (this does not affect orthogonality), and so:

1 1
B3 =
0 2
Hence, the orthogonal set is:

{B1 , B2 , B3 } =
(b)

3 1
2 1

1 1
0 2

Normalizing the orthogonal set:

1
1
1
1
1
1 1
3
B1 ,
B2 ,
B3 =
,
2
0
kB1 k
kB2 k
kB3 k
6
15 2
# "
("
=

(c)

1 1
2 0

1
6
26

The matrix is:

C=

1
6

3 1
0 0

3
15
2
15

1
1
1
15
1
15

1
,
6
# "
,

1
0

16
0

1
2
1
6
2
6

#)

Using Theorem 5.12 and using the orthonormal basis found in part (b), the required linear
combination is (if it exists):
C = hC, B1 i B1 + hC, B2 i B2 + hC, B3 i B3
#
#
"
"
"
1
1
3
1
16
10
2
4
6
6
15
15

+
+
=
1
2
2
0
0
6 6
15 15 15
6

1
6
2
6

Checking this result by multiplying out the matrices and computing the sums shows that it is
correct:

1

2
2
2 32
31
3
3
3
+
+
=
2
4
4 0
0 23
3
3
3
3 1
=
=C
0 0

Unit 5

Linear Algebra 2

MATH 2300

51

(d)

The matrix is:

C=

1 0
1 2

Using Theorem 5.12 and using the orthonormal basis found in part (b), the required linear
combination is (if it exists):
C = hC, B1 i B1 + hC, B2 i B2 + hC, B3 i B3
#
#
"
"
"
1
1
3
1
16
3
1
3
6
6
15
15

+
+
=
1
2
2
0
0
6 6
15 15 15
6

1
6
2
6

Checking this result by multiplying out the matrices and computing the sums shows that it is
correct:
1
3 1 1 1
1
6 6
2
2
=
+ 25 51 +
1
1 0
0
5
5
3
14 13
15
15
6= C
=
8
35 15
In this case the linear combination of the orthogonal matrices is not equal to C. This means
C is not in the span of B1 , B2 , B3 . By Theorem 5.13, the linear combination is actually the
projection of C onto the space S.
9.

For the polynomials:


g1 (x) = 1, g2 (x) = x, g3 (x) = x2 , g4 (x) = x3 , g5 (x) = x4
it was shown in Example 5.6.4 that the first four orthogonal polynomials produced by the
Gram-Schmidt process (Theorem 5.8) are:

1
3
{f1 (x) , f2 (x) , f3 (x) , f4 (x)} = 1, x, x2 , x3 x
3
5
(a)

The fifth orthogonal polynomial is given by:


hg5 , f1 i
hg5 , f2 i
hg5 , f3 i
hg5 , f4 i
f1 (x)
f2 (x)
f3 (x)
f4 (x)
hf1 , f1 i
hf2 , f2 i
hf3 , f3 i
hf4 , f4 i
R1 5
R 1 4 2 1
R1 4

x dx
x x 3 dx
x dx
1
1
1
2
4
x

1 R1
x R11
= x R1

2
3
12 dx
x2 dx
x2 31 dx
1
1
1
R1 4 3 3

x x 5 x dx
3
3
x

x
R11

2
5
x3 53 x dx
1
16
2

1
3
0
0
2
3

= x4 5 2 x 105
8
8
2
3
5
175
3
45
1
1 6
x2
= x4
5 7
3
3
6 2
4
f3 (x) = x x +
7
35
f5 (x) = g5 (x)

Hence, the orthogonal basis of P4 is:

{f1 (x) , f2 (x) , f3 (x) , f4 (x) , f5 (x)} =

Unit 5

3
3
6
1
1, x, x , x3 x, x4 x2 +
3
5
7
35

Linear Algebra 2

MATH 2300

52

Note: The Legendre Polynomials are multiples of these polynomials, and the first five
Legendre Polynomials are:
p0 (x) = 1, p1 (x) = x, p2 (x) =

1 2
1
1 3
x 1 , p3 (x) =
5x 3x , p4 (x) =
35x4 30x2 + 3
2
2
8

These multiples are chosen so that pk (1) = 1, so they are an orthogonal set but are not
normalized with respect to the inner product and so do not form an orthonormal set. For
more details see: http://en.wikipedia.org/wiki/Legendre polynomials
(b)

The corresponding orthonormal basis is:


1
1
1
1
1
f1 (x) ,
f2 (x) ,
f3 (x) ,
f4 (x) ,
f5 (x)
kf1 k
kf2 k
kf3 k
kf4 k
kf5 k
1
1
1
1
1
=p
f1 (x) , p
f2 (x) , p
f3 (x) , p
f4 (x) , p
f5 (x)
hf1 , f1 i
hf2 , f2 i
hf3 , f3 i
hf4 , f4 i
hf5 , f5 i

1
1
1
1
2
x

,
1, qR
x, qR
= qR

1
1
1
3
2 dx
2 1 2 dx
1
dx
x
x
1
1
3
1

3
6 2
1
3
1
3
4
qR
q
x

x
,
x
+
2

R1
1
5
7
35
3 2
dx
x3 53 x dx
x4 76 x2 + 35
1
1

1
1
3
6
1
1
1
3
1
, x3 x , x4 x2 +
= , q x, x2
22
22
8 2
3
5
7
35
2
2
1
= ,
2

10.

3 5

105

5 7

6 2
1
3
3 5
5 7
105
3
3
2
3
4
x
x x , x x +
x,
,
2
3
5
7
35
2 2
2 2
8 2

For the vectors:


u1 = (1, 0, 0, 1) , u2 = (1, 2, 0, 1)
(a)

Using the method of Theorem 5.9 to find an orthonormal basis of W :

1
1
1
v1 = , 0, 0,
v1 = u1 and w1 =
kv1 k
2
2

v2 = u2 (u2 w1 ) w1 = (1, 2, 0, 1)
w2 =

1
1
, 0, 0,
2
2

= (0, 2, 0, 0)

1
v2 = (0, 1, 0, 0)
kv2 k

Hence, an orthonormal basis of W is:

1
1
, 0, 0,
, (0, 1, 0, 0)
{w1 , w2 } =
2
2
In order to find a basis of W we add vectors, and use the Gram-Schmidt process to find an
orthonormal basis of the whole space R4 - the extra vectors are an orthonormal basis of
W . That is, we first add a third vector w3 to the set w1 , w2 . The choice of w3 is arbitrary
(as long as it is not in W ) but by noticing that the third components of w1 , w2 are both zero
we can choose w3 = (0, 0, 1, 0) so that it is already normalized and orthogonal to the first
two vectors.
We add a fourth vector u4 to the set w1 , w2 , w3 and arbitrarily choose it to be

Unit 5

Linear Algebra 2

MATH 2300

53

u4 = (1, 0, 0, 3) . We use one step of the method from Theorem 5.9 to make the whole set
orthonormal:
v4 = u4 (u4 w1 ) w1 (u4 w2 ) w2 (u4 w3 ) w3

1
1
0 (0, 1, 0, 0) 0 (0, 0, 1, 0) = (1, 0, 0, 1)
= (1, 0, 0, 3) 2 2 , 0, 0,
2
2

1
1
1
v4 = , 0, 0,
w4 =
kv4 k
2
2
Hence, we now have an orthonormal basis of R4 :

1
1
1
1
, 0, 0,
, (0, 1, 0, 0) , (0, 0, 1, 0) , , 0, 0,
{w1 , w2 , w3 , w4 } =
2
2
2
2
and by Theorem 5.13, the extra two vectors are a basis for W , namely:

1
1
{w3 , w4 } = (0, 0, 1, 0) , , 0, 0,
2
2
(b)

By Theorem 5.12 we can express u = (2, 3, 0, 4) as a linear combination of w1 , w2 , w3 , w4


by the formula:
u = (u w1 ) w1 + (u w2 ) w2 + (u w3 ) w3 + (u w4 ) w4

= 3 2w1 + 3w2 + 0 w3 + 2 2w4


Hence, by Theorem 5.13 u = u1 + u2 with u1 W, u2 W and:

1
1
+ 3 (0, 1, 0, 0)
u1 = 3 2w1 + 3w2 = 3 2 , 0, 0,
2
2

1
1
+ 3 (0, 1, 0, 0) = (3, 3, 0, 3)
= 3 2 , 0, 0,
2
2

1
1
= (2, 0, 0, 2)
u2 = 0 w3 + 2 2w4 = 2 2 , 0, 0,
2
2

(c)

The QR factorization formula is, with the vectors


as columns,

1
,
0,
0,
, w2 = (0, 1, 0, 0):
u1 = (1, 0, 0, 1) , u2 = (1, 2, 0, 1) , w1 =
2
2

u1

u2

w2

w1 u1
0

0

1
2

0
0
0

w1 u2
w2 u2


2
2

The Gram-Schmidt process is given in matrix form, as shown in Example 5.6.11 11. The columns
of the original matrix are referred to as u1 , u2 , and the columns of Q are referred to as
w1 , w2 , .
(a)

Step 1: No special work needed since the first column is already normalized:

Unit 5

w1

1
1
2

2
= 0
0 0
1
1
2

1
0

0
1
11.

1
0

3
4

1
0

1
0

w1 u2
w2 u2

Linear Algebra 2

MATH 2300

54

Step 2: The second column of Q is equal to the vector


v2 = u2 (u2 w1 ) w1 = (3, 4) 3 (1, 0) = (0, 4), which is then normalized to w2 = (0, 1).
The second column of R has non-zero entries identical to u2 :

1 3
1 0
1 3
=
0 4
0 1
0 4
Note: This is rather a trivial case with R = A and Q = I. That is because A is already upper
triangular.
(b)

Step 1: The first column of Q is the normalized u1 , and the first row of R is computed:

1
0
1

1
1 0
2
1 1 = 0
1
1 1
2
1
B

= 0

1
2

w1 u1 w1 u2 w1 u3
0
w2 u2 w2 u3
0
0
w3 u3

1
2
0
2
0 w2 u2 w2 u3
0
0
w3 u3

Step 2: The second column of Q is equal to the vector


v2 = u2 (w1 u2 ) w1= (1, 1, 1) 0 w1 = (1, 1, 1), which is then normalized to
w2 = 13 , 13 , 13 . The second row of R is computed.

1
0
1

1
1 0
2
1 1 = 0
1
1 1
2
1

2
= 0
1
2

1
3
1
3
13
1
3
1
3
13

1
2
0
2

0 w2 u2 w2 u3

0
0
w3 u3

1
2 0
2


0
3
0

0
0 w3 u3

Step 3: The thirdcolumn ofQ is equal to the vector v3 = u3 hu3 , w1 i


w1 hu3 , w2 iw2

1
1
1
1
1

, 0, 2 0 w2 = 2 , 1, 2 normalized to w3 = 16 , 26 , 16 .
= (0, 1, 1) 2
2

1 1
0 1
1 1

1 1
0 1
1 1

1
0
2
1 = 0
1
1
2

1
0
2
1 = 0
1
1
2

1
3
1
3
13
1
3
1
3
13

16
2
6
1
6
16
2
6
1
6


2 0

3
0
0
0

2 0

3
0
0
0

0
w3 u3

1
2

1
2

3
6

Step 1: The first column of Q is the normalized u1 , and the first row of R is computed:

3
4
0

Unit 5

3
6

The last row of R is computed, giving the QR decomposition of B :


1
1
1

2 0
1 1 0
2
3
6

1
2
0 1 1 =
3
0

0
3
6
1
1
1
1 1 1
0
0

2
3
6
(c)

1
2

3
3Q
R


5
5
5
w1 u1 w1 u2
0 = 45
= 45
0
w2 u2
2
0
0


5

0

Linear Algebra 2

3
w2 u2

MATH 2300

55

Step 2: The second column of Q is equalto the vector


16
3 4
v2 = u2 (w1 u2 ) w
1 = (5, 0, 2) 3 5 , 5 , 0 = 5 ,
8

,
5 5

normalized to w2 =
factorization:

3
4
0
(d)

3
5
4
5

0 =
2
0

,
5 5

5 5
6
5
5
1
5

1
5

12
5 ,

2 which is then

The second row of R is computed giving the QR

5
0

3
w2 u2

3
5
4
5

5 5
6
5
5
1
5

3
5

0 2 5

Step1: By Theorem 5.9, the matrices have the following form, with the first column of Q
being the normalized u1 . The first row of R is then computed:

1
0
2

1
0 2
5
1 3 = 0
2
1 1
5
1
D

= 0

2
5

w1 u1 w1 u2 w1 u3
0
w2 u2 w2 u3
0
0
w3 u3

4
5
25
5
0 w2 u2 w2 u3
0
0
w3 u3

Step 2: The second column of Q is equal


vector
to the

2
1 , 0, 2
= 52 , 1, 51 , which is then
v2 = u2 (u2 w1 ) w1 = (0, 1, 1) 5
5
5
q


normalized to w2 = 56 52 , 1, 51 = 365 , 56 , 516 . The second row of R is then
computed:

1
6

4
5
25
1 0 2
5
3 5
5

0 1 3 = 0
5
0 w2 u2 w2 u3
6
1
2
2 1 1
0
0
w3 u3

56
5

1
4
6

5 25

5
5
3 5
q

3 6
6
5
= 0

0
5
6
5
1
2

5 6
0
0
w3 u3
5
Step 3: The third column of Q is equal to the vector:
v3 = u3 (w1 u3 ) w1 (w2 u3 ) w2
!

1
2
3 6
1
6
5
4
, ,
, 0,

= (2, 3, 1)
5
5
5
5 3 5
6
5 6
v3 = (0, 0, 0)
The zero vector found for v3 indicates that it is not possible to find a third non-zero vector
orthogonal to the first two columns of Q. This happens because the original matrix D has a
linear dependence in its columns (the third column equals 3 times the second plus 2 times
the first), and so the columns of D only span a 2-dimensional subspace of R3 .
Note: Some authors produce a QR factorization anyway, by finding an extra non-zero
column that makes Q orthogonal and then putting the bottom row of R to be all zeros, giving:

1
6
2
5 25 45
1 0 2
5
3 5
3
q

0 1 3 =
3 6
6
5
0

16 0
5
6
5
2 1 1
2
1
1

0
0
0
5
5 6
6

Unit 5

Linear Algebra 2

MATH 2300

56

5.7

Least squares approximations

When S is a subspace of an inner product space V any vector b not in S is shown to have a unique
orthogonal projection vector in the space S, denoted projS b, such that b projS b is orthogonal to
every vector in S. The orthogonal projection vector solves an optimization problem because it is the
nearest vector in S to the vector b, when distance is defined in terms of the inner product of V. In this
optimization context, projS b is often referred to as the least squares approximation of the vector b
when V is a Euclidean vector space.
We have previously encountered some orthogonal projection formulae. In Euclidean spaces in Unit 3
Section 7: Linear Transformations from Rn to Rm , Theorem 3.14 gives the orthogonal projection onto a
line through the origin, and Theorem 3.19 gives the orthogonal projection onto a plane through the
origin. We also encountered the general inner product space formula for the orthogonal projection onto
a line/vector through the origin in Theorem 5.7 of Unit 5 Section 5, Basic Properties of Inner Product
Spaces. Finally we saw a formula for the orthogonal projection onto a subspace of an inner product
space with a known orthogonal basis, in Theorem 5.13 of Unit 5, Section 6, Orthogonal Bases, the
Gram-Schmidt Process and QR factorization. In this section we develop some formulae for
orthogonal projections that work for any subspace of an inner product space and specific
specializations of these formulae for Euclidean spaces. The orthogonal projection, or least squares
approximation, has a wide variety of applications in Engineering, Science, and Mathematics.
Note: Some of the proofs of theorems are omitted here but can be found in your textbook. Vectors are
shown interchangeably as column matrices and as comma - separated row vectors. The word
projection is often used here to mean orthogonal projection.

5.7.1

Properties of orthogonal projections in inner product spaces

Definition.
If S is a subspace of an inner product space V , and b
/ S is any vector in V then the orthogonal
projection (or simply projection) of b in S is a vector y S such that b y is orthogonal to every
vector in S. If b is in S we formally define the projection of b in S to be b (the same vector). The
projection is denoted as projS b.
Note: Think of going from b along a line that is perpendicular to every vector in S until the line meets S
at the vector y = projS b. You can also think of it as being analogous to the Euclidean space projection
that is easy to visualize, at least for R2 and R3 . The picture for R2 is shown in Figure 5.5, and a similar
picture for R3 can be found in Unit 3, Linear Transformations from Rn to Rm .
Theorem 5.15. In any inner product space, the orthogonal projection projS b, of a vector b into a finite
dimensional subspace S, always exists and is unique.
Proof. The subspace S must have a basis, and this basis can be converted to an orthogonal basis
using the Gram-Schmidt algorithm of the previous section, Orthogonal Bases, the Gram-Schmidt
Process, and QR factorization. Furthermore Theorem 5.13 of the previous section gives a formula for
the orthogonal projection in terms of this orthogonal basis - hence, the projection always exists.
In order to show it is unique, suppose y1 and y2 are both projections of b. Since y1 , y2 S :

hb y1 , y1 i = 0 = hb, y1 i = hy1 , y1 i
hb y2 , y2 i = 0 = hb, y2 i = hy2 , y2 i
and
hb y1 , y2 i = 0 = hb, y1 i = hy1 , y2 i
hb y2 , y1 i = 0 = hb, y2 i = hy2 , y1 i
Since hy1 , y2 i = hy2 , y1 i (basis axiom that inner products are commutative) it follows that:
hy1 , y1 i = hy2 , y2 i = hy1 , y2 i = hy2 , y1 i

Unit 5

Linear Algebra 2

MATH 2300

57

Figure 5.5: Projection onto a Line In R2


Hence, the square of the distance from y1 to y2 is (using the linearity and commutativity axioms of the
inner product):
2

[d (y1 , y2 )] = ky1 y2 k = hy1 y2 , y1 y2 i


= hy1 , y1 i + hy2 , y2 i 2 hy1 , y2 i
= 0 (by the equations derived above)
Hence, ky1 y2 k = 0, but the fifth axiom of inner products (see the previous section, Basis Definitions
and Properties of Inner Product Spaces) ensures this can only happen if y1 y2 = 0. That is, y1 = y2
so the orthogonal projection is uniquely defined.

Theorem 5.16. In any inner product space, the orthogonal projection projS b is the nearest vector in S
to the vector b. That is any other vector in x S satisfies:
d (b, x) > d (b, projS b) or equivalently: kb xk > kb projS bk
Proof. By the definition of projection, b projS b is perpendicular to any vector in S and, in particular,
if x 6= projS b is any other vector in S, then b projS b is perpendicular to x projS b. Recall
Theorem 5.5 (Pythagoras) of the previous section, Basic Definitions and Properties of Inner Product
Spaces, that for two perpendicular vectors u, v:
2

ku vk = kuk + kvk

Applying this with u = b projS b, v = x projS b to give the result:


2

k(b projS b) (x projS b)k = kb projS bk + kx projS bk


kb xk = kb projS bk + kx projS bk

Unit 5

Linear Algebra 2

MATH 2300

58

However, kx projS bk > 0, since x projS b 6= 0, and so:


2

kb xk > kb projS bk

= kb xk > kb projS bk , as asserted.

Theorem 5.17. Suppose V is an inner product space, and S is a 2-dimensional subspace with a basis
{v1 , v2 }. If the orthogonal projection vector is given by the linear combination of these basis vectors as:
projS b = k1 v1 + k2 v2 where k1 , k2 R
then k1 , k2 are solutions of the linear equations with non-singular coefficient matrix:

hv1 , v1 i hv1 , v2 i
k1
hv1 , bi
=
hv2 , v1 i hv2 , v2 i
k2
hv2 , bi
Note: The matrix is symmetric because hv2 , v1 i = hv1 , v2 i .
Proof. If b projS b is orthogonal to v1 and v2 then it is left as an exercise for the reader to show that
b projS b is orthogonal to all vectors in S. Hence, we have the two equations (using projS b b
instead of b projS b):

hprojS b b, v1 i = 0
hk1 v1 k2 v2 b, v1 i = 0
k1 hv1 , v1 i + k2 hv2 , v1 i = hb, v1 i
=
=
hprojS b b, v2 i = 0
hk1 v1 k2 v2 b, v2 i = 0
k1 hv1 , v2 i + k2 hv2 , v2 i = hb, v2 i
The matrix form of the two equations is the required result (using the commutativity hb, v1 i = hv1 , bi,
hb, v2 i = hv2 , bi , hv2 , v1 i = hv1 , v2 i.
The determinant of the coefficient matrix is (where is the angle between the two basis vectors):
2

hv1 , v1 i hv2 , v2 i hv1 , v2 i = kv1 k kv2 k kv1 k kv2 k cos2

2
2
= kv1 k kv2 k 1 cos2
= kv1 k kv2 k sin2
This is zero only if one of kv1 k , kv2 k is zero, in which case that vector is the zero vector, and so not a
basis vector, or, sin = 0 in which case = 0 and v1 , v2 are parallel vectors and so cannot be a basis
of S. Hence, the determinant cannot be zero when {v1 , v2 } is a basis, and so the coefficient matrix is
non-singular.

Theorem 5.18. Suppose V is an inner product space, and S is a k dimensional subspace with a
basis {v1 , v2 , , vk }. If the orthogonal projection vector is given by the linear combination of these
basis vectors as:
projS b = k1 v1 + k2 v2 + + kk vk where k1 , k2 , , kk R
then k1 , k2 2 , , kk are solutions of the k k system of linear equations with k k non-singular
coefficient matrix:

hv1 , v1 i hv1 , v2 i hv1 , vk i


k1
hv1 , bi
hv2 , v1 i hv2 , v2 i hv2 , vk i k2 hv2 , bi

.. =

..
..
..
..
..

.
.
.
.
.
.
hvk , v1 i hvk , v2 i hvk , vk i
kk
hvk , bi

Note: The matrix is symmetric because vi , vj = hvj , vi i for each i, j.

Unit 5

Linear Algebra 2

MATH 2300

59

Proof. The proof is not given but is similar to the proof of Theorem 5.17.

Example 5.7.1.
2
In P3 find the projection
of

the polynomial f (x) = 2 3x + 4x onto the subspace S with basis


2
{b1 (x) , b2 (x)} = x, x
(a)

Using the inner product a0 + a1 x + a2 x2 , b0 + b1 x + b2 x2 = a0 b0 + a1 b1 + a2 b2

(b)

Requires calculus. Using the inner product hf, gi =

R1
1

f (x) g (x) dx

Solution. The following is the solution:


(a)

If the projection is the linear combination of the basis vectors p (x) = k1 x + k2 x2 then by Theorem
5.17, k1 , k2 satisfies:

hb1 , b1 i hb1 , b2 i
k1
hb1 , f i
=
hb2 , b1 i hb2 , b2 i
k2
hb2 , f i

1 0
k1
3
=
0 1
k2
4
2
with solution
k1 =
3, k2 = 4. Hence, the projection of f (x) = 2 3x + 4x onto the subspace
2
with basis x, x is the polynomial:

p (x) = 3x + 4x2
Note: The projection just drops the constant term of the polynomial, which seems intuitively
reasonable.
(b)

If the projection is the linear combination of the basis vectors p (x) = k1 x + k2 x2 then by Theorem
5.17, k1 , k2 satisfies:

hb1 , b1 i hb1 , b2 i
k1
hb1 , f i
=
hb2 , b1 i hb2 , b2 i
k2
hb2 , f i
" R1
#
#

R1 3
" R1
2
x dx 1 x dx
x 2 3x + 4x2 dx
k1
1
1

R1 3
R1
= R1 2
k2
x dx 1 x4 dx
x 2 3x + 4x2 dx
1
1
Evaluating the integrals (details not shown) the equations become:


2
2
0
k1
3
=
44
k2
0 52
15
22
2
with solution
k1 =
3, k2 = 3 . Hence, the projection of f (x) = 2 3x + 4x onto the subspace
2
with basis x, x is the polynomial:

p (x) = 3x +

22 2
x
3

2
Note: To check the accuracy of the result in part (b), compute the f (x) p (x) = 2 10
3 x (that is
f (x) projS f ), and show it is orthogonal to the basis polynomials, b1 (x) and b2 (x). That is,
check that:

Z 1
Z 1
10
10
x 2 x2 dx = 0 and
x2 2 x2 dx = 0
3
3
1
1

Unit 5

Linear Algebra 2

MATH 2300

60

Example 5.7.2.
Find the orthogonal projection of the matrix A onto the subspace S of M22 with basis {B, C} , using
the usual inner product (scalar product of the vectors formed from the four coefficients of the matrices):

2 0
1 0
0 1
A=
, B=
, C=
3 2
0 1
1 1
Solution. If the projection is the linear combination of the basis vectors D = k1 B + k2 C then by
Theorem 5.17, k1 , k2 satisfies:

hB, Bi hB, Ci
k1
hB, Ai
=
hC, Bi hC, Ci
k2
hC, Ai

2 1
k1
4
=
1 3
k2
1
with solution k1 =

13
5 ,

k2 = 56 . Hence, the projection is:


D=

13
5

1 0
0 1

6
5

0
1

1
1

13
5
65

56

7
5

53 56
(that is
Note: To check the accuracy of the result in part (b), compute A D =
95 53
A projS A), and show it is orthogonal to the basis matrices B, C. That is, check that:

hA D, Bi = 0 and hA D, Ci = 0

5.7.2

Properties of orthogonal projections in Euclidean spaces Rn

Euclidean spaces, Rn , n = 1, 2, 3, are inner product spaces. We assume the usual inner product
hu, vi = u v - the scalar product. The orthogonal projection in a Euclidean space Rn is often called
the Least Squares Approximation. The reason for this name is that if b is a vector and y = projS b is
its projection on a subspace S then y is the nearest vector to b in S. Hence, if e = b y is the error
vector of the approximation of b by y then the length kek is minimized. However:
q
kek = e21 + e22 + + e2n
where e1 , e2 , , en are the components of e, and so the orthogonal projection minimizes the sum of
the squares of the ei - that is it finds the least value of the sum of squares.
The results in the previous section for general Inner Product Spaces, hold in Euclidean spaces but have
some extra features. Theorem 5.17 becomes:
Theorem 5.19. Suppose that in Rn , for some n > 2, S is a 2-dimensional subspace with a basis
{v1 , v2 }. If the orthogonal projection of the vector b is given by the linear combination of these basis
vectors as:
projS b = k1 v1 + k2 v2 where k1 , k2 R
then k1 , k2 are solutions of the linear equations with symmetric non-singular coefficient matrix:

v1 v1 v1 v2
k1
v1 b
=
v2 v1 v2 v2
k2
v2 b
This can also be re-written in terms of products of matrices as:
AT AX = AT B

Unit 5

Linear Algebra 2

MATH 2300

61

where:
"

AT
v1T

v2T

v1 |

A
X " vT # B
k1
1
v2
= [b]
k2
v2T

That is, the matrix A has the column vectors v1 , v2 as columns, the transpose matrix AT has the
vectors v1 , v2 as rows, and B has the single vector b as its column.
Note: The matrix equations AT AX = AT B are called the normal equations and are usually derived
directly using matrix/vector methods in Rn without appealing to the general theorem of inner product
spaces.
Proof. The first matrix equation containing the scalar products follows directly from Theorem 5.17. The
proof that this is the same as the second matrix form follows directly by using block matrix multiplication
of the rows v1T , v2T of AT by the columns v1 , v2 , b of A and B:
" T #

" vT #
v1

k1
1

v1 | v2
= [b] =
T
k
2
v2
v2T

T
k1
v1T b
v1 v1 v1T v2
=
k2
v2T b
v2T v1 v2T v2
Since viT vj = vi vj and viT b = vi b this last equation is the same as first matrix equation of the
theorem.
Theorem 5.20. Suppose that in Rn , for some n > k, S is a k dimensional subspace with a basis
{v1 , v2 , , vk }. If the orthogonal projection of the vector b is given by the linear combination of these
basis vectors as:
projS b = k1 v1 + k2 v2 + + kk vk where k1 , k2 , , kk R
then k1 , k2 , , kk are solutions of the linear equations with symmetric non-singular coefficient matrix:

v1 v1 v1 v2 v1 vk
k1
v1 b
v2 v1 v2 v2 v2 vk k2 v2 b

.. =

..
..
..
..
..

.
.
.
.
.
.
vk v1

vk v2

vk vk

kk

vk b

This can also be re-written in terms of products of matrices as:


AT AX = AT B
where:

AT
v1T
v2T

.. v1
.
vkT

v2

vk

k1
k2
..
.

AT
v1T
v2T

= .. [b]
.
kk
vkT

That is, the matrix A has the column vectors v1 , v2 , , vk as columns, the transpose matrix AT has
the row vectors v1T , v2T , , vkT as rows, and B has the single vector b as its column.
Proof. The proof is not given here, but it would use methods analogous to the proof of Theorem
5.19.

Unit 5

Linear Algebra 2

MATH 2300

62

Theorem 5.21. Suppose that in Rn , for some n > k, S is a k dimensional subspace with a basis
{v1 , v2 , , vk }, and A has the vi as columns:

A = v1 v2 vk
The projection matrix P, such that for any b Rn , P b is the projection of b onto S is given by:

1 T
P = A AT A
A
Proof. From Theorem 5.20, the matrix AT A is non-singular, and the projection satisfies:

projS b = k1 v1 + k2 v2 + + kk vk =

v1

v2

vk

k1
k2
..
.

= AX

kk
X is the solution of AT AX = AT B, where B = [b]. This is given by the inverse matrix method as:

1 T
X = AT A
A B
and so the projection is:

1 T
P = AX = A AT A
A B

1 T
and so the projection matrix is A AT A
A .

Example 5.7.3.
Find the orthogonal projection of the vector b = (1, 2, 3) onto the subspace S with basis
{v1 , v2 } = {(1, 0, 2) , (0, 1, 3)} .
Solution. If the projection is p = k1 v1 + k2 v2 then, using Theorem 5.19, the equations satisfied by
k1 , k2 are:

v1 v1 v1 v2
k1
v1 b
5 6
k1
7
=
=
=
v2 v1 v2 v2
k2
v2 b
6 10
k2
7
with solution k1 = 2, k2 = 21 . Hence, the orthogonal projection is:

1 5
2, ,
2 2

Note: To check the accuracy of this result, show that b p = 1, 23 , 21 is orthogonal to the basis
vectors v1 , v2 . That is, check that:

3 1
3 1
(1, 0, 2) = 0 and 1, ,
(0, 1, 3) = 0
1, ,
2 2
2 2
1
p = 2 (1, 0, 2) (0, 1, 3) =
2

Note: The alternative way of setting-up the equations, used in most textbooks, is as follows. Construct
the matrices, A with the basis vectors as columns and B with b as its column:


1 0
1
A = 0 1 , B = 2
2 3
3

Unit 5

Linear Algebra 2

MATH 2300

63

Compute the normal equations AT AX = AT B (these will be the same as the equations used above to
solve this example):

AT

1
0

0 2
1 3

1
0
2


0
k1
1
1
=
k2
0
3

AT

0
1

2
3

1
2
3
B

After multiplying the first two matrices on each side these become:

k1
7
5 6
=
6 10
k2
7

5.7.3

Applications of least squares in Euclidean spaces Rn

Approximate solutions of inconsistent systems of equations


In many applications the solution of a problem is given by the solution of a linear system of equations,
but this system turns out to be inconsistent (it does not have a solution). This can happen when there is
an excess of data that produces more equations than unknowns and inaccuracies in the data or model
used ensure this system is inconsistent. In other applications the model deliberately creates an
inconsistent system and requires a best possible approximate solution to the system (see Example
5.7.5 below).
There are many ways to find an approximate solution to an inconsistent system. The Least Squares
Approximation of an inconsistent system AX = B uses results established in Theorems 5.19, 5.20
and 5.21 that are, for convenience, re-written in equation terminology in Theorem 5.22.
Theorem 5.22. Suppose a linear system is AX = B, whereA is an m n matrix with m n, X is the
n 1 column of unknowns, and B is the m 1 column vector of right hand values. The least squares
approximate solution for X of the system is defined to be the orthogonal projection of B onto the space
spanned by the columns of A, and it is given by the solution of the normal equations:
AT AX = AT B
The least squares approximation minimizes the sum of the squares of the differences between the right
hand side and left hand side of each of the equations (that is, each row of AX = B). If the columns of
A are linearly independent then the coefficient matrix AT A is non-singular and then the solution can be
written:

1 T
X = AT A
A B
The orthogonal projection of B onto the column space of A is given by:

1 T

1 T
AX = A AT A
A B, and the projection matrix is: A AT A
A
Note: This formula holds even if the original system is consistent (has an exact solution). In that case
the least squares approximation is the same as the exact solution of the system. When the columns of
A are linearly dependent, the coefficient matrix AT A is singular, and the normal equations have
infinitely many solutions, but all of these will give the same value of AX.
Proof. This theorem is a variation of Theorems 5.19 and 5.20. To make the connection clearer, note
that AX
of the column vectors of
is simply a linear combination

A. That is if, in column form,


A = v1 | v2 | | vn and X T = x1 x2 xn then:

x1

x2
AX = v1 | v2 | | vn . = x1 v1 + x2 v2 + + xn vn
..
xn

Unit 5

Linear Algebra 2

MATH 2300

64

Hence, the problem is to find the orthogonal projection of the right hand side B onto the subspace
spanned by columns of A - the problem solved by Theorems 5.19 and 5.20.
Theorems 5.19 and 5.20 did not cover the case where the matrix AT A is singular (because the
columns of A are linearly dependent). However, in that case, if X is any one solution of the normal
equations AT AX = AT B the all other solutions are given by X + Y where Y is any vector in the null
space of A. In that case all solutions give the same value of for the projection AX because
A (X + Y ) = AX + AY = AX.

Example 5.7.4.
Find the least squares approximation, and find the error in the solution for the system of equations
AX = B:

1 3
2
2 4 x1 = 1
x2
1 4
6
5
Solution. Solving the first two equations gives x1 = 11
2 , x2 = 2 , but these two values do not satisfy
the third equation, so the system is inconsistent. Using the method of Theorem 5.19, the least square
approximation X is the solution of the normal equations AT AX = AT B, and this is:


2
1 3
1 2 1
1 2 1
2 4 x1 =
1
3 4 4
x2
3 4 4
1 4
6

6
15
x1
6
=
15 41
x2
26
22
with solution x1 = 48
7 and x2 = 7 . The differences between the left and right sides gives the error
vector:
18 4

2
2
1 3 48
7
7
7
2 4 22
1 = 87 1 = 17
7
40
6
6
1 4
72
7

Hence, the errors in the three equations are 74 , 71 , 72 .


Note: The method actually minimizes the sum of the squares of the errors. That is, in this example,
4 2 1 2 2 2
+ 7 + 7 = 73 is as small as it possibly can be.
7
Note: In many inconsistent systems of equations it would be better to minimize the sums of the
absolute values of the errors (rather than the squares of these), but this is a more difficult problem to
solve. In particular, when an inconsistent system has an equation constructed from data that is totally
wrong, then the least squares method tends to exaggerate the affect of that invalid equation on the
approximate solution (rather than minimizing it).
Linear regression
Example 5.7.5.
A large set of data in R2 : (xi , yi ) , i = 1, 2, , m is obtained in an experiment. The data should lie on a
straight line but experimental errors, or inaccuracies in the model, have ensured that the data does not
lie on a straight line. The experimenters want to find a straight line, y = a + bx, that best fits the data,
and they decide to minimize the sums of the squares of the deviations in the y values. Show that the
values for a, b are given by:
X X
X X
X
X X
y x2
x xy
m xy
x y
a=
X 2 , b = X
X 2
X
m x2
x
m x2
x

Unit 5

Linear Algebra 2

MATH 2300

65

where the formulae use the following abbreviations for clarity:


X

y means

xy means

xi yi ,

x means

i=1

m
m
X
X
X
yi ,
x2 means
x2i .
i=1

m
X

m
X

xi ,

i=1

i=1

Solution. Set up each data point for the equation in the form a + bx = y, with unknowns a and b :

a + bx1 = y1
y1
1 x1

1 x2
a + bx2 = y2

y2
a

a + bx3 = y3
= 1 x3
= y3
b

.
..
.
..

..
..
.
.

ym
a + bxm = ym
1 xm
This is (almost certainly) an inconsistent system of the form AX = B, and so we can solve it by the
method of Theorem 5.22, by solving the system AT AX = AT B :

1 x1
y1

1 x

y
.

.
2

2
1
1 .. 1 1 x3 a
1
1
1 .. 1 y3
1

.
.
.
.
.. b
x1 x2 x3 .. xm ..
x1 x2 x3 .. xm ..
.
1 xm
ym
X

m
X

" X #
1
x a
y

= X
X

i=1
X
b
xy
x
x2
Noting that

m
X
1 = m, the solution given by the formula for the inverse of a 2 2 matrix is:
i=1

a
b

X #" X #
" X
x2 x
y
X
X
= X
X 2
x
m
xy
m x2
x
1

X X

a
b

X X
" X X
#

y x2
x xy

X
X X
=
= X
X 2

m
xy

x
y
2
m x
x

X X

x2

x2

xy

X 2

xy

X X

x2

X 2

Note: The regression line P


is a basic result in statistics ,where it P
is often written in a simpler form using
x

statistical constructs: x = m (the mean of the xi values), y = m . The formulae become (see the
exercise set for more details):
P
(x x) (y y)
and a = y bx
b=
P
2
(x x)
Note: See the exercise set for numerical examples using this formula.
Least squares approximations for nonlinear functions
The least squares method uses linear methods but it can be used to find least squares data fits, using
polynomials and exponential functions.

Unit 5

Linear Algebra 2

MATH 2300

66

Example 5.7.6.
A given set of data in R2 : (xi , yi ) , i = 1, 2, , m is expected to fit a quadratic function but experimental
errors, or inaccuracies in the model, have ensured that no quadratic fits it exactly. We want to find a
quadratic, y = a + bx + cx2 , by minimizing the sums of the squares of the deviations in the y values.
Show that the values of a, b, c are given by the solutions of the normal equations AT AX = AT B, where:

1 x1 x21
y1

1 x2 x22
y2
a

1 x3 x23

A=
, X = b , B = y3
..
..
..
..
c
.
.
.
.
2
1 xm xm
ym
Solution. Set up each data point for the equation in the form a + bx + cx2 = y, with unknowns a, b, c :

y1
a + bx1 + cx21 = y1
1 x1 x21

y2
1 x2 x22
a + bx2 + cx22 = y2

1 x3 x23
a + bx3 + cx23 = y3
=
b = y3

.
..
.
.
.

..
.. c

..
..
.

ym
1 xm x2m
a + bxm + cx2m = ym
This is (almost certainly) an inconsistent system of the form AX = B and so we can solve it by the
method of Theorem 5.22 (and Example 5.7.5) by solving the system AT AX = AT B as asserted.
Note: For numerical examples, see the exercise set.
Note: A method exactly analogous to this example can be used to find polynomials of any degree that
approximate a data set. However, the problem becomes numerically unstable if the degree of the
polynomial is large. The process generally works reasonably well with polynomials of degree 3 and 4.
Note: The method will also become unstable and may give meaningless results if the data really does
not all lie reasonably close to a polynomial of the degree used in the approximation.
Example 5.7.7.
A given set of data in R2 : (xi , yi ) , i = 1, 2, , m is expected to fit an exponential function, y = aebx ,
but experimental errors, or inaccuracies in the model, have ensured that no such function fits it exactly.
We want to find an exponential function y = aebx by minimizing the sums of the squares of the
deviations in the y values. Show that the values of a, b are given by the solutions of the normal
equations AT AX = AT B, where:

1 x1
ln y1
1 x2
ln y2

ln a
1 x3

A=
, B = ln y3
, X =
b
..
..
..
.
.
.
1 xm
ln ym
Solution. Set up each data point for the equation in the form aebx

aebx1 = y1
ln a + bx1 = ln y1
1

1
aebx2 = y2
ln
a
+
bx
=
ln
y

2
2

Take logs of

ln a + bx3 = ln y3
aebx3 = y3
=
= 1

..
both
sides
.
..

..

.
.

ln a + bxm = ln ym
1
aebxm = ym

Unit 5

= y, with unknowns a, b :

x1
ln y1
ln y2
x2

ln a

x3
= ln y3

b
..
..
.
.
xm

Linear Algebra 2

ln ym

MATH 2300

67

This is (almost certainly) an inconsistent system of the form AX = B, and so we can solve it by the
method of Theorem 5.22 (and Examples 5.7.5, 5.7.6) by solving the system AT AX = AT B as
asserted.
Note: For numerical examples, see the exercise set.
Note: The solution does not actually solve the problem as stated, because it actually finds the least
squares solution of the equations formed by taking logs of both sides. That is, the solution is not a least
squares solution of the original equations aebxi = yi .

Section 5.7 exercise set


Check your understanding by answering the following questions.
1.

2.

Find the orthogonal projection in R3 of the vector (1, 2, 3) onto the two-dimensional subspace
spanned by the vectors {(2, 0, 3) , (2, 1, 1)}, using the inner product:
(a)

The normal scalar product

(a)

The inner product: h(a, b, c) , (e, f, g)i = ae + bf + 2cg

Find the orthogonal projection


in P4 of the polynomial f (x) = x + x4 onto P2 with basis

{b1 (x) , b2 (x) , b3 (x)} = 1, x, x2 using the inner product:

(a)
a0 + a1 x + a2 x2 + a3 x3 + a4 x4 , b0 + b1 x + b2 x2 + b3 x3 + b4 x4
= a0 b0 + a1 b1 + a2 b2 + a3 b3 + a4 b4
R1
(b) Requires calculus: hp, qi = 1 p (x) q (x) dx

3.

Find the projection of the matrix A onto the subspace of M22 spanned by the matrices B, C, D
with the inner product defined in the usual way as the scalar product of the vectors formed from
the four entries in each matrix or, equivalently, as the trace (sum of diagonal values) of the
transpose of one matrix multiplied by the other matrix.

2 1
1 0
0 1
1 1
A=
, B=
, C=
, D=
0 2
1 1
1 1
1 0

4.

Find the projection matrix P for the projection in R4 onto the subspace S spanned by the two
vectors:
{(1, 3, 2, 1) , (0, 1, 0, 1)}
That is, P v gives the orthogonal projection of v in S.

5.

Find the least squares approximation, and find the errors in each equation, for the linear systems
AX = B:
(a)

when:

(b)

when:

2 0
2
A = 1 2 , B = 1
0 1
3


2 0
2
A = 1 2 , B = 1
0 1
1

and explain the meaning of the unusual set of errors.

Unit 5

Linear Algebra 2

MATH 2300

68

(c)

when:

2
1
A=
0
0
6.

2
0 1
1
2 0
, B =
3
1 0
0 1
2

Find the regression line for the data:


{(0, 1) , (1, 1.3) , (2, 1.9) , (3, 2.6) , (4, 3)}

7.

More difficult and needs a calculator or computer. Find a plane that fits the data:
(xi , yi , zi ) = {(0, 1, 1) , (1, 1.3, 1.9) , (2, 1.9, 3.1) , (3, 2.6, 4.2) , (4, 3, 4.8)}
That is, assume the plane is z = a + bx + cy, and do the least squares fit on the z values.

8.

More difficult. Given the data:


{(0, 2) , (1, 0) , (2, 1) , (3, 0) , (4, 1)}
(a)
(b)

9.

Find a quadratic, y = a + bx + cx2 , that gives the best least squares fit to the data. Find the
sum of squares of the errors.
Find a cubic, y = a + bx + cx2 + dx3 , that gives the best least squares fit to the data. Find the
sum of squares of the errors.

More difficult. Given the data:


{(1, 0.5) , (0, 1) , (0.5, 2) , (1, 3) , (1.5, 5)}
(a)
(b)

10.

Find an exponential, y = aebx , that gives the best least squares fit to the data. Find all of the
errors in the y values.
Find a quadratic, y = a + bx + cx3 , that gives the best least squares fit to the data. Find the
errors in the y values.

Requires calculus and difficult. Find the orthogonal


projection
of f (x) = ex onto the set of

2
3
polynomials {g1 (x) , g2 (x) , g3 (x) , g4 (x)} = 1, x, x , x in the (infinite dimensional) vector
space of all continuous functions, with the inner product:
Z 1
hp, qi =
f (x) g (x) dx
1

11.

More difficult. Show that the simpler forms of the formulae given in Example 5.7.5 are correct:
X
X X
P
m xy
x y
(x x) (y y)
b= X
P
X 2 =
2
(x x)
m x2
x
X X
X X
y x2
x xy
a=
X 2 = y bx
X
m x2
x

Solutions
1.

For the vector b = (1, 2, 3) , and subspace S spanned by {v1 , v2 } = {(2, 0, 3) , (2, 1, 1)} , the
orthogonal projection of b onto S is given by p = k1 v1 + k2 v2 , given by the solution of (Theorem
5.17 ):

hv1 , v1 i hv1 , v2 i
k1
hv1 , bi
=
hv2 , v1 i hv2 , v2 i
k2
hv2 , bi
and this becomes:

Unit 5

Linear Algebra 2

MATH 2300

69

(a)

For the normal scalar product:

v1 v1
v2 v1

v1 v2
v2 v2

k1
k2

v1 b
v2 b

13 1
k1
7
=
k2
7
1 6

14
7
, k2 = 11
. Hence, the orthogonal projection is:
with solution k1 = 11

7
14
14 14 35
(2, 0, 3) +
(2, 1, 1) =
,
,
p=
11
11
11 11 11

(b)

For the inner product h(a, b, c) , (e, f, g)i = ae + bf + 2cg this becomes:

22 2
k1
16
=
2 7
k2
10

94
46
, k2 = 75
. Hence, the orthogonal projection is:
with solution k1 = 75

94
32 94 232
46
(2, 0, 3) +
(2, 1, 1) =
,
,
p=
75
75
25 75 75

2. Given f (x) = x + x4 and basis {b1 (x) , b2 (x) , b3 (x)} = 1, x, x2 of P2 , the projection of f onto
P2 is given by = p (x) = k1 b1 (x) + k2 b2 (x) + k3 b3 (x) where k1 , k2 , k3 are given by:

(a) With inner product a0 + a1 x + a2 x2 + a3 x3 + a4 x4 , b0 + b1 x + b2 x2 + b3 x3 + b4 x4


= a0 b0 + a1 b1 + a2 b2 + a3 b3 + a4 b4 , solve:

hb1 , b1 i hb1 , b2 i hb1 , b3 i


k1
hb1 , f i
hb2 , b1 i hb2 , b2 i hb2 , b3 i k2 = hb2 , f i
hb3 , b1 i hb3 , b2 i hb3 , b3 i
k3
hb3 , f i


1 0 0
k1
0
k1
0
0 1 0 k2 = 1 = k2 = 1
0 0 1
k3
0
k3
0

(b)

Hence, the projection of f (x) = x + x4 is p (x) = x.


R1
With inner product hp, qi = 1 p (x) q (x) dx, solve:
R1
R1
1 dx
x dx
1
R 11 2
R1
x dx
1 x dx
R1 2
R1
1
x dx 1 x3 dx
1

2 0 32
0 2 0
3
2
0 52
3

R1 2
R1
x + x4 dx
x dx
k
1
1
1

R1 3

R1

x dx k2 = 1 x x + x4 dx
1

R1 2
R1 4
k3
x x + x4 dx
x dx
1
1
2

3
k1
k1
35
5
k2 = 23 = k2 = 1
6
2
k3
k3
7
7

3
+ x + 76 x2 .
Hence, the projection of f (x) = x + x4 is p (x) = 35
Note: To check your result, show that hf p, bi i = 0 for each basis function bi . Since
3
76 x2 + x4 , we have to show (the correct results):
f (x) p (x) = 35

Z 1
Z 1
6 2
6 3
3
3
4
5
x +x
dx = 0,
x x +x
dx = 0
35 7
35
7
1
1

Z 1
3 2 6 4
x x + x6 dx = 0
35
7
1

Unit 5

Linear Algebra 2

MATH 2300

70

3.

Given:

A=

2
0

1
2

, B=

0
0

1
0

, C=

0
1

0
1

, D=

1
1

0
0

the projection is the linear combination of the basis vectors D = k1 B + k2 C + k3 D. By Theorem


5.17, k1 , k2 , k3 satisfy:

hB, Bi hB, Ci hB, Di


k1
hB, Ai
hC, Bi hC, Ci hC, Di k2 = hC, Ai
hD, Bi hD, Ci hD, Di
k3
hD, Ai

1
k1
1
k1
1 0 0
0 2 1 k2 = 2 = k2 = 2
3
2
0 1 2
k3
2
k3
3
Hence, the required projection matrix P = B + 32 C + 32 D:

2 1 0
2 0 0
0 1
P =
+
=
+
0 0
3 1 1
3 1 0

2
3
4
3

2
3

Note: To check your result, show that h(P A) , Bi = 0, h(P A) , Ci = 0, h(P A) , Di = 0,


or equivalently, the trace (sum of diagonal entries) is zero for each of
T
T
T
(P A) B, (P A) C, (P A) D.
4.

If the basis vectors of S are the columns of the matrix A :

1 0
3 1

A=
2 0
1 1

1 T
then the projection matrix is given by Theorem 5.21 as: P = A AT A
A :

1 0
1 0

3 1 1 3 2 1 3 1
1 3 2

P =
2 0 0 1 0 1 2 0
0 1 0
1 1
1 1

1 0


3 1 15 4 1 1 3 2 1

=
2 0 4 2
0 1 0 1
1 1

1 0

1
3 1
72
1 3 2 1
7

=
2 0 2 15
0 1 0 1
7
14
1 1

1 0

1
1
3 1
72 71
7
7

=
4
11
2 0 2 3
7
14
7
14
1 1

1
1
72 71
7
7
9
5
1
72 14
7
14

P =
4
2
2 2
7
7
7
7
5
2
9
17 14
7
14

1
1

Note: To check this is correct, you can show that for any vector v R4 ,
(v P v) (1, 3, 2, 1) = 0 and (v P v) (0, 1, 0, 1) = 0, but this is somewhat complicated.

Unit 5

Linear Algebra 2

MATH 2300

71

5.

The least squares projection of the system AX = B is given, by Theorem 5.22, as the solution of
the normal equations AT AX = AT B :
(a)

2
0

1 0
2 1

2 0
2
2
1
0
1 2 X =
1
0 2 1
0 1
3

5 2
5
X=
2 5
1

with solution X T = 79 75 (least square approximation for the original system). The
errors for each equation are given by AX B :

2 0
1 2
0 1
(b)

9
7
5
7

4
2
7
1 = 87
3
16
7

This is the same matrix A as part (a) and so the left hand side matrix of the normal
equations is the same:

5 2
2 5

X=

2
0

1 0
2 1

2
1 = 3
3
1

with solution X T = 1 1 (least square approximation for the original system). The errors
for each equation are given by AX B :


2 0
2
0
1 2 1 1 = 0
1
0 1
1
0
The errors are all zero. This means that the original system, even though it has more
equations than variables, does have an exact solution, and it is the solution given by the
least squares method.
(c)

2
0
1

1
2
2 1 0 0
0
1
X = 0 2 1 0
3
0
1 0 0 1
1
2


2
5
0 X = 1
2
4

2 0
1 0 0
1
2
2 1 0
0 1
0 0 1
0 0

5 2
2 5
2
0

7
5
15
(least square approximation for the original system). The
with solution X T = 11
11
11
errors for each equation are given by AX B :

2
1

0
0

Unit 5

0 1
2 0

1 0
0 1

7
11
5
11
15
11

7
2
11

14
1 = 1128
3
11
7
2
11

Linear Algebra 2

MATH 2300

72

Figure 5.6: Least Squares Line


6.

By Example 5.7.5, the regression line y = a + bx for five data points is given by the least squares
solution of AX = B :

1 x1
y1
1 0
1
1 x2
1 1
y2
1.3

1 x3 a = y3 = 1 2 a = 1.9

1 x4
y4
1 3
2.6
1 x5
y5
1 4
3
and the least squares solution is the solution of AT AX = AT B :

1 0

1 1

1 1 1 1 1
1 2 a = 1 1 1

0 1 2 3 4
0 1 2
1 3 b
1 4

9. 8
a
5 10
=
10 30
b
24. 9

1
3

1
4

1
1.3
1.9
2.6
3

with solution: a = 0.9, b = 0.53 and regression line y = 0.9 + 0.53x. To illustrate this solution, the
graph of the data and regression line are shown in Figure 5.6.
Alternate solution: The solution can also be obtained using the formulae derived in Example
5.7.5, but these are complicated, and generally it is easier to develop the equations and solve the
problem as shown above. The formulae for the least squares line y = a + bx are:
X
X X
X X
X X
m xy
x y
y x2
x xy
a=
X 2 , b = X
X 2
X
m x2
x
m x2
x
Since m = 5,

Unit 5

x = 10,

X
X
X
x2 = 30,
y = 9.8,
xy = 24.9 the formulae give the same values

Linear Algebra 2

MATH 2300

73

for a, b:
a=
b=
7.

9.8 30 10 24.9
2

5 30 (10)
5 24.9 10 9.8
2

5 30 (10)

= 0.9

= 0.53

Extending the method of Example 5.7.5, the regression equation z = a + bx + cy: for five data
points is given by the least squares solution of equations AX = B formed by inserting the data in
the form a + bx + cy = z

1
1 x1 y1
z1
1 0 1

1.9
1 x2 y2 a
z2
1 1 1.3 a

1 x3 y3 b = z3 = 1 2 1.9 b = 3.1

4.2
1 x4 y4 c
z4
1 3 2.6 c
4.8
1 4 3
1 x5 y5
z5
and the least squares solution is the solution of AT AX = AT B :

1 0 1

a
1.9
1 1
1
1 1
1
1
1.3
1
1
1
1
1

0 1

2
3 4
2
3 4
1 2 1.9 b = 0 1
3.1
1 1. 3 1. 9 2. 6 3 1 3 2.6 c
1 1. 3 1. 9 2. 6 3 4.2
1 4 3
4.8

5
10
9.8
a
15.0
10
30
24. 9 b = 39. 9
9. 8 24.9 22.06
c
34. 68

with approximate solution: a = 0.329, b = 0.583, c = 0.767 and regression equation


z = 0.329 + 0.583x + 0.767y.
8.
(a)

By Example 5.7.6 the quadratic, y = a + bx + cx2 , that approximates the five data points
(xi , yi ) in the least squares sense is given by the least squares approximation for the
system AX = B :

1 x1 x21
y1
1 0 0
2

1 x2 x22 a
y2
1 1 1 a
0

1 x3 x23 b = y3 = 1 2 4 b = 1

1 x4 x24 c
y4
1 3 9 c
0
1 x5 x25
y5
1 4 16
1
with least squares approximation X given by the solution of AT AX = AT B :

1 0 0

1 1 1 a

1 1 1 1 1
1 1 1 1 1

0 1 2 3 4 1 2 4

b = 0 1 2 3 4

0 1 4 9 16 1 3 9 c
0 1 4 9 16
1 4 16

5 10 30
a
2
10 30 100 b = 2
30 100 354
c
12

Unit 5

Linear Algebra 2

2
0
1
0
1

MATH 2300

74

Figure 5.7: Least Squares Parabola


87
68
, b = 35
, c = 74 and approximating quadratic
with solution a = 35
87
68
2
y = a + bx + cx = 35 35 x + 74 x2 .
2
The sum of the squares of the errors is given by kAX Bk

1
1
1
1
1

0
1
2
3
4

0
1
4
9
16

68

35

87
35

2
0
1
0
1

2
35

1
35
9
=
3513

35

2 68

35

3526
=

35
13

35

2
0
1
0
1

= 8

35

To illustrate this solution, the graph of the data and the approximating quadratic are shown in
Figure 5.7.
(b)

Note: Notice that the cubic has a significantly smaller error than the quadratic.

(a)

By Example 5.7.6 the exponential y = aebx that approximates the five data points (xi , yi ) in
the least squares sense is given by the least squares approximation for the system AX = B :

1 1
ln 0.5
1 x1
0.693
ln y1
1 x2

1 0
ln 1
ln y2
0

1 x3 ln a = ln y3 = 1 1 ln a = ln 2 ' 0.693
2

b
b
ln 3 1. 098
1 x4
ln y4
1 1
ln 5
1 x5
1. 609
ln y5
1 23

9.

Unit 5

Linear Algebra 2

MATH 2300

75

Figure 5.8: Least Squares Exponential


The least squares solution is given by the solution of AT AX = AT B:

1 1
0.693

1 0

1 1 1 1 1
1 1 1 1 1
1 1 ln a =
0.693
3
3
1
1
2
b
1 0 2 1 2
1 0 2 1 2
1 1
1. 098
3
1. 609
1 2

5 2
ln a
2. 707
=
b
4. 551
2 29

with approximate solution ln a = 0.16646, giving a ' 1.181 and b = 0.937. Hence, the
approximating exponential is y = 0.937e1.181x .
The errors in the y values are given by:

y1 0.937e1.181x1
0.212
0.5 0.937e1.181(1)
y2 0.937e1.181x2
0.063
1 0.937e1.1810

y3 0.937e1.181x3 = 2 0.937e1.1810.5 ' 0.309

y4 0.937e1.181x4
0.052
3 0.937e1.1811
y5 0.937e1.181x5
0.509
5 0.937e1.1811.5

(b)

Unit 5

To illustrate this solution, the graph of the data and the approximating quadratic are shown in
Figure 5.8.
By Example 5.7.6 the quadratic y = a + bx + cx2 that approximates the five data points
(xi , yi ) in the least squares sense is given by the least squares approximation for the
system AX = B :

1 1 1
1 x1 x21
y1
2

1
1 x2 x22 a
y2
1 0 0 a

1
1 x3 x23 b = y3 = 1 1
b =
2
2
4

3
1 x4 x24 c
y4
1 1 1 c
9
1 x5 x25
y5
5
1 23
4

Linear Algebra 2

MATH 2300

76

Figure 5.9: Least Squares Cubic

with least squares approximation X given by the solution of AT AX = AT B :

1 1
1 0
1 0

1
1
2
1
4

1
1
1

1
3
2
9
4

1 1
1 0
1 21
1 1
1 23

5
2
9
2

2
9
2
7
2

1

1 1
0
a

1
= 1 0
b
4
1 c
1 0
9
4

9
2
7
2
57
8

1
1
2
1
4

1
1
1

1
3
2
9
4

1
2

3
5

23
a
2
b = 11
61
c
4

with approximate solution a = 0.996, b = 1. 337, c = 0.854 and approximating quadratic


y = a + bx + cx2 = 0.996 + 1. 337x + 0.854x2 .
The errors in the y values are:

1
1
1
1
1

1
0
1
2

1
3
2

1
1
0.513
2

1 0.996
0
0.996


1

1. 337
4
2 = 1. 878
3 3. 187
1 0.854
9
4. 923
5
4

1
2

1


2 =


3
5

0.013
0.004
0.122
0.187
0.077

To illustrate this solution, the graph of the data and the approximating quadratic are shown in
Figure 5.9.
Note: The errors with a quadratic approximation are considerably smaller than for the
exponential approximation, suggesting that the data is closer to a quadratic form.

Unit 5

Linear Algebra 2

MATH 2300

77

10.

By Theorem 5.18 the projection p (x) = k1 g1 (x) + k2 g2 (x) + k3 g3 (x) + k4 g4 (x)

hg1 , g1 i hg1 , g2 i hg1 , g3 i hg1 , g4 i


k1
hg1 , f i
hg2 , g1 i hg2 , g2 i hg2 , g3 i hg2 , g4 i k2 hg2 , f i

hg3 , g1 i hg3 , g2 i hg3 , g3 i hg3 , g4 i k3 = hg3 , f i


hg4 , g1 i hg4 , g2 i hg4 , g3 i hg4 , g4 i
k4
hg4, f i
R1
R1 2
R1 3

R1 x
R1

e dx
1dx
xdx
x dx 1 x dx
k1
1
1
1
1
R 1 xex dx
R 1 xdx R 1 x2 dx R 1 x3 dx R 1 x4 dx

k
2 1
1
R1
R1
R1
= R1

R1 2
1
1
1
1 x dx 1
x3 dx 1 x4 dx 1 x5 dx k3 1 x2 ex dx
R1 3
R1 4
R1 5
R1 6
R1 3 x
k4
x dx 1 x dx 1 x dx 1 x dx
x e dx
1
1

2 0 32 0
k1
2 sinh 1
0 2 0 2 k2

2e1
=
2 3 2 5

k3 e 5e1

0
0
3
5
k4
16e1 2e
0 25 0 72

with approximate solution: k1 = 0.996, k2 = 0.998, k3 = 0.537, k4 = 0.176 and approximating


polynomial:
y = 0.996 + 0.998x + 0.537x2 + 0.176x3
Note: The first four terms of the Taylor Series approximation to ex are:
1
1
ex ' 1 + x + x2 + x3
2
6
' 1 + x + 0.5x2 + 0.167 x3

11.

These values are reasonably close to those found by the least squares method.
X
X
1
1
x2 , y = m
y 2 the simplified formula,
Using the formulae for the means, x = m
P
(xx)(yy)
, numerator and denomenator are computed separately:
b= P
2
(xx)

X
X
X
X
xy x y y x + xy 1
X
1X X
1X
1X
1X X
x y
y x+
x
ym
=
xy
m
m
m
m

X
X X
P
1
m xy
y x
(x x) (y y) =
m
X
X
X
P
2
x2 2x x + x2 1
(x x) =
X 2
X
X
1
1X
x
x+
x m
=
x2 2
m
m
X
X 2
P
1
2
m x2
x
(x x) =
m
P

(x x) (y y) =

Hence the result follows:


X
X X
P
m xy
y x
(x x) (y y)
= X
b=
P
X 2 - the formula found in Exercise 5.7.5.
2
(x x)
m x2
x
Starting with the simplified formula for a then:
P
(x x) (y y)
y bx = y
x
P
2
(x x)
P
P
2
y (x x) x (x x) (y y)
=
P
2
(x x)

Unit 5

Linear Algebra 2

MATH 2300

78

Computing the numerator separately:


P
P
P
P
P
P
P
= y x2 2yx x + yx2 1 x xy + x2 y + xy x x2 y 1
Moving the fourth term of the numerator up to the second position:
1P P
1P P 2
y x
x xy 2myx x + myx2 + mx2 y + mxyx mx2 y
m
m
1P P 2 P P
y x x xy
=
m
=

Putting this back with the denomenator and using the formula for the denomenator derived above:
P P 2 P P
1
y x x xy
m
a=
P
2
(x x)
P P 2 P P
y x x xy
a= X
X 2 - the formula found in Exercise 5.7.5
2
m x
x

Unit 5

Linear Algebra 2

MATH 2300

79

You might also like