You are on page 1of 13

1 Lists

The notation for points, vectors, and matrices can be awkward. To try to alleviate some of this, we make
use a number of conventions. The rst is that lists of numbers will be presented as a single generic value.
For example,
(p
1
, . . . , p
n
) = (p
i
) and v
1
, . . . , v
n
) = v
i
).
If we have lists with dierent lengths, we will try to indicate this by choosing dierent indexes on the generic
term.
Sometimes we want to concatenate lists of numbers together. We will use a semi-colon for this operation.
That means that if we have
p = (p
i
) = (p
1
, . . . , p
n
)
q = (q
j
) = (q
1
, . . . , q
m
)
then
(p ; q) = (p
i
; q
j
) = (p
1
, . . . , p
n
, q
1
, . . . , q
m
)
Matrices have both a row and a column index (row rst, column second) and will be presented as
_
_
_
a
1,1
. . . a
1,n
.
.
.
.
.
.
.
.
.
a
m,1
. . . a
m,n
_
_
_ = (a
i,j
)
Sometimes we want to concatenate columns together to form a matrix. Assuming the matrices A and B
have the same number of rows, we use the notation (A; B) to refer to the matrix
(A; B) =
_
_
_
a
1,1
. . . a
1,n
b
1,1
. . . b
1,p
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
m,1
. . . a
m,n
b
m,1
. . . b
m,p
_
_
_
We will also eliminate multiple delimiters when we are applying a function to a point or vector. For
example, instead of using the (technically correct) notation F((x, y)) to apply the function F to the point
(x, y), we will write F(x, y). Similarly, for vectors, instead of using F(x, y)), we will write Fx, y). In
fact, we did that above when we discussed concatenation and used (p
i
; q
j
) instead of the correct notation
((p
i
) ; (q
j
)).
2 Euclidean Spaces
We start by dening n-dimensional Euclidean space
E
n
= (p
1
, . . . , p
n
) [ p
i
R.
An element of E
n
is an n-dimensional point and is written in bold face. The n real numbers that make up
a point are called the coordinates of the point and the function u
i
: E
n
R that maps p to p
i
is called the
coordinate map. The point 0
n
E
n
is the point all of whose coordinates are 0 and is sometimes referred
to as the origin. The superscript n on 0
n
is often omitted if it can be determined from context.
We can determine the distance between two points using the Pythagorean Theorem.
dist(p, q) =

_
n

i=1
(p
i
q
i
)
2
This distance function satises the triangle inequality.
1
Theorem 1 (Triangle Inequality). If p, q, r E
n
, then we have
dist(p, q) dist(p, r) + dist(r, q)
There isnt much to say about Euclidean space, it doesnt have a lot of structure. Basically, a point just
provides us with a location. In particular, there is no notion of adding or scaling points. What does it
mean to add the library and Burger King or to scale Burger King by ? In order to add and scale, we
need a vector space.
3 Vector Spaces
A vector space is a collection of objects that can be added and scaled. You can nd the abstract denition
in almost any linear algebra book and I wont repeat it here. Notationally, variables representing vectors
have an arrow over them and the zero vector is written as

0 . Since most of the vector spaces we are dealing
with in this course are subsets of a known vector space, we do not need to check all the vector space axioms.
In fact, if we can show that a non-empty subset of a vector space is closed under addition and scaling, then
that set is a vector space as well.
3.1 Real Spaces
We dene n-dimensional real space to be
R
n
= v
1
, . . . , v
n
) [ v
i
R
which is often referred to as R n and elements in R
n
are called n-dimensional vectors. The n real
numbers that make up a point are called the components of the vector and the zero vector is the all of
whose components are 0 and is written as 0
n
. As with the point 0
n
, we will omit the n in the superscript if
it can be determined from context.
While points refer to locations, vectors refer to directions. By this, I mean both which way to head
as well as how far to go (something like go three miles west). Alternatively, this can be read as which
way to head as well as how fast to travel (something like go west at a speed of three). The two ideas are
equivalent, but the heading plus speed is a more common idiom for advanced mathematics. This idea of
direction is also sometimes referred to as a directed distance or magnitude and direction.
The space R
n
is a vector space which means we can add and scale vectors. These operations are dened
component-wise.

v +

w = v
i
+w
i
)

v = v
i
)
The vector space R
n
also has an inner product (also called the scalar product or dot product).

w =
n

i=1
v
i
w
i
We can use the inner-product to dene a norm (or magnitude) of a vector
[

v [ =

v
which is sometimes written with double bars |

v |. I prefer the single bars (easier to write), but you need to


be aware of the context. The inner product satises the Cauchy-Schwartz Inequality.
Theorem 2 (Cauchy-Schwartz Inequality). If

v ,

w R
n
, then
[

w[ [

v [ [

w[
2
Because of this inequality, we can also dene the angle between two vectors
(

v ,

w) = arccos
_

v

w
[

v [ [

w[
_
which is sometimes just written as if the two vectors are understood. If two vectors have a zero dot product,
then we say that they are orthogonal to each other. If neither vector is the zero vector, then this reduces
to the concept of perpendicular.
We can also use the dot product to project one vector along another. Given two vectors, the value

v
[

v [
=

u

v
[

v [
is the length of

u that is in the direction of

v . So if

v and

u are perpendicular, none of

u points in the
direction of

v .
There is clearly a direct translation between n-dimensional points and n-dimensional vectors. The vector

pq = q
i
p
i
) is called the vector from p to q and dist(p, q) =

pq

. The literature often identies


the point p with the vector

0p which can make everything very confusing. It is even more confusing in 1
dimension where they identify numbers x, points (x), and vectors x) usually without mentioning the change
in type of object. Im going to try very hard to keep the types of the objects I am working with clear. I will
also expect you to do the same.
3.2 Real Matrix Spaces
We dene mn-dimensional matrix space to be
R
mn
=
_

_
_
_
_
a
1,1
. . . a
1,n
.
.
.
.
.
.
.
.
.
a
m,1
. . . a
m,n
_
_
_

a
i,j
R
_

_
An element of R
mn
is referred to as an mn matrix.
The set of matrices of the same dimension are a vector space with addition and scaling are dened
component-wise.
A+B = (a
i,j
+b
i,j
)
A = (a
i,j
)
Where the vector spaces R
n
had an inner product, matrices have a matrix product. Given two matrices
A R
pn
and B R
mp
, we can dene the product AB R
mn
. I wont present the formulas here for
this product (they can be found in any introductory linear algebra text). This product is not commutative,
but distributes over addition, is associative and has an identity matrix I
n
which is the n n matrix with
1s down the diagonal and 0s elsewhere. In other words I
n
= (
i,j
) where
i,j
is Kroneckers Delta which
is 1 if i = j and 0 otherwise. As usual, the superscript n will be omitted if it is clear from context. An nn
matrix A is invertible if there exists another n n matrix B such that AB = I
n
= BA.
We can also switch the rows and the columns of a matrix. Doing this creates the transpose of a matrix
and we will use the notation
(a
i,j
)
t
= (a
j,i
)
A matrix is said to be symmetric if it is equal to its transpose and it is said to be orthogonal if its inverse
is equal to its transpose.
There is a map det : R
nn
R called the determinant. Again, we omit the denition of the determi-
nant (see any introductory linear algebra text). It is true that det(AB) = det(A) det(B). Another property
of matrices is the rank of the matrix. An n m matrix has rank p provided there exists a p p sub-matrix
that has a non-zero determinant but there is no larger such submatrix. An nm matrix is said to have full
rank provided it has rank equal to the minimum of n and m.
3
Theorem 3. Let A be an nn matrix, then the following are equivalent:
A is invertible;
det(A) ,= 0;
A is of full rank.
We conclude by presenting the trace of an nn matrix. This is the map Tr : R
nn
R which is equal
to Tr(a
i,j
) =

n
i=1
a
i,i
.
3.3 Linear Transformations
When dealing with functions between vector spaces, we are primarily interested in functions that preserve
the linear structure. A function T between two vector spaces is a linear transformation (sometimes called
a linear operator) provided we have
T(

v +

w) = T(

v ) +T(

w)
The range of a linear transformation is the set of all outputs and the null space of a linear transfor-
mation is the set of all vectors in the domain which map to

0 under the transformation. The rank of a
linear transformation is the dimension of the range of the linear transformation. The nullity of a linear
transformation is the dimension of the null space of the transformation. These numbers are related by the
following theorem.
Theorem 4 (Rank Nullity Theorem). If T is a linear transformation
from an n-dimensional vector space, then the rank of T plus the nullity of
T is equal to n.
We can use this to show that a linear transformation between two vector spaces of the same dimension is
invertible if its nullity is 0 or its rank is n.
For a inner product space V , if we have a linear transformation T : V V , then we can nd a linear
transformation T

: V V such that
T(

v )

w =

v T(

w)
The linear transformation T

is called the adjoint of T and a linear transformation is self-adjoint if T = T

.
A linear transformation is orthogonal if T
1
= T

. The term orthogonal arises here because if T is an


orthogonal linear transformation, then
T(

v ) T(

w) =

v T

(T(

w)) =

v T
1
(T(

w)) =

v

w
for any

v ,

w V . This means that T preserves the inner product and hence preserves information about
lengths of vectors and the angles between them.
If you have a linear transformation T and a vector

v such that T(

v ) =

v for some real number ,


then

v is called an eigenvector and is called the associated eigenvalue. All eigenvectors associated
with a particular eigenvalue is a subspace of V called an eigenspace associated with and sometimes
denoted V

.
Finally, if we have a linear transformation T, then there is a number det

(T) such that for any region


R which has volume vol(R), we will have vol(T(R)) = det

(T) vol(R). The value of det

(T) is called the


unsigned determinant or T. Notice we havent dened what the volume of a region means, for now
just assume it matches what you expect.
4
Theorem 5. If T is a linear transformation between two n-dimensional
vector spaces, then the following statements are equivalent:
T is invertible;
T has a nullity of 0;
det

(T) ,= 0.
3.4 Bases
A basis of a vector space is an ordered collection of vectors B =

b
1
, . . .

b
n
such that every vector

v
can be written uniquely as a linear combination of the basis

v =
n

i=1

b
i
.
A vector space has dimension n if it has a basis consisting of n elements. The standard basis for R
n
consist
of the vectors which have a 1 in the ith position and 0s in the remaining positions. So R
n
has dimension
n. We wont name these vectors here, they will be more important as a vector eld which we will describe
later.
The numbers
i
are called the coordinates of the vector with respect to the basis B. If we have
a basis B for a vector space V with dimension n, we dene the coordinate map relative to B as the
function
B
: V R
n
taking

v
i
). This function is one-to-one and onto (because B is a basis) and so
has an inverse taking
i
)
i
)
B
=

n
i=1

b
i
. We will also use the coordinates in the form of a column
matrix
[

v ]
B
=
_
_
_

1
.
.
.

n
_
_
_
when we talk about linear transformations.
Keep in mind that there are really three contexts that we are discussing: the vector space context
(

v =
i
)
B
V ), the coordinates context (
B
(

v ) =
i
) R
n
), and the matrix context ([

v ]
B
R
n1
).
The coordinates depend on which basis we are using, changing bases changes the values of the coordinates.
However, changing the bases does not change the vector itself, only how it is being described.
Before talking about good and bad bases, we present the following theorem which can be used to test for
bases.
Theorem 6 (Basis Test). If V is a vector space with dimension n, then
any two of the following statements imply that a set of vectors is a basis
(which implies that all three of the statements are true).
the vectors are linearly independent;
the vectors span the vector space;
there are n vectors in the set.
Some bases are better than other bases. In particular, we shall tend to prefer orthonormal bases. These
are bases such that

b
i

b
j
=
i,j
. This means that each basis element has norm 1 and that any two distinct
basis elements are orthogonal. Given any basis in any inner product space, you can use the Gram-Schmidt
Process to generate an orthonormal basis.
If you have a collection of eigenvectors from distinct eigenvalues, then these eigenvectors are linearly
independent. So, if you have n distinct eigenvalues, you will get n linearly independent eigenvectors and
5
these will form a basis. If the linear transformation is self-adjoint, then it will have such a basis and,
furthermore, it will have such a basis which is orthonormal.
If we choose a basis B for the domain and a basis B

for the range, then every linear transformation


between can be represented as a matrix [T]
B
B
R
mn
as follows. Let B =

b
1
, . . . ,

b
n
be a basis for the
domain and B

b

1
, . . . ,

b

m
be a basis for the range. Then for each vector

b
i
in the domain basis,
compute the coordinates of T(

b
i
) with respect to the range basis. This creates the ith column of [T]
B
B
.
Using the concatenation notation, we have
[T]
B
B
=
_
[T(

b
1
)]
B
; . . . ; [T(

b
n
)]
B

_
This means that once we choose bases for the domain and range, we can compute the aect of any linear
transformation on any vector as a matrix multiplication.
[T(

v )]
B
= [T]
B
B
[

v ]
B
For a basis consisting of eigenvectors, the matrix [T]
B
B
will be a diagonal matrix with the eigenvalues down
the diagonal.
The process of taking linear transformations to matrices can be reversed. If we start with an m n
matrix A (and bases B of V and B

of W), then we can create a linear transformation A)


B
B
: V W.
This linear transformation is dened by
A[

v ]
B
= [A)(

v )]
B

Got that? It isnt as tricky as the notation makes it appear. You start with

v , compute its column matrix
with respect to B, multiply this matrix by A and this product is the column matrix of the result with respect
to B

.
Things work out as expected if we pass from matrix to transformation and back to matrix or from
transformation to matrix and back to transformation.
[A)
B
B
]
B
B
= A
[T]
B
B
)
B
B
= T
If S and T are linear transformations with the domain of T contained within the range of S and we have
bases B of the domain of S, B

of the range of S, and B

of the range of T, then we have


[T S]
B
B
= [T]
B

B
[S]
B
B

This also works from the other direction. If A and B are matrices with sizes p n and m p respectively,
then
B)
B

B
A)
B
B
= BA)
B
B
.
If you have the column matrix of a vector in one basis B and want it in terms of another basis C, then
you can use the change of basis matrix [I]
B
C
which does nothing to the vector, but changes the basis from
B to C
[I]
B
C
[

v ]
B
= [

v ]
C
Changing bases in the other direction just involves invertin the matrix as [I]
C
B
=
_
[I]
B
C
_
1
. When working
with R
n
, it is usually easy to determine [I]
B
(where the range is using the standard basis). Then combine
matrices of this type (and their inverses) to obtain the change of basis matrix desired.
If you have the matrix of a linear transformation with respect to one pair of bases B and B

and want
it with respect to another pair of bases C and C

, then this can be accomplished by the formula


[T]
C
C
= [I T I]
C
C
= [I]
B

C
[T]
B
B
[I]
C
B
A number of concepts translate between the linear transformation context and the matrix context.
6
Transformation Matrix
T is self-adjoint [T]
B
B
is symmetric if B is orthonormal
T is orthogonal [T]
B
B
is orthogonal if B is orthonormal
det

(T) = [ det[T]
B
B
[
Rank of T equals the rank of [T]
B
B

Furthermore, the sign of [T]


B
B
does not depend on the bases, we dene det(T) to be the value of [T]
B
B
. The
sign of det(T) is explained in the next section. It also turns out that Tr(PMP
1
) = Tr(M) so we can dene
the trace of a linear transformation. There is a meaning to this value for linear transformations, but it isnt
readily accessible.
Notice that if you have a basis of eigenvectors, then T can be represented by a diagonal matrix with
eigenvalues
i
along the diagonal. In this case, det(T) =
n
i=1

i
and Tr(T) =

n
i=1

i
where we multiply or
sum over all the eigenvalues, using duplicate values as needed.
Orientation
Another property of bases of R
n
is their orientation. Given two bases, Band B

for R
n
, we can compute the
change of bases matrix M = [I]
B
B
. If the determinant of this matrix is positive, then we say that the two bases
have the same orientation, otherwise, we say that the two bases have the opposite orientation. Note
that the determinant cannot be 0 (as this would show that one of the bases was not a linearly independent
set). This means that bases of R
n
fall into two categories: those with the same orientation as the standard
basis are called positively oriented and those with the opposite orientation are negatively oriented.
Alternatively, these bases are called right-handed and left-handed.
Given a linearly independent set of n 1 vectors in R
n
, we can extend it to a positive basis by adding
a new vector as the last vector in the basis. For R
2
, we start out with a non-zero vector

v and we can
extend the set to a basis by including the vector
J(

v ) = v
2
, v
1
)
The function J is sometimes called the complex structure on R
2
. For R
3
, we start out with two linearly
independent vectors

v ,

w and we can extend this set to a basis by including the vector

w = v
2
w
3
w
2
v
3
, v
1
w
3
+w
1
v
3
, v
1
w
2
w
1
v
2
)
This is called the cross product (or vector product). If the n1 vectors we started with are orthonormal,
then these constructions result in orthonormal bases.
The complex structure on R
2
allows us to dene better angles between vectors in R
2
. In general, the
angle between two vectors (

v ,

w) is an angle between 0 and . However, using the complex structure, we


can nd an angle with 0 < 2 such that
cos() =

w
[

v [ [

w[
sin() =

v J(

w)
[

v [ [J(

w)[
4 Calculus
4.1 Continuity
Topology is the branch of mathematics that focuses on continuity. Below are some of the basic denitions
and results that will be used in this class.
If we have a point p E
n
, then we can dene the open ball around p of radius r
B
n
r
(p) = q E
n
[ dist(p, q) < r
7
and the punctured ball around p or radius r

B
n
r
(p) = q E
n
[ 0 < dist(p, q) < r
As usual, the superscript indicating dimension will often be omitted. A set U is open in R
n
provided for
every p U, there exists some r > 0 such that B
r
(p) U. A set N is a neighborhood of p provided
there is an open set U with p U N.
If F : E
n
E
m
, we dene the limit of F(q) as q approaches p as any point L such that that for all
> 0, there exists a > 0 such that
0 < dist(q, p) < = dist(F(q), L) < .
Alternatively, if for every -ball around L, we can nd a punctured -ball around p such that the -ball is
map completely inside the -ball.
F(

(p)) B

(L)
If the limit L exists, then it is unique.
A function F : E
n
E
m
is continuous at p E
n
provided we can allow the -ball to be unpunctured.
F(B

(p)) B

(L)
This forces the value of L to equal F(p). So we can characterize continuity in terms of limits as F is
continuous if and only if
lim
qp
F(q) = F(p)
If U is an open set, then we say that F is continuous on U if F is continuous for all points p U.
Often times, we do not have a function dened on all of R
n
, for example, if A R
n
and F : A R
m
is
dened, then F is continuous on A provided there is an open set U containing A and a continuous function
on U,

F : U R
m
such that

F(p) = F(p) for all p A.
Showing a function is continuous tends to be dicult and we will summarize these results as a single
omnibus theorem.
Theorem 7 (Continuity). If F : E
n
E
m
is a function such that each of
its coordinates is an algebraic combination or composition of the functions
listed below, then F is continuous everywhere it is dened.
Power Functions: x x
n
and their inverses (the root functions)
x
n

x;
Trigonometric Functions: based on x sin(x), cos(x) and their
inverses x arcsin(x), arccos(x).
Exponential Functions: x e
x
and its inverse (the log function)
x ln(x).
A continuous function : R E
n
is refered to as a path. Often times, paths can be used to show that
a function cannot have a limit at a particular point. For example, if there are two paths , : R E
n
such
that (0) = (0) = p and lim
t0
F((t)) ,= lim
t0
F((t)), then lim
qp
F(p) cannot exist (as it wouldnt
be unique).
Continuous functions have a number of nice properties. The two most important from calculus are the
following.
Theorem 8 (Intermediate Value Theorem). If f : [a, b] E
1
is con-
tinuous with y between f(a) and f(b), then there exists c [a, b] with
f(c) = y.
8
From a topological perspective, this result states that continuous images of connected sets are connected.
Theorem 9 (Extreme Value Theorem). If f : [a, b] E
1
is continuous,
then there exist c, C [a, b] such that
f(c) f(x) (x [a, b])
f(C) f(x) (x [a, b])
From a topological perspective, this result states that continuous images of compact (closed and bounded)
sets are compact.
4.2 Dierentiability
A function F : E
n
E
m
is singly-dierentiable at p if there exists a linear function : E
n
E
m
such that
lim
qp

F(p)F(q) (

pq)

pq

= 0
When this happens, the function is uniquely dened and called the dierential of F and we will use the
notation F
p
to refer to it. The subscript p will be omitted if the point is clear from the context. The singly-
dierentiability of a function F says that if we dene the function G : R
n
R
m
by G(

pq) =

F(p)F(q),
then the function G can be well-approximated by a linear function .
Probably the most important theorem of dierntial calculus is the following.
Theorem 10 (Mean Value Theorem). If f : [a, b] E
1
is dierentiable,
then there is a value c [a, b] such that
f

(c) =
f(b) f(a)
b a
This is what makes calculus work. It says that global behavior (the right side of the equation) is reected
by innitesimal behavior somewhere (the left side of the equation).
Notice that F
p
can be thought of as another function from E
n
E
mn
and we can dierentiate F
p
to
arrive at higher order derivatives. The notation (
r
is used to refer to functions whose derivatives up to order
r exist and are continous. So (
0
refers to continuous functions and (
1
refers to those that are dierentiable.
The notation (

refers to those functions all of whose partial derivatives of all orders are continuous. A
function is dierentiable if it is of class (

. Many of the theorems do not require that many derivatives


(four is usually the most necessary), but this denition means that we do not need to count derivatives while
we work.
We can also consider the partial derivatives of the function F by holding all but one variable constant.
In other words, if (y
1
, . . . , y
m
) = F(x
1
, . . . , x
n
), then we can compute
yj
xi
which indicates how the variable
x
i
aects the variable y
j
. These partial derivatives are usually collected into a matrix, J(F), called the
Jacobian matrix
J(F) =
_
_
_
y1
x1
. . .
y1
xn
.
.
. . . .
.
.
.
ym
x1
. . .
ym
xn
_
_
_
which follows the linear algebra convention that the outputs are determine the rows and the inputs determine
the columns. Notice that the Jacobian is a function that takes points p R
n
and produces a matrix in
R
mn
. We will sometimes place the input value p into a subscript so that we have
J(F)
p
= J(F)(p)
9
The determinant of this matrix is called the Jacobian and will be important when we consider change-
of-variables and integration.
Theorem 11 (Criteria for Dierentiability). If F singly-dierentiable,
then the partial derivatives of F all exist and [F
p
] = J(F)
p
.
Conversely, if all the rst order partial derivatives of F all exist and are
continuous, then the function is singly-dierentiable.
If we have continuity of the second order derivatives, then the order of dierentiation does not matter.
Theorem 12 (Clairauts Theorem). If (y
k
) = F(x
i
) and F is a function
all of whose second order partial derivatives are continuous, then for all
i, j, k we have

2
y
k
x
i
x
j
=

2
y
k
x
j
x
i
Another variation occurs if we only wish to include some of the columns of the Jacobian matrix. In this
case, we will subscript the J symbol with the variables for the columns that we are retaining. For example,
if we have F(x, y, z) = (u, v) and we wish only to consider only the way x and y aect u and v, then we will
look at the matrix
J
(x,y)
(F) =
_
u
x
u
y
v
x
v
y
_
which is only part of the entire Jacobian matrix
J(F) =
_
u
x
u
y
u
z
v
x
v
y
v
z
_
There are a number of basic rules of dierentiation that you learned in calculus, the most important is
the following.
Theorem 13 (The Chain Rule). Let F : E
n
E
p
and G : E
p
E
m
be two dierentiable functions and p = F(q). Then G F : E
n
E
m
is
also dierentiable and
(G F)
p
= G
q
F
p
Boy, that was easier than remembering all those formulas they give you in Multivariable Calculus. Each
of those formulas is just looking at one entry of the matrix multiplication. Heres an example.
Example: Assume F(s, t) = (x, y, z) and G(x, y, z) = (u, v). Then the matrix equations says
that
_
u
s
u
t
v
s
v
t
_
= J(G F) = J(G)J(F)
=
_
u
x
u
y
u
z
v
x
v
y
v
z
_
_
_
x
s
x
t
y
s
y
t
z
s
z
s
_
_
=
_
u
x
x
s
+
u
y
y
s
+
u
z
z
s
u
x
x
t
+
u
y
y
t
+
u
z
z
t
v
x
x
s
+
v
y
y
s
+
v
z
z
s
v
x
x
t
+
v
y
y
t
+
v
z
z
t
_
10
The notation for the dierential map, the Jacobian matrix, and the Jacobian are not at all standardized.
Heres a translation table between some of the more common notations from texts by Do Carmo
1
and Spivak
2
Object Gray Do Carmo Spivak
Dierential F
p

dF
p
DF(p)
Jacobian Matrix J(F)
p
_
Fi
xj
_

(xi)=p
F

(p)
Jacobian det(J(F))
p
(y1,...,ym)
(x1,...,xn)

(xi)=p
det(F

(p))
Inverse and Implicit Function Theorems
The most important theorems in multi-variable calculus for our purposes are the Inverse and Implicit Func-
tion Theorems. Each is easy to prove if you are willing to assume the other one.
Warning: these theorems are tough (not just to prove, but also to understand). They typically come
late in a Junior or Senior level real-analysis course. Im putting them out here and will try to explain them
as we use them. That means, dont worry if you dont understand them now, but you might want to think
about them a little bit before I try to explain them in context.
Theorem 14 (Inverse Function Theorem). Assume that F : R
n
R
n
is a dierentiable function and that J(F) is invertible at p.
Then there is a small neighborhood U of p such that F has a dierentiable
inverse F
1
: F(U) U. Furthermore, for each p U with q = F(p)
we have
J(F
1
)
q
= (J(F)
p
)
1
The inverse function theorem asserts that if the derivative (as a linear operator) is invertible at a point, then
the original function is also invertible in a neighborhood of that point. Again, like the Mean Value Theorem,
this theorem shows that there is a connection between the innitesimal behavior of a function and the local
behavior of the function.
Theorem 15 (Implicit Function Theorem). Assume that F : R
n+m

R
m
is a dierentiable function with notation (z
j
) = F(x
i
; y
j
). Assume
that (a
i
; b
j
) is a point in R
n+m
with (c
j
) = F(a
i
; b
j
) and that the matrix
J
(yi)
(F) =
_
_
_
z1
y1
. . .
z1
ym
.
.
.
.
.
.
.
.
.
zm
ym
. . .
zm
ym
_
_
_
is invertible at the point (a
i
; b
j
).
Then there exists a neighborhood U of (a
i
) R
n
and a neighborhood
V of (b
j
) R
m
such that each (x
i
) U is associated with a unique
(y
j
) V , thus dening a function (y
j
) = f(x
i
) with F((x
i
); f(x
i
)) = (c
j
).
Furthermore the function f is dierentiable.
Okay, that was more complicated. However, it says that if we can invert the derivative of the function
(y
j
) (z
j
), then we can solve the equation F(x
i
, y
j
) = (c
j
) for the variables y
j
.
1
Dierential Geometry of Curves and Surfaces
2
Calculus on Manifolds
11
4.3 Integration
Now that we know all about derivatives, we need to set up some facts about integrals.
Given a bounded region A E
n
and function F : A R, we extend A to be a rectangle R = [a, b] [c, d]
and dene F : R R by setting F(p) = 0 for all p , A. Then we partition R into small rectangular regions
= R
i
. Since these regions are rectangular, we can compute their volume vol(R
i
) as the product of their
side lengths. We dene the diameter of these regions as follows
diam(R
i
) = maxdist(p, q) [ p, q R
i

and the mesh of the partition as


mesh() = maxdiam(R
i
).
From each of these R
i
, we choose a sample point p
i
R
i
. Finally, we compute the sum

i
F(p
i
) vol(R
i
).
We say that F is integrable over A provided provided there exists a number L such that for any > 0,
there exists a > 0 such that
mesh() < =

i
F(p
i
) vol(R
i
) L

<
If such an L exists, it is unique and is denoted by
_
A
F. If the funtion F(p) = 1 is integrable over the region
A, then we say that A is measurable and dene
vol(A) =
_
A
1.
Theorem 16 (Integrability). If A is measurable and F is continuous on
A, then F is integrable over A.
In one dimension, we normally denote
_
[a,b]
f by the notation
_
b
a
f(x) dx. Furthermore, if a < b, we dene
_
a
b
f(x) dx as
_
[a,b]
f. The rst result that I will present is the Fundamental Theorem of Calculus.
Theorem 17 (Fundamental Theorem of Calculus). If f is a integrable
function on the interval [a, b] and c is any point in this interval, then the
function
F(x) =
_
x
c
f(t) dt
is dierentiable and F

(x) = f(x) on the interval [a, b].


This can be used in two directions. The rst is given a initial value problem where F

, c, C are known
F(c) = C, we wish to nd F such that F(c) = C. The solution is that F(x) =
_
x
c
f(t) dt +C. The opposite
occurs when F, a, b are known and we wish to nd
_
b
a
F

(t) dt, the solution in this case is that the integral


equals F(b) F(a).
Now we prove that we can do this integration one dimension at a time (and hence use the previous
theorem to compute higher dimensional integrals).
12
Theorem 18 (Fubinis Theorem). Let A E
n
and B E
m
be closed
rectangles, and let F : AB R be integrable. For x A, let G
x
: B
R be dened by G
x
(y) = f(x, y). Furthermore, assume G
x
is integrable.
Set
G(x) =
_
B
G
x
.
Then G is integrable on A and
_
AB
F =
_
A
G
In less formal notation, this says
_
AB
F =
_
xA
__
yB
F(x, y) dy)
_
dx
where the intergals on the right are referred to as iterated integrals for F.
The last theorem is the change of variables theorem for integration.
Theorem 19 (Change of Variables). Assume that A is an open measur-
able set in E
n
, F : G(A) R is integrable and G : A R
n
is one-to-one
and dierentiable such that det(J(G))
p
,= 0 for all p A. Then F G is
integrable and
_
G(A)
F =
_
A
F G[ det(J(G))[
13

You might also like