You are on page 1of 30

On the Perturbation of Pseudo-Inverses, Projections and Linear Least Squares Problems

Author(s): G. W. Stewart
Reviewed work(s):
Source: SIAM Review, Vol. 19, No. 4 (Oct., 1977), pp. 634-662
Published by: Society for Industrial and Applied Mathematics
Stable URL: http://www.jstor.org/stable/2030248 .
Accessed: 16/07/2012 14:32
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extend
access to SIAM Review.

http://www.jstor.org

SIAM

REVIEW

Vol. 19, No. 4, October 1977

ON THE PERTURBATION OF PSEUDO-INVERSES, PROJECTIONS


AND LINEAR LEAST SQUARES PROBLEMS*
G. W. STEWARTt
Abstract.This paper surveys perturbationtheory for the pseudo-inverse (Moore-Penrose
generalizedinverse),forthe orthogonalprojectiononto the columnspace of a matrix,and forthe
linearleast squares problem.

The pseudo-inverse(or Moore-Penrosegeneralized


1. Introduction.
the
inverse)of a matrixA maybe definedas theuniquematrixAt satisfying
[dueto Penrose(1955)]:
conditions
following
(l.l1a)

AtAAt=At,

(li.b)

AA tA =A,

(1. lc)

(AA t)H =AA t,

(li.d)

(AtA)H=AtA.

investigated
have been extensively
and itsgeneralizations
The pseudo-inverse
is thatit
in thepseudo-inverse
and widelyapplied.One reasonforthisinterest
in
constructions
thesuccinctexpression
geometric
of someimportant
permits
and
withthepseudo-inverse
space.Thispaperwillbe concerned
n-dimensional
ontoa subspace
projection
theorthogonal
constructions:
tworelatedgeometric
andthelinearleastsquaresproblem.
The orthogonal
projectiononto a subspaceX is the uniqueHermitian,
from
P whosecolumnspace[denotedbyR (P)] isX. It follows
matrix
idempotent
(1.la) thatthematrix
PA =AAt

andR (PA) = R (A). HencePA


isHermitian
andfrom(1.lb) thatPA isidempotent
showsthat
is theorthogonal
ontoR (A). A similarargument
projection
(1.2)

RA=AtA

is theprojection
ontoR/(AH), therowspaceofA.
of
isthesolutionofthelinearleastsquaresproblem
Thesecondconstruction
choosinga vectorx to minimize
p(x) = ||b-AxII2,
denotestheusualEuclideannorm.Thesolutions
whereb isa fixedvectorand11112
ofthisproblemaregivenby
Z
x = A tb a (I-RA.
(1.4)
(1.3)

* Received by the editorsAugust 18, 1975, and in revisedformFebruary15, 1976.

t ComputerScience Department,Universityof Maryland,College Park,Maryland20742. This


workwas supportedin partby the Officeof Naval Research.
634

ON THE

PERTURBATION

OF PSEUDO-INVERSES

635

WhenA has fullcolumnrank,RA = I and thesolution


wherez is arbitrary.
itis easilyverified
from(1.1) and(1.2) thatA tbis
x = A tbis unique.Otherwise,
theorem
to (I-RA)Z, so thatbythePythagorean
orthogonal
IIxII2
= IAtb 2 +II(I-RA)ZI2.

norm.
It followsthatx = A tbis theuniquesolutionof(1.3) thathas minimal
inA onA
ofperturbations
Theobjectofthispaperistodescribetheeffects
ontoR (A), and
ontheprojection
onPA,andonA tb;i.e.,onthepseudo-inverse,
are
on the solutionof the linearleast squaresproblem.Such descriptions
tools.
forthreereasons.First,the resultsare usefulmathematical
important
the elementsof A willseldombe known
Second,in numericalapplications
inA.
oftheuncertainties
anditis necessary
tohaveboundson theeffects
exactly,
and leastsquares
projections
processesforcomputing
Finally,manynumerical
had been performed
on a perturbed
solutionsbehaveas ifexactcomputations
and
whosesizedependson thealgorithm
A + E, whereE is a smallmatrix
matrix
usedinitsexecution.
thearithmetic
bounds,
We shallbe concernedwiththreekindsof results:perturbation
The perturbation
andderivatives.
boundsare neededin
expressions,
asymptotic
and derivatives
are
above.Asymptotic
expressions
theapplications
mentioned
is actuallyknown.Moreover
usefulcomputational
toolswhentheperturbation
bounds.Not
theycan be used to checkthe sharpnessof the perturbation
bound
sharpperturbation
to obtaina reasonably
itis ratherdifficult
surprisingly
forms
Asymptotic
oftheperturbations.
thattellsthecompletestoryoftheeffects
are easierto comeby.
andderivatives
webeginin? 2 witha
In ordertomakethissurvey
reasonably
self-contained,
In ? 3 wedeveloptheperturbation
theory
for
background.
reviewofthenecessary
in ? 4 fortheprojection
PA,andin ? 5 fortheleastsquares
thepseudo-inverse,
solutionA tb.
inversesee the
on thegeneralized
Notesand references.
For background
andRao and
and
Odell
(1971),
Ben-Israel
and
Greville
(1974),
Boullion
booksby
is
due
whose
papers
Mitra(1971).Theexpression
to
Penrose
(1956),
(1955),
(1.1)
inthepseudo-inverse.
interest
initiated
thecurrent
andrelatedprobon perturbation
forpseudo-inverses
theory
Manyarticles
To datethemostcompletesurveyof the
lemshaveappearedin theliterature.
andunifying
problemhas beengivenbyWedin(1973). In additionto collecting
thispaperwillpresentsomenewresults.
earliermaterial,
2. Preliminaries.
thispaperwe shalluse the notationalconventions
Notation.Throughout
matrices
aredenotedbyuppercaseitalicand
ofHouseholder
(1964).Specifically,
andscalarsbylowercaseGreek
vectors
Greekletters,
bylowercaseitalicletters,
C" thesetofcomplex
ThesymbolC denotesthesetofcomplexnumbers,
letters.
AH is the
The matrix
and Crxn thesetof complexmx n matrices.
n-vectors,
ofA. The columnspaceofA is denotedbyR(A), and its
conjugatetranspose
byR (A)'.
complement
orthogonal
A e Crxn with
witha fixedmatrix
We shallbe concerned
rank(A) = r.

636

G. W. STEWART

ofA andwe shallset


E E CmXn willdenotea perturbation
The matrix
B =A +E.

ofC', weshallbe atsomepainsto


withthegeometry
Sinceweareconcerned
transformations
byunitary
insucha waythattheyarenotaffected
castourresults
normsbelow).We mayuse thisfactto
invariant
(cf.the sectionon unitarily
let U =
problemsintoa simplerform.Specifically,
ourperturbation
transform
withR (U1) = R (A) andletV = (V1, V2)bea
matrix
(Ul, U2) E mX m be a unitary
with42( V1)= R (A H) ThenUHAV hastheform
matrix
unitary
?)

UA V= (

(2.1)

UHEV and UHBV conforWe shallpartition


whereA1l E Crxr is nonsingular.
mallywithUHA V:
UHEv

(Ell

E2.

(B21

UH

E12

E22

Bl

B12\=

B22

(All+Ell

E21

E12

E22

A, B, andE, andinthe
ofthematrices
willbe calledreducedforms
Theseforms
form.In thiscase,
reduced
in
are
the
matrices
that
assume
shall
often
we
sequel
is givenby
thepseudo-inverse

A =(

(2.2)

lO

resultthatin thereducedform(2.1) the


values.It is a well-known
Singular
U1 and V1maybe chosenso that
matrices
= diag (aO,

A11

, r),

02,

where
.

cYli> 0

r>?

A, and
ofthematrix
valuedecomposition
Thisreducedformis calledthesingular
the
and
(2.2)
valuesofA. Fromtherelation
thenumbers
o-iarecalledthesingular
factthat(UHAV)t= VHAtU, it followsthat
At=

V(X

UH.

A, whichwillbe denotedbyo-i(A),canbe
valueofa matrix
The ithsingular
intheform
written
(2.3)

ai(A)=

sup

inf IIAxII2 (i=1,2,

dim(,t)= i xFET

n),

= 1
lIX112

where

(2.4)

= _-1YH
IIYI12

providesa naturalconvention
is theusualEuclideannorm.Thischaracterization
for numberingthe singularvalues of a rectangularmatrix:A e Cmxn has n

ON THE

637

OF PSEUDO-INVERSES

PERTURBATION

singularvalues of whichn - r are zero; A H has m singularvalues of whichm - r


arezero.The nonzerosingular
valuesofA andAH arethesame.

Two inequalities
thatwe shallneedin thesequelfollowfairly
from
directly
(2.3). Theyare
o-i(A ) - o-,(E) _-o-i(A + E)

o-ci
(A ) + o-i(E)

o-i(AC)o-'io(A) o-,(C),

o-1(A)rri(C).

and
(2.5)

Unitarilyinvariantmatrixnorms.A normon C Xnisa functionlii


C1mxn-R

theconditions
thatsatisfies

1. A0#

2. IIaAII=IaIIIAII,

(2.6)

3.
A norm

IIAII>O,

IA+BII?II||AII+IIBII.

is unitarilyinvariantif
IIUHAVII= I|A||

matrices
U andV.Theperturbation
boundsinthispaperwillbe cast
forallunitary
willnowbe described.
intermsofunitarily
invariant
whoseproperties
norms,
valuedecomposimatrices
thesingular
Let U and V be theunitary
realizing
A e CmXn.Thenforanyunitarily
invariant
norm11
tionofthematrix
-IIm,n

(2.7)

IIAIIn =

l( o)|m

=
IUHAVIIm,n

ThusIA IlIm,n
isa function
valuesofA, say
ofthesingular
= (Pm,n(LTi,02,
IIAIIm,n

(2.8)

,,n)

It followsfrom(2.6) that(Pm,nregardedas a function


on Rnis a norm.Sincethe
is a unitary
of
transformation
oftworowsortwocolumns
ofa matrix
interchange
in itsargumentsO-1,cr2,
, -n.It can
the matrix,thefunction(Pm,nis symmetric
in thesensethat
is nondecreasing
also be shownthatPrm,n
(2.9)

0 -<0- <i

(i = 1, 2,

n)

4>

Pm,n(01,
0in)

n(0l1
-0 m,

OSn)

isgenerated
We shallsaythatthenorm11
by Om,n.
-IIm,n
An important
normis thespectralnorm11112generatedbythefunction
sp
defined
by
(02((Tl, (a2,

an) = max{I|-1I,
.

L}
Ioan

Thisnormcan also be defined


bytheequation
(2.10)

sup IIAxII2,
IIA112=
IIX 112= 1

where11'112on therightdenotestheEuclideannormdefined
by(2.4).

638

G. W. STEWART

relationwithother
an important
consistency
The spectralnormsatisfies
normgenerated
byp0,
thenit
If11|Iisa unitarily
invariant
unitarily
invariant
norms.
followsfrom(2.5) and (2.9) that
(2.11)

|ICDII1C1|C11211DII
II,CIIIIDI12

e CrmXn andrespectively
whenever
E C?mxnor CIIe
IICDII
E CmXn
lIDII
normis the Frobeniusnorm
A secondexampleof a unitarily
invariant
generated
bythefunction
,

(PF(01,

On)

(1

+ n

Foranymatrix
A E Cmxn
IIAII=
1

i=F

j=l

Ia21I=trace(AHA).

theconsistency
relation
The Frobeniusnormsatisfies

| |ICIIFIID
IIF.
IICDIIF
weshallwork
dimensions,
ofvarying
Sinceweshallbe dealingwithmatrices
CmXn.
It
isimportant
witha family
defined
on
invariant
norms
=
ofunitarily
U??,n1
withone another
properly.
Accordinteract
thattheindividual
normsso defined
definition.
ingly,
we makethefollowing
invariant
DEFINITION
2.1. Let |I|11 U??,n=1 Cmxn-->R be a familyof unitarily
ifthereis a symmetric
function
defined
generated
norms.
Then1111is uniformly
p0,
such
that
with
a
finite
number
of
nonzero
forall infinite
terms,
only
sequences

= SD
(A), 02(A) ,
(o-1
IIA11

on (A), O,O,)

if
forall A E CtmXnIt is normalized
lxII= IX112

as a matrix.
foranyvectorx considered
theconditions
(2.6). Any
mustsatisfy
Thefunction
spintheabovedefinition
Indeedwe have
canbe normalized.
normdefined
bysucha function
p(P-1(X), 0, 0, ' ' ') =

0, 0, ..),
p(1IX112,
= Ui|X
of
,u thatis independent
thatllx
112
forsomeconstant
fromwhichitfollows
11
'p
thedimension
of
then
the
normalized
family
generates
ofx. The function
A1S
norms.
A uniformly
First,since
family
ofnormshassomeniceproperties.
generated
thenonzerosingular
valuesofa matrix
anditsconjugatetranspose
arethesame,
we have
=
llxii

IIA HII =

ii.

isbordered
i.e.,
itsnormremains
unchanged;
byzeromatrices,
Second,ifa matrix

ON THE

PERTURBATION

OF PSEUDO-INVERSES

639

In particular
ifA is in reducedform,
then
and IIAtII=11A-11.
IIAiI=IIA11j1
Itisalsoa consequence
of(2.12)that(2.11)holdsfora uniformly
generated
family
ofnorms
whenever
theproduct
CD isdefined,
as maybe seenbybordering
C and
D withzeromatrices
untiltheyarebothsquare.
A thirdproperty
is thatif1I is normalized
then

IIA
IA 112

11

(2.13)

In factfrom(2.11) andthefactthatlxi
we have
I= 11xI12,
= IIAx
A
(2.14)
IIAxII2
= 11AIIX112
11

is thesmallest
number
forallx. Butby(2.10) h|All2
forwhich(2.14) holdsforallx,
fromwhich(2.13) follows.A trivialcorollary
of (2.11) and (2.13) is thatliiiis
consistent:

IICDII
=II|CIII
IDI|.
Finallywe observethat
< IIDxII2
4 GC||
VxIICxII2
= IDII.

(2.15)

To provethisimplication
notethatby(2.3) thehypothesis
impliesthato-j(C)?
IICII=
|IIDIIfollowsfrom(2.9).
oir(D).Hencetheinequality
In thesequel 1111
willalwaysreferto a uniformly
generated,
normalized,
unitarily
invariant
norm.
Perturbationof matrixinverses.We shall later need some resultson the

in the
inversesofperturbations
of nonsingular
matrices.
Thesearesummarized
theorem.
following
THEOREM

2.2 If A and B = A + E are nonsingular,


then

(2.16)

JIB-1 A 1l/IIA
-'11?

IhEIIIIA
j,

where

(2.17)

Ic= IIAl!'IB-112.

If A is nonsingularand

(2.18)

IIA-11121IEII< 1,

thenB is a fortiori
nonsingular.In thiscase

(2.19)

IIAII/y,
JIB1=C

and

(2.20)

JIB-'-A-11K IIEll
IIA-ll1 =y IIA1'

where

(2.21)

K =

IIA-1112
IIA11

640

G. W. STEWART

and
y=

1- K

IE|II/IIAII> 0.

The bound(2.16) placesno restrictions


itsuse
on thesize ofE; however,
such
requiressome estimateof the size of B-1. WhenE satisfies
one
(2.18)
is givenby(2.19),fromwhichthebound(2.20) follows.
Thisboundhas
estimate
in termsof thematrix
A. Pairsof
theadvantagethatit can be statedentirely
a
of
boundsanalogousto(2.16) and(2.20)willrepeatthemselves
throughnumber
as willthepairs,& and K. The numberK measuresthe
subsequenttheorems,
inA andis usuallycalledthecondition
number
sensitivity
ofA-1 toperturbations
ofA (withrespectto inversion).
We havealreadyobservedthattheorthogonal
Projections.
projections
PA
andRA ontothecolumnspaceandtherowspaceofA canbe expressed
interms
of
The projection
thepseudo-inverse.
ontoR(A)' willbe denotedby
PA-

Likewise

I-PA.

RA-I-RA

willdenotetheprojection
ontoR(AH)'.
WhenA is in reducedform,
itsprojections
can be easilywritten
out:
PA(

0)eC

(OA

RA=(

mxm,

RA

)Cnxn.
)E?

=(O

It followsthat
= IIA11II
IIPAARA11

and
11E1111,IIPAERAl=

IIPAERAI

1=

IIPAERA

11= 11E2111,

IIPAERALII

IIE1211,
IIE2211.

Theseidentities
enableus to pass fromresultsforthereducedformto general
resultsstatedintermsofprojections
ofA andE.
We shallneed some properties
later.These are
of normsof projections
in thefollowing
summarized
theorem.
THEOREM
2.3. ForanyA and B thefollowing
statements
aretrue.
1. If rank(A) = rank(B), thenthesingular
valuesofPAPB andPBPA arethe
sameso that
IIPAPBI I = IIPBPAJI

Moreover
thenonzero
valuesa-ofPAP' correspond
topairs+o- of
singular
eigenvalues of PB

- PA,

so that

IIPB -PA112

= IIPAPBII2 = IIPBPAiI21

2. If IIPB- PA 112< 1, then rank (A) = rank (B).

3. If rank(B)?' rank(A), then

IIPBPA'I

IIPBPA

11

ON THE PERTURBATION

OF PSEUDO-INVERSES

641

a
however,
in theliterature;
Proof.Proofsofparts1 and2 arereadilyfoutid
ofpart1 is givenintheAppendix
decomposition,
proof,basedon a usefulmatrix
to thispaper. For part3 writePB = P1+ P2 whererank(P1)= rank(A) and
to 92(A)). Then
PAP2=0 (i.e.,R (P2) is orthonormal
IIPAPBI = IIPA(I

P1 - P2)J1= IIPA(I- P1)I

= JP1PAJJ,

frompart1. Nowforanyx
thelastequalityfollowing

II'
JJP1PAX
C IIPBPAX
11
andtheresultfollowsfrom(2.15). 0
in termsofE.
WhenB = A + E, we canestimateIIPBPAII
intheform
PBPA can be written
THEOREM 2.4. Theproduct
PBPA = (B')HRBEHPA.

(2.22)
Hence
(2.23)

IIPBPAll_-IIB'11211EII,

and ifrank(A) = rank(B), then

||A JJ2}IIEII.
_ min{j1Btjj2,
IIPBP'll

(2.24)
Proof.We have

PBPA = PBPA = (Bt)HBHP3


= (B )H (A + E)HPA

= (Bt)HEHPR
(Bt)HBH(B')HEHPA

= (Bt)HRBEHPA,

whichestablishes(2.22). The inequality(2.23) followsupon takingnormsin


(2.22). Finally(2.24) followsfrompart1 ofTheorem2.3. [1
of
Theorems2.3 and 2.4 haveobviousanaloguesforothercombinations
to thesetheorems
(e.g.RIRA = -A tERB).In thesequela reference
projectors
variants.
willalso coveranytrivial
later.We have
important
The case whenIIPB- PA 12< 1 willbe particularly
seeninpart2 ofTheorem2.3 thatinthiscaserank(A) = rank(B). Howevermore
to98(B) andviceversa.Forsuppose
is true:novectorinR (A) canbe orthogonal
thatx $ 0 satisfiesPAX= x and PBX= 0. Then (PB - PA)x = -x, whichimpliesthat
if IIPB- PA112= 1 thenthereis a vectorin R (A) or
IIPB- PA112> 1- Conversely
toR(B) ori(A). To see this,notethatbyTheorem2.3,
R(B) thatis orthogonal
part1, thereis a vectorx suchthat(PB - PA)x = x. IfPAX= 0 thenPBX= x, which
showsthatx E Rp(B)and x E R(A)'. If,on theotherhand,PAX 0O,thensince
PAX= -(I-PB)x wehavePB(PAX) = 0, whichshowsthatPAX E Q (A) and PAX E
R (B)I.

we shallsaythatR (A) and52(B) are


Becauseoftheaboveconsiderations
<
1.
A andB areacuteif
We
shall
saythatthematrices
acutewhenever
IIPBPA112
are
In thiscaseweshall
acute.
are
and
R
and
R(BH)
(AH)
R
acute
R (A) and (B)
The
theoremgives
B
is
A.
that
an
of
following
also say
acuteperturbation
be
A
to
for
and
B
acute.
and
sufficient
conditions
necessary

642

G. W. STEWART
THEOREM

(2.25)

2.5. The matricesA and B are acute ifand onlyif


rank(A) = rank(B) = rank(PABRA).

Weshallusethereducedforms
ofA andB. Firstsuppose(2.25) holds.
Proof.

Then rank(B11) = rank(A1l), and B11 is nonsingular.Thus


R (B) = R

")]R

But
R(A)

[)

toR (B) and


from
whichitiseasilyseenthatnovectorinR (A) canbe orthogonal
showsthatR (A H) andR(BRH) are also acute.
viceversa.A similarargument
Now assumethatA and B are acute.Then rank(A) = rank(B)-' rank(B11).

Letp andq be leftand


Assumethatrank(B11)< rank(A), so thatB11is singular.
matrices
whose
nullvectors
andletP andQ be unitary
unity,
right
ofB of2-norm
first
columnsarep andq. Considerthereducedforms
PAHllQ
o
0

P BlllQ
J

E2109

PHE12
E22J

Thefirst
rowandcolumnofPHB,1 Q iszero.IfE21q ? 0,thenthenonzerovector
(E21q)

ifPHEk ?0, thenthenonzero


to R(A). Similarly
is in R(B) and is orthogonal
vector
(EHpJ

toR (A H). IfE21q= 0 andpHE12= 0, thentheunit


isinR (BH) andisorthogonal
to R/(B) and R/(BH). In all
vectore1 is in R(A) and R(AH) andis orthogonal
that
or equivalently
thatB1, is nonsingular,
establishes
casesthecontradiction
rank(A) = rank(B) = rank(B11). 0

thatrank(B) = rank(A), Theorem2.5 showsthat


Beyondtherequirement
forA andB tobe acute.ByTheorem2.2 thiswillbe true
B11mustbe nonsingular
< 1. This
< 1 or equivalently
wheneverIIA II2IIPAERAII
wheneverIlAi-Il2IIE11II
small.
whenEl1 is sufficiently
is alwayssatisfied
condition
valuesarewellknown.See
The properties
ofsingular
Notesandreferences.
and Gohbergand Krein(1969) fora more
Stewart(1973) foran introduction
in an infinite
dimensional
setting.
detailedtreatment
can
invariant
norms
toprovethatunitarily
VonNeumann(1937)wasthefirst
in
is
values(thefunction
as a function
of singular
be written
(2.7)
usually
pm,,n
invariant
treatments
ofunitarily
calleda symmetric
Systematic
gaugefunction).
(1960) andGohbergandKrein(1969).
normsmaybe foundin Mirsky

ON THE

PERTURBATION

OF PSEUDO-INVERSES

643

The treatment
ofunitarily
invariant
normsin finite
dimensional
spaceshas
oftenbeena littlesloppy.In infinite
thereis usuallyonlyone
dimensional
settings
spaceand one generating
function,
and thesameis truein a finite
dimensional
setting
whenoneis concerned
withsquarematrices.
However,whenone considersrectangular
matrices
withvarying
dimensions,
different
normscanbe usedfor
different
and thereis no reasonwhythesenormsshouldinteract
dimensions,
nicely.Howbad things
cangetisillustrated
bythefamily
ofnorms11 defined
for

A eCmxn by

=- IIA112.
IIA11
n

This familyis unitarily


invariant
and consistent,
but IIAHII IAI, unlessA is
square,andtherelation(2.13) doesnotholdingeneral.Definition
2.1 represents
a return
case.
to thesimplicity
oftheinfinite
dimensional
Theorem2.2 is classicalandis usuallyprovedbyan appealtotheNeumann
..
+A2+.
seriesrepresentation
Wilkinson(1965) gives a
(I-A)-'=I+A
proofthatdoesnotuseseriesanddiscusses
atsomelength
thenotionofcondition
= 1; however,
number.
Theresultis usuallyprovedundertheassumption
that11111
theproofscanbe extendedto establish
theresultforanyconsistent
norm.
inTheorem2.3 arewellknowntopeoplewhoworkcloselywith
The results
orthogonal
projectors
(e.g.,see Afriat(1957) orWedin(1969)).The decomposiina slightly
weakerformbyWedin(1973).In
tioninTheorem2.4 wasestablished
some cases, whenE is small,RB will be near RA and the approximation
willbe morerealistic
in (2.23).
IIPAERBII-IIE2111
The numberIIPB-PA 112
is closelyrelatedto variousmeasuresofseparation
betweensubspaces.See Kato (1966) and especiallyDavis and Kahan (1970)
wherefurther
references
maybe found.Theorem2.4,withIIPAERAIIreplacedby
refers
to theangle
IIElI,is provedbyWedin(1973). The term"acute"ordinarily
subtendedby two line segments,
and it is
not to the segmentsthemselves,
whensubspacesare said to be acute.But thisusagewill
technically
misapplied
causeno confusion
anditis betterthantheuglyphrase"intheacutecase." The
term"acuteperturbation"
isnew,butthenotionisintroduced
inWedin(1973).
3. The pseudo-inverse.
In thissectionwe shallconsiderthe problemof
bounding
onefor
IIBt- A tllintermsof||ElI.We shallobtainthreebasictheorems:
whenrank(A) $ rank(B), oneforwhenrank(A) = rank(B), andoneforwhenB

is an acuteperturbation
ofA. All thesetheoremsare based on expressionsforBt,
whichalso yieldasymptoticexpressionsforBt and expressionsforthederivative
of At.

Lowerbounds.Beforeproceeding
to obtainboundson IlBt-Atll,we shall

showhow bad thingscan be by derivinglowerbounds.


THEOREM
3.1. If A and B are notacute,then

(3.1)

IlBt-AtlI2
_1/h1Ell2.

rank(B) _ rank(A), then


If,further,

(3.2)

IIBtII2?1/h|Ell2.

644

G. W. STEWART

Proof.Suppose fordefiniteness
thatrank(B) ' rank(A). Then thereis, say,
workwithA H and
a vectory E R (B) withIY
Y12 = 1 suchthaty E R (A)' (otherwise
BH). Thus
1 = yHy

= yHPBy

yHBBty = yH(A +E)B ty

= YHEBtYCIIEII2IIBtyII2,
Fromthisandthe
whichshowsthatliBty|12,
andhencel|Bt|12
isnotlessthan1/h|Ell2.

factthatA ty= A tPAy = 0 we have

112
11
1 IBtyII
112(Bt-At)Y
CIBt-AtIl.O

=
hJEll2

Theorem 3.1 shows that the pseudo-inverseof a general matrixis not a


is restricted.
continuousfunctionofitselements,unlesstheclass ofperturbations
It also saysthatiftwonearbymatricesdo nothave acute columnand rowspaces,
thenone of themat least musthave a largepseudo-inverse.Moreoveriftheyare
of the same rank,thenbothof themmusthave largepseudo-inverses.
A decompositionofBt - At. In spiteofthe negativeresultsin Theorem3.1,
it is possible to obtain bounds on IIBt-Atll in the generalcase, althoughthese
bounds need not remainfiniteas B approachesA. The basis forobtainingsuch
bounds is containedin the followingtheorem.
ofBt -At are valid:
THEOREM
3.2. Thefollowingtwodecompositions
(3.3)

Bt-At=-BtPBERAAt+BtPBPA-RBRAAt,

(3.4)

Bt-At= -BtPBERAAt+(BHB)tRBEHPA'

R BEHPA(AA

H)t.

Proof.Both expressionscan be verifieddirectlyby replacingE withB -A,


replacingthe projectorsby theirexpressionsin termsof pseudo-inverses,and
simplifying.O
It should be noted that (3.4) can be obtained directlyfrom(3.3) by using
Theorem2.4 to expressPBPA and RBRA in termsof E.
The generaltheorem.We are nowin a positionto provethegeneraltheorem
boundingIlBt- A tll1
THEOREM
3.3. For any A and B withB = A +E,

- At C
IlBt

max{IIAt112,
IIBtII2}IIEII,

whereA is givenin thefollowingtable:

11I11 arbitrary spectral


y

Frobenius

of theproofgivenbyWedin (1973).
Proof.The proofis a slightmodification
We shall give onlythe proofforthe Frobeniusnorm.

ON THE

PERTURBATION

645

OF PSEUDO-INVERSES

rank(B) ?rank (A). Let F1, F2, and F3 denote the


Suppose fordefiniteness
side of (3.3). Then thecolumnspaces of F1 and F2
threetermson theright-hand
are orthogonalto the columnspace of F3. Hence

l|Bt- A t|12 = IF1+ F2112


+ IIF3II1.

(3.5)

Now sinceF1 + F2 = Bt(PBDA tPA+ PBPA),

+ IIPBP1
112).
+ F2112
tPA112
IIF1
?- IBtII2(IIPBEA
But fromTheorems2.4 and 2.5

|
t|2 + IIPBPA
IF
IIPBEAtpA112+ IIPBPAIIF
IIPBEA
+ IIP_EA t2 = IIEAtl2
IIPBEA tII2

Hence

IIEIIIIA
t1l2

+ F2IIF
C-IAtII2IIBtII2IIEIIF.
JIF1

(3.6)

Also fromTheorem2.5

(3.7)

=
RBRAIF= IIAtI2IIRAR
BIIF
IIF3IIFI|AtII2IR
tERBIIF
C IIAtII2IIEIIF,
= IIA
tI2lIA

and theresultfollowson combining(3.3), (3.6), and (3.7). Since thefinalboundis


in A and B, it also holds whenrank(B) _rank(A). O
symmetric
It shouldbe notedthattheseboundsdo notimplythatIIBt- A til is smallwhen
is small,since Bt maygrowunboundedlyas E approacheszero.
IIElI
The case rank(A) = rank(B). When A and B have the same rank,we can
strengthenTheorem 3.3 in two ways. First, we can replace the term
more
max{IIAtII2,
withtheproductIJAI1211BI12.
Secondwe can distinguish
IIBtII2}
A
with
In
that
m
the
constant
the
theorem
recall
E
Cm
X"
cases for
_ n.
following
A.
THEOREM
3.4. If rank(A) = rank(B), then

tIl21lBtll2llEII.
|lBt- Atll AIIA
where
table.
A is giveninthefollowing

(3.8)

>

X X1

Arbitrary

Spectral

Frobenius

rank(A)<min (m,n)

rank(A) = m $ n =min(m,n)

di

rank(A)=m=n

(1+14)/2

The proofof thistheoremmaybe foundin Wedin (1973). The bound (3.8)


maybe recastin the form

(lBt
(3.9)

- AtIIc

|EIl
Al
/A1K
~~~~~~~IIBtII2

646

G. W. STEWART

where
K

= IAJJ
IIA'I12

In thisformthe resultis almostanalogous to the bound (2.20) forthe inversein


Theorem2.2. The bound (3.9) also impliesthatas E approacheszero,therelative
error in Bt approaches zero, which furtherimplies that Bt approaches At.
Remembering,on the other hand, that if rank(B) $ rank(A) then A and B
cannotbe acute,we have fromTheorem3.1 the followingcorollaryof Theorem
3.4.
COROLLARY
3.5. A necessaryand sufficient
conditionthat
lim Bt=At

B-OA

is thatrank(B) = rank(A) as B approachesA.


It is evidentfromthe proofsof Theorems3.3 and 3.4
Acute perturbations.
thatwe have givenawaymuchin derivingthebounds.In particular,ifB is a small
acute perturbation
ofA thenPA and PB are nearlyequal, and thesame is trueof
RA and RB. Thus itfollowsfrom(3.4) thatBt _-Atcan be decomposedintothree
terms-one essentiallydepending on PAERA, one on PAERA, and one on
PAERA. However,thisdoes not tell the whole story;forwe shall show thatthe
dependencyof Bt -At on PAERA and PAERA is bounded,no matterhow large
theseprojectionsmaybe.
In order to state our theoremsconcisely,we must firstintroducesome
be generatedby'p and foranyF Ckxr (k ' r) define
additionalnotation.Let || 11
( 3. 1 0)

[ 1 + _2(F)9] 1/2]

~~[[1 + S.2(,F) ]1/2'

The functionfq,,is nota norm;however,ithas some usefulproperties.First,from


of p,
(2.5) and the monotonicity
Ilr(GF)= f(JJG112F)If(JGIIF).
(I
Second, since fora - 1
ao-

ao-

(1 + ao2)1/2= (1 +

2)1/21

we have
a-1

q (aF)

aq, (F).

For smallF, *X,(F)is asymptoticto JIFII:

+ o (|IFII).
q, (F)= IIFII
For largeF, r,P(F)is bounded:
qls,(F)

Ir
Ir1.

Finally,forthe spectralnorm

+2(F)= IIFII2/(1
+IIFII2)1122

ON THE

PERTURBATION

OF PSEUDO-INVERSES

647

Our firstresultconcernsa ratherspecial matrix


LEMMA3.6. Thematrix

(F)

satisfies

1 Ftl

(3.11)
and

(3.12)

II()-V (I

) ||=Q(F) .

Proof.It is easilyverifiedthat

(I)

(3.13)

(I+FHF)-(l F(

whose singularvalues are


1
[1 +

11

_2(F)]2

fromwhich(3.11) follows.Also if
G = (F)

- (I

0),

then
GGH

I(I+

FHF)1.

It followsthatthe singularvalues of G are givenby


'-i (F)
+
[1 oi(F)]112'

whichestablishes(3.12). [
The mainresultis based on an explicitrepresentation
of Bt. We shallwork
withthe reducedformsof A and B.
3.7. Let B be an acuteperturbation
THEOREM
ofA. Then
(3.14)

Bt= (I

F12)tBI1 (F1)t,

where
F21= E21B l1,

F12= Bl1E12.

Proof.As in the proofof Theorem3.4, we have

I
(B) =[(

:)

648

G. W. STEWART

Thus the columnsof


/E120

(E22

can be expressedas a linearcombinationof the columnsof

Since

B11(B1E12)

= E12, we musthave
(E12\

B1 Bil

E22)

E21)

E12,

fromwhichit followsthat
B =(F)Bi1(I

(3.15)

F12).

The resultnow followsfromPenrose's conditions. [


It is interesting
to observethat,from(3.15),
B22 = E22 =F2jB

1F12,

In otherwords,ifrank(A +E) = rank(A), then


whichis of second orderin IJEJI.
mustapproachzero quadraticallyas E approacheszero.
We turnnow to the perturbationtheorem.
THEOREM
3.8. Let B be an acuteperturbation
ofA, and let

P'ERA

= IIAIIIB1I112.

Then

(3.16)

(IA

IIKAI

tII

IAII

IiI(K4

+l

IIAIIJ

IAII

whereI,, is definedby (3.10).


Proof.Let Fi be definedas in Theorem3.7. Let
121 =

J21

I?)

I12

( I)

(Ir

0),

J12 = (Ir

F12).

From (3.14), Bt = Jtf2B11fJ1;hence


(3 .17)

Bt-A t= (Jt12
-It12)A
11It + Jt12A
11(Jtl-Itl) +J1t2(B
11-A 11)Jt1

FromTheorem2.2 we have the followingbound:


(3 .18)

- A 11?J
IIJt12(B11
IIA1IK11F1111

IIA11II'

ON THE

PERTURBATION

OF PSEUDO-INVERSES

649

By Lemma 3.6

II(.P212) ?12 IA1IIIIJ12II-11211


II(J12_I'12)A111ItjII-'-IIA11lI
II1-t2I
-

(3.19)

(F12) = IA
(B 1E12)
IIA1l'IIqi,
11IIq,

11111Q(

IIA11)

and likewise

(3.20)

1Jtf2A
-I1t)II
(Jtl1

The bound (3.16) followson combining(3.17), (3.18), (3.19), and (3.20) and

thatIA11i= IIAtll.
remembering

The bound (3.16) givesa rathernice dissectionof JIBt-Atll.Asymptotically,


forE12 and E21 small,it reducesto the bound thatwould be obtainedby taking
normsin (3.4), i.e.,
+ IIE12II
+ IIE2lII
IlBt- AttlC |IE11lI
IIAtll = K||A
11

However, the bound additionallyshows thatE12 and E21 can have at most a

boundedeffect
on JIBt-Atll.

When A is square and nonsingular,E12 and E21 are void, and the bound
reducesto thatof Theorem2.2. Note thatthe numberi, definedin analogywith
(2.17), playsan analogous role here.
As in the second part of Theorem 3.2, if E1l is sufficiently
small,we can

in termsofI|A11112
and ||Ell.Thisgivesthefollowing
estimate
IIBilI112
corollary.
In
Theorem
COROLLARY
3.9.
3.8, let
(3.21)

K =

IIA11
IIAtII2

and supposethat
1,
IlAtIJ21 JE1111<

so that
y

1 - K gEl 1ll/IIA || > O.

Then

(3.22)

IlBtll IIAtjj/^y

and

IIBt-AtIIK 11F111 E2
1 i~(~A)+
ation
f BI1f1
vhIAB11

Proof. FromIItI
teeA
(3.23)

____K

hav
IIA
1P'q'P

Proof. From the equation Bt =Jtl2B11Jtlwe have

E12\

650

G.

W. STEWART

By Theorem2.2

IIA-111/yIIAtil/y
IIB-111
whichestablishes(3.21). Also X ?K/y, and (3.23) followsfrom(3.16).
small,
The numberK is definedin analogywith(2.21). For E1l sufficiently
KX- , and (3.16) and (3.23) give essentiallythe same bound.
Asymptoticformsand derivatives.AsymptoticformsforB maybe obtained
fromeither (3.4) or (3.14). Of course for Bt to approach At we must have
small,B may
rank(A) = rank(B); and sincewe are assumingthatE is arbitrarily
be assumedto be an acute perturbationof A. In thiscase
Bt = At +O(JJEJJ),
and
PB =BBt=

(A +E)[At+O(IIEII)]=PA +O(JJEJJ)

withsimilarexpressionsforthe otherprojections.Hence from(3.4)


(3.24)

Bt =At-AtPAERAAt+(AHA)tRAEHP

_RFEHPA(AA

H)t
+ O(JJE12).

functionof rwith
from(3.24) thatifA (r) is a differentiable
It followsimmediately
rank[A (r)] = rank[A (r')]
functionof r and
forall r,thenA (-r)tis a differentiable
(3.25)

dAt
d,r

dA

t+ (AHA)tRA

-A tPA dRAA

dr

dAH
dAH
PA(AA)
J PA-RA
dr
ar

Ht

The asymptoticformobtained from(3.14) can be usefulcomputationally


to computingA We have
whenA has been putin reducedformas a preliminary
from(3.24) that
B11 = A7 -A 1FE11A-1+ O(IIE1112).
From (3.13) in the proofof Lemma 3.6 we have
=

(I A 1E[H) + O(IE1111JE2111)

and

(I F12)t

F~~
(12A

11)

+ O(II,1F11111F1211).

Hence from(3.14)

(A -1- A iIE11Ai1+ O(IEJ1112)


Bt =

EH2(A1A Hl)1 + O(|E1211JEF1211)

(AHAA11)1E2H+ O(IF1111I1JE2111)
EH(A

HAllA AH)-lE H

+ O(JJF11E12
IIE2111)
IE1211

This expressionis in perfectagreementwith(3.24) whentheEij are interpreted


appropriatelyas projectionsof E.

ON THE

PERTURBATION

OF PSEUDO-INVERSES

651

Notes and references.For expositoryreasonstheresultsof thissectionhave


not been presentedin the historicalorderof theirdevelopment.Penrose (1955)
establishedCorollary3.4 usingtechniquesthatdo notgive explicitperturbation
bounds.The subjectwas revivedbyGolub and Wilkinson(1966), whose interest
in stablealgorithms
forsolvingleastsquaresproblems[cf.Golub (1965)] led them
to derivefirst-order
perturbation
boundsforleast squares solutions(moreof this
later).The firstperturbationbounds forthe pseudo-inverseitselfwere givenby
Ben-Israel (1966), who restrictshis class of perturbationsso that (in reduced
were
form)onlyE1l is nonzero.More generaltheoremsforacute perturbations
establishedby Hanson and Lawson (1969), Pereyra(1969), and Stewart(1969).
Theorem 3.7 is a refinement
and extensionof Stewart'sbound. An identityin
termsof projectionsrelated to (3.14) is givenby Wedin (1973), who uses it to
derivebounds foracute perturbations.
The decompositions(3.3) and (3.4) and theconsequentTheorem3.4 are due
to Wedin(1973). Theorem3.3 is a slightextensionoftheseresults.Theorem3.1 is
also due to Wedin (1973), althougha slightly
restricted
formof theresultmaybe
foundin Stewart(1969). In an earlierreportWedin (1969) considersthe sharpness of the constantsA in Theorem 3.4 and shows thatforthe spectralnormA
cannotbe made smaller.
resultshave been givenbyPavel-Parvuand Korganoff
Earlydifferentiability
(1969) and Hearon and Evans (1968). Wedin(1969) derivedtheformula(3.25) as
we did fromthe decomposition(3.4). The same resultforfunctionsof several
variableswas derivedindependently
by Golub and Pereyra(1973) in connection
withseparablenonlinearleastsquaresproblems.For further
referencessee Golub
and Pereyra(1975).
4. Projections.In thissectionwe shallconsiderhowtheprojectionPA varies
withA. Since PA = AA t, it mightbe thoughtthattheperturbation
theoryforPA
could be derivedfromthe theorydeveloped in the last sectionforAt. However
this approach gives too much away, and sharperbounds may be obtained by
workingdirectlywithone ofthedecompositionsofB t. In particularwe shallwork
withthe decomposition(3.15) based on the reducedformsof A and B.
= 1. Consequentlywe can
If R (A) and R (B) are not acute,thenIIPB-PA 112
restrict
ourselvesto the case whereR (A) and R (B) are acute. More particularly
we shall onlyconsiderthe case whereB is an acute perturbation
of A.
THEOREM
4. 1. Let B be an acuteperturbation
ofA, and letiKbe definedas in

Theorem
3.8. Then

E21112/
IIPBPA 112-+_
[1 + (,kIIE21JJ2/IIA112)212<1
(K
A]1 )2]1/2 <1

(4.1)

Proof.WithF21 definedas in the last sectionwe have [cf.(3.15)]


Th

matxF21

The matrix

R()=R

I)

652

G. W. STEWART

is a Hermitianidempotentwhose columnspace is 97(B); hence itis PB. It follows


that
(4.2)

PB-

PA

(F21(I+F2HF2D)-1

_ ((I+FHF21)-1-I

(I+FHF21)D1F2HA
F21(I+F2AF21)21F2I'

fromwhichit is easilyverifiedthat

(4.3)

(PBP

)2

(FHF21(I+FAHF21)-1

Now the nonzerosingularvalues of the diagonal blocksin (4.3) are givenby


+ o_2(F21)]

r2(F21)/[l

wherethe o-i(F21)are the nonzerosingularvalues of F21.The resultfollowsfrom


thefactthatthe largestsingularvalue u-,of F21 satisfies

'

= IIF21112K11E21112/IIA 112

o(F21)

In terms of projections,

the bound (4.1) can be written in the form

112
I?IIPAERA112/IIA
1 + (KZIIPAERA
11/2
112/IIA
112)2

- PA11
IIPB
2<!:[

The bound is interesting in several ways. First it depends not at all on E12 and E22.
Second its dependence on E1l is only through the constant Z. Third the bound is
always less than unity. Finally, it goes to zero along with E21. We may summarize
this last observation in the following corollary.
4.2. Regarding B as variable, a sufficient condition for
COROLLARY

lim PB

B -A

= PA

is that A and B are acute and

lim PABRA = 0

B-+A

If the hyp6theses of Corollary 3.9 are satisfied (i.e., if I A ilII12I1EuII< 1) then we


K/y in (4.1).

nay replace iX by

Asymptotic forms and derivatives. Asymptotic forms may be obtained in the


usual way from (4.2). Indeed

(4.4)

_
-pA
PB-A

(I E2 1112)

F2H, + 0 (I E2 1113)

+ O(IE21113) O(I1E21112)
FF21
J

In terms of projections

(4.5)

PB = PA + PAERAA

+A

RAEHPA

O (IIPPAERA
112).

ON THE

PERTURBATION

OF PSEUDO-INVERSES

653

It followsthatifA (r) is differentiable


and varieswithoutchangingrank,thenPA (,)
is differentiable
and
dPA

(4.6)

dr

p
PA

dARtRdAH

RAAt+At
dr

RA-PA.

dr

Notes and references.Theorem4.1 and its corollaryappear to be new. The


expression(4.4) forthe derivativeof PA was firstgivenby Golub and Pereyra
(1973). Per-Ake Wedin has pointedout to the authorthatthe asymptoticform
(4.5) and the expression(4.6) can be derivedfromthe identity
PB-PA= PB(I-PA) + (I-PB)PA =BtRBEPA

+ PBERAA.

5. The linearleast squaresproblem.In thissectionwe shallderiveperturbationboundsfortheleast squares problemof minimizing


Althoughthe
|lb- Ax112.
solutionof minimumnormis givenby x = A tb, the perturbationtheoryof ? 3
again does not give the best possible results.
thissectionthatB is an acuteperturbation
We shallassumethroughout
ofA,
and we shallworkwiththereducedformoftheproblem.In thisformx is replaced
by VHx and b is replacedby UHb (cf.? 2). Ifx and b are partitionedintheforms
( X2)

(b2

wherex1,b1E Cr' then


(5.1)

X1=A -ib1

and
X2 = 0.

Moreoverthe normof the residualvector

r= b - Ax
is givenby
11r112=11b2112.

In the theoremsto followwe shall freelyuse the definitionsmade in the


previoussections(e.g., i, K and -y).As in ?? 3 and 4 thenumberiXmaybe replaced
< 1. One additionalpiece ofnotationwillbe needed;
byK/y wheneverIIAtII2IIEiiII
namely,we shall definer1as thatnonnegativeconstantsuch that

= 7IIAII1211XI12
11b1112
Sincebi = A 1xi, we havei- 1. Also llx_IIAIAtll
IIbill,whichshowsthat

When A is ill-conditioned,thatis whenAt is large,the vectorx may be either


largeor small.In thefirstcase r1is near zero, and we shall say that"x reflectsthe
ill-conditionof A."
in the vectorb.
We firstconsiderperturbations
THEOREM 5.1. Letx =Atb andx+h =At(b+k).
Then

(5.2)

-' K77|IPAk
IIh112/IIX
112/IIPAb
112
112

654

G. W. STEWART

Proof.Withthe obvious partitioning


of k we have h = A lk1, so that
Ilh11"-'|IA-111
Ilk1ill

(5.3)

But1JxJJ2=
12,whichcombinedwith(5.3) yields(5.2). [
r12 IIblI2/IIA

Theorem5.1 showsthattheperturbation
inx is determinedbytheprojection
of k ontoR (A). However,PAk is normalizedby I0PAb
12,and ifthislatterquantity
is small,the perturbationmaybe large.Since
Ilb112=

IIPAbI11 + JJrJJ,2

thisobservationmay be summarizedby sayingthatlarge residualsare troublesome, a statementwhichwillbe amplysupportedlater.


Since r1can be as smallas K_', thenumberK cannotbe takenas a condition
numberforperturbations
in b withoutfurther
qualification.If x does not reflect
the ill-conditioning
of A, then r1is near unityand K is a conditionnumber.
Otherwisethe solutionwillbe relativelyinsensitiveto perturbations
in b.
We nextturnto assessingthe effectson x of a perturbationin A.

THEOREM
5.2. Let x = Atb and x + h = Btb, whereB = A +E is an acute
perturbation
ofA. Then

(54)

~~Jh1J2JIE11JJ2
jIAIIj2

(5X412

FE

J
~IF2k(Jb2JJ2
ffF21112~

\T1b1112
FA12 IjAII2
IJAII2)

Proof.Write
(5.5)

h = Jt2(B7l-A il)b1 + (f2 -I12)A -lb+ +Jl2B(Jft -1l)b.

Then

(5.6)

)bl2-'
IfJJ2(Bl1-Ai

2II11XI2,

112
IIA

and

(5.7)

(V12 -It2)A

-lbl2-

q12(l

A 112 IIX112-

Now

- I]bl
(5.8) Jt2B l l (Jt1- It 1)b= Jt2B11[(I+F2H1F2l)-l
To bound the firsttermin (5.8), note that
Hence

(I+F2lF21)

-I=-(I+F2lF21)

+ J12B 1(I+ F2H


F21) 1F2Hb2
lF2

II12 lU[I + JF21F21l)-I]l 1

(5.9)

+ FjF21y1)J2JJF2JJF21b1JJ2
= JIBiJ2JJ(I
JJB
=' JIBIIIE2II2iIIXII2
= [K
JJJJ2J2JJ21Blblb12
11

I2A1uIX12.

ON THE

PERTURBATION

655

OF PSEUDO-INVERSES

Forthesecondtermin (5.8) we have

(5.10)

IIJ21B11
(I+F2HAF21)-1F21b2II2
C-IIBf11
121IIE211
11b211
11b2112
112
=
I2IIF21II2
lb1

IIB11

IIX121JAJ112

-1 AE
21112
11b211
|X112
The bound(5.4) followson combining
(5.5)-(5.10). [
Thefirst
in(5.4) areunexceptionable.
twoterms
Thefirst
termcorresponds
to
theclassicalresultforlinearsystems
and is theonlynonzerotermwhenA is
ThesecondtermdependsonPAERAandvanishes
squareandnonsingular.
when
A is offullcolumnrank,as itis in manyapplications.
The thirdtermrequiresmoreexplanation.
IftermsofsecondorderinIIE21JJ
areignored,
thisexpression
becomesessentially

(5.11)

-2 JJb2JJ2
JJE211122 tan JJF21112
K
77
IAil2
1IIb1II2
IAi1l2K71

where0 is theanglesubtended
byb and RY(A).The numberkq1tan0 can vary
from0 tox. It is smallwhen0 is small(i.e. theresidualvectoris small).It is also
is smallandx reflects
theill-conditioning
reducedinsizewhenlIE11112
ofA so that
1
Ki-1.
Whenx does not reflectthe ill-conditioning
7K
of A and 0 is
itisoforderK 2,thusmaking
thethird
termin(5.4)thedominant
significant,
one.
We haveboundedthethirdterminthedecomposition
(5.5) insucha wayas
itsbehaviorwhenE21 is small.In factitis boundedforallvaluesofE21,
toreflect
andthethirdtermin (5.4) maybe replacedby
-||b||2

Kl

11

E21\
+

IA1

The residual.Sincetheresidualvectoris givenbyr= PAb, thetheory


of? 4
if
maybe appliedto giveperturbation
boundsfortheresidual.Specifically,
x

and

r= b -

=Btb
=

PBb,

then
|r

- rI2 _

IIPB- PA112Ib
112

andIIPB
-PAI12canbe boundedby(4.1) inTheorem4.1.
In applications
inr; rather
inthe
one maynotbe interested
oneis interested
residualr ofx withrespectto thematrix
A:
=b -Ax.

Ifwe write
F-r = (PB-PA)b-Ei,

656

G. W. STEWART

then

IT- rIl2- IIPB- PA11211b


112
+ lFll
X112l
Theorem5.1 providesthenecessary
estimate
of Ix11I2.
Ifwe concernourselveswithonlythechangein lrll2
we canderivea slightly
result.Sincer is theminimizing
stronger
we
have
Likewise
residual,
IrII2?- |IF112.
(A +E)x112,fromwhichitfollowsthat
Ilb-(A +E)x11J2:11bb
-l JrJ2
+ |IEI12(IIX
112
+ Ix112)
llrll2 lIr-112

An asymptotic
Asymptotic
formsand derivatives.
formfortheperturbed
leastsquaressolutionx^canbe obtainedfrom(3.4):
x= x -

(5.12)

A PAERAX - R EHPA (A H)tx+ (A HA)tRAEHP1b


+ O(IE112).

An equivalent
whichmaybe usefulincomputational
asymptotic
formula,
work,
canbe derivedfromthereducedform(3.23).Thederivative
formula
correspondingto (5.12) is
dA
dx= -APtPA-R

dr

dr

AXRI

dAH

dr

PA(A H)tx + (AHA)tR


P()+A
RA

dAH

dr

b
PAb

in
An inverseperturbation
theorem.
Theorem5.2 showshowa perturbation
A can affect
theleastsquaressolution.Here we considerthequestion:givena
is x^the least squaressolutionof a slightly
vectorx^,underwhatconditions
is givenin thefollowing
perturbed
problem?One suchcondition
theorem.
THEOREM
5.3. Letx E ECbe given.Letx = Atb, r= b-Ax, andr= b -Ax. If
llP1l

llr12l+

thenthere
is a matrix
E ofrankunitywith

2,

= /llx
IIEl2
112

(5.13)

suchthatlb- (A + E)X^I2 is a minimum.


Proof.Let
e =r^-r=A(x

-x^)c!(A).

SincerE R (A)',
11
11|2 +
|e112,
rVl
+ IlIe
2= llrll
2l.
A112

==. Let
whichshowsthatlIe12

E = e?x/ X2ll2
ThenE satisfies
(5.13) andR (E) c R (A). HenceR (A + E) c R (A). But
b -(A + E x = rER(A)l

whichshowsthat the residualb - (A + E)XE R (A + E)', and x^ solves the


requiredleastsquaresproblem. [

ON THE

PERTURBATION

657

OF PSEUDO-INVERSES

A consequence
ofthistheorem
is thatthereis littleusehunting
fortheexact
minimizing
x. Provided
theresidualisnearlyminimal,
theapproximate
solution
x,
howeverinaccurate,
is theexactsolutionofa slightly
perturbed
problem.
It is sometimes
desirablethattheperturbation
matrix
E inTheorem5.3 not
altersomeofthecolumns
ofA (e.g.a columnmaybe datesinyears).Thiscanbe
done as follows.Let x be the vectorobtainedfromx^by settingto zero the
components
corresponding
to thecolumnsthatarenotto be disturbed.
Then
E

-eiH/I2

? I|X12
istherequired
sothatIIEll2 IEll2;howeverIIEll2
matrix.
Ofcourse11Xll|2
may
stillbe smallenoughforpractical
purposes.

Notesandreferences.
Muchoftheperturbation
theory
forpseudo-inverses
hasbeena byproduct
ofthesearchforboundsforthelinearleastsquaresproblem.
GolubandWilkinson
(1966) gavea first
orderanalysisoftheproblemandwere
thefirst
to notethedependenceof thesolutionon K2.Rigorousupperbounds
werederived
byHansonandLawson(1969),Pereyra
(1969),andStewart
(1969).
Wedin(1969) also givesbounds.More recenttreatments
have been givenby
LawsonandHanson(1974)andAbdelmalek
(1974).VanderSluis(1975)wasthe
first
to pointoutthemitigating
effect
ofrj in (5.11).
The inverseperturbation
is new.
theorem
Appendix.In thisAppendix,we shallgivea proofofpartone ofTheorem
2.3 thatisbasedon a generaldecomposition
ofunitary
a decomposition
matrices,
thatisofindependent
interest.
In establishing
thedecomposition,
weshallusethe
mA tomeanA E
notation
THEOREMA.1. Let theunitarymatrixWe Cnxn be partitioned
in theform

(W2 1

W2 2
r

n-r

whereW11e Crxr withr ' n/2. Thenthereare unitarymatricesU= diag (U1, U)


r

and V= diag(V1,

n-r

V2)

such that
r

(A.1)

UHWV=

j
r

n-2r

r IF -1
r (1
r
n-2r O O

where

F = diag(y1,Y2,

Yr)

7r)

0-

and

L = diag(o1,

0-2, *

658

G.

W.

STEWART

Proof. Let
r =uHwJJv,

of W1,withthe diagonalelementsof F
be the singularvalue decomposition
...
= yr; i.e.,
<1 = Ykl = '
orderedso thatyl 2 C
F = diag(F', Ir-k),
The matrix
ofF' are lessthanunity.
wherethediagonalelements
W21

Hence
columns.
hasorthonormal

) VJ =rF2+(W2jV)H(W2jV).

I[= W(i)v]H[(

V1),whichsaysthatthe
so is (1W21
V1)H( W21
SinceI andF2 arediagonalmatrices,
SincetheithdiagonalentryofI- I2 is the
columnsof W21V1 are orthogonal.
of W21V,
k ' R _ n - r columns
normoftheithcolumnof W21V1,onlythefirst
k columns
are
matrix
whosefirst
arenonzero.Let U2 e C(nr)x(nr) be anyunitary
thenormalized
columnsof W21Vl. Then
U2

W2 1 Vl

where
1;= diag(oj,0(J2,*'

'

0) )diag

ok(,k, *

r-k

11;, O

Since
diag(,

U2 HW2l)1=

hasorthonormal
we musthave
columns,
(A.2)

yi2+Oi2=

(i =

1,2, * ,r).

X' is nonsingular.
In particular,
matrixV2 e C(n-r)x(n-r) such
a unitary
In a likemannerwe maydetermine
that
UlHW12V2=

(T, 0)

where T= diag (r1,r2,' ' ', Tr) and ri?0 (i = 1, 2, * *, r). Since, as above,
it
2
yi+ ri = 1, it followsfrom(A.2) thatT= -X.
Set U = diag (U1, U2) and V = diag (Vl, V2). Then theforegoingshowsthat

thematrix

x=

UHWV

ON THE

659

OF PSEUDO-INVERSES

PERTURBATION

in theform
canbe partitioned
k

r-k

(A.3)

X=

0/ o
O
I

k
r- k
n - 2r

r-k

n-2r

-it

0
0

0
0

X33

X34

X35

r-k

O0t

X43

X44

45

X53

X54

X55

we have
Sincecolumns1 and4 inthepartition
(A.3) areorthogonal,
'X34 = 0,

we have X34
and sincel' is nonsingular,

0. LikewiseX35,

X43,

and X53 are zero.

Fromtheorthogonality
ofcolumns1 and3 in (A.3) itfollowsthat
0,

XIX33=

-FiX'+

fromwhichitfollowsthatX33 = F'.
X is thusseento havetheform
The matrix
r-k

Ft
O

o
I

-11
O

r k

n - 2i

r-k
X=

k
-

ot

n-2r

r-k

o
O

X44

X45

X54

X55

F/
'

The matrix
X5(4

X5s)

Set
is unitary.
U2= diag(Ik, U3)U2
and U= diag(U1, U2).Then
UHWV= diag(Ir+k,
k

r-k

n-2r

/F' 0 -'
0 I 0

0
O

k
r-k

X=

k
r-k
n-2r

U3)X,

r-k

IX

F'

0
0

of,

isprecisely
involved
thedimensions
ofthematrices
which,
considering
(A.1). 0

660

G. W. STEWART

= R (B) andletX'
To establish
partoneofTheorem2.3,letX = R (A) andON
and 9/' denotetheirorthogonal
complements.
Assumethat
r=dim (Q)=dim( /)?m/2
inassuming
sinceinthesequelwe
thelastinequality,
(thereisnolossofgenerality
can also workwith6V and 0"). Let X = (X1,X2) and Y= (Y1, Y2) be unitary
matrices
withR (X1)= X andR ( Y1)= O/.Let
W =XHY=

(W,i

W12)

be partitionedconformally
withX and Y. If U = diag(U1, U2) and V=
diag(Vl, V2)arethematrices
whoseexistence
isinsured
byTheoremA. 1 andwe
set
(i=1, 2),
X, = X, U
X=(Xl, x2)
and
Y=(Y1,Y2),

(i=1,2),

Yi=YyVI
then

Xi Y1 = FX Y2 = (-X, 0),
X2

Note that R (Xj) =

Yi =

Y2(OI-r

0X2

and R(Yj) = O/.

Cn * XHCn, thebases X and Y become


Ifwenowmakethetransformation

\0/

(A.4)

Fr

\0

In -2rJ

(-z

anditis withthesebasesthatwe shallprovethefirst


partofTheorem2.3. First
notethat
>;2 -IFr o
PAPB= (XlX)(Y2Y2)=

??

\O

0/

Likewise
PBPA = (yl Yl')(X2X2') =

ON THE

PERTURBATION

661

OF PSEUDO-INVERSES

andthenonzerosingular
valuesofbothmatrices
areeasilyseentobe thenumbers
r-.Nowconsider
r2-I
PB

r:s
Fr

(F

=
PA=Y1Y1X1X1

h)

_z2

rs

5o

The nonzeroeigenvalues
aretheeigenvalues
ofthismatrix
ofthe2 x 2 matrices
(o'
(ri'Yi

im)

O'i

whichare easilyseen to be ?0-i.

The matrixdecomposition
in TheoremA.1 has
Notes and references.
intheworksof
itisimplicit
notbeenexplicitly
statedbefore;however,
apparently
DavisandKahan(1970)andBjorkandGolub(1973).Thediagonalelements
ofF
arethecosinesofthe"canonicalangles"betweenthesubspacesR(A) andR (B)
andthecolumnsofX1 and Y1formbiorthogonal
basessubtending
theseangles.
The use ofthesecanonicalbases,particularly
whentheyhavebeentransformed
intotheforms
(A.4),oftenenablesonetoobtainroutine
computational
proofsof
to
geometrictheoremsthatwouldotherwiserequireconsiderableingenuity
establish.
REFERENCES
N. N. ABDELMALEK
(1974), On thesolutionof thelinearleastsquaresproblemand pseudo-inverses,
Computing,13, pp. 215-228.
S. N. AFRIAT (1957), Orthogonaland obliqueprojectors
and thecharacteristics
ofpairsofvectorspaces,
Proc. CambridgePhilos. Soc., 53, pp. 800-816.
A. BEN-ISRAEL
(1966), On errorboundsforgeneralizedinverses,SIAM J. Numer. Anal., 3, pp.
585-592.
A. BEN-ISRAEL
AND T. N. E. GREVILLE
(1974), GeneralizedInverses: Theoryand Applications,
JohnWiley,New York.
A. BJORK AND G. H. GOLUB (1973), Numericalmethodsfor computingangles betweenlinear
subspaces,Math. Comp., 27, pp. 579-594.
T. L. BOULLION
AND P. L. ODELL
(1971), GeneralizedInverseMatrices,JohnWiley,New York.
CHANDLER
DAVIS
AND W. M. KAHAN
(1970), The rotationof eigenvectors
bya perturbation.
III,
SIAM J. Numer.Anal., 7, pp. 1-46.
I. C. GOHBERG
AND M. G. KREIN (1969), Introduction
totheTheoryofNonself-adjointOperators,
AmericanMathematicalSociety,Providence,R.I.
G. H. GOLUB (1965), Numericalmethodsforsolvinglinearleastsquaresproblems,Numer.Math.,7,
pp. 206-216.
G. H. GOLUB AND J.H. WILKINSON (1966), Noteon theiterative
ofleastsquaressolution,
refinement
Numer.Math., 9, pp. 139-148.
and nonlinearleast
G. H. GOLUB AND V. PEREYRA
(1973), The differentiation
of pseudoinverses
squaresproblemswhosevariablesseparate,SIAM J. Numer.Anal., 10, pp. 413-432.
(1975), Differentiation
ofpseudoinverses,
separablenonlinearleastsquaresproblems,and other
tales,manuscript.
R. J.HANSON AND C. L. LAWSON (1969), Extensionsand applicationsoftheHouseholderalgorithm
forsolvinglinearleastsquaresproblems,Math. Comp., 23, pp. 787-812.
J.Z. HEARON AND J.W. EVANS (1968), Differentiable
J.Res. Nat. Bur. Stand.,
generalizedinverses,
Sect. B, 72B, pp. 109-113.

662

G. W. STEWART

A. S. HOUSEHOLDER (1964), The TheoryofMatricesin NumericalAnalysis,Dover, New York.


T. KATO (1966), Perturbation
TheoryforLinear Operators,Springer-Verlag,
Berlin.
C. L. LAWSON AND R. J. HANSON (1974), Solving Least Squares Problems,Prentice-Hall,
Englewood Cliffs,N.J.
L. MIRSKY (1960), Symmetric
gaugefunctionsand unitarily
invariantnorms,Quart. J.Math. Oxford
Ser., 11, no. 2, pp. 55-59.
J. VON NEUMANN (1937), Some matrix-inequalities
and metrization
of matric-space,Tomsk. Univ.
Rev., 1, pp. 286-300.
M. PAVEL-PARVU AND A. KORGANOFF (1969), Iterationfunctionsforsolving
polynomialequations,
ConstructiveAspects of the FundamentalTheorem of Algebra, B. Dejon and P. Henrici,
eds., JohnWiley,New York.
R. PENROSE (1955), A generalizedinversefor matrices,Proc. Cambridge Philos. Soc., 51, pp.
506-513.
(1956), On bestapproximatesolutionof linearmatrixequations,Ibid., 52, pp. 17-19.
V. PEREYRA (1969), Stability
ofgeneralsystems
oflinearequations,
Aequat. Math.,2, pp. 194-206.
C. R. RAO AND S. K. MITRA (1971), GeneralizedInverseof Matricesand Its Applications,John
Wiley,New York.
A. VAN DER SLuIS (1975), Stability
ofthesolutionsoflinearleastsquaresproblems,
Numer.Math.,23,
pp. 241-254.
G. W. STEWART (1969), On thecontinuity
of thegeneralizedinverse,SIAM J. Appl. Math., 17, pp.
33-45.
(1973), Introduction
toMatrixComputations,
Academic Press,New York.
P.-A. WEDIN (1969), On pseudo-inverses
ofperturbed
matrices,
Lund Univ. Comput.Sci. Tech. Rep.,
Lund, Sweden.
(1973), Perturbation
theory
forpseudo-inverses,
BIT, 13, pp. 217-232.
J. H. WILKINSON (1965), The AlgebraicEigenvalueProblem,OxfordUniversityPress,London.

You might also like