On The Perturbation of The Pseudoinverse

On the Perturbation of Pseudo-Inverses, Projections and Linear Least Squares Problems
Author(s): G. W. Stewart
Reviewed work(s):
Source: SIAM Review, Vol. 19, No. 4 (Oct., 1977), pp. 634-662
Published by: Society for Industrial and Applied Mathematics
Stable URL: http://www.jstor.org/stable/2030248 .
Accessed: 16/07/2012 14:32
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extend
access to SIAM Review.
http://www.jstor.org
SIAM
REVIEW
Vol. 19, No. 4, October 1977
ON THE PERTURBATION OF PSEUDO-INVERSES, PROJECTIONS

AND LINEAR LEAST SQUARES PROBLEMS*
G. W. STEWARTt
Abstract.This paper surveys perturbationtheory for the pseudo-inverse (Moore-Penrose
generalizedinverse),forthe orthogonalprojectiononto the columnspace of a matrix,and forthe
linearleast squares problem.
The pseudo-inverse(or Moore-Penrosegeneralized

1. Introduction.
the
inverse)of a matrixA maybe definedas theuniquematrixAt satisfying
[dueto Penrose(1955)]:
conditions
following
(l.l1a)
AtAAt=At,
(li.b)
AA tA =A,
(1. lc)
(AA t)H =AA t,
(li.d)
(AtA)H=AtA.
investigated
have been extensively
and itsgeneralizations
The pseudo-inverse
is thatit
in thepseudo-inverse
and widelyapplied.One reasonforthisinterest
in
constructions
thesuccinctexpression
geometric
of someimportant
permits
and
withthepseudo-inverse
space.Thispaperwillbe concerned
n-dimensional
ontoa subspace
projection
theorthogonal
constructions:
tworelatedgeometric
andthelinearleastsquaresproblem.
The orthogonal
projectiononto a subspaceX is the uniqueHermitian,
from
P whosecolumnspace[denotedbyR (P)] isX. It follows
matrix
idempotent
(1.la) thatthematrix
PA =AAt
andR (PA) = R (A). HencePA

isHermitian
andfrom(1.lb) thatPA isidempotent
showsthat
is theorthogonal
ontoR (A). A similarargument
projection
(1.2)
RA=AtA
is theprojection
ontoR/(AH), therowspaceofA.
of
isthesolutionofthelinearleastsquaresproblem
Thesecondconstruction
choosinga vectorx to minimize
p(x) = ||b-AxII2,
denotestheusualEuclideannorm.Thesolutions
whereb isa fixedvectorand11112
ofthisproblemaregivenby
Z
x = A tb a (I-RA.
(1.4)
(1.3)
* Received by the editorsAugust 18, 1975, and in revisedformFebruary15, 1976.
t ComputerScience Department,Universityof Maryland,College Park,Maryland20742. This

workwas supportedin partby the Officeof Naval Research.
634
ON THE
PERTURBATION
OF PSEUDO-INVERSES
635
WhenA has fullcolumnrank,RA = I and thesolution

wherez is arbitrary.
itis easilyverified
from(1.1) and(1.2) thatA tbis
x = A tbis unique.Otherwise,
theorem
to (I-RA)Z, so thatbythePythagorean
orthogonal
IIxII2
= IAtb 2 +II(I-RA)ZI2.
norm.
It followsthatx = A tbis theuniquesolutionof(1.3) thathas minimal
inA onA
ofperturbations
Theobjectofthispaperistodescribetheeffects
ontoR (A), and
ontheprojection
onPA,andonA tb;i.e.,onthepseudo-inverse,
are
on the solutionof the linearleast squaresproblem.Such descriptions
tools.
forthreereasons.First,the resultsare usefulmathematical
important
the elementsof A willseldombe known
Second,in numericalapplications
inA.
oftheuncertainties
anditis necessary
tohaveboundson theeffects
exactly,
and leastsquares
projections
processesforcomputing
Finally,manynumerical
had been performed
on a perturbed
solutionsbehaveas ifexactcomputations
and
whosesizedependson thealgorithm
A + E, whereE is a smallmatrix
matrix
usedinitsexecution.
thearithmetic
bounds,
We shallbe concernedwiththreekindsof results:perturbation
The perturbation
andderivatives.
boundsare neededin
expressions,
asymptotic
and derivatives
are
above.Asymptotic
expressions
theapplications
mentioned
is actuallyknown.Moreover
usefulcomputational
toolswhentheperturbation
bounds.Not
theycan be used to checkthe sharpnessof the perturbation
bound
sharpperturbation
to obtaina reasonably
itis ratherdifficult
surprisingly
forms
Asymptotic
oftheperturbations.
thattellsthecompletestoryoftheeffects
are easierto comeby.
andderivatives
webeginin? 2 witha
In ordertomakethissurvey
reasonably
self-contained,
In ? 3 wedeveloptheperturbation
theory
for
background.
reviewofthenecessary
in ? 4 fortheprojection
PA,andin ? 5 fortheleastsquares
thepseudo-inverse,
solutionA tb.
inversesee the
on thegeneralized
Notesand references.
For background
andRao and
and
Odell
(1971),
Ben-Israel
and
Greville
(1974),
Boullion
booksby
is
due
whose
papers
Mitra(1971).Theexpression
to
Penrose
(1956),
(1955),
(1.1)
inthepseudo-inverse.
interest
initiated
thecurrent
andrelatedprobon perturbation
forpseudo-inverses
theory
Manyarticles
To datethemostcompletesurveyof the
lemshaveappearedin theliterature.
andunifying
problemhas beengivenbyWedin(1973). In additionto collecting
thispaperwillpresentsomenewresults.
earliermaterial,
2. Preliminaries.
thispaperwe shalluse the notationalconventions
Notation.Throughout
matrices
aredenotedbyuppercaseitalicand
ofHouseholder
(1964).Specifically,
andscalarsbylowercaseGreek
vectors
Greekletters,
bylowercaseitalicletters,
C" thesetofcomplex
ThesymbolC denotesthesetofcomplexnumbers,
letters.
AH is the
The matrix
and Crxn thesetof complexmx n matrices.
n-vectors,
ofA. The columnspaceofA is denotedbyR(A), and its
conjugatetranspose
byR (A)'.
complement
orthogonal
A e Crxn with
witha fixedmatrix
We shallbe concerned
rank(A) = r.
636
G. W. STEWART
ofA andwe shallset

E E CmXn willdenotea perturbation
The matrix
B =A +E.
ofC', weshallbe atsomepainsto

withthegeometry
Sinceweareconcerned
transformations
byunitary
insucha waythattheyarenotaffected
castourresults
normsbelow).We mayuse thisfactto
invariant
(cf.the sectionon unitarily
let U =
problemsintoa simplerform.Specifically,
ourperturbation
transform
withR (U1) = R (A) andletV = (V1, V2)bea
matrix
(Ul, U2) E mX m be a unitary
with42( V1)= R (A H) ThenUHAV hastheform
matrix
unitary
?)
UA V= (
(2.1)
UHEV and UHBV conforWe shallpartition

whereA1l E Crxr is nonsingular.
mallywithUHA V:
UHEv
(Ell
E2.
(B21
UH
E12
E22
Bl
B12\=
B22
(All+Ell
E21
E12
E22
A, B, andE, andinthe
ofthematrices
willbe calledreducedforms
Theseforms
form.In thiscase,
reduced
in
are
the
matrices
that
assume
shall
often
we
sequel
is givenby
thepseudo-inverse
A =(
(2.2)
lO
resultthatin thereducedform(2.1) the

values.It is a well-known
Singular
U1 and V1maybe chosenso that
matrices
= diag (aO,
A11
, r),
02,
where
.
cYli> 0
r>?
A, and
ofthematrix
valuedecomposition
Thisreducedformis calledthesingular
the
and
(2.2)
valuesofA. Fromtherelation
thenumbers
o-iarecalledthesingular
factthat(UHAV)t= VHAtU, it followsthat
At=
V(X
UH.
A, whichwillbe denotedbyo-i(A),canbe
valueofa matrix
The ithsingular
intheform
written
(2.3)
ai(A)=
sup
inf IIAxII2 (i=1,2,
dim(,t)= i xFET
n),
= 1
lIX112
where
(2.4)
= _-1YH
IIYI12
providesa naturalconvention
is theusualEuclideannorm.Thischaracterization
for numberingthe singularvalues of a rectangularmatrix:A e Cmxn has n
ON THE
637
OF PSEUDO-INVERSES
PERTURBATION
singularvalues of whichn - r are zero; A H has m singularvalues of whichm - r

arezero.The nonzerosingular
valuesofA andAH arethesame.
Two inequalities
thatwe shallneedin thesequelfollowfairly
from
directly
(2.3). Theyare
o-i(A ) - o-,(E) _-o-i(A + E)
o-ci
(A ) + o-i(E)
o-i(AC)o-'io(A) o-,(C),
o-1(A)rri(C).
and
(2.5)
Unitarilyinvariantmatrixnorms.A normon C Xnisa functionlii

C1mxn-R
theconditions
thatsatisfies
1. A0#
2. IIaAII=IaIIIAII,
(2.6)
3.
A norm
IIAII>O,
IA+BII?II||AII+IIBII.
is unitarilyinvariantif
IIUHAVII= I|A||
matrices
U andV.Theperturbation
boundsinthispaperwillbe cast
forallunitary
willnowbe described.
intermsofunitarily
invariant
whoseproperties
norms,
valuedecomposimatrices
thesingular
Let U and V be theunitary
realizing
A e CmXn.Thenforanyunitarily
invariant
norm11
tionofthematrix
-IIm,n
(2.7)
IIAIIn =
l( o)|m
=
IUHAVIIm,n
ThusIA IlIm,n
isa function
valuesofA, say
ofthesingular
= (Pm,n(LTi,02,
IIAIIm,n
(2.8)
,,n)
It followsfrom(2.6) that(Pm,nregardedas a function

on Rnis a norm.Sincethe
is a unitary
of
transformation
oftworowsortwocolumns
ofa matrix
interchange
in itsargumentsO-1,cr2,
, -n.It can
the matrix,thefunction(Pm,nis symmetric
in thesensethat
is nondecreasing
also be shownthatPrm,n
(2.9)
0 -<0- <i
(i = 1, 2,
n)
4>
Pm,n(01,
0in)
n(0l1
-0 m,
OSn)
isgenerated
We shallsaythatthenorm11
by Om,n.
-IIm,n
An important
normis thespectralnorm11112generatedbythefunction
sp
defined
by
(02((Tl, (a2,
an) = max{I|-1I,
.
L}
Ioan
Thisnormcan also be defined

bytheequation
(2.10)
sup IIAxII2,
IIA112=
IIX 112= 1
where11'112on therightdenotestheEuclideannormdefined
by(2.4).
638
G. W. STEWART
relationwithother
an important
consistency
The spectralnormsatisfies
normgenerated
byp0,
thenit
If11|Iisa unitarily
invariant
unitarily
invariant
norms.
followsfrom(2.5) and (2.9) that
(2.11)
|ICDII1C1|C11211DII
II,CIIIIDI12
e CrmXn andrespectively
whenever
E C?mxnor CIIe
IICDII
E CmXn
lIDII
normis the Frobeniusnorm
A secondexampleof a unitarily
invariant
generated
bythefunction
,
(PF(01,
On)
(1
+ n
Foranymatrix
A E Cmxn
IIAII=
1
i=F
j=l
Ia21I=trace(AHA).
theconsistency
relation
The Frobeniusnormsatisfies
| |ICIIFIID
IIF.
IICDIIF
weshallwork
dimensions,
ofvarying
Sinceweshallbe dealingwithmatrices
CmXn.
It
isimportant
witha family
defined
on
invariant
norms
=
ofunitarily
U??,n1
withone another
properly.
Accordinteract
thattheindividual
normsso defined
definition.
ingly,
we makethefollowing
invariant
DEFINITION
2.1. Let |I|11 U??,n=1 Cmxn-->R be a familyof unitarily
ifthereis a symmetric
function
defined
generated
norms.
Then1111is uniformly
p0,
such
that
with
a
finite
number
of
nonzero
forall infinite
terms,
only
sequences
= SD
(A), 02(A) ,
(o-1
IIA11
on (A), O,O,)
if
forall A E CtmXnIt is normalized
lxII= IX112
as a matrix.
foranyvectorx considered
theconditions
(2.6). Any
mustsatisfy
Thefunction
spintheabovedefinition
Indeedwe have
canbe normalized.
normdefined
bysucha function
p(P-1(X), 0, 0, ' ' ') =
0, 0, ..),
p(1IX112,
= Ui|X
of
,u thatis independent
thatllx
112
forsomeconstant
fromwhichitfollows
11
'p
thedimension
of
then
the
normalized
family
generates
ofx. The function
A1S
norms.
A uniformly
First,since
family
ofnormshassomeniceproperties.
generated
thenonzerosingular
valuesofa matrix
anditsconjugatetranspose
arethesame,
we have
=
llxii
IIA HII =
ii.
isbordered
i.e.,
itsnormremains
unchanged;
byzeromatrices,
Second,ifa matrix
ON THE
PERTURBATION
OF PSEUDO-INVERSES
639
In particular
ifA is in reducedform,
then
and IIAtII=11A-11.
IIAiI=IIA11j1
Itisalsoa consequence
of(2.12)that(2.11)holdsfora uniformly
generated
family
ofnorms
whenever
theproduct
CD isdefined,
as maybe seenbybordering
C and
D withzeromatrices
untiltheyarebothsquare.
A thirdproperty
is thatif1I is normalized
then
IIA
IA 112
11
(2.13)
In factfrom(2.11) andthefactthatlxi
we have
I= 11xI12,
= IIAx
A
(2.14)
IIAxII2
= 11AIIX112
11
is thesmallest
number
forallx. Butby(2.10) h|All2
forwhich(2.14) holdsforallx,
fromwhich(2.13) follows.A trivialcorollary
of (2.11) and (2.13) is thatliiiis
consistent:
IICDII
=II|CIII
IDI|.
Finallywe observethat
< IIDxII2
4 GC||
VxIICxII2
= IDII.
(2.15)
To provethisimplication
notethatby(2.3) thehypothesis
impliesthato-j(C)?
IICII=
|IIDIIfollowsfrom(2.9).
oir(D).Hencetheinequality
In thesequel 1111
willalwaysreferto a uniformly
generated,
normalized,
unitarily
invariant
norm.
Perturbationof matrixinverses.We shall later need some resultson the
in the
inversesofperturbations
of nonsingular
matrices.
Thesearesummarized
theorem.
following
THEOREM
2.2 If A and B = A + E are nonsingular,

then
(2.16)
JIB-1 A 1l/IIA
-'11?
IhEIIIIA
j,
where
(2.17)
Ic= IIAl!'IB-112.
If A is nonsingularand
(2.18)
IIA-11121IEII< 1,
thenB is a fortiori
nonsingular.In thiscase
(2.19)
IIAII/y,
JIB1=C
and
(2.20)
JIB-'-A-11K IIEll
IIA-ll1 =y IIA1'
where
(2.21)
K =
IIA-1112
IIA11
640
G. W. STEWART
and
y=
1- K
IE|II/IIAII> 0.
The bound(2.16) placesno restrictions

itsuse
on thesize ofE; however,
such
requiressome estimateof the size of B-1. WhenE satisfies
one
(2.18)
is givenby(2.19),fromwhichthebound(2.20) follows.
Thisboundhas
estimate
in termsof thematrix
A. Pairsof
theadvantagethatit can be statedentirely
a
of
boundsanalogousto(2.16) and(2.20)willrepeatthemselves
throughnumber
as willthepairs,& and K. The numberK measuresthe
subsequenttheorems,
inA andis usuallycalledthecondition
number
sensitivity
ofA-1 toperturbations
ofA (withrespectto inversion).
We havealreadyobservedthattheorthogonal
Projections.
projections
PA
andRA ontothecolumnspaceandtherowspaceofA canbe expressed
interms
of
The projection
thepseudo-inverse.
ontoR(A)' willbe denotedby
PA-
Likewise
I-PA.
RA-I-RA
willdenotetheprojection
ontoR(AH)'.
WhenA is in reducedform,
itsprojections
can be easilywritten
out:
PA(
0)eC
(OA
RA=(
mxm,
RA
)Cnxn.
)E?
=(O
It followsthat
= IIA11II
IIPAARA11
and
11E1111,IIPAERAl=
IIPAERAI
1=
IIPAERA
11= 11E2111,
IIPAERALII
IIE1211,
IIE2211.
Theseidentities
enableus to pass fromresultsforthereducedformto general
resultsstatedintermsofprojections
ofA andE.
We shallneed some properties
later.These are
of normsof projections
in thefollowing
summarized
theorem.
THEOREM
2.3. ForanyA and B thefollowing
statements
aretrue.
1. If rank(A) = rank(B), thenthesingular
valuesofPAPB andPBPA arethe
sameso that
IIPAPBI I = IIPBPAJI
Moreover
thenonzero
valuesa-ofPAP' correspond
topairs+o- of
singular
eigenvalues of PB
- PA,
so that
IIPB -PA112
= IIPAPBII2 = IIPBPAiI21
2. If IIPB- PA 112< 1, then rank (A) = rank (B).
3. If rank(B)?' rank(A), then
IIPBPA'I
IIPBPA
11
ON THE PERTURBATION
OF PSEUDO-INVERSES
641
a
however,
in theliterature;
Proof.Proofsofparts1 and2 arereadilyfoutid
ofpart1 is givenintheAppendix
decomposition,
proof,basedon a usefulmatrix
to thispaper. For part3 writePB = P1+ P2 whererank(P1)= rank(A) and
to 92(A)). Then
PAP2=0 (i.e.,R (P2) is orthonormal
IIPAPBI = IIPA(I
P1 - P2)J1= IIPA(I- P1)I
= JP1PAJJ,
frompart1. Nowforanyx
thelastequalityfollowing
II'
JJP1PAX
C IIPBPAX
11
andtheresultfollowsfrom(2.15). 0
in termsofE.
WhenB = A + E, we canestimateIIPBPAII
intheform
PBPA can be written
THEOREM 2.4. Theproduct
PBPA = (B')HRBEHPA.
(2.22)
Hence
(2.23)
IIPBPAll_-IIB'11211EII,
and ifrank(A) = rank(B), then
||A JJ2}IIEII.
_ min{j1Btjj2,
IIPBP'll
(2.24)
Proof.We have
PBPA = PBPA = (Bt)HBHP3

= (B )H (A + E)HPA
= (Bt)HEHPR
(Bt)HBH(B')HEHPA
= (Bt)HRBEHPA,
whichestablishes(2.22). The inequality(2.23) followsupon takingnormsin

(2.22). Finally(2.24) followsfrompart1 ofTheorem2.3. [1
of
Theorems2.3 and 2.4 haveobviousanaloguesforothercombinations
to thesetheorems
(e.g.RIRA = -A tERB).In thesequela reference
projectors
variants.
willalso coveranytrivial
later.We have
important
The case whenIIPB- PA 12< 1 willbe particularly
seeninpart2 ofTheorem2.3 thatinthiscaserank(A) = rank(B). Howevermore
to98(B) andviceversa.Forsuppose
is true:novectorinR (A) canbe orthogonal
thatx $ 0 satisfiesPAX= x and PBX= 0. Then (PB - PA)x = -x, whichimpliesthat
if IIPB- PA112= 1 thenthereis a vectorin R (A) or
IIPB- PA112> 1- Conversely
toR(B) ori(A). To see this,notethatbyTheorem2.3,
R(B) thatis orthogonal
part1, thereis a vectorx suchthat(PB - PA)x = x. IfPAX= 0 thenPBX= x, which
showsthatx E Rp(B)and x E R(A)'. If,on theotherhand,PAX 0O,thensince
PAX= -(I-PB)x wehavePB(PAX) = 0, whichshowsthatPAX E Q (A) and PAX E
R (B)I.
we shallsaythatR (A) and52(B) are

Becauseoftheaboveconsiderations
<
1.
A andB areacuteif
We
shall
saythatthematrices
acutewhenever
IIPBPA112
are
In thiscaseweshall
acute.
are
and
R
and
R(BH)
(AH)
R
acute
R (A) and (B)
The
theoremgives
B
is
A.
that
an
of
following
also say
acuteperturbation
be
A
to
for
and
B
acute.
and
sufficient
conditions
necessary
642
G. W. STEWART
THEOREM
(2.25)
2.5. The matricesA and B are acute ifand onlyif

rank(A) = rank(B) = rank(PABRA).
Weshallusethereducedforms
ofA andB. Firstsuppose(2.25) holds.
Proof.
Then rank(B11) = rank(A1l), and B11 is nonsingular.Thus

R (B) = R
")]R
But
R(A)
[)
toR (B) and

from
whichitiseasilyseenthatnovectorinR (A) canbe orthogonal
showsthatR (A H) andR(BRH) are also acute.
viceversa.A similarargument
Now assumethatA and B are acute.Then rank(A) = rank(B)-' rank(B11).
Letp andq be leftand

Assumethatrank(B11)< rank(A), so thatB11is singular.
matrices
whose
nullvectors
andletP andQ be unitary
unity,
right
ofB of2-norm
first
columnsarep andq. Considerthereducedforms
PAHllQ
o
0
P BlllQ
J
E2109
PHE12
E22J
Thefirst
rowandcolumnofPHB,1 Q iszero.IfE21q ? 0,thenthenonzerovector
(E21q)
ifPHEk ?0, thenthenonzero

to R(A). Similarly
is in R(B) and is orthogonal
vector
(EHpJ
toR (A H). IfE21q= 0 andpHE12= 0, thentheunit

isinR (BH) andisorthogonal
to R/(B) and R/(BH). In all
vectore1 is in R(A) and R(AH) andis orthogonal
that
or equivalently
thatB1, is nonsingular,
establishes
casesthecontradiction
rank(A) = rank(B) = rank(B11). 0
thatrank(B) = rank(A), Theorem2.5 showsthat

Beyondtherequirement
forA andB tobe acute.ByTheorem2.2 thiswillbe true
B11mustbe nonsingular
< 1. This
< 1 or equivalently
wheneverIIA II2IIPAERAII
wheneverIlAi-Il2IIE11II
small.
whenEl1 is sufficiently
is alwayssatisfied
condition
valuesarewellknown.See
The properties
ofsingular
Notesandreferences.
and Gohbergand Krein(1969) fora more
Stewart(1973) foran introduction
in an infinite
dimensional
setting.
detailedtreatment
can
invariant
norms
toprovethatunitarily
VonNeumann(1937)wasthefirst
in
is
values(thefunction
as a function
of singular
be written
(2.7)
usually
pm,,n
invariant
treatments
ofunitarily
calleda symmetric
Systematic
gaugefunction).
(1960) andGohbergandKrein(1969).
normsmaybe foundin Mirsky
ON THE
PERTURBATION
OF PSEUDO-INVERSES
643
The treatment
ofunitarily
invariant
normsin finite
dimensional
spaceshas
oftenbeena littlesloppy.In infinite
thereis usuallyonlyone
dimensional
settings
spaceand one generating
function,
and thesameis truein a finite
dimensional
setting
whenoneis concerned
withsquarematrices.
However,whenone considersrectangular
matrices
withvarying
dimensions,
different
normscanbe usedfor
different
and thereis no reasonwhythesenormsshouldinteract
dimensions,
nicely.Howbad things
cangetisillustrated
bythefamily
ofnorms11 defined
for
A eCmxn by
=- IIA112.
IIA11
n
This familyis unitarily

invariant
and consistent,
but IIAHII IAI, unlessA is
square,andtherelation(2.13) doesnotholdingeneral.Definition
2.1 represents
a return
case.
to thesimplicity
oftheinfinite
dimensional
Theorem2.2 is classicalandis usuallyprovedbyan appealtotheNeumann
..
+A2+.
seriesrepresentation
Wilkinson(1965) gives a
(I-A)-'=I+A
proofthatdoesnotuseseriesanddiscusses
atsomelength
thenotionofcondition
= 1; however,
number.
Theresultis usuallyprovedundertheassumption
that11111
theproofscanbe extendedto establish
theresultforanyconsistent
norm.
inTheorem2.3 arewellknowntopeoplewhoworkcloselywith
The results
orthogonal
projectors
(e.g.,see Afriat(1957) orWedin(1969)).The decomposiina slightly
weakerformbyWedin(1973).In
tioninTheorem2.4 wasestablished
some cases, whenE is small,RB will be near RA and the approximation
willbe morerealistic
in (2.23).
IIPAERBII-IIE2111
The numberIIPB-PA 112
is closelyrelatedto variousmeasuresofseparation
betweensubspaces.See Kato (1966) and especiallyDavis and Kahan (1970)
wherefurther
references
maybe found.Theorem2.4,withIIPAERAIIreplacedby
refers
to theangle
IIElI,is provedbyWedin(1973). The term"acute"ordinarily
subtendedby two line segments,
and it is
not to the segmentsthemselves,
whensubspacesare said to be acute.But thisusagewill
technically
misapplied
causeno confusion
anditis betterthantheuglyphrase"intheacutecase." The
term"acuteperturbation"
isnew,butthenotionisintroduced
inWedin(1973).
3. The pseudo-inverse.
In thissectionwe shallconsiderthe problemof
bounding
onefor
IIBt- A tllintermsof||ElI.We shallobtainthreebasictheorems:
whenrank(A) $ rank(B), oneforwhenrank(A) = rank(B), andoneforwhenB
is an acuteperturbation
ofA. All thesetheoremsare based on expressionsforBt,
whichalso yieldasymptoticexpressionsforBt and expressionsforthederivative
of At.
Lowerbounds.Beforeproceeding
to obtainboundson IlBt-Atll,we shall
showhow bad thingscan be by derivinglowerbounds.

THEOREM
3.1. If A and B are notacute,then
(3.1)
IlBt-AtlI2
_1/h1Ell2.
rank(B) _ rank(A), then

If,further,
(3.2)
IIBtII2?1/h|Ell2.
644
G. W. STEWART
Proof.Suppose fordefiniteness
thatrank(B) ' rank(A). Then thereis, say,
workwithA H and
a vectory E R (B) withIY
Y12 = 1 suchthaty E R (A)' (otherwise
BH). Thus
1 = yHy
= yHPBy
yHBBty = yH(A +E)B ty
= YHEBtYCIIEII2IIBtyII2,
Fromthisandthe
whichshowsthatliBty|12,
andhencel|Bt|12
isnotlessthan1/h|Ell2.
factthatA ty= A tPAy = 0 we have
112
11
1 IBtyII
112(Bt-At)Y
CIBt-AtIl.O
=
hJEll2
Theorem 3.1 shows that the pseudo-inverseof a general matrixis not a

is restricted.
continuousfunctionofitselements,unlesstheclass ofperturbations
It also saysthatiftwonearbymatricesdo nothave acute columnand rowspaces,
thenone of themat least musthave a largepseudo-inverse.Moreoveriftheyare
of the same rank,thenbothof themmusthave largepseudo-inverses.
A decompositionofBt - At. In spiteofthe negativeresultsin Theorem3.1,
it is possible to obtain bounds on IIBt-Atll in the generalcase, althoughthese
bounds need not remainfiniteas B approachesA. The basis forobtainingsuch
bounds is containedin the followingtheorem.
ofBt -At are valid:
THEOREM
3.2. Thefollowingtwodecompositions
(3.3)
Bt-At=-BtPBERAAt+BtPBPA-RBRAAt,
(3.4)
Bt-At= -BtPBERAAt+(BHB)tRBEHPA'
R BEHPA(AA
H)t.
Proof.Both expressionscan be verifieddirectlyby replacingE withB -A,

replacingthe projectorsby theirexpressionsin termsof pseudo-inverses,and
simplifying.O
It should be noted that (3.4) can be obtained directlyfrom(3.3) by using
Theorem2.4 to expressPBPA and RBRA in termsof E.
The generaltheorem.We are nowin a positionto provethegeneraltheorem
boundingIlBt- A tll1
THEOREM
3.3. For any A and B withB = A +E,
- At C
IlBt
max{IIAt112,
IIBtII2}IIEII,
whereA is givenin thefollowingtable:
11I11 arbitrary spectral

y
Frobenius
of theproofgivenbyWedin (1973).
Proof.The proofis a slightmodification
We shall give onlythe proofforthe Frobeniusnorm.
ON THE
PERTURBATION
645
OF PSEUDO-INVERSES
rank(B) ?rank (A). Let F1, F2, and F3 denote the

Suppose fordefiniteness
side of (3.3). Then thecolumnspaces of F1 and F2
threetermson theright-hand
are orthogonalto the columnspace of F3. Hence
l|Bt- A t|12 = IF1+ F2112

+ IIF3II1.
(3.5)
Now sinceF1 + F2 = Bt(PBDA tPA+ PBPA),
+ IIPBP1
112).
+ F2112
tPA112
IIF1
?- IBtII2(IIPBEA
But fromTheorems2.4 and 2.5
|
t|2 + IIPBPA
IF
IIPBEAtpA112+ IIPBPAIIF
IIPBEA
+ IIP_EA t2 = IIEAtl2
IIPBEA tII2
Hence
IIEIIIIA
t1l2
+ F2IIF
C-IAtII2IIBtII2IIEIIF.
JIF1
(3.6)
Also fromTheorem2.5
(3.7)
=
RBRAIF= IIAtI2IIRAR
BIIF
IIF3IIFI|AtII2IR
tERBIIF
C IIAtII2IIEIIF,
= IIA
tI2lIA
and theresultfollowson combining(3.3), (3.6), and (3.7). Since thefinalboundis

in A and B, it also holds whenrank(B) _rank(A). O
symmetric
It shouldbe notedthattheseboundsdo notimplythatIIBt- A til is smallwhen
is small,since Bt maygrowunboundedlyas E approacheszero.
IIElI
The case rank(A) = rank(B). When A and B have the same rank,we can
strengthenTheorem 3.3 in two ways. First, we can replace the term
more
max{IIAtII2,
withtheproductIJAI1211BI12.
Secondwe can distinguish
IIBtII2}
A
with
In
that
m
the
constant
the
theorem
recall
E
Cm
X"
cases for
_ n.
following
A.
THEOREM
3.4. If rank(A) = rank(B), then
tIl21lBtll2llEII.
|lBt- Atll AIIA
where
table.
A is giveninthefollowing
(3.8)
>
X X1
Arbitrary
Spectral
Frobenius
rank(A)<min (m,n)
rank(A) = m $ n =min(m,n)
di
rank(A)=m=n
(1+14)/2
The proofof thistheoremmaybe foundin Wedin (1973). The bound (3.8)

maybe recastin the form
(lBt
(3.9)
- AtIIc
|EIl
Al
/A1K
~~~~~~~IIBtII2
646
G. W. STEWART
where
K
= IAJJ
IIA'I12
In thisformthe resultis almostanalogous to the bound (2.20) forthe inversein

Theorem2.2. The bound (3.9) also impliesthatas E approacheszero,therelative
error in Bt approaches zero, which furtherimplies that Bt approaches At.
Remembering,on the other hand, that if rank(B) $ rank(A) then A and B
cannotbe acute,we have fromTheorem3.1 the followingcorollaryof Theorem
3.4.
COROLLARY
3.5. A necessaryand sufficient
conditionthat
lim Bt=At
B-OA
is thatrank(B) = rank(A) as B approachesA.

It is evidentfromthe proofsof Theorems3.3 and 3.4
Acute perturbations.
thatwe have givenawaymuchin derivingthebounds.In particular,ifB is a small
acute perturbation
ofA thenPA and PB are nearlyequal, and thesame is trueof
RA and RB. Thus itfollowsfrom(3.4) thatBt _-Atcan be decomposedintothree
terms-one essentiallydepending on PAERA, one on PAERA, and one on
PAERA. However,thisdoes not tell the whole story;forwe shall show thatthe
dependencyof Bt -At on PAERA and PAERA is bounded,no matterhow large
theseprojectionsmaybe.
In order to state our theoremsconcisely,we must firstintroducesome
be generatedby'p and foranyF Ckxr (k ' r) define
additionalnotation.Let || 11
( 3. 1 0)
[ 1 + _2(F)9] 1/2]
~~[[1 + S.2(,F) ]1/2'
The functionfq,,is nota norm;however,ithas some usefulproperties.First,from

of p,
(2.5) and the monotonicity
Ilr(GF)= f(JJG112F)If(JGIIF).
(I
Second, since fora - 1
ao-
ao-
(1 + ao2)1/2= (1 +
2)1/21
we have
a-1
q (aF)
aq, (F).
For smallF, *X,(F)is asymptoticto JIFII:
+ o (|IFII).
q, (F)= IIFII
For largeF, r,P(F)is bounded:
qls,(F)
Ir
Ir1.
Finally,forthe spectralnorm
+2(F)= IIFII2/(1
+IIFII2)1122
ON THE
PERTURBATION
OF PSEUDO-INVERSES
647
Our firstresultconcernsa ratherspecial matrix

LEMMA3.6. Thematrix
(F)
satisfies
1 Ftl
(3.11)
and
(3.12)
II()-V (I
) ||=Q(F) .
Proof.It is easilyverifiedthat
(I)
(3.13)
(I+FHF)-(l F(
whose singularvalues are

1
[1 +
11
_2(F)]2
fromwhich(3.11) follows.Also if
G = (F)
- (I
0),
then
GGH
I(I+
FHF)1.
It followsthatthe singularvalues of G are givenby

'-i (F)
+
[1 oi(F)]112'
whichestablishes(3.12). [
The mainresultis based on an explicitrepresentation
of Bt. We shallwork
withthe reducedformsof A and B.
3.7. Let B be an acuteperturbation
THEOREM
ofA. Then
(3.14)
Bt= (I
F12)tBI1 (F1)t,
where
F21= E21B l1,
F12= Bl1E12.
Proof.As in the proofof Theorem3.4, we have
I
(B) =[(
:)
648
G. W. STEWART
Thus the columnsof

/E120
(E22
can be expressedas a linearcombinationof the columnsof
Since
B11(B1E12)
= E12, we musthave
(E12\
B1 Bil
E22)
E21)
E12,
fromwhichit followsthat
B =(F)Bi1(I
(3.15)
F12).
The resultnow followsfromPenrose's conditions. [

It is interesting
to observethat,from(3.15),
B22 = E22 =F2jB
1F12,
In otherwords,ifrank(A +E) = rank(A), then

whichis of second orderin IJEJI.
mustapproachzero quadraticallyas E approacheszero.
We turnnow to the perturbationtheorem.
THEOREM
3.8. Let B be an acuteperturbation
ofA, and let
P'ERA
= IIAIIIB1I112.
Then
(3.16)
(IA
IIKAI
tII
IAII
IiI(K4
+l
IIAIIJ
IAII
whereI,, is definedby (3.10).

Proof.Let Fi be definedas in Theorem3.7. Let
121 =
J21
I?)
I12
( I)
(Ir
0),
J12 = (Ir
F12).
From (3.14), Bt = Jtf2B11fJ1;hence

(3 .17)
Bt-A t= (Jt12
-It12)A
11It + Jt12A
11(Jtl-Itl) +J1t2(B
11-A 11)Jt1
FromTheorem2.2 we have the followingbound:

(3 .18)
- A 11?J
IIJt12(B11
IIA1IK11F1111
IIA11II'
ON THE
PERTURBATION
OF PSEUDO-INVERSES
649
By Lemma 3.6
II(.P212) ?12 IA1IIIIJ12II-11211

II(J12_I'12)A111ItjII-'-IIA11lI
II1-t2I
-
(3.19)
(F12) = IA
(B 1E12)
IIA1l'IIqi,
11IIq,
11111Q(
IIA11)
and likewise
(3.20)
1Jtf2A
-I1t)II
(Jtl1
The bound (3.16) followson combining(3.17), (3.18), (3.19), and (3.20) and
thatIA11i= IIAtll.
remembering
The bound (3.16) givesa rathernice dissectionof JIBt-Atll.Asymptotically,

forE12 and E21 small,it reducesto the bound thatwould be obtainedby taking
normsin (3.4), i.e.,
+ IIE12II
+ IIE2lII
IlBt- AttlC |IE11lI
IIAtll = K||A
11
However, the bound additionallyshows thatE12 and E21 can have at most a
boundedeffect
on JIBt-Atll.
When A is square and nonsingular,E12 and E21 are void, and the bound
reducesto thatof Theorem2.2. Note thatthe numberi, definedin analogywith
(2.17), playsan analogous role here.
As in the second part of Theorem 3.2, if E1l is sufficiently
small,we can
in termsofI|A11112
and ||Ell.Thisgivesthefollowing
estimate
IIBilI112
corollary.
In
Theorem
COROLLARY
3.9.
3.8, let
(3.21)
K =
IIA11
IIAtII2
and supposethat
1,
IlAtIJ21 JE1111<
so that
y
1 - K gEl 1ll/IIA || > O.
Then
(3.22)
IlBtll IIAtjj/^y
and
IIBt-AtIIK 11F111 E2
1 i~(~A)+
ation
f BI1f1
vhIAB11
Proof. FromIItI
teeA
(3.23)
____K
hav
IIA
1P'q'P
Proof. From the equation Bt =Jtl2B11Jtlwe have
E12\
650
G.
W. STEWART
By Theorem2.2
IIA-111/yIIAtil/y
IIB-111
whichestablishes(3.21). Also X ?K/y, and (3.23) followsfrom(3.16).
small,
The numberK is definedin analogywith(2.21). For E1l sufficiently
KX- , and (3.16) and (3.23) give essentiallythe same bound.
Asymptoticformsand derivatives.AsymptoticformsforB maybe obtained
fromeither (3.4) or (3.14). Of course for Bt to approach At we must have
small,B may
rank(A) = rank(B); and sincewe are assumingthatE is arbitrarily
be assumedto be an acute perturbationof A. In thiscase
Bt = At +O(JJEJJ),
and
PB =BBt=
(A +E)[At+O(IIEII)]=PA +O(JJEJJ)
withsimilarexpressionsforthe otherprojections.Hence from(3.4)

(3.24)
Bt =At-AtPAERAAt+(AHA)tRAEHP
_RFEHPA(AA
H)t
+ O(JJE12).
functionof rwith
from(3.24) thatifA (r) is a differentiable
It followsimmediately
rank[A (r)] = rank[A (r')]
functionof r and
forall r,thenA (-r)tis a differentiable
(3.25)
dAt
d,r
dA
t+ (AHA)tRA
-A tPA dRAA
dr
dAH
dAH
PA(AA)
J PA-RA
dr
ar
Ht
The asymptoticformobtained from(3.14) can be usefulcomputationally

to computingA We have
whenA has been putin reducedformas a preliminary
from(3.24) that
B11 = A7 -A 1FE11A-1+ O(IIE1112).
From (3.13) in the proofof Lemma 3.6 we have
=
(I A 1E[H) + O(IE1111JE2111)
and
(I F12)t
F~~
(12A
11)
+ O(II,1F11111F1211).
Hence from(3.14)
(A -1- A iIE11Ai1+ O(IEJ1112)

Bt =
EH2(A1A Hl)1 + O(|E1211JEF1211)
(AHAA11)1E2H+ O(IF1111I1JE2111)
EH(A
HAllA AH)-lE H
+ O(JJF11E12
IIE2111)
IE1211
This expressionis in perfectagreementwith(3.24) whentheEij are interpreted

appropriatelyas projectionsof E.
ON THE
PERTURBATION
OF PSEUDO-INVERSES
651
Notes and references.For expositoryreasonstheresultsof thissectionhave

not been presentedin the historicalorderof theirdevelopment.Penrose (1955)
establishedCorollary3.4 usingtechniquesthatdo notgive explicitperturbation
bounds.The subjectwas revivedbyGolub and Wilkinson(1966), whose interest
in stablealgorithms
forsolvingleastsquaresproblems[cf.Golub (1965)] led them
to derivefirst-order
perturbation
boundsforleast squares solutions(moreof this
later).The firstperturbationbounds forthe pseudo-inverseitselfwere givenby
Ben-Israel (1966), who restrictshis class of perturbationsso that (in reduced
were
form)onlyE1l is nonzero.More generaltheoremsforacute perturbations
establishedby Hanson and Lawson (1969), Pereyra(1969), and Stewart(1969).
Theorem 3.7 is a refinement
and extensionof Stewart'sbound. An identityin
termsof projectionsrelated to (3.14) is givenby Wedin (1973), who uses it to
derivebounds foracute perturbations.
The decompositions(3.3) and (3.4) and theconsequentTheorem3.4 are due
to Wedin(1973). Theorem3.3 is a slightextensionoftheseresults.Theorem3.1 is
also due to Wedin (1973), althougha slightly
restricted
formof theresultmaybe
foundin Stewart(1969). In an earlierreportWedin (1969) considersthe sharpness of the constantsA in Theorem 3.4 and shows thatforthe spectralnormA
cannotbe made smaller.
resultshave been givenbyPavel-Parvuand Korganoff
Earlydifferentiability
(1969) and Hearon and Evans (1968). Wedin(1969) derivedtheformula(3.25) as
we did fromthe decomposition(3.4). The same resultforfunctionsof several
variableswas derivedindependently
by Golub and Pereyra(1973) in connection
withseparablenonlinearleastsquaresproblems.For further
referencessee Golub
and Pereyra(1975).
4. Projections.In thissectionwe shallconsiderhowtheprojectionPA varies
withA. Since PA = AA t, it mightbe thoughtthattheperturbation
theoryforPA
could be derivedfromthe theorydeveloped in the last sectionforAt. However
this approach gives too much away, and sharperbounds may be obtained by
workingdirectlywithone ofthedecompositionsofB t. In particularwe shallwork
withthe decomposition(3.15) based on the reducedformsof A and B.
= 1. Consequentlywe can
If R (A) and R (B) are not acute,thenIIPB-PA 112
restrict
ourselvesto the case whereR (A) and R (B) are acute. More particularly
we shall onlyconsiderthe case whereB is an acute perturbation
of A.
THEOREM
4. 1. Let B be an acuteperturbation
ofA, and letiKbe definedas in
Theorem
3.8. Then
E21112/
IIPBPA 112-+_
[1 + (,kIIE21JJ2/IIA112)212<1
(K
A]1 )2]1/2 <1
(4.1)
Proof.WithF21 definedas in the last sectionwe have [cf.(3.15)]

Th
matxF21
The matrix
R()=R
I)
652
G. W. STEWART
is a Hermitianidempotentwhose columnspace is 97(B); hence itis PB. It follows

that
(4.2)
PB-
PA
(F21(I+F2HF2D)-1
_ ((I+FHF21)-1-I
(I+FHF21)D1F2HA
F21(I+F2AF21)21F2I'
fromwhichit is easilyverifiedthat
(4.3)
(PBP
)2
(FHF21(I+FAHF21)-1
Now the nonzerosingularvalues of the diagonal blocksin (4.3) are givenby

+ o_2(F21)]
r2(F21)/[l
wherethe o-i(F21)are the nonzerosingularvalues of F21.The resultfollowsfrom

thefactthatthe largestsingularvalue u-,of F21 satisfies
'
= IIF21112K11E21112/IIA 112
o(F21)
In terms of projections,
the bound (4.1) can be written in the form
112
I?IIPAERA112/IIA
1 + (KZIIPAERA
11/2
112/IIA
112)2
- PA11
IIPB
2<!:[
The bound is interesting in several ways. First it depends not at all on E12 and E22.
Second its dependence on E1l is only through the constant Z. Third the bound is
always less than unity. Finally, it goes to zero along with E21. We may summarize
this last observation in the following corollary.
4.2. Regarding B as variable, a sufficient condition for
COROLLARY
lim PB
B -A
= PA
is that A and B are acute and
lim PABRA = 0
B-+A
If the hyp6theses of Corollary 3.9 are satisfied (i.e., if I A ilII12I1EuII< 1) then we

K/y in (4.1).
nay replace iX by
Asymptotic forms and derivatives. Asymptotic forms may be obtained in the

usual way from (4.2). Indeed
(4.4)
_
-pA
PB-A
(I E2 1112)
F2H, + 0 (I E2 1113)
+ O(IE21113) O(I1E21112)
FF21
J
In terms of projections
(4.5)
PB = PA + PAERAA
+A
RAEHPA
O (IIPPAERA
112).
ON THE
PERTURBATION
OF PSEUDO-INVERSES
653
It followsthatifA (r) is differentiable

and varieswithoutchangingrank,thenPA (,)
is differentiable
and
dPA
(4.6)
dr
p
PA
dARtRdAH
RAAt+At
dr
RA-PA.
dr
Notes and references.Theorem4.1 and its corollaryappear to be new. The

expression(4.4) forthe derivativeof PA was firstgivenby Golub and Pereyra
(1973). Per-Ake Wedin has pointedout to the authorthatthe asymptoticform
(4.5) and the expression(4.6) can be derivedfromthe identity
PB-PA= PB(I-PA) + (I-PB)PA =BtRBEPA
+ PBERAA.
5. The linearleast squaresproblem.In thissectionwe shallderiveperturbationboundsfortheleast squares problemof minimizing

Althoughthe
|lb- Ax112.
solutionof minimumnormis givenby x = A tb, the perturbationtheoryof ? 3
again does not give the best possible results.
thissectionthatB is an acuteperturbation
We shallassumethroughout
ofA,
and we shallworkwiththereducedformoftheproblem.In thisformx is replaced
by VHx and b is replacedby UHb (cf.? 2). Ifx and b are partitionedintheforms
( X2)
(b2
wherex1,b1E Cr' then

(5.1)
X1=A -ib1
and
X2 = 0.
Moreoverthe normof the residualvector
r= b - Ax
is givenby
11r112=11b2112.
In the theoremsto followwe shall freelyuse the definitionsmade in the

previoussections(e.g., i, K and -y).As in ?? 3 and 4 thenumberiXmaybe replaced
< 1. One additionalpiece ofnotationwillbe needed;
byK/y wheneverIIAtII2IIEiiII
namely,we shall definer1as thatnonnegativeconstantsuch that
= 7IIAII1211XI12
11b1112
Sincebi = A 1xi, we havei- 1. Also llx_IIAIAtll
IIbill,whichshowsthat
When A is ill-conditioned,thatis whenAt is large,the vectorx may be either

largeor small.In thefirstcase r1is near zero, and we shall say that"x reflectsthe
ill-conditionof A."
in the vectorb.
We firstconsiderperturbations
THEOREM 5.1. Letx =Atb andx+h =At(b+k).
Then
(5.2)
-' K77|IPAk
IIh112/IIX
112/IIPAb
112
112
654
G. W. STEWART
Proof.Withthe obvious partitioning

of k we have h = A lk1, so that
Ilh11"-'|IA-111
Ilk1ill
(5.3)
But1JxJJ2=
12,whichcombinedwith(5.3) yields(5.2). [
r12 IIblI2/IIA
Theorem5.1 showsthattheperturbation
inx is determinedbytheprojection
of k ontoR (A). However,PAk is normalizedby I0PAb
12,and ifthislatterquantity
is small,the perturbationmaybe large.Since
Ilb112=
IIPAbI11 + JJrJJ,2
thisobservationmay be summarizedby sayingthatlarge residualsare troublesome, a statementwhichwillbe amplysupportedlater.

Since r1can be as smallas K_', thenumberK cannotbe takenas a condition
numberforperturbations
in b withoutfurther
qualification.If x does not reflect
the ill-conditioning
of A, then r1is near unityand K is a conditionnumber.
Otherwisethe solutionwillbe relativelyinsensitiveto perturbations
in b.
We nextturnto assessingthe effectson x of a perturbationin A.
THEOREM
5.2. Let x = Atb and x + h = Btb, whereB = A +E is an acute
perturbation
ofA. Then
(54)
~~Jh1J2JIE11JJ2
jIAIIj2
(5X412
FE
J
~IF2k(Jb2JJ2
ffF21112~
\T1b1112
FA12 IjAII2
IJAII2)
Proof.Write
(5.5)
h = Jt2(B7l-A il)b1 + (f2 -I12)A -lb+ +Jl2B(Jft -1l)b.
Then
(5.6)
)bl2-'
IfJJ2(Bl1-Ai
2II11XI2,
112
IIA
and
(5.7)
(V12 -It2)A
-lbl2-
q12(l
A 112 IIX112-
Now
- I]bl
(5.8) Jt2B l l (Jt1- It 1)b= Jt2B11[(I+F2H1F2l)-l
To bound the firsttermin (5.8), note that
Hence
(I+F2lF21)
-I=-(I+F2lF21)
+ J12B 1(I+ F2H

F21) 1F2Hb2
lF2
II12 lU[I + JF21F21l)-I]l 1
(5.9)
+ FjF21y1)J2JJF2JJF21b1JJ2
= JIBiJ2JJ(I
JJB
=' JIBIIIE2II2iIIXII2
= [K
JJJJ2J2JJ21Blblb12
11
I2A1uIX12.
ON THE
PERTURBATION
655
OF PSEUDO-INVERSES
Forthesecondtermin (5.8) we have
(5.10)
IIJ21B11
(I+F2HAF21)-1F21b2II2
C-IIBf11
121IIE211
11b211
11b2112
112
=
I2IIF21II2
lb1
IIB11
IIX121JAJ112
-1 AE
21112
11b211
|X112
The bound(5.4) followson combining
(5.5)-(5.10). [
Thefirst
in(5.4) areunexceptionable.
twoterms
Thefirst
termcorresponds
to
theclassicalresultforlinearsystems
and is theonlynonzerotermwhenA is
ThesecondtermdependsonPAERAandvanishes
squareandnonsingular.
when
A is offullcolumnrank,as itis in manyapplications.
The thirdtermrequiresmoreexplanation.
IftermsofsecondorderinIIE21JJ
areignored,
thisexpression
becomesessentially
(5.11)
-2 JJb2JJ2
JJE211122 tan JJF21112
K
77
IAil2
1IIb1II2
IAi1l2K71
where0 is theanglesubtended
byb and RY(A).The numberkq1tan0 can vary
from0 tox. It is smallwhen0 is small(i.e. theresidualvectoris small).It is also
is smallandx reflects
theill-conditioning
reducedinsizewhenlIE11112
ofA so that
1
Ki-1.
Whenx does not reflectthe ill-conditioning
7K
of A and 0 is
itisoforderK 2,thusmaking
thethird
termin(5.4)thedominant
significant,
one.
We haveboundedthethirdterminthedecomposition
(5.5) insucha wayas
itsbehaviorwhenE21 is small.In factitis boundedforallvaluesofE21,
toreflect
andthethirdtermin (5.4) maybe replacedby
-||b||2
Kl
11
E21\
+
IA1
The residual.Sincetheresidualvectoris givenbyr= PAb, thetheory

of? 4
if
maybe appliedto giveperturbation
boundsfortheresidual.Specifically,
x
and
r= b -
=Btb
=
PBb,
then
|r
- rI2 _
IIPB- PA112Ib
112
andIIPB
-PAI12canbe boundedby(4.1) inTheorem4.1.
In applications
inr; rather
inthe
one maynotbe interested
oneis interested
residualr ofx withrespectto thematrix
A:
=b -Ax.
Ifwe write
F-r = (PB-PA)b-Ei,
656
G. W. STEWART
then
IT- rIl2- IIPB- PA11211b

112
+ lFll
X112l
Theorem5.1 providesthenecessary
estimate
of Ix11I2.
Ifwe concernourselveswithonlythechangein lrll2
we canderivea slightly
result.Sincer is theminimizing
stronger
we
have
Likewise
residual,
IrII2?- |IF112.
(A +E)x112,fromwhichitfollowsthat
Ilb-(A +E)x11J2:11bb
-l JrJ2
+ |IEI12(IIX
112
+ Ix112)
llrll2 lIr-112
An asymptotic
Asymptotic
formsand derivatives.
formfortheperturbed
leastsquaressolutionx^canbe obtainedfrom(3.4):
x= x -
(5.12)
A PAERAX - R EHPA (A H)tx+ (A HA)tRAEHP1b

+ O(IE112).
An equivalent
whichmaybe usefulincomputational
asymptotic
formula,
work,
canbe derivedfromthereducedform(3.23).Thederivative
formula
correspondingto (5.12) is
dA
dx= -APtPA-R
dr
dr
AXRI
dAH
dr
PA(A H)tx + (AHA)tR

P()+A
RA
dAH
dr
b
PAb
in
An inverseperturbation
theorem.
Theorem5.2 showshowa perturbation
A can affect
theleastsquaressolution.Here we considerthequestion:givena
is x^the least squaressolutionof a slightly
vectorx^,underwhatconditions
is givenin thefollowing
perturbed
problem?One suchcondition
theorem.
THEOREM
5.3. Letx E ECbe given.Letx = Atb, r= b-Ax, andr= b -Ax. If
llP1l
llr12l+
thenthere
is a matrix
E ofrankunitywith
2,
= /llx
IIEl2
112
(5.13)
suchthatlb- (A + E)X^I2 is a minimum.

Proof.Let
e =r^-r=A(x
-x^)c!(A).
SincerE R (A)',
11
11|2 +
|e112,
rVl
+ IlIe
2= llrll
2l.
A112
==. Let
whichshowsthatlIe12
E = e?x/ X2ll2
ThenE satisfies
(5.13) andR (E) c R (A). HenceR (A + E) c R (A). But
b -(A + E x = rER(A)l
whichshowsthat the residualb - (A + E)XE R (A + E)', and x^ solves the

requiredleastsquaresproblem. [
ON THE
PERTURBATION
657
OF PSEUDO-INVERSES
A consequence
ofthistheorem
is thatthereis littleusehunting
fortheexact
minimizing
x. Provided
theresidualisnearlyminimal,
theapproximate
solution
x,
howeverinaccurate,
is theexactsolutionofa slightly
perturbed
problem.
It is sometimes
desirablethattheperturbation
matrix
E inTheorem5.3 not
altersomeofthecolumns
ofA (e.g.a columnmaybe datesinyears).Thiscanbe
done as follows.Let x be the vectorobtainedfromx^by settingto zero the
components
corresponding
to thecolumnsthatarenotto be disturbed.
Then
E
-eiH/I2
? I|X12
istherequired
sothatIIEll2 IEll2;howeverIIEll2
matrix.
Ofcourse11Xll|2
may
stillbe smallenoughforpractical
purposes.
Notesandreferences.
Muchoftheperturbation
theory
forpseudo-inverses
hasbeena byproduct
ofthesearchforboundsforthelinearleastsquaresproblem.
GolubandWilkinson
(1966) gavea first
orderanalysisoftheproblemandwere
thefirst
to notethedependenceof thesolutionon K2.Rigorousupperbounds
werederived
byHansonandLawson(1969),Pereyra
(1969),andStewart
(1969).
Wedin(1969) also givesbounds.More recenttreatments
have been givenby
LawsonandHanson(1974)andAbdelmalek
(1974).VanderSluis(1975)wasthe
first
to pointoutthemitigating
effect
ofrj in (5.11).
The inverseperturbation
is new.
theorem
Appendix.In thisAppendix,we shallgivea proofofpartone ofTheorem
2.3 thatisbasedon a generaldecomposition
ofunitary
a decomposition
matrices,
thatisofindependent
interest.
In establishing
thedecomposition,
weshallusethe
mA tomeanA E
notation
THEOREMA.1. Let theunitarymatrixWe Cnxn be partitioned
in theform
(W2 1
W2 2
r
n-r
whereW11e Crxr withr ' n/2. Thenthereare unitarymatricesU= diag (U1, U)

r
and V= diag(V1,
n-r
V2)
such that
r
(A.1)
UHWV=
j
r
n-2r
r IF -1
r (1
r
n-2r O O
where
F = diag(y1,Y2,
Yr)
7r)
0-
and
L = diag(o1,
0-2, *
658
G.
W.
STEWART
Proof. Let
r =uHwJJv,
of W1,withthe diagonalelementsof F
be the singularvalue decomposition
...
= yr; i.e.,
<1 = Ykl = '
orderedso thatyl 2 C
F = diag(F', Ir-k),
The matrix
ofF' are lessthanunity.
wherethediagonalelements
W21
Hence
columns.
hasorthonormal
) VJ =rF2+(W2jV)H(W2jV).
I[= W(i)v]H[(
V1),whichsaysthatthe
so is (1W21
V1)H( W21
SinceI andF2 arediagonalmatrices,
SincetheithdiagonalentryofI- I2 is the
columnsof W21V1 are orthogonal.
of W21V,
k ' R _ n - r columns
normoftheithcolumnof W21V1,onlythefirst
k columns
are
matrix
whosefirst
arenonzero.Let U2 e C(nr)x(nr) be anyunitary
thenormalized
columnsof W21Vl. Then
U2
W2 1 Vl
where
1;= diag(oj,0(J2,*'
'
0) )diag
ok(,k, *
r-k
11;, O
Since
diag(,
U2 HW2l)1=
hasorthonormal
we musthave
columns,
(A.2)
yi2+Oi2=
(i =
1,2, * ,r).
X' is nonsingular.
In particular,
matrixV2 e C(n-r)x(n-r) such
a unitary
In a likemannerwe maydetermine
that
UlHW12V2=
(T, 0)
where T= diag (r1,r2,' ' ', Tr) and ri?0 (i = 1, 2, * *, r). Since, as above,
it
2
yi+ ri = 1, it followsfrom(A.2) thatT= -X.
Set U = diag (U1, U2) and V = diag (Vl, V2). Then theforegoingshowsthat
thematrix
x=
UHWV
ON THE
659
OF PSEUDO-INVERSES
PERTURBATION
in theform
canbe partitioned
k
r-k
(A.3)
X=
0/ o
O
I
k
r- k
n - 2r
r-k
n-2r
-it
0
0
0
0
X33
X34
X35
r-k
O0t
X43
X44
45
X53
X54
X55
we have
Sincecolumns1 and4 inthepartition
(A.3) areorthogonal,
'X34 = 0,
we have X34
and sincel' is nonsingular,
0. LikewiseX35,
X43,
and X53 are zero.
Fromtheorthogonality
ofcolumns1 and3 in (A.3) itfollowsthat
0,
XIX33=
-FiX'+
fromwhichitfollowsthatX33 = F'.
X is thusseento havetheform
The matrix
r-k
Ft
O
o
I
-11
O
r k
n - 2i
r-k
X=
k
-
ot
n-2r
r-k
o
O
X44
X45
X54
X55
F/
'
The matrix
X5(4
X5s)
Set
is unitary.
U2= diag(Ik, U3)U2
and U= diag(U1, U2).Then
UHWV= diag(Ir+k,
k
r-k
n-2r
/F' 0 -'
0 I 0
0
O
k
r-k
X=
k
r-k
n-2r
U3)X,
r-k
IX
F'
0
0
of,
isprecisely
involved
thedimensions
ofthematrices
which,
considering
(A.1). 0
660
G. W. STEWART
= R (B) andletX'
To establish
partoneofTheorem2.3,letX = R (A) andON
and 9/' denotetheirorthogonal
complements.
Assumethat
r=dim (Q)=dim( /)?m/2
inassuming
sinceinthesequelwe
thelastinequality,
(thereisnolossofgenerality
can also workwith6V and 0"). Let X = (X1,X2) and Y= (Y1, Y2) be unitary
matrices
withR (X1)= X andR ( Y1)= O/.Let
W =XHY=
(W,i
W12)
be partitionedconformally
withX and Y. If U = diag(U1, U2) and V=
diag(Vl, V2)arethematrices
whoseexistence
isinsured
byTheoremA. 1 andwe
set
(i=1, 2),
X, = X, U
X=(Xl, x2)
and
Y=(Y1,Y2),
(i=1,2),
Yi=YyVI
then
Xi Y1 = FX Y2 = (-X, 0),
X2
Note that R (Xj) =
Yi =
Y2(OI-r
0X2
and R(Yj) = O/.
Cn * XHCn, thebases X and Y become

Ifwenowmakethetransformation
\0/
(A.4)
Fr
\0
In -2rJ
(-z
anditis withthesebasesthatwe shallprovethefirst

partofTheorem2.3. First
notethat
>;2 -IFr o
PAPB= (XlX)(Y2Y2)=
??
\O
0/
Likewise
PBPA = (yl Yl')(X2X2') =
ON THE
PERTURBATION
661
OF PSEUDO-INVERSES
andthenonzerosingular
valuesofbothmatrices
areeasilyseentobe thenumbers
r-.Nowconsider
r2-I
PB
r:s
Fr
(F
=
PA=Y1Y1X1X1
h)
_z2
rs
5o
The nonzeroeigenvalues
aretheeigenvalues
ofthismatrix
ofthe2 x 2 matrices
(o'
(ri'Yi
im)
O'i
whichare easilyseen to be ?0-i.
The matrixdecomposition
in TheoremA.1 has
Notes and references.
intheworksof
itisimplicit
notbeenexplicitly
statedbefore;however,
apparently
DavisandKahan(1970)andBjorkandGolub(1973).Thediagonalelements
ofF
arethecosinesofthe"canonicalangles"betweenthesubspacesR(A) andR (B)
andthecolumnsofX1 and Y1formbiorthogonal
basessubtending
theseangles.
The use ofthesecanonicalbases,particularly
whentheyhavebeentransformed
intotheforms
(A.4),oftenenablesonetoobtainroutine
computational
proofsof
to
geometrictheoremsthatwouldotherwiserequireconsiderableingenuity
establish.
REFERENCES
N. N. ABDELMALEK
(1974), On thesolutionof thelinearleastsquaresproblemand pseudo-inverses,
Computing,13, pp. 215-228.
S. N. AFRIAT (1957), Orthogonaland obliqueprojectors
and thecharacteristics
ofpairsofvectorspaces,
Proc. CambridgePhilos. Soc., 53, pp. 800-816.
A. BEN-ISRAEL
(1966), On errorboundsforgeneralizedinverses,SIAM J. Numer. Anal., 3, pp.
585-592.
A. BEN-ISRAEL
AND T. N. E. GREVILLE
(1974), GeneralizedInverses: Theoryand Applications,
JohnWiley,New York.
A. BJORK AND G. H. GOLUB (1973), Numericalmethodsfor computingangles betweenlinear
subspaces,Math. Comp., 27, pp. 579-594.
T. L. BOULLION
AND P. L. ODELL
(1971), GeneralizedInverseMatrices,JohnWiley,New York.
CHANDLER
DAVIS
AND W. M. KAHAN
(1970), The rotationof eigenvectors
bya perturbation.
III,
SIAM J. Numer.Anal., 7, pp. 1-46.
I. C. GOHBERG
AND M. G. KREIN (1969), Introduction
totheTheoryofNonself-adjointOperators,
AmericanMathematicalSociety,Providence,R.I.
G. H. GOLUB (1965), Numericalmethodsforsolvinglinearleastsquaresproblems,Numer.Math.,7,
pp. 206-216.
G. H. GOLUB AND J.H. WILKINSON (1966), Noteon theiterative
ofleastsquaressolution,
refinement
Numer.Math., 9, pp. 139-148.
and nonlinearleast
G. H. GOLUB AND V. PEREYRA
(1973), The differentiation
of pseudoinverses
squaresproblemswhosevariablesseparate,SIAM J. Numer.Anal., 10, pp. 413-432.
(1975), Differentiation
ofpseudoinverses,
separablenonlinearleastsquaresproblems,and other
tales,manuscript.
R. J.HANSON AND C. L. LAWSON (1969), Extensionsand applicationsoftheHouseholderalgorithm
forsolvinglinearleastsquaresproblems,Math. Comp., 23, pp. 787-812.
J.Z. HEARON AND J.W. EVANS (1968), Differentiable
J.Res. Nat. Bur. Stand.,
generalizedinverses,
Sect. B, 72B, pp. 109-113.
662
G. W. STEWART
A. S. HOUSEHOLDER (1964), The TheoryofMatricesin NumericalAnalysis,Dover, New York.

T. KATO (1966), Perturbation
TheoryforLinear Operators,Springer-Verlag,
Berlin.
C. L. LAWSON AND R. J. HANSON (1974), Solving Least Squares Problems,Prentice-Hall,
Englewood Cliffs,N.J.
L. MIRSKY (1960), Symmetric
gaugefunctionsand unitarily
invariantnorms,Quart. J.Math. Oxford
Ser., 11, no. 2, pp. 55-59.
J. VON NEUMANN (1937), Some matrix-inequalities
and metrization
of matric-space,Tomsk. Univ.
Rev., 1, pp. 286-300.
M. PAVEL-PARVU AND A. KORGANOFF (1969), Iterationfunctionsforsolving
polynomialequations,
ConstructiveAspects of the FundamentalTheorem of Algebra, B. Dejon and P. Henrici,
eds., JohnWiley,New York.
R. PENROSE (1955), A generalizedinversefor matrices,Proc. Cambridge Philos. Soc., 51, pp.
506-513.
(1956), On bestapproximatesolutionof linearmatrixequations,Ibid., 52, pp. 17-19.
V. PEREYRA (1969), Stability
ofgeneralsystems
oflinearequations,
Aequat. Math.,2, pp. 194-206.
C. R. RAO AND S. K. MITRA (1971), GeneralizedInverseof Matricesand Its Applications,John
Wiley,New York.
A. VAN DER SLuIS (1975), Stability
ofthesolutionsoflinearleastsquaresproblems,
Numer.Math.,23,
pp. 241-254.
G. W. STEWART (1969), On thecontinuity
of thegeneralizedinverse,SIAM J. Appl. Math., 17, pp.
33-45.
(1973), Introduction
toMatrixComputations,
Academic Press,New York.
P.-A. WEDIN (1969), On pseudo-inverses
ofperturbed
matrices,
Lund Univ. Comput.Sci. Tech. Rep.,
Lund, Sweden.
(1973), Perturbation
theory
forpseudo-inverses,
BIT, 13, pp. 217-232.
J. H. WILKINSON (1965), The AlgebraicEigenvalueProblem,OxfordUniversityPress,London.

On The Perturbation of The Pseudoinverse

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

On The Perturbation of The Pseudoinverse

Uploaded by

Copyright:

Available Formats

On the Perturbation of Pseudo-Inverses, Projections and Linear Least Squares Problems

Vol. 19, No. 4, October 1977

ON THE PERTURBATION OF PSEUDO-INVERSES, PROJECTIONS

The pseudo-inverse(or Moore-Penrosegeneralized

(AA t)H =AA t,

andR (PA) = R (A). HencePA

* Received by the editorsAugust 18, 1975, and in revisedformFebruary15, 1976.

t ComputerScience Department,Universityof Maryland,College Park,Maryland20742. This

WhenA has fullcolumnrank,RA = I and thesolution

ofA andwe shallset

ofC', weshallbe atsomepainsto

UHEV and UHBV conforWe shallpartition

resultthatin thereducedform(2.1) the

inf IIAxII2 (i=1,2,

singularvalues of whichn - r are zero; A H has m singularvalues of whichm - r

Unitarilyinvariantmatrixnorms.A normon C Xnisa functionlii

It followsfrom(2.6) that(Pm,nregardedas a function

Thisnormcan also be defined

2.2 If A and B = A + E are nonsingular,

The bound(2.16) placesno restrictions

2. If IIPB- PA 112< 1, then rank (A) = rank (B).

3. If rank(B)?' rank(A), then

P1 - P2)J1= IIPA(I- P1)I

and ifrank(A) = rank(B), then

PBPA = PBPA = (Bt)HBHP3

whichestablishes(2.22). The inequality(2.23) followsupon takingnormsin

we shallsaythatR (A) and52(B) are

2.5. The matricesA and B are acute ifand onlyif

Then rank(B11) = rank(A1l), and B11 is nonsingular.Thus

toR (B) and

Letp andq be leftand

ifPHEk ?0, thenthenonzero

toR (A H). IfE21q= 0 andpHE12= 0, thentheunit

thatrank(B) = rank(A), Theorem2.5 showsthat

This familyis unitarily

showhow bad thingscan be by derivinglowerbounds.

rank(B) _ rank(A), then

yHBBty = yH(A +E)B ty

factthatA ty= A tPAy = 0 we have

Theorem 3.1 shows that the pseudo-inverseof a general matrixis not a

Proof.Both expressionscan be verifieddirectlyby replacingE withB -A,

whereA is givenin thefollowingtable:

11I11 arbitrary spectral

rank(B) ?rank (A). Let F1, F2, and F3 denote the

l|Bt- A t|12 = IF1+ F2112

Now sinceF1 + F2 = Bt(PBDA tPA+ PBPA),

and theresultfollowson combining(3.3), (3.6), and (3.7). Since thefinalboundis

The proofof thistheoremmaybe foundin Wedin (1973). The bound (3.8)

In thisformthe resultis almostanalogous to the bound (2.20) forthe inversein

is thatrank(B) = rank(A) as B approachesA.

~~[[1 + S.2(,F) ]1/2'

The functionfq,,is nota norm;however,ithas some usefulproperties.First,from

For smallF, *X,(F)is asymptoticto JIFII:

Our firstresultconcernsa ratherspecial matrix

whose singularvalues are

It followsthatthe singularvalues of G are givenby

Proof.As in the proofof Theorem3.4, we have

Thus the columnsof

can be expressedas a linearcombinationof the columnsof

The resultnow followsfromPenrose's conditions. [

In otherwords,ifrank(A +E) = rank(A), then

whereI,, is definedby (3.10).

From (3.14), Bt = Jtf2B11fJ1;hence