LINEAR ALGEBRA AND ITS APPLICATIONS ELSEVIER Linear Algebra and its Applications 287 (1999) 77-85
Orthogonality of matrices and some distance problems
Rajendra Bhatia a,., Peter Semrl b,l
Indian Statistical Institute, 7, S.J.S. Sansanwal Marg, New Delhi 110016, India b Faeu•y o f Mechanical Engineering, Uni~:ersi O, o f Maribor, Smetanova 17, 2000 Maribor, Slovenia
Received 18 February 1998; accepted 29 June 1998 Submitted by V. Mehrmann
Dedicated to Ludwig Elsner on the occasion of his 60th birthday
Abstract
If A and B are matrices such that IIA + zBII ~ IIA II for all complex numbers z, then A is said to be orthogonal to B. We find necessary and sufficient conditions for this to be the case. Some applications and generalisations are also discussed. © 1999 Elsevier Science Inc. All rights reserved.
Ke;vwords: Birkhoff James orthogonality; Derivative; Norms; Distance problems
Let A and B be two n × n matrices. The matrix A will be identified with an o p e r a t o r acting on an n-dimensional Hilbert space H in the usual way. T h e symbol IIA II stands for the n o r m o f this operator. A is said to be o r t h o g o n a l to B (in the Birkhoff-James sense [7]) if JJA +zB]l ~> HAIl for every c o m p l e x n u m b e r z. In Section 1 o f this note we give a necessary a n d sufficient condition for A to be o r t h o g o n a l to B. T h e special case when B --- 1 can be applied to get some distance formulas for matrices as well as a simple p r o o f o f a well-known result o f Stampfli on the n o r m o f a derivation. In Section 2 we consider the a n a l o g o u s p r o b l e m when the n o r m ]].]i is replaced by the Schatten p - n o r m . The special case A = 1 o f this p r o b l e m has been studied by K i t t a n e h [8], and used to
* Corresponding author. E-mail: rbh@isid.ernet.in.
i E-mail: peter.semrl@uni-mb.si.
0024-3795/99/$ see front matter © 1999 Elsevier Science Inc. All rights reserved.
PII: S 0 0 2 4 - 3 7 9 5 ( 9 8 ) 1 0 1 3 4 - 9
78 R. Bhatia, P. ¢Semrl I Linear Algebra and its Applications 287 (1999) 77~86
characterise matrices whose trace is zero. In Section 3 we make some remarks on how to extend some results from Section 1 to infinite-dimensional Hilbert spaces, and formulate a conjecture about orthogonality with respect to induced matrix norms.
1. The operator norm
Theorem 1.1. A matrix A is orthogonal to B if and only if there exists a unit vector x E H such that rlAxll = IIAI[ and (Ax, Bx) = O.
Proof. If such a vector x exists then
[IA + zBll 2 >>. I[(A + zB)xJ[ 2 = ll,4xf[2 + [zl2tlBxld 2 ~ IIAx[I 2 = IIAI[ 2.
So, the sufficiency of the condition is obvious.
Before proving the converse in full generality we make a remark that serves three purposes. It gives a p r o o f in a special case, indicates why the condition of the theorem is a natural one, and establishes a connection with the theorem in Section 2.
It is well-known that the operator norm II,ll is not Fr6chet differentiable at all points. However, if A is a point at which this norm is differentiable, then there exists a unit vector x, unique upto a scalar multiple, such that [IAx[[ = HAIl, and such that for all B
d A B
dt t=o"A + tB[[ = Re ( [-~[ x' x ) .
See Theorem 3.1 of [1]. Using this, one can easily see that the statement of the theorem is true for all matrices A that are points o f differentiability of the norm [[. [[.
N o w let A be any matrix and suppose A is orthogonal to B. Let A = UP be a polar decomposition o f A with U unitary and P positive. Then we have
[IP+zU*BII >~ [IP/[ = p[All
for all z. In other words, the distance o f P to the linear span of U'B is [IP][.
Hence, by the H a h n - B a n a c h theorem, there exists a linear functional 4, on the space of matrices such that [[~bl[ = 1, ~b(P) =
I[PI[,
and q~(U*B) = 0. We can find a matrix T s u c h that ~b(X) = tr(XT) for all X. Since I[ ll : 1 the trace norm (the sum o f singular values) o f T must be 1. So, T has a polar decompositionT = sjuj V,
\ i = l J
where s~ are singular values of T i n decreasing order, ~ j = l n s: -- 1, the vectors uj form an orthonormal basis for H, and V is unitary. We have
R. Bhatia, P. Semrl I Linear Algebra and its Applications 287 (1999) 77-86 79
n
IIPll
= tr(PT) = y ~ s s tr[Puj(V* uj)* ]j=l
n n n
= Z j<pu , V*ujl <. Zsjll jll E jllt'll = ItPII.
j=l /=1 j=l
Hence, if k is the rank of T (i.e., sk # 0, but s,+l = 0), then [[Pus[ [ = [[PH for j = 1 , . . . , k ; and hence Pug. = [[P[[ug. F r o m the conditions for the C a u c h y - Schwarz inequality to be an equality we conclude that V'u: is a scalar multiple o f Pug., j = 1 , . . . , k . Obviously, these scalars must be positive, and so,
V*uj = u: for all j = 1 , . . . , k. It follows that T is o f the form
k
Z juj.;,
j=l
where u s belong to the eigenspace K of P corresponding to its maximal eigenvalue HP][. Then O(U*B) = 0 implies
k
Zsj<B*Uuj,.j) =0.
j=l
If Q is the orthoprojector on the linear span of the u j, then this equality can be rewritten as
k
Z s j < Q B * UQuj, u j) = O.
j=I
Since the numerical range of any operator is a convex set, there exists a unit vector x c K such that
0 = <QB*UQx, x) = (B*Ux, x) = (Ux, Bx).
So,
(Ax, gx> = (UPx, Bx> = IIPIl(Ux, gx> = O. []
Notice that orthogonality is not a symmetric relation. The special cases when A or B is the identity are of particular interest [3,4,8,10].
Theorem 1.1 says that I is orthogonal to B if and only if W(B), the numerical range of B, contains 0. F o r another p r o o f of this see Remark 4 of [8].
The more complicated case when B = I has been important in problems related to derivations and operator approximations. In this case the theorem (in infinite dimensions) was proved by Stampfli ([10], Theorem 2). A different p r o o f attributed to Ando [3] can be found in [4] (p. 206). It is this p r o o f that we have adopted for the general case.
80 R. Bhatia, P. Semrl / Linear Algebra and its Applications 287 (1999) 77~86
Problems o f a p p r o x i m a t i n g an o p e r a t o r by a simpler one have been o f in- terest to o p e r a t o r theorists [4], numerical analysts [6], and statisticians [9]. T h e second special result gives a f o r m u l a for the distance o f an o p e r a t o r to the class o f scalar operators. We have, by definition,
dist(A, CI) = rain IIA +zZII. (1.1)
zEC
If this m i n i m u m is attained at Ao = A 4 - z o I then A0 is o r t h o g o n a l to the identity. T h e o r e m 1.1 then says that
dist(A, CI) =
IIA011
= m a x { J ( A o x , y)l:Ilxll
=Ilytt
-- 1 and x 3_ y}= m a x { l ( A x , y)[:
Ilxll --Ilyll
= 1 and x A_y}. (1.2) This result is due to A n d o [3]. We will use it to calculate the diameter o f the unitary orbit of a matrix.T h e u n i t a r y orbit o f a matrix A is the set o f all matrices o f the form UA U*
where U is unitary. The diameter o f this set is dA = max{[[VAV* - UAU*[I: U, V unitary }
= max{[lA - UAU*[]: U unitary}. (1.3)
Notice that this diameter is zero if and only if A is a scalar matrix. T h e fol- lowing t h e o r e m is, therefore, interesting.
Theorem 1.2. F o r e v e r y m a t r i x A we have
dA ---- 2 dist(A, C I ) . (1.4)
Proof. F o r every unitary U and scalar z we have
JIA - UAU*I[ = II(A - z l ) - U ( A - zI)U*[I ~<2[IA - zll[.
So,
dA ~< 2 dist(A, CI).
As before we choose Ao = A + zol and an o r t h o g o n a l pair o f unit vectors x and y such that
dist(A, CI) = [[A0[[ = (Aox, y).
By the condition for equality in the C a u c h y - S c h w a r z inequality we must have A0x -- [[A0[[y. We can find a unitary U satisfying Ux = x and Uy = - y . T h e n
UAoU*x = -tlAo[[y. We have
dA ~- dAo >~ [[A0x - UAoU*x[[ = 2][A0[[ = 2 dist(A, CI). []
F r o m (1.3) and (1.4) we have
R. Bhatia, P. Semrl / Linear Algebra and its Applieations 287 (1999) 77~86 81
m a x { inA U - UA LI: u unitary} = 2 dist(A, C I ) . (1.5) I f X is a n y o p e r a t o r with [[XI[ = 1, then X can be written as X = ~ (V + W) where V a n d W are unitary. (Use the singular value d e c o m p o s i t i o n o f X, a n d observe t h a t every positive n u m b e r between 0 and 1 can be expressed as 1 (ei0 _1_ e-i0).) H e n c e we have
m a x IIAX - NAil = 2 dist(A, CI). (1.6)
IlXll=l
Recall that the o p e r a t o r hA (X) = A X - XA on the space o f matrices is called an inner derivation. T h e preceding r e m a r k shows that the n o r m o f 6A is 2 dist(A, CI). This was p r o v e d (for o p e r a t o r s in a H i l b e r t space) b y Stampfli [10].
T h e p r o o f we have given for matrices is simpler. In Section 4 we will show h o w to p r o v e the result for infinite-dimensional Hilbert spaces.
A trivial u p p e r b o u n d for dA is 2llAI[. This b o u n d can be attained. F o r ex- ample, any block diagonal m a t r i x o f the f o r m
is unitarily similar to
[ oj0
A simple lower b o u n d for dx is given in o u r next p r o p o s i t i o n .
Proposition 1.3. L e t A be any m a t r i x with singular values Sl(A) /> ..- /> s , ( A ) . Then
d.~ >~ sl (A ) - s,(A). (1.7)
Proof. Let z be a n y c o m p l e x n u m b e r with p o l a r f o r m z : re i°. Let A = UP be a p o l a r d e c o m p o s i t i o n o f A. T h e n
II A - zlII = II P - z U * l l >~ i n f { l l P - z V l l : V unitary}
= inf{lIP - rVll: v unitary}.
By a t h e o r e m o f F a n a n d H o f f m a n , the value o f the last infimum is [IP - rill (see [5], p. 276). So
m i n IlA - zll[ >1 min lIP - rill : min m a x [sj - r I
zEC r >/0 r ) 0 j
= ½ (s, (A) - s , ( A ) ) . T h e p r o p o s i t i o n n o w follows f r o m T h e o r e m 1.2. []
82 1L Bhatia, P. ~Semrl / Linear Algebra and its Applications 287 (1999) 77~86 If A is a Hermitian matrix then there is equality in (1.7).
2. The Sehatten norms
F o r 1 ~<p < c~, the Schatten p-norm o f A is defined as
IIAllp =
sj(A)) p ,where sl(A) >>.... >>. sn(A) are the singular values o f A.
If 1 < p < ec, then the norm II,llp is Fr~chet differentiable at every A. In this case
d =0[IA +
tB[IPp = P Re tr IAI p-IU*B,(2.
1)for every B, where A =
UIAI
is a polar decomposition o f A. Here IAI =(A'A) 1/2.
I f p = 1 this is true if A is invertible. See [2] (Theorem 2.1) and [1] (Theorems 2.2 and 2.3).
As before, we say that A is orthogonal to B in the Schatten p-norm (for a given 1 ~<p < oe) if
IIA +zBII~
I> IlAl[p f o r a l l z. (2.2)The case p = 2 is special. The quantity (A, B) -- tr A*B
defines an inner product on the space o f matrices, and the norm associated with this inner product is [].[[2. The condition (2.2) for orthogonality is then equivalent to the usual Hilbert space condition (A, B) = 0. Our next theorem includes this as a very special case.
Theorem 2.1. Let A have a polar decomposition A =
UIA[.
I f for any 1 <<.p < oe we havetr[A[ p-I U*B = 0, (2.3)
then A is orthogonal to B in the Schatten p-norm. The converse is true for all A, if 1 < p < c~, and for all invertible A, i f p = 1.
Proof. If (2.3) is satisfied, then for all z tr
IAI p
= tr [AIP-I([AI +zU*B).Hence, by H61der's Inequality ([5], p. 88),
R. Bhatia, P. Semrl I Linear Algebra and its Applications 287 (1999) 77-86 83
tr
IAI p <~
111AI~-'
[Iql[ IA[ +zU*BIIp
= II[AIp-'
Ilq[[ A +zBllp
= [tr
]AI(P-I)q] 1/q IIA +zBIl~
= (trIAI")J/°IIA + zBIl~,
where q is the index conjugate to p (i.e., 1/p
+ 1/q
= 1). Since (tr[ALP) '-'/q
= (tr]AqP) 1/p
=IlAI]p,
this shows that
[IA[[p~< IIA
+zB]lp
forall z.Conversely, if (2.2) is true, then Ilei°A +
tnllp >! Ilei°A[Ip
for all real t and 0. Using the expression (2.1) we see that this implies Re tr(lAI p-'
e-i°U*B)
= 0,for all A if I < p < co, and for invertible A i f p = 1. Since this is true for all 0, we get (2.3). []
The following example shows that the case p = 1 is exceptional. If A = (10 0)0 and B = ( 00 ~ ) '
then
[IA + zBIIl ~
IIAII1 However,tr
U*B
= tr B ¢ 0.for all z.
The ideas used in our proof of Theorem 2.1 are adopted from Kittaneh [8]
who restricted himself to the special case A = 1.
3. Remarks
Remark 3.1. Theorem 1.1 can be extended to the infinite-dimensional case with a small modification. Let A and B be bounded operators on an infinite- dimensional Hilbert space H. Then A is orthogonal to B if and only if there exists a sequence {xn} of unit vectors such that
llAx.It--, IIAII,
and(Axe, Bx~) ---, O.
Indeed, if such a sequence {xn} exists then8 4
So,
R. Bhatia, P. Semrl I Linear Algebra and its Applications 287 (1999) 77~6 I[A +zBII 2 >/II(A +zB)x,,l[ 2
= []Ax.[[ 2 + [zlZllBx.][ z + 2 R e ( ~ ( A x . , B x . ) ) .
IIA + zBll 2 ~ lim supll(A + zB)x.II 2 ~ IIAId 2.
To prove the converse we first note that T h e o r e m 1.1 can be reformulated in the following way: if A and B are operators acting on a finite-dimensional Hilbert space H then
min I[A
+zBII
: max{l(Ax,y)[: Ilxl[ = Ilyll = 1 and y ± Bx}.It follows that for operators A and B acting on an infinite-dimensional Hilbert space H we have
min [[A +zB[[ = sup{l(Ax,y)[: [Ix[[ : I[Y][ = 1 and y ± Bx}.
This implication was proved in the special case when B = I in [4] (p. 207). A slight modification of the p r o o f yields the general case. Assume now that A is orthogonal to B. Then rain IIA + zB[] = HAIl. Therefore we can find sequences of unit vectors {xn}, {y~} E H such that (Axn,y,) -~ ][A][ a n d y , ± Bx,. It follows that [[Axn[] ---* J[A[I, and consequently
Axn ~0
Y° IIAx.II and
lim (Ax.,Bx.)= lim
IpAx~II(y.,Bx.)=
O.This completes the proof.
Remark 3.2. The statement following (1.6) about norms of derivations can also be proved for infinite-dimensional Hilbert spaces by a limiting argument.
Let H be an infinite-dimensional separable Hilbert space, and let A be a bounded operator on H. Let {P.} be a sequence of finite rank projections in- creasing to the identity. Denote by A. the finite rank oper~itor P,,A restricted to the range of P~. Let min:~e [IA, - zll[ = IIA,, - z,I[[. F o r each n we have
sup
IlXll ~< 1
f l A X - g A l l ~ sup
IXl~ <~ 1
>~ sup
JtX]I ~< 1
= sup
IlXll <~ 1
I IAp,~Yp. - p.XP.All lIP. (AP.XP. - P,~P.A )P. II
II
(P.A P.) (P.XP.) - (P.XP.) (P.A Po)lf
= 2][An -z,,III.
R. Bhatia, P. Semrl / Linear Algebra and its Applications 287 (1999) 77-86 85 P a s s i n g to a s u b s e q u e n c e , if n e c e s s a r y , a s s u m e t h a t z, ~ z0. T h e n
l i m [tA, - z,I[I = [IA - zoZrl >1 dist(A, C I ) .
n ~ o c
H e n c e ,
sup IIAX -XAII >/2 dist(A, CI).
IIXII ~< l
T h u s t h e n o r m o f t h e d e r i v a t i o n 6A is e q u a l to 2 dist(A, C I ) .
R e m a r k 3.3. I n view o f T h e o r e m 1.1 we are t e m p t e d to m a k e t h e f o l l o w i n g c o n j e c t u r e . L e t I1.11 n o w r e p r e s e n t a n y n o r m o n t h e v e c t o r s p a c e C", a n d also t h e n o r m it i n d u c e s o n t h e s p a c e o f n x n m a t r i c e s a c t i n g as l i n e a r o p e r a t o r s o n C ". W e c o n j e c t u r e t h a t
IIA + zBII >1 IIAll for all z
if a n d o n l y if t h e r e exists a u n i t v e c t o r x s u c h t h a t
IIAxll
= IIAII a n dIIAx + zgxll >t
IIAxll f o r all z.Acknowledgement
T h i s w o r k w a s b e g u n d u r i n g t h e first a u t h o r ' s visit to S l o v e n i a i n S e p t e m b e r 1997. B o t h a u t h o r s a r e t h a n k f u l to t h e S l o v e n e M i n i s t r y o f Science a n d T e c h n o l o g y for its s u p p o r t .
References
[1] T.J. Abatzoglou, Norm derivatives on spaces of operators, Math. Ann. 239 (1979) 129--135.
[2] J.G. Aiken, J.A. Erdos, J.A. Goldstein, Unitary approximation of positive operators, Illinois J. Math. 24 (1980) 61-72.
[3] T. Ando, Distance to the set of thin operators, unpublished report, 1972.
[4] C. Apostol, L.A. Fialkow, D.A. Herrero, D. Voiculescu, Approximation of Hilbert Space Operators II, Pitman, Boston, 1984.
[5] R. Bhatia, Matrix Analysis, Springer, New York, 1997.
[6] N.J. Higham, Matrix nearness problems and applications, in: Applications of Matrix Theory, Oxford University Press, Oxford, 1989.
[7] R.C. James, Orthogonality and linear functionals in normed linear spaces, Trans. Amer.
Math. Soc. 61 (1947) 265 292.
[8] F. Kittaneh, On zero-trace matrices, Linear Algebra Appl. 151 (1991) 119-124.
[9] C.R. Rao, Matrix approximations and reduction of dimensionality in multivariate statistical analysis, in: Multivariate Analysis - V, North-Holland, Amsterdam, 1980.
[10] J.G. Stampfli, The norm of a derivation, Pacific J. Math. 33 (1970) 737 747.