Orthogonality of matrices and some distance problems

Download (0)

Full text


LINEAR ALGEBRA AND ITS APPLICATIONS ELSEVIER Linear Algebra and its Applications 287 (1999) 77-85

Orthogonality of matrices and some distance problems

Rajendra Bhatia a,., Peter Semrl b,l

Indian Statistical Institute, 7, S.J.S. Sansanwal Marg, New Delhi 110016, India b Faeu•y o f Mechanical Engineering, Uni~:ersi O, o f Maribor, Smetanova 17, 2000 Maribor, Slovenia

Received 18 February 1998; accepted 29 June 1998 Submitted by V. Mehrmann

Dedicated to Ludwig Elsner on the occasion of his 60th birthday


If A and B are matrices such that IIA + zBII ~ IIA II for all complex numbers z, then A is said to be orthogonal to B. We find necessary and sufficient conditions for this to be the case. Some applications and generalisations are also discussed. © 1999 Elsevier Science Inc. All rights reserved.

Ke;vwords: Birkhoff James orthogonality; Derivative; Norms; Distance problems

Let A and B be two n × n matrices. The matrix A will be identified with an o p e r a t o r acting on an n-dimensional Hilbert space H in the usual way. T h e symbol IIA II stands for the n o r m o f this operator. A is said to be o r t h o g o n a l to B (in the Birkhoff-James sense [7]) if JJA +zB]l ~> HAIl for every c o m p l e x n u m b e r z. In Section 1 o f this note we give a necessary a n d sufficient condition for A to be o r t h o g o n a l to B. T h e special case when B --- 1 can be applied to get some distance formulas for matrices as well as a simple p r o o f o f a well-known result o f Stampfli on the n o r m o f a derivation. In Section 2 we consider the a n a l o g o u s p r o b l e m when the n o r m ]].]i is replaced by the Schatten p - n o r m . The special case A = 1 o f this p r o b l e m has been studied by K i t t a n e h [8], and used to

* Corresponding author. E-mail: rbh@isid.ernet.in.

i E-mail: peter.semrl@uni-mb.si.

0024-3795/99/$ see front matter © 1999 Elsevier Science Inc. All rights reserved.

PII: S 0 0 2 4 - 3 7 9 5 ( 9 8 ) 1 0 1 3 4 - 9


78 R. Bhatia, P. ¢Semrl I Linear Algebra and its Applications 287 (1999) 77~86

characterise matrices whose trace is zero. In Section 3 we make some remarks on how to extend some results from Section 1 to infinite-dimensional Hilbert spaces, and formulate a conjecture about orthogonality with respect to induced matrix norms.

1. The operator norm

Theorem 1.1. A matrix A is orthogonal to B if and only if there exists a unit vector x E H such that rlAxll = IIAI[ and (Ax, Bx) = O.

Proof. If such a vector x exists then

[IA + zBll 2 >>. I[(A + zB)xJ[ 2 = ll,4xf[2 + [zl2tlBxld 2 ~ IIAx[I 2 = IIAI[ 2.

So, the sufficiency of the condition is obvious.

Before proving the converse in full generality we make a remark that serves three purposes. It gives a p r o o f in a special case, indicates why the condition of the theorem is a natural one, and establishes a connection with the theorem in Section 2.

It is well-known that the operator norm II,ll is not Fr6chet differentiable at all points. However, if A is a point at which this norm is differentiable, then there exists a unit vector x, unique upto a scalar multiple, such that [IAx[[ = HAIl, and such that for all B

d A B

dt t=o"A + tB[[ = Re ( [-~[ x' x ) .

See Theorem 3.1 of [1]. Using this, one can easily see that the statement of the theorem is true for all matrices A that are points o f differentiability of the norm [[. [[.

N o w let A be any matrix and suppose A is orthogonal to B. Let A = UP be a polar decomposition o f A with U unitary and P positive. Then we have

[IP+zU*BII >~ [IP/[ = p[All

for all z. In other words, the distance o f P to the linear span of U'B is [IP][.

Hence, by the H a h n - B a n a c h theorem, there exists a linear functional 4, on the space of matrices such that [[~bl[ = 1, ~b(P) =


and q~(U*B) = 0. We can find a matrix T s u c h that ~b(X) = tr(XT) for all X. Since I[ ll : 1 the trace norm (the sum o f singular values) o f T must be 1. So, T has a polar decomposition

T = sjuj V,

\ i = l J

where s~ are singular values of T i n decreasing order, ~ j = l n s: -- 1, the vectors uj form an orthonormal basis for H, and V is unitary. We have


R. Bhatia, P. Semrl I Linear Algebra and its Applications 287 (1999) 77-86 79



= tr(PT) = y ~ s s tr[Puj(V* uj)* ]


n n n

= Z j<pu , V*ujl <. Zsjll jll E jllt'll = ItPII.

j=l /=1 j=l

Hence, if k is the rank of T (i.e., sk # 0, but s,+l = 0), then [[Pus[ [ = [[PH for j = 1 , . . . , k ; and hence Pug. = [[P[[ug. F r o m the conditions for the C a u c h y - Schwarz inequality to be an equality we conclude that V'u: is a scalar multiple o f Pug., j = 1 , . . . , k . Obviously, these scalars must be positive, and so,

V*uj = u: for all j = 1 , . . . , k. It follows that T is o f the form


Z juj.;,


where u s belong to the eigenspace K of P corresponding to its maximal eigenvalue HP][. Then O(U*B) = 0 implies


Zsj<B*Uuj,.j) =0.


If Q is the orthoprojector on the linear span of the u j, then this equality can be rewritten as


Z s j < Q B * UQuj, u j) = O.


Since the numerical range of any operator is a convex set, there exists a unit vector x c K such that

0 = <QB*UQx, x) = (B*Ux, x) = (Ux, Bx).


(Ax, gx> = (UPx, Bx> = IIPIl(Ux, gx> = O. []

Notice that orthogonality is not a symmetric relation. The special cases when A or B is the identity are of particular interest [3,4,8,10].

Theorem 1.1 says that I is orthogonal to B if and only if W(B), the numerical range of B, contains 0. F o r another p r o o f of this see Remark 4 of [8].

The more complicated case when B = I has been important in problems related to derivations and operator approximations. In this case the theorem (in infinite dimensions) was proved by Stampfli ([10], Theorem 2). A different p r o o f attributed to Ando [3] can be found in [4] (p. 206). It is this p r o o f that we have adopted for the general case.


80 R. Bhatia, P. Semrl / Linear Algebra and its Applications 287 (1999) 77~86

Problems o f a p p r o x i m a t i n g an o p e r a t o r by a simpler one have been o f in- terest to o p e r a t o r theorists [4], numerical analysts [6], and statisticians [9]. T h e second special result gives a f o r m u l a for the distance o f an o p e r a t o r to the class o f scalar operators. We have, by definition,

dist(A, CI) = rain IIA +zZII. (1.1)


If this m i n i m u m is attained at Ao = A 4 - z o I then A0 is o r t h o g o n a l to the identity. T h e o r e m 1.1 then says that

dist(A, CI) =


= m a x { J ( A o x , y)l:




-- 1 and x 3_ y}

= m a x { l ( A x , y)[:

Ilxll --Ilyll

= 1 and x A_y}. (1.2) This result is due to A n d o [3]. We will use it to calculate the diameter o f the unitary orbit of a matrix.

T h e u n i t a r y orbit o f a matrix A is the set o f all matrices o f the form UA U*

where U is unitary. The diameter o f this set is dA = max{[[VAV* - UAU*[I: U, V unitary }

= max{[lA - UAU*[]: U unitary}. (1.3)

Notice that this diameter is zero if and only if A is a scalar matrix. T h e fol- lowing t h e o r e m is, therefore, interesting.

Theorem 1.2. F o r e v e r y m a t r i x A we have

dA ---- 2 dist(A, C I ) . (1.4)

Proof. F o r every unitary U and scalar z we have

JIA - UAU*I[ = II(A - z l ) - U ( A - zI)U*[I ~<2[IA - zll[.


dA ~< 2 dist(A, CI).

As before we choose Ao = A + zol and an o r t h o g o n a l pair o f unit vectors x and y such that

dist(A, CI) = [[A0[[ = (Aox, y).

By the condition for equality in the C a u c h y - S c h w a r z inequality we must have A0x -- [[A0[[y. We can find a unitary U satisfying Ux = x and Uy = - y . T h e n

UAoU*x = -tlAo[[y. We have

dA ~- dAo >~ [[A0x - UAoU*x[[ = 2][A0[[ = 2 dist(A, CI). []

F r o m (1.3) and (1.4) we have


R. Bhatia, P. Semrl / Linear Algebra and its Applieations 287 (1999) 77~86 81

m a x { inA U - UA LI: u unitary} = 2 dist(A, C I ) . (1.5) I f X is a n y o p e r a t o r with [[XI[ = 1, then X can be written as X = ~ (V + W) where V a n d W are unitary. (Use the singular value d e c o m p o s i t i o n o f X, a n d observe t h a t every positive n u m b e r between 0 and 1 can be expressed as 1 (ei0 _1_ e-i0).) H e n c e we have

m a x IIAX - NAil = 2 dist(A, CI). (1.6)


Recall that the o p e r a t o r hA (X) = A X - XA on the space o f matrices is called an inner derivation. T h e preceding r e m a r k shows that the n o r m o f 6A is 2 dist(A, CI). This was p r o v e d (for o p e r a t o r s in a H i l b e r t space) b y Stampfli [10].

T h e p r o o f we have given for matrices is simpler. In Section 4 we will show h o w to p r o v e the result for infinite-dimensional Hilbert spaces.

A trivial u p p e r b o u n d for dA is 2llAI[. This b o u n d can be attained. F o r ex- ample, any block diagonal m a t r i x o f the f o r m

is unitarily similar to

[ oj0

A simple lower b o u n d for dx is given in o u r next p r o p o s i t i o n .

Proposition 1.3. L e t A be any m a t r i x with singular values Sl(A) /> ..- /> s , ( A ) . Then

d.~ >~ sl (A ) - s,(A). (1.7)

Proof. Let z be a n y c o m p l e x n u m b e r with p o l a r f o r m z : re i°. Let A = UP be a p o l a r d e c o m p o s i t i o n o f A. T h e n

II A - zlII = II P - z U * l l >~ i n f { l l P - z V l l : V unitary}

= inf{lIP - rVll: v unitary}.

By a t h e o r e m o f F a n a n d H o f f m a n , the value o f the last infimum is [IP - rill (see [5], p. 276). So

m i n IlA - zll[ >1 min lIP - rill : min m a x [sj - r I

zEC r >/0 r ) 0 j

= ½ (s, (A) - s , ( A ) ) . T h e p r o p o s i t i o n n o w follows f r o m T h e o r e m 1.2. []


82 1L Bhatia, P. ~Semrl / Linear Algebra and its Applications 287 (1999) 77~86 If A is a Hermitian matrix then there is equality in (1.7).

2. The Sehatten norms

F o r 1 ~<p < c~, the Schatten p-norm o f A is defined as

IIAllp =

sj(A)) p ,

where sl(A) >>.... >>. sn(A) are the singular values o f A.

If 1 < p < ec, then the norm II,llp is Fr~chet differentiable at every A. In this case

d =0[IA +

tB[IPp = P Re tr IAI p-IU*B,



for every B, where A =


is a polar decomposition o f A. Here IAI =

(A'A) 1/2.

I f p = 1 this is true if A is invertible. See [2] (Theorem 2.1) and [1] (Theorems 2.2 and 2.3).

As before, we say that A is orthogonal to B in the Schatten p-norm (for a given 1 ~<p < oe) if


I> IlAl[p f o r a l l z. (2.2)

The case p = 2 is special. The quantity (A, B) -- tr A*B

defines an inner product on the space o f matrices, and the norm associated with this inner product is [].[[2. The condition (2.2) for orthogonality is then equivalent to the usual Hilbert space condition (A, B) = 0. Our next theorem includes this as a very special case.

Theorem 2.1. Let A have a polar decomposition A =


I f for any 1 <<.p < oe we have

tr[A[ p-I U*B = 0, (2.3)

then A is orthogonal to B in the Schatten p-norm. The converse is true for all A, if 1 < p < c~, and for all invertible A, i f p = 1.

Proof. If (2.3) is satisfied, then for all z tr


= tr [AIP-I([AI +zU*B).

Hence, by H61der's Inequality ([5], p. 88),


R. Bhatia, P. Semrl I Linear Algebra and its Applications 287 (1999) 77-86 83


IAI p <~



[Iql[ IA[ +




Ilq[[ A +


= [tr

]AI(P-I)q] 1/q IIA +zBIl~

= (tr

IAI")J/°IIA + zBIl~,

where q is the index conjugate to p (i.e., 1/p

+ 1/q

= 1). Since (tr

[ALP) '-'/q

= (tr

]AqP) 1/p



this shows that

[IA[[p~< IIA


forall z.

Conversely, if (2.2) is true, then Ilei°A +

tnllp >! Ilei°A[Ip

for all real t and 0. Using the expression (2.1) we see that this implies Re tr(lAI p-'


= 0,

for all A if I < p < co, and for invertible A i f p = 1. Since this is true for all 0, we get (2.3). []

The following example shows that the case p = 1 is exceptional. If A = (10 0)0 and B = ( 00 ~ ) '


[IA + zBIIl ~

IIAII1 However,



= tr B ¢ 0.

for all z.

The ideas used in our proof of Theorem 2.1 are adopted from Kittaneh [8]

who restricted himself to the special case A = 1.

3. Remarks

Remark 3.1. Theorem 1.1 can be extended to the infinite-dimensional case with a small modification. Let A and B be bounded operators on an infinite- dimensional Hilbert space H. Then A is orthogonal to B if and only if there exists a sequence {xn} of unit vectors such that

llAx.It--, IIAII,


(Axe, Bx~) ---, O.

Indeed, if such a sequence {xn} exists then


8 4


R. Bhatia, P. Semrl I Linear Algebra and its Applications 287 (1999) 77~6 I[A +zBII 2 >/II(A +zB)x,,l[ 2

= []Ax.[[ 2 + [zlZllBx.][ z + 2 R e ( ~ ( A x . , B x . ) ) .

IIA + zBll 2 ~ lim supll(A + zB)x.II 2 ~ IIAId 2.

To prove the converse we first note that T h e o r e m 1.1 can be reformulated in the following way: if A and B are operators acting on a finite-dimensional Hilbert space H then

min I[A


: max{l(Ax,y)[: Ilxl[ = Ilyll = 1 and y ± Bx}.

It follows that for operators A and B acting on an infinite-dimensional Hilbert space H we have

min [[A +zB[[ = sup{l(Ax,y)[: [Ix[[ : I[Y][ = 1 and y ± Bx}.

This implication was proved in the special case when B = I in [4] (p. 207). A slight modification of the p r o o f yields the general case. Assume now that A is orthogonal to B. Then rain IIA + zB[] = HAIl. Therefore we can find sequences of unit vectors {xn}, {y~} E H such that (Axn,y,) -~ ][A][ a n d y , ± Bx,. It follows that [[Axn[] ---* J[A[I, and consequently

Axn ~0

Y° IIAx.II and

lim (Ax.,Bx.)= lim



This completes the proof.

Remark 3.2. The statement following (1.6) about norms of derivations can also be proved for infinite-dimensional Hilbert spaces by a limiting argument.

Let H be an infinite-dimensional separable Hilbert space, and let A be a bounded operator on H. Let {P.} be a sequence of finite rank projections in- creasing to the identity. Denote by A. the finite rank oper~itor P,,A restricted to the range of P~. Let min:~e [IA, - zll[ = IIA,, - z,I[[. F o r each n we have


IlXll ~< 1

f l A X - g A l l ~ sup

IXl~ <~ 1

>~ sup

JtX]I ~< 1

= sup

IlXll <~ 1

I IAp,~Yp. - p.XP.All lIP. (AP.XP. - P,~P.A )P. II


(P.A P.) (P.XP.) - (P.XP.) (P.A Po)lf

= 2][An -z,,III.


R. Bhatia, P. Semrl / Linear Algebra and its Applications 287 (1999) 77-86 85 P a s s i n g to a s u b s e q u e n c e , if n e c e s s a r y , a s s u m e t h a t z, ~ z0. T h e n

l i m [tA, - z,I[I = [IA - zoZrl >1 dist(A, C I ) .

n ~ o c

H e n c e ,

sup IIAX -XAII >/2 dist(A, CI).

IIXII ~< l

T h u s t h e n o r m o f t h e d e r i v a t i o n 6A is e q u a l to 2 dist(A, C I ) .

R e m a r k 3.3. I n view o f T h e o r e m 1.1 we are t e m p t e d to m a k e t h e f o l l o w i n g c o n j e c t u r e . L e t I1.11 n o w r e p r e s e n t a n y n o r m o n t h e v e c t o r s p a c e C", a n d also t h e n o r m it i n d u c e s o n t h e s p a c e o f n x n m a t r i c e s a c t i n g as l i n e a r o p e r a t o r s o n C ". W e c o n j e c t u r e t h a t

IIA + zBII >1 IIAll for all z

if a n d o n l y if t h e r e exists a u n i t v e c t o r x s u c h t h a t


= IIAII a n d

IIAx + zgxll >t

IIAxll f o r all z.


T h i s w o r k w a s b e g u n d u r i n g t h e first a u t h o r ' s visit to S l o v e n i a i n S e p t e m b e r 1997. B o t h a u t h o r s a r e t h a n k f u l to t h e S l o v e n e M i n i s t r y o f Science a n d T e c h n o l o g y for its s u p p o r t .


[1] T.J. Abatzoglou, Norm derivatives on spaces of operators, Math. Ann. 239 (1979) 129--135.

[2] J.G. Aiken, J.A. Erdos, J.A. Goldstein, Unitary approximation of positive operators, Illinois J. Math. 24 (1980) 61-72.

[3] T. Ando, Distance to the set of thin operators, unpublished report, 1972.

[4] C. Apostol, L.A. Fialkow, D.A. Herrero, D. Voiculescu, Approximation of Hilbert Space Operators II, Pitman, Boston, 1984.

[5] R. Bhatia, Matrix Analysis, Springer, New York, 1997.

[6] N.J. Higham, Matrix nearness problems and applications, in: Applications of Matrix Theory, Oxford University Press, Oxford, 1989.

[7] R.C. James, Orthogonality and linear functionals in normed linear spaces, Trans. Amer.

Math. Soc. 61 (1947) 265 292.

[8] F. Kittaneh, On zero-trace matrices, Linear Algebra Appl. 151 (1991) 119-124.

[9] C.R. Rao, Matrix approximations and reduction of dimensionality in multivariate statistical analysis, in: Multivariate Analysis - V, North-Holland, Amsterdam, 1980.

[10] J.G. Stampfli, The norm of a derivation, Pacific J. Math. 33 (1970) 737 747.




Related subjects :