**LINEAR ALGEBRA **
**AND ITS **
**APPLICATIONS **
ELSEVIER Linear Algebra and its Applications 287 (1999) 77-85

**Orthogonality of matrices and some distance ** **problems **

### Rajendra Bhatia a,., Peter Semrl b,l

*Indian Statistical Institute, 7, S.J.S. Sansanwal Marg, New Delhi 110016, India *
*b Faeu•y o f Mechanical Engineering, Uni~:ersi O, o f Maribor, Smetanova 17, 2000 Maribor, Slovenia *

Received 18 February 1998; accepted 29 June 1998 Submitted by V. Mehrmann

Dedicated to Ludwig Elsner on the occasion of his 60th birthday

**Abstract **

If A and B are matrices such that IIA + *zBII ~ *IIA II for all complex numbers z, then A
is said to be orthogonal to B. We find necessary and sufficient conditions for this to be
the case. Some applications and generalisations are also discussed. © 1999 Elsevier
Science Inc. All rights reserved.

*Ke;vwords: *Birkhoff James orthogonality; Derivative; Norms; Distance problems

Let A and B be two n × n matrices. The matrix A will be identified with an o p e r a t o r acting on an n-dimensional Hilbert space H in the usual way. T h e symbol IIA II stands for the n o r m o f this operator. A is said to be o r t h o g o n a l to B (in the Birkhoff-James sense [7]) if JJA +zB]l ~> HAIl for every c o m p l e x n u m b e r z. In Section 1 o f this note we give a necessary a n d sufficient condition for A to be o r t h o g o n a l to B. T h e special case when B --- 1 can be applied to get some distance formulas for matrices as well as a simple p r o o f o f a well-known result o f Stampfli on the n o r m o f a derivation. In Section 2 we consider the a n a l o g o u s p r o b l e m when the n o r m ]].]i is replaced by the Schatten p - n o r m . The special case A = 1 o f this p r o b l e m has been studied by K i t t a n e h [8], and used to

* Corresponding author. E-mail: rbh@isid.ernet.in.

i E-mail: peter.semrl@uni-mb.si.

0024-3795/99/$ see front matter © 1999 Elsevier Science Inc. All rights reserved.

PII: S 0 0 2 4 - 3 7 9 5 ( 9 8 ) 1 0 1 3 4 - 9

78 *R. Bhatia, P. ¢Semrl I Linear Algebra and its Applications 287 (1999) 77~86 *

characterise matrices whose trace is zero. In Section 3 we make some remarks on how to extend some results from Section 1 to infinite-dimensional Hilbert spaces, and formulate a conjecture about orthogonality with respect to induced matrix norms.

**1. The operator norm **

**Theorem **1.1. *A matrix A is orthogonal to B if and only if there exists a unit *
*vector x E H such that *rlAxll = IIAI[ *and (Ax, Bx) = O. *

Proof. If such a vector x exists then

*[IA + zBll 2 >>. *I[(A *+ zB)xJ[ 2 *= ll,4xf[2 + *[zl2tlBxld 2 ~ *IIAx[I 2 = IIAI[ 2.

So, the sufficiency of the condition is obvious.

Before proving the converse in full generality we make a remark that serves three purposes. It gives a p r o o f in a special case, indicates why the condition of the theorem is a natural one, and establishes a connection with the theorem in Section 2.

It is well-known that the operator norm II,ll is not Fr6chet differentiable at all points. However, if A is a point at which this norm is differentiable, then there exists a unit vector x, unique upto a scalar multiple, such that [IAx[[ = HAIl, and such that for all B

d A B

*dt t=o"A + tB[[ = Re ( [-~[ x' x ) . *

See Theorem 3.1 of [1]. Using this, one can easily see that the statement of the theorem is true for all matrices A that are points o f differentiability of the norm [[. [[.

N o w let A be any matrix and suppose A is orthogonal to B. Let A = *UP be a *
polar decomposition o f A with U unitary and P positive. Then we have

*[IP+zU*BII >~ *[IP/[ = p[All

for all z. In other words, the distance o f P to the linear span of *U'B *is [IP][.

Hence, by the H a h n - B a n a c h theorem, there exists a linear functional 4, on the space of matrices such that [[~bl[ = 1, ~b(P) =

### I[PI[,

and*q~(U*B)*= 0. We can find a matrix T s u c h that ~b(X) = tr(XT) for all X. Since I[ ll : 1 the trace norm (the sum o f singular values) o f T must be 1. So, T has a polar decomposition

*T = * *sjuj * *V, *

\ i = l J

where *s~ *are singular values of T i n decreasing order, ~ j = l n *s: *-- 1, the vectors uj
form an orthonormal basis for H, and V is unitary. We have

*R. Bhatia, P. Semrl I Linear Algebra and its Applications 287 (1999) 77-86 * 79

n

### IIPll

= tr(PT) =*y ~ s s tr[Puj(V* uj)* ]*

j=l

n n n

*= Z j<pu , V*ujl <. * Zsjll jll E jllt'll = ItPII.

j=l /=1 j=l

Hence, if k is the rank of T (i.e., sk # 0, but s,+l = 0), then [[Pus[ [ = [[PH for
j = 1 , . . . , k ; and hence Pug. = [[P[[ug. F r o m the conditions for the C a u c h y -
Schwarz inequality to be an equality we conclude that *V'u: *is a scalar multiple
o f Pug., j = 1 , . . . , k . Obviously, these scalars must be positive, and so,

*V*uj = u: *for all j = 1 , . . . , k. It follows that T is o f the form

k

*Z juj.;, *

j=l

where *u s * belong to the eigenspace K of P corresponding to its maximal
eigenvalue HP][. Then *O(U*B) *= 0 implies

*k *

### Zsj<B*Uuj,.j) =0.

j=l

If Q is the orthoprojector on the linear span of the u j, then this equality can be rewritten as

k

*Z s j < Q B * UQuj, u j) = O. *

j=I

Since the numerical range of any operator is a convex set, there exists a unit vector x c K such that

*0 = <QB*UQx, x) = (B*Ux, x) = (Ux, Bx). *

So,

*(Ax, gx> = (UPx, Bx> = IIPIl(Ux, gx> = O. * *[] *

Notice that orthogonality is not a symmetric relation. The special cases when A or B is the identity are of particular interest [3,4,8,10].

Theorem 1.1 says that I is orthogonal to B if and only if W(B), the numerical range of B, contains 0. F o r another p r o o f of this see Remark 4 of [8].

The more complicated case when B = I has been important in problems related to derivations and operator approximations. In this case the theorem (in infinite dimensions) was proved by Stampfli ([10], Theorem 2). A different p r o o f attributed to Ando [3] can be found in [4] (p. 206). It is this p r o o f that we have adopted for the general case.

80 *R. Bhatia, P. Semrl / Linear Algebra and its Applications 287 (1999) 77~86 *

Problems o f a p p r o x i m a t i n g an o p e r a t o r by a simpler one have been o f in- terest to o p e r a t o r theorists [4], numerical analysts [6], and statisticians [9]. T h e second special result gives a f o r m u l a for the distance o f an o p e r a t o r to the class o f scalar operators. We have, by definition,

dist(A, CI) = rain IIA *+zZII. * (1.1)

zEC

If this m i n i m u m is attained at *Ao = A 4 - z o I * then A0 is o r t h o g o n a l to the
identity. T h e o r e m 1.1 then says that

dist(A, CI) =

### IIA011

=*m a x { J ( A o x , y)l:*

### Ilxll

=### Ilytt

-- 1 and x 3_ y}*= m a x { l ( A x , y)[: *

### Ilxll --Ilyll

= 1 and x A_y}. (1.2) This result is due to A n d o [3]. We will use it to calculate the diameter o f the unitary orbit of a matrix.T h e *u n i t a r y orbit *o f a matrix A is the set o f all matrices o f the form *UA U* *

where U is unitary. The diameter o f this set is
*dA = max{[[VAV* - UAU*[I: U, V *unitary }

= max{[lA - *UAU*[]: U *unitary}. (1.3)

Notice that this diameter is zero if and only if A is a scalar matrix. T h e fol- lowing t h e o r e m is, therefore, interesting.

Theorem 1.2. *F o r e v e r y m a t r i x A we have *

*dA *---- 2 dist(A, *C I ) . * (1.4)

Proof. F o r every unitary U and scalar z we have

*JIA - UAU*I[ = * II(A *- z l ) - U ( A - zI)U*[I *~<2[IA *- zll[. *

So,

*dA *~< 2 dist(A, CI).

As before we choose *Ao = A + zol *and an o r t h o g o n a l pair o f unit vectors x and
y such that

dist(A, CI) = [[A0[[ = *(Aox, y). *

By the condition for equality in the C a u c h y - S c h w a r z inequality we must have
A0x -- [[A0[[y. We can find a unitary U satisfying *Ux = x * and *Uy = - y . * T h e n

*UAoU*x *= -tlAo[[y. We have

*dA ~- dAo >~ *[[A0x - *UAoU*x[[ *= 2][A0[[ = 2 dist(A, CI). []

F r o m (1.3) and (1.4) we have

*R. Bhatia, P. Semrl / Linear Algebra and its Applieations 287 (1999) 77~86 * 81

m a x { inA *U - UA LI: u * unitary} = 2 dist(A, C I ) . (1.5)
I f X is a n y o p e r a t o r with [[XI[ = 1, then X can be written as X = ~ (V + W)
where V a n d W are unitary. (Use the singular value d e c o m p o s i t i o n o f X, a n d
observe t h a t every positive n u m b e r between 0 and 1 can be expressed as
1 (ei0 _1_ e-i0).) H e n c e we have

m a x IIAX - *NAil *= 2 dist(A, CI). (1.6)

IlXll=l

Recall that the o p e r a t o r hA (X) = *A X - XA *on the space o f matrices is called an
inner derivation. T h e preceding r e m a r k shows that the n o r m o f 6A is 2
dist(A, CI). This was p r o v e d (for o p e r a t o r s in a H i l b e r t space) b y Stampfli [10].

T h e p r o o f we have given for matrices is simpler. In Section 4 we will show h o w to p r o v e the result for infinite-dimensional Hilbert spaces.

A trivial u p p e r b o u n d for *dA *is 2llAI[. This b o u n d can be attained. F o r ex-
ample, any block diagonal m a t r i x o f the f o r m

is unitarily similar to

**[ oj0 **

A simple lower b o u n d for dx is given in o u r next p r o p o s i t i o n .

Proposition 1.3. *L e t A be any m a t r i x with singular values *Sl(A) /> ..- /> *s , ( A ) . *
*Then *

*d.~ >~ sl (A ) - * s,(A). (1.7)

Proof. Let z be a n y c o m p l e x n u m b e r with p o l a r f o r m z : *re i°. *Let A = *UP *be a
p o l a r d e c o m p o s i t i o n o f A. T h e n

*II A - zlII *= II P - z U * l l >~ i n f { l l P - z V l l : V unitary}

= inf{lIP - *rVll: v * unitary}.

By a t h e o r e m o f F a n a n d H o f f m a n , the value o f the last infimum is [IP - *rill *
(see [5], p. 276). So

m i n IlA - *zll[ >1 min lIP - * *rill *: min m a x [sj - r I

zEC r >/0 r ) 0 j

*= ½ (s, (A) - s , ( A ) ) . *
T h e p r o p o s i t i o n n o w follows f r o m T h e o r e m 1.2. []

82 *1L Bhatia, P. ~Semrl / Linear Algebra and its Applications 287 (1999) 77~86 *
If A is a Hermitian matrix then there is equality in (1.7).

**2. The Sehatten norms **

F o r 1 ~<p < c~, the Schatten p-norm o f A is defined as

### IIAllp =

*sj(A)) p*

*,*

where *sl(A) >>.... >>. sn(A) *are the singular values o f A.

If 1 < p < ec, then the norm II,llp is Fr~chet differentiable at every A. In this case

### d =0[IA +

*= P Re tr*

^{tB[IPp }*IAI p-IU*B,*

### (2.

1)for every B, where A =

*UIAI *

is a polar decomposition o f A. Here IAI = *(A'A) 1/2. *

I f p = 1 this is true if A is invertible. See [2] (Theorem 2.1) and [1] (Theorems 2.2 and 2.3).

As before, we say that A is orthogonal to B in the Schatten p-norm (for a given 1 ~<p < oe) if

### IIA +zBII~

I> IlAl[p f o r a l l z. (2.2)The case p = 2 is special. The quantity (A, B) -- tr A*B

defines an inner product on the space o f matrices, and the norm associated with this inner product is [].[[2. The condition (2.2) for orthogonality is then equivalent to the usual Hilbert space condition (A, B) = 0. Our next theorem includes this as a very special case.

**Theorem **2.1. *Let A have a polar decomposition A = *

*UIA[. *

*I f for any 1 <<.p < oe*

*we have*

tr[A[ p-I U*B = 0, (2.3)

*then A is orthogonal to B in the Schatten p-norm. The converse is true for all A, if *
*1 < p < c~, and for all invertible A, i f p = 1. *

Proof. If (2.3) is satisfied, then for all z tr

### IAI p

= tr [AIP-I([AI +zU*B).Hence, by H61der's Inequality ([5], p. 88),

*R. Bhatia, P. Semrl I Linear Algebra and its Applications 287 (1999) 77-86 * 83

tr

*IAI p <~ *

111AI ### ~-'

[Iql[ IA[ +*zU*BIIp *

^{= II[AI }

*p-' *

Ilq[[ A + *zBllp *

= [tr

*]AI(P-I)q] 1/q * IIA +zBIl~

= (tr *IAI")J/°IIA * + zBIl~,

where q is the index conjugate to p (i.e., 1/p

*+ 1/q *

= 1). Since
(tr *[ALP) '-'/q *

= (tr *]AqP) 1/p *

= ### IlAI]p,

this shows that

[IA[[p~< IIA

*+zB]lp *

forall z.
Conversely, if (2.2) is true, then Ilei°A +

*tnllp >! Ilei°A[Ip *

for all real t and 0. Using the expression (2.1) we see that this implies Re tr(lAI p-'

*e-i°U*B) *

= 0,
for all A if I < p < co, and for invertible A i f p = 1. Since this is true for all 0, we get (2.3). []

The following example shows that the case p = 1 is exceptional. If A = (10 0)0 and B = ( 00 ~ ) '

then

*[IA + zBIIl ~ *

IIAII1
However,
tr

*U*B *

= tr B ¢ 0.
for all z.

The ideas used in our proof of Theorem 2.1 are adopted from Kittaneh [8]

who restricted himself to the special case A = 1.

**3. Remarks **

Remark 3.1. Theorem 1.1 can be extended to the infinite-dimensional case with a small modification. Let A and B be bounded operators on an infinite- dimensional Hilbert space H. Then A is orthogonal to B if and only if there exists a sequence {xn} of unit vectors such that

### llAx.It--, IIAII,

and*(Axe, Bx~) ---, O. *

Indeed, if such a sequence {xn} exists then
8 4

So,

*R. Bhatia, P. Semrl I Linear Algebra and its Applications 287 (1999) 77~6 *
I[A +zBII 2 >/II(A +zB)x,,l[ 2

*= []Ax.[[ 2 + [zlZllBx.][ z + 2 R e ( ~ ( A x . , B x . ) ) . *

**IIA + zBll 2 ~ ** **lim supll(A + ****zB)x.II 2 ~ ****IIAId 2. **

To prove the converse we first note that T h e o r e m 1.1 can be reformulated in the following way: if A and B are operators acting on a finite-dimensional Hilbert space H then

min I[A

*+zBII *

*: max{l(Ax,y)[:*Ilxl[ = Ilyll = 1 and

*y ± Bx}.*

It follows that for operators A and B acting on an infinite-dimensional Hilbert space H we have

min [[A +zB[[ = *sup{l(Ax,y)[: * [Ix[[ : I[Y][ = 1 and *y ± Bx}. *

This implication was proved in the special case when B = I in [4] (p. 207). A
slight modification of the p r o o f yields the general case. Assume now that A is
orthogonal to B. Then rain *IIA + zB[] *= HAIl. Therefore we can find sequences
of unit vectors {xn}, {y~} E H such that *(Axn,y,) -~ *][A][ a n d y , ± *Bx,. *It follows
that *[[Axn[] ---* J[A[I, and consequently *

*Axn * *~0 *

Y° IIAx.II and

lim * (Ax.,Bx.)= *lim

*IpAx~II(y.,Bx.)= *

**O.**This completes the proof.

Remark 3.2. The statement following (1.6) about norms of derivations can also be proved for infinite-dimensional Hilbert spaces by a limiting argument.

Let H be an infinite-dimensional separable Hilbert space, and let A be a
bounded operator on H. Let {P.} be a sequence of finite rank projections in-
creasing to the identity. Denote by A. the finite rank oper~itor P,,A restricted to
the range of P~. Let min:~e [IA, - *zll[ = IIA,, - z,I[[. * F o r each n we have

sup

IlXll ~< 1

f l A X - g A l l ~ sup

IXl~ <~ 1

>~ sup

JtX]I ~< 1

= sup

IlXll <~ 1

*I IAp,~Yp. - p.XP.All *
*lIP. (AP.XP. - P,~P.A )P. II *

II

*(P.A P.) (P.XP.) - (P.XP.) (P.A Po)lf *

= 2][An -z,,III.

*R. Bhatia, P. Semrl / Linear Algebra and its Applications 287 (1999) 77-86 * 85
P a s s i n g to a s u b s e q u e n c e , if n e c e s s a r y , a s s u m e t h a t z, ~ z0. T h e n

l i m [tA, - *z,I[I *= [IA - *zoZrl >1 *dist(A, C I ) .

n ~ o c

H e n c e ,

**sup ** **IIAX ** **-XAII >/2 dist(A, CI). **

**IIAX**

IIXII ~< l

T h u s t h e n o r m o f t h e d e r i v a t i o n *6A *is e q u a l to 2 dist(A, C I ) .

R e m a r k 3.3. I n view o f T h e o r e m 1.1 we are t e m p t e d to m a k e t h e f o l l o w i n g c o n j e c t u r e . L e t I1.11 n o w r e p r e s e n t a n y n o r m o n t h e v e c t o r s p a c e C", a n d also t h e n o r m it i n d u c e s o n t h e s p a c e o f n x n m a t r i c e s a c t i n g as l i n e a r o p e r a t o r s o n C ". W e c o n j e c t u r e t h a t

IIA + *zBII >1 *IIAll for all z

if a n d o n l y if t h e r e exists a u n i t v e c t o r x s u c h t h a t

*IIAxll *

= IIAII a n d
*IIAx + zgxll >t *

IIAxll f o r all z.
**Acknowledgement **

T h i s w o r k w a s b e g u n d u r i n g t h e first a u t h o r ' s visit to S l o v e n i a i n S e p t e m b e r 1997. B o t h a u t h o r s a r e t h a n k f u l to t h e S l o v e n e M i n i s t r y o f Science a n d T e c h n o l o g y for its s u p p o r t .

**References **

[1] T.J. Abatzoglou, Norm derivatives on spaces of operators, Math. Ann. 239 (1979) 129--135.

[2] J.G. Aiken, J.A. Erdos, J.A. Goldstein, Unitary approximation of positive operators, Illinois J. Math. 24 (1980) 61-72.

[3] T. Ando, Distance to the set of thin operators, unpublished report, 1972.

[4] C. Apostol, L.A. Fialkow, D.A. Herrero, D. Voiculescu, Approximation of Hilbert Space Operators II, Pitman, Boston, 1984.

[5] R. Bhatia, Matrix Analysis, Springer, New York, 1997.

[6] N.J. Higham, Matrix nearness problems and applications, in: Applications of Matrix Theory, Oxford University Press, Oxford, 1989.

[7] R.C. James, Orthogonality and linear functionals in normed linear spaces, Trans. Amer.

Math. Soc. 61 (1947) 265 292.

[8] F. Kittaneh, On zero-trace matrices, Linear Algebra Appl. 151 (1991) 119-124.

[9] C.R. Rao, Matrix approximations and reduction of dimensionality in multivariate statistical analysis, in: Multivariate Analysis - V, North-Holland, Amsterdam, 1980.

[10] J.G. Stampfli, The norm of a derivation, Pacific J. Math. 33 (1970) 737 747.