Paper: Multivariate Analysis

(1)

Subject: Statistics

Paper: Multivariate Analysis

Module: Factor Analysis : Estimation Techniques 1

1 / 17

(2)

Development Team

Principal investigator: Dr. Bhaswati Ganguli, Professor, Department of Statistics, University of Calcutta

Paper co-ordinator: Dr. Sugata SenRoy,Professor, Department of Statistics, University of Calcutta

Content writer: Souvik Bandyopadhyay, Senior Lecturer, Indian Institute of Public Health, Hyderabad

Content reviewer: Dr. Kalyan Das,Professor, Department of Statistics, University of Calcutta

2 / 17

(3)

Development Team

2 / 17

(4)

Development Team

2 / 17

(5)

Development Team

2 / 17

(6)

Factor Model

I X= (X1, X2, . . . , Xm)⁰ ←observed random vector.

I F= (F₁, F₂, . . . , F_p)⁰ ← set of common factors.

I = (₁, ₂, . . . , _m)⁰ ← the specific factors (or errors) Model : X=µ+LF+

I p usually much smaller than m.

I Assume thatE(F) =0 andCov(F) =I_p

I Also E() =0 and Cov() = Ψ =diag((ψ_jj))

I Cov(F,) =0.

3 / 17

(7)

Factor Model

I Cov(F,) =0.

3 / 17

(8)

Factor Model

I Cov(F,) =0.

3 / 17

(9)

Factor Model

I Cov(F,) =0.

3 / 17

(10)

Factor Model

I Cov(F,) =0.

3 / 17

(11)

Factor Model

I Cov(F,) =0.

3 / 17

(12)

Factor Model

I Cov(F,) =0.

3 / 17

(13)

Implication

The assumptions lead to

I E(X) =µ+LE(F) +E() =µ

Σ = Cov(X) =E(X−µ)(X−µ)⁰

= E[(LF+)(LF+)⁰]

= LE(FF⁰)L⁰+E(F⁰)L⁰+LE(F⁰) +E(⁰)

= LL⁰+ Ψ

I Also (X−µ)F⁰= (LF+)F⁰ =LFF⁰+F⁰

⇒Cov(X,F) =E(X−µ)F⁰ =LE(FF⁰) +E(F⁰) =L.

4 / 17

(14)

Implication

The assumptions lead to

I E(X) =µ+LE(F) +E() =µ

Σ = Cov(X) =E(X−µ)(X−µ)⁰

= E[(LF+)(LF+)⁰]

= LE(FF⁰)L⁰+E(F⁰)L⁰+LE(F⁰) +E(⁰)

= LL⁰+ Ψ

I Also (X−µ)F⁰= (LF+)F⁰ =LFF⁰+F⁰

⇒Cov(X,F) =E(X−µ)F⁰ =LE(FF⁰) +E(F⁰) =L.

4 / 17

(15)

Estimation of the parameters

I The parameters involved are (L,Ψ,µ).

I There are mpunknowns inL andm unknowns in each of Ψ andµ, a total ofm(p+ 2)parameters.

I Let X1,X2, . . . ,Xn be a sample of size n.

I µis estimated as µˆ =X= _n¹Pn i=1X_i.

I Henceforth we will work withX_i−Xso that we have mp+m parameters ofL andΨ to estimate.

I Usually either of two estmation techniques are used :

I Principal Component Method

I Maximum Likelihood Method

5 / 17

(16)

5 / 17

(17)

5 / 17

(18)

5 / 17

(19)

5 / 17

(20)

5 / 17

(21)

5 / 17

(22)

5 / 17

(23)

Principal Component Method

I Σ =Cov(X) is p.d.

I Let λ₁ ≥λ₂≥. . .≥λ_m be the eigen-values ofΣ with p1,p2, . . . ,pm the corresponding eigen-vectors.

I Then the spectral decomposition of Σcan be obtained as Σ = λ₁p₁p⁰₁+λ₂p₂p⁰₂. . .+λ_mp_mp⁰_m

= √ λ1p1

√λ2p2 . . . √ λmpm







√λ₁p₁

√λ2p2

...

√λ_mp_m







= L˜L˜⁰.

I Here we have a representation of Σsimilar to that of the model, except thatΨ = 0since we have m factors and hence no reduction.

6 / 17

(24)

= √ λ1p1

√λ2p2 . . . √ λmpm







√λ₁p₁

√λ2p2

...

√λ_mp_m







= L˜L˜⁰.

6 / 17

(25)

= √ λ1p1

√λ2p2 . . . √ λmpm







√λ₁p₁

√λ2p2

...

√λ_mp_m







= L˜L˜⁰.

6 / 17

(26)

= √ λ1p1

√λ2p2 . . . √ λmpm







√λ₁p₁

√λ2p2

...

√λ_mp_m







= L˜L˜⁰.

6 / 17

(27)

I The principal component method suggests dropping the last (m−p) columns of L˜ and approximating Σas

Σ ≈ √ λ1p1

√λ2p2 . . . p λppp







√√λ1p1

λ₂p₂ ... pλ_pp_p







= LL⁰.

I Since the λ_j’s are ordered, this means dropping the contributions of the smallest(m−p) eigen-values.

7 / 17

(28)

I The principal component method suggests dropping the last (m−p) columns of L˜ and approximating Σas

Σ ≈ √ λ1p1

√λ2p2 . . . p λppp







√√λ1p1

λ₂p₂ ... pλ_pp_p







= LL⁰.

I Since the λ_j’s are ordered, this means dropping the contributions of the smallest(m−p) eigen-values.

7 / 17

(29)

I The difference matrix is L˜L˜⁰−LL⁰

= p

λp+1pp+1 . . . √ λmpm







pλp+1pp+1

...

√λ_mp_m





.

I The diagonal elements ofΨare then obtained from the diagonal elements of L˜L˜⁰−LL⁰, i.e.

ψ_jj =σ_jj−

p

X

k=1

l_jk² .

I Ψis then constructed by taking ψ_jj on the diagonal and 0’s elsewhere.

8 / 17

(30)

= p

λp+1pp+1 . . . √ λmpm







pλp+1pp+1

...

√λ_mp_m





.

ψ_jj =σ_jj−

p

X

k=1

l_jk² .

8 / 17

(31)

= p

λp+1pp+1 . . . √ λmpm







pλp+1pp+1

...

√λ_mp_m





.

ψ_jj =σ_jj−

p

X

k=1

l_jk² .

8 / 17

(32)

I How to use the technique ?

I Estimate Σby the sample covariance matrix S = _n−1¹ Pn

i=1(X_i−X)(X_i−X)⁰.

I Obtain the eigen-values and eigen-vectors of S, λˆ₁≥ˆλ₂ ≥. . .≥λˆ_m andpˆ₁,pˆ₂, . . . ,pˆ_m.

I Then Lˆ = h p

ˆλ1pˆ1

pλˆ2pˆ2 . . .

qˆλppˆp

i

and ψˆjj =sjj−

p

X

k=1

ˆl_jk²

with Ψ =ˆ diag(( ˆψjj)).

9 / 17

(33)

i=1(X_i−X)(X_i−X)⁰.

I Then Lˆ = h p

ˆλ1pˆ1

pλˆ2pˆ2 . . .

qˆλppˆp

i

and ψˆjj =sjj−

p

X

k=1

ˆl_jk²

9 / 17

(34)

i=1(X_i−X)(X_i−X)⁰.

I Then Lˆ = h p

ˆλ1pˆ1

pλˆ2pˆ2 . . .

qˆλppˆp

i

and ψˆjj =sjj−

p

X

k=1

ˆl_jk²

9 / 17

(35)

i=1(X_i−X)(X_i−X)⁰.

I Then Lˆ = h p

ˆλ1pˆ1

pλˆ2pˆ2 . . .

qˆλppˆp

i

and ψˆjj =sjj−

p

X

k=1

ˆl_jk²

9 / 17

(36)

How good is the fit ?

I The residual matrix is S−( ˆLLˆ⁰+ ˆΨ).

I Notice that this has diagonal elements 0.

I For a good fit the non-diagonal elements of the residual matrix must be small.

I Since

sum of squares of S−( ˆLLˆ⁰+ ˆΨ)≤λˆp+1+ ˆλp+2+. . .+ ˆλm

small ˆλ_p+1,λˆ_p+2, . . . ,λˆ_m ensures that the fit is good.

10 / 17

(37)

I Since

10 / 17

(38)

I Since

10 / 17

(39)

I Since

10 / 17

(40)

The choice of p

I But what should be the choice ofp ?

I It is obvious from the spectral decomposition that the

contribution of the1^st factor tos_jj, the variability ofX_j isˆl_j1² .

I Observe that the total variability of the observed values is tr S =s₁₁+s₂₂+. . .+s_mm.

I Thus contribution of the 1^st factor to the total variability is ˆl₁₁² + ˆl²₂₁+. . .+ ˆl²_m1 = (

q

λˆ1pˆ1)⁰( q

λˆ1pˆ1) = ˆλ1

I Thus proportion of variability explained by the 1^st factor is λˆ₁/(s₁₁+s₂₂+. . .+s_mm).

11 / 17

(41)

The choice of p

q

λˆ1pˆ1)⁰( q

λˆ1pˆ1) = ˆλ1

11 / 17

(42)

The choice of p

q

λˆ1pˆ1)⁰( q

λˆ1pˆ1) = ˆλ1

11 / 17

(43)

The choice of p

q

λˆ1pˆ1)⁰( q

λˆ1pˆ1) = ˆλ1

11 / 17

(44)

The choice of p

q

λˆ1pˆ1)⁰( q

λˆ1pˆ1) = ˆλ1

11 / 17

(45)

The choice of p

I In general, proportion of variability explained by thej^th factor is

ˆλ_j/(s₁₁+s₂₂+. . .+s_mm).

I The cumulative proportion of variability explained by the first p factors is

(ˆλ1+. . .+ ˆλp)/(s11+s22+. . .+smm).

I Choose p such that this proportion is high (say90% or95%).

12 / 17

(46)

The choice of p

ˆλ_j/(s₁₁+s₂₂+. . .+s_mm).

(ˆλ1+. . .+ ˆλp)/(s11+s22+. . .+smm).

12 / 17

(47)

The choice of p

ˆλ_j/(s₁₁+s₂₂+. . .+s_mm).

(ˆλ1+. . .+ ˆλp)/(s11+s22+. . .+smm).

12 / 17

(48)

Example (Johnson & Wichern (2009)

I In a consumer preference study, customers were asked to rate the attributes of a new product on a 7-point scale. The attributes were Taste (X1), Good buy for money (X2), Flavour (X₃), Suitable for snack (X₄), Provides lots of energy (X₅).The correlation matrix (R) thus obtained was

1.00 0.02 0.96 0.42 0.01 0.02 1.00 0.13 0.71 0.85 0.96 0.13 1.00 0.50 0.11 0.42 0.71 0.50 1.00 0.79 0.01 0.85 0.11 0.79 1.00

I The first eigen-value of R is λˆ₁ = 2.85.

I Since tr(R) = 5, the first factor explains(2.85/5)% = 57% of the variability.

13 / 17

(49)

1.00 0.02 0.96 0.42 0.01 0.02 1.00 0.13 0.71 0.85 0.96 0.13 1.00 0.50 0.11 0.42 0.71 0.50 1.00 0.79 0.01 0.85 0.11 0.79 1.00

13 / 17

(50)

1.00 0.02 0.96 0.42 0.01 0.02 1.00 0.13 0.71 0.85 0.96 0.13 1.00 0.50 0.11 0.42 0.71 0.50 1.00 0.79 0.01 0.85 0.11 0.79 1.00

13 / 17

(51)

I The second eigen-value of R isλˆ2= 1.81.

I Together, the first two factors explain

(2.85 + 1.81)/5 % = 93.2% of the variability.

I So one can work with 2 factors instead of the 5 variables.

I The factor loadings, communalities and uniquenesses are obtained as

F1 F2 h²_j ψjj

X1 0.56 0.82 0.98 0.02 X₂ 0.78 -0.53 0.88 0.12 X₃ 0.65 0.75 0.98 0.02 X4 0.94 -0.10 0.89 0.11 X5 0.80 -0.54 0.93 0.07

14 / 17

(52)

F1 F2 h²_j ψjj

X1 0.56 0.82 0.98 0.02 X₂ 0.78 -0.53 0.88 0.12 X₃ 0.65 0.75 0.98 0.02 X4 0.94 -0.10 0.89 0.11 X5 0.80 -0.54 0.93 0.07

14 / 17

(53)

F1 F2 h²_j ψjj

X1 0.56 0.82 0.98 0.02 X₂ 0.78 -0.53 0.88 0.12 X₃ 0.65 0.75 0.98 0.02 X4 0.94 -0.10 0.89 0.11 X5 0.80 -0.54 0.93 0.07

14 / 17

(54)

F1 F2 h²_j ψjj

X1 0.56 0.82 0.98 0.02 X₂ 0.78 -0.53 0.88 0.12 X₃ 0.65 0.75 0.98 0.02 X4 0.94 -0.10 0.89 0.11 X5 0.80 -0.54 0.93 0.07

14 / 17

(55)

Principal Factor Solution

I The principal factor solution is a technique often used to solve for the eigen-values and eigen-vectors.

I Notice that the diagonal elements of R should be 1 =h²_jj+ψj or h²_jj = 1−ψj.

I Supposeψ_j =ψ_j^∗.

I Then for h^∗2_jj = 1−ψ_j^∗, write the correlation matrix R^∗ as h^∗2₁₁ r12 . . . r1m

r₂₁ h^∗2₂₂ . . . r_2m . . . . rm1 rm2 . . . h^∗2_mm

15 / 17

(56)

r₂₁ h^∗2₂₂ . . . r_2m . . . . rm1 rm2 . . . h^∗2_mm

15 / 17

(57)

r₂₁ h^∗2₂₂ . . . r_2m . . . . rm1 rm2 . . . h^∗2_mm

15 / 17

(58)

r₂₁ h^∗2₂₂ . . . r_2m . . . . rm1 rm2 . . . h^∗2_mm

15 / 17

(59)

I Using the principal component method on R^∗ to get L^∗ = p

λ^∗₁p^∗₁ p

λ^∗₂p^∗₂ . . . p

λ^∗_mp^∗_m , whereλ^∗_j’s and p^∗_j’s are respectively the eigen-values and eigen-vectors ofR^∗.

I Also ψ^∗_jj = 1−Pp k=1ˆl^∗2_jk.

I Updating with this new ψ^∗_jj, iterate till convergence.

I A possible initial solution is

h^∗2_jj = 1−ψ_j^∗= 1− 1 r^jj, wherer^jj are elements ofR^∗−1.

16 / 17

(60)

λ^∗₁p^∗₁ p

λ^∗₂p^∗₂ . . . p

16 / 17

(61)

λ^∗₁p^∗₁ p

λ^∗₂p^∗₂ . . . p

16 / 17

(62)

λ^∗₁p^∗₁ p

λ^∗₂p^∗₂ . . . p

16 / 17

(63)

Summary

I Methods of estimation of factor models are introduced.

I The Principal Component technique of estimating the factor model parameters is discussed.

I A numerical illustration is used to expalin the method.

I The principal factor solution technique is discussed.

17 / 17

(64)

Summary

17 / 17

(65)

Summary

17 / 17

(66)

Summary

17 / 17