**Subject: Statistics**

**Paper: Biostatistics**

**Module39:Matched Analysis-I**

**Development Team**

Principal investigator: *Dr. Bhaswati Ganguli,**Department of*
*Statistics, University of Calcutta*

Paper co-ordinator: *Dr. Sugata SenRoy,**Department of Statistics,*
*University of Calcutta*

Content writer: *Dr.Atanu Bhattacharjee,* *Division of Clinical*
*Research and Biostatistics, Malabar Cancer Centre*
Content reviewer: *Dr.Indranil Mukhopadhyay,**Indian Statistical*

*Institute, Kolkata*

**Introduction**

* Case-control studies are an appropriate and effective means of studying rare diseases.

* Matching of cases and controls is frequently adopted to control the effects of known potential confounding variables.

* The analysis of matched data needs specific statistical methods.

**Introduction**

* Case-control studies are an appropriate and effective means of studying rare diseases.

* Matching of cases and controls is frequently adopted to control the effects of known potential confounding variables.

* The analysis of matched data needs specific statistical methods.

**Introduction**

* Case-control studies are an appropriate and effective means of studying rare diseases.

* Matching of cases and controls is frequently adopted to control the effects of known potential confounding variables.

* The analysis of matched data needs specific statistical methods.

**Stratification and Matching**

Stratified analysis is applicable to get an adjusted estimate and a test of the risk difference between the treatment or exposure groups through adjusting for the effect of an intervening confounding factor.

* Stratified analysis is alike to using a regression model as the basis for the adjustment.

Matching Analysis is an alternative approach to control for the effects of a covariate.

* Each member of a group is matched to a member of the other group with respect to the values of one or more covariates.

**Stratification and Matching**

Stratified analysis is applicable to get an adjusted estimate and a test of the risk difference between the treatment or exposure groups through adjusting for the effect of an intervening confounding factor.

* Stratified analysis is alike to using a regression model as the basis for the adjustment.

Matching Analysis is an alternative approach to control for the effects of a covariate.

* Each member of a group is matched to a member of the other group with respect to the values of one or more covariates.

**Stratification and Matching**

Stratified analysis is applicable to get an adjusted estimate and a test of the risk difference between the treatment or exposure groups through adjusting for the effect of an intervening confounding factor.

* Stratified analysis is alike to using a regression model as the basis for the adjustment.

Matching Analysis is an alternative approach to control for the effects of a covariate.

* Each member of a group is matched to a member of the other group with respect to the values of one or more covariates.

**Stratification and Matching**

* Stratified analysis is alike to using a regression model as the basis for the adjustment.

Matching Analysis is an alternative approach to control for the effects of a covariate.

**Frequency Matching**

Frequency Matching The members of the comparison group are sampled within separate categories of a discrete covariate class like sex (male/female), decade of age (0-9 years, 10-19 years, etc.) and thereafter members of each group are matched within each category.

Example The cases may be stratified by sex and decade of age. Then within each category, such as females ages between 40-49 years, a separate sample of controls is selected from the control population in that category (i.e. females ages between 40-49 years) and therafter fequency of females in cases and control groups are compared.

**Frequency Matching**

Frequency Matching The members of the comparison group are sampled within separate categories of a discrete covariate class like sex (male/female), decade of age (0-9 years, 10-19 years, etc.) and thereafter members of each group are matched within each category.

Example The cases may be stratified by sex and decade of age. Then within each category, such as females ages between 40-49 years, a separate sample of controls is selected from the control population in that category (i.e. females ages between 40-49 years) and therafter fequency of females in cases and control groups are compared.

**Frequency Matched Study**

* A separate samples of cases and controls are observed in covariate categories.

* The goal is to get adequate numbers of subjects from each group for each stratum to provide a sufficient overall comparison between groups.

* A sufficient number of exposed and non-exposed cases and controls be sampled within each stratum to calculate and compare disease frequency.

**Frequency Matched Study**

* A separate samples of cases and controls are observed in covariate categories.

* The goal is to get adequate numbers of subjects from each group for each stratum to provide a sufficient overall comparison between groups.

* A sufficient number of exposed and non-exposed cases and controls be sampled within each stratum to calculate and compare disease frequency.

**Frequency Matched Study**

* A separate samples of cases and controls are observed in covariate categories.

* The goal is to get adequate numbers of subjects from each group for each stratum to provide a sufficient overall comparison between groups.

* A sufficient number of exposed and non-exposed cases and controls be sampled within each stratum to calculate and compare disease frequency.

**Odds Ratio Estimation**

Title1:- 2x2 contigency table from a matched analysis Control Exposed Control Not-exposed

Cases Exposed a b a+b

Cases Not-exposed c d c+d

a+c b+d T

**Odds Ratio Estimation**

OR is invariant to study design.

*E*= 1for exposed,0for notexposed
*D*= 1for diseased,0for notdiseased

*logitP*[D= 1*|E] =α*_{0}+*βE*
*logitP*[E = 1*|D] =α*^{∗}_{0}+*βD*

To eliminate the nuisance parameter*α*^{∗}_{0}, we usally
condition on the marginal table i.e. total no of
exposed in a pair.

**Odds Ratio Estimation**

Consider each pair as a 2X2 table.

Pair1 Control Control

Exposed Not-exposed Total

Cases Exposed 1 0 1

Cases Not-exposed 0 0 0

1 0 1

**Odds Ratio Estimation**

Consider each pair as a 2X2 table.

Pair2 Control Control

Exposed Not-exposed Total

Cases Exposed 0 0 0

Cases Not-exposed 0 1 1

0 1 1

**Odds Ratio Estimation**

Consider each pair as a 2X2 table.

Pair3 Control Control

Exposed Not-exposed Total

Cases Exposed 0 1 1

Cases Not-exposed 0 0 0

0 1 1

**Odds Ratio Estimation**

Consider each pair as a 2X2 table.

Pair4 Control Control

Exposed Not-exposed Total

Cases Exposed 0 0 0

Cases Not-exposed 1 0 1

1 0 1

**Odds Ratio Estimation**

Now,

[ ^{e}^{α}

*∗*0 +*β*

1+e^{α∗}^{0}^{+β}*.* ^{1}

1+e^{α∗}^{0}]
[ ^{e}^{α}

*∗*0+β

1+e^{α∗}^{0}^{+β}*.* ^{1}

1+e^{α∗}^{0}] + [ ^{1}

1+e^{α∗}^{0}^{+β}*.* ^{e}^{α}

*∗*0

1+e^{α∗}^{0}]

= 1

1 +*ψ* (1)

Thus

loglik= ( *ψ*

*ψ*+ 1)* ^{b}*( 1

1 +*ψ*)* ^{c}* (2)
logL=

*b[logψ−log(ψ*+ 1)]

*−clog(1 +ψ)*(3) logL=

*blogψ−*(b+

*c)log(ψ*+ 1)] (4)

**Odds Ratio Estimation**

Thus,

*δlogL*
*δψ* = *b*

*ψ−* *b*+*c*

1 +*ψ* = 0 (5)
*b−bψ−bψ−cψ*= 0 (6)

*ψ*ˆ= *b*

*c* =*M LE* (7)

The standard error of *ψ*ˆmay be derived from the
inverse information matrix.

**Frequency Matching Example**

Title2:- Smoker and Non-Smoker Data

Age Case Case Control Control

Smoker Non-smoker Smoker Non-smoker

20-29 16 2 12 5

30-39 18 4 22 15

40-49 20 6 18 15

* The sampling steps for this work was to first select all 66 cases and stratify these cases based on age interval.

* Within each age stratum, controls were then selected and their exposure status explored for all cases and controls.

* It reveals3independent strata with an independent
2*×*2 table within each stratum.

* Apply stratified analysis(such as a Mantel-Haenszel

**Frequency Matching Example**

Title2:- Smoker and Non-Smoker Data

Age Case Case Control Control

Smoker Non-smoker Smoker Non-smoker

20-29 16 2 12 5

30-39 18 4 22 15

40-49 20 6 18 15

* The sampling steps for this work was to first select all 66 cases and stratify these cases based on age interval.

* Within each age stratum, controls were then selected and their exposure status explored for all cases and controls.

* It reveals3independent strata with an independent
2*×*2 table within each stratum.

* Apply stratified analysis(such as a Mantel-Haenszel

**Frequency Matching Example**

Title2:- Smoker and Non-Smoker Data

Age Case Case Control Control

Smoker Non-smoker Smoker Non-smoker

20-29 16 2 12 5

30-39 18 4 22 15

40-49 20 6 18 15

* The sampling steps for this work was to first select all 66 cases and stratify these cases based on age interval.

* Within each age stratum, controls were then selected and their exposure status explored for all cases and controls.

* It reveals3independent strata with an independent
2*×*2 table within each stratum.

* Apply stratified analysis(such as a Mantel-Haenszel

**Frequency Matching Example**

Title2:- Smoker and Non-Smoker Data

Age Case Case Control Control

Smoker Non-smoker Smoker Non-smoker

20-29 16 2 12 5

30-39 18 4 22 15

40-49 20 6 18 15

* It reveals3independent strata with an independent
2*×*2 table within each stratum.

* Apply stratified analysis(such as a Mantel-Haenszel

**McNemar’s Large Sample Test**

* Large-sample tests can be adopted from

multinomial/binomial distribution through normal approximation.

**McNemar’s Large Sample Test**

Let the cell propotion is assumed with *p** _{ij}* =

^{n}

_{N}*, for the*

^{ij}*ijth cell and*

*E(p*

*ij*) =

*π*

*ij*

*, i*= 1,2;

*j*= 1,2.

*p*_{ij}*∼N*(π_{ij}*,*∑
)

∑= 1
*N* =

*π*_{11}(1*−**π*_{11}) *−**π*_{11}*π*_{12} *−**π*_{11}*π*_{21} *−**π*_{11}*π*_{22}

*−π*11*π*12 *π*12(1*−**π*12) *−π*12*π*21 *−π*12*π*22

*−**π*_{11}*π*_{21} *−**π*_{12}*π*_{21}) *−**π*_{21}(1*−**π*_{21}) *−**π*_{21}*π*_{22}

*−**π*_{11}*π*_{22} *−**π*_{12}*π*_{22}) *−**π*_{21}*π*_{22} *−**π*_{22}(1*−**π*_{22})
*.*

**McNemar’s Large Sample Test**

*H*0 :*π*12=*π*21, we wish to apply the test statistics
based on the difference in the discordant proportions
*p*12*−p*21 of the form

*z*= *p*_{12}*−p*_{21}

√*V*ˆ(p_{12}*−p*_{21}*|H*_{0})

(8)

with the variance of the difference evaluated under the null hypothesis.

*V*(p12*−p*21) = *{*(π12+*π*21)*−*(π12*−π*21)*}*^{2}
*N*

**McNemar’s Large Sample Test**

If it is assumed that*π*_{12}=*π*_{21} under *H*_{0} then
*π*12=*π*21=*π* and*π* = ^{π}_{N}* ^{d}*, then

*V*(p12*−p*21*|H*0) = *π*12+*π*21

*N* = *π*_{d}

*N* *≃* *p*12+*p*21

*N*
and

*Z**M odif ied*= √ *p*12*−p*21

(p_{12}+*p*_{21})/N = *f−g*

*√f*+*g*

is asymptotically normally distributed under*H*_{0}.