Subject: Statistics
Paper: Multivariate Analysis
Module: Introduction to Multivariate Analysis
Development Team
Principal investigator: Dr. Bhaswati Ganguli, Professor, Department of Statistics, University of Calcutta
Paper co-ordinator: Dr. Sugata SenRoy,Professor, Department of Statistics, University of Calcutta
Content writer: Souvik Bandyopadhyay, Senior Lecturer, Indian Institute of Public Health, Hyderabad
Content reviewer: Dr. Kalyan Das,Professor, Department of Statistics, University of Calcutta
2 / 18
What is Multivariate Analysis?
Multivariate analysis is the simultaneous study of several variables.
I Rather than considering each variable individually, we consider all the variables under study together.
I Formally, in multivariate analysis, we have a set of m related variables
X= (X1, X2, ., Xm)0 which are studied together.
Why Multivariate Analysis?
I A major advantage of multivariate analysis is that, unlike univariate studies, it takes into account the interdependences between the variables under study and hence is more
informative.
I However, this also brings in more complexities in the analyses.
I In spite of it, most often a multivariate analysis of them variables is preferred to mseparate univariate analyses.
4 / 18
Multivariate Analysis?
I Usually most of the univariate methods can be extended to multivariate methods, with more complexities but better results.
I However, there are several multivariate techniques which are peculiar to themselves. These are
I eithertoo trivialin univariate studies
I ordo not arisein univariate problems
I It is mainly this latter with which we will be concerned here.
Some Examples
Let us next look at a few examples of multivariate data in different fields :
I Social Science : Characteristics of an individual like (Gender, Age, Nationality)
I Climatology : Weather on a particular day like
(Minimum temperature, Maximum temperature, Rainfall, Humidity)
I Economics : Management of a firm like (Input costs, Production, Profit)
I Socio-Demographic : Profile of a country like (GDP, Life expectancy, Literacy rate)
6 / 18
Some Examples
There can be several examples from Health Sciences :
I General Health : Health profile of an individual like (Systolic BP, Diastolic BP, Pulse rate)
I Pathology : The pathological profile of a patient like (Blood sugar level, Uric acid concentration, Hemoglobin count)
I Administrative : Day-to-day administration in a hospital (Admissions, Operations, Discharges, Deaths)
I Pharmaceutical : Quantity of sales per day in a pharmacy (Drug A, Drug B, ., Drug Z)
Visualization
Unlike univariate, bivariate or trivariate data, multivariate data (m≥4)is impossible to plot. Even for trivariate data (as shown in the diagram), the plots are often difficult to understand.
8 / 18
Example
Problem: In this problem, each face has several features. It is impossible to quantify and then plot these features to analyse their characteristics.
Different aspects of Multivariate Analysis
A major aspect of multivariate analysis is to extend the univariate results to the multivariate set-up.
This is mostly done to study the inter-relationships between the variables under study, which are lost in univariate studies. Hence emphasis here is onjoint rather thanmarginal studies.
I This may mean the extension of the Binomial or Normal distributions to their multivariate counterparts.
I It can also mean the extensions of the inference techniques to multivariate data. This can relate both to the estimation and hypothesis testing problems.
Different aspects of Multivariate Analysis 10 / 18
Cause-Effect Relationships
I A major extension is in the study of cause-effect relationships.
The multiple regression model with one response, the ANOVA model and the ANCOVA models can all be extended to their multivariate counterparts where several responses are studied simultaneously.
I The question asked here iswhether a set of variables have an effect on another set of variables, and if so, how ?
I As the model equations are linear in the parameters, the models are broadly referred to as Multivariate Linear Models.
I Special cases of this, depending on the nature of the covariates, areMultivariate Regression,MANOVAand
Different aspects of Multivariate Analysis
But there are aspects of multivariate analysis which are unique to itself. Two such broad aspects are
I Classification of Individuals
I Dimension Reduction
The problem ofClassification of Individuals is too trivial for univariate data
while theDimension Reductionproblem does not arise for single variable studies.
Aspects of Multivariate Analysis 12 / 18
Classification of Individuals
At the exploratory stage in the classification problem, the question often asked is
Can a group of individuals or units be separated into smaller subgroups according to similarity or dissimilarity ?
I The answer is often sought to segregate similar units into homogeneous groups and allow the heterogeneity to be treated as between group variability.
I The greater this between group variability, the more distinctly defined are the groups.
I The technique to do this is known as Cluster Analysis.
Classification of Individuals
Following the formation of such clusters, it is necessary to characterize these clusters. Thus the next natural question is
Why and how are these clusters different ?
I Such answers are generally provided by Discriminant Analysis, which helps to distinguish between the characteristics of the clusters.
I Classification techniques are then used to assign new individuals into one of these well-defined clusters.
Aspects of Multivariate Analysis 14 / 18
Dimension Reduction
I Very often, a data may contain a large number of variables.
I This may make both the analysis complicated and the interpretations difficult.
I A question thus asked is
Whether it is necessary to look at all the variables,
Or is it possible to capture the same information through a smaller set of variables ?
I The solutions to this problem are obtained throughDimension Reduction methods.
Dimension Reduction
There are several Dimension Reduction techniques. Four of the important ones are
I Principal Component Analysis
I Factor Analysis
I Canonical Correlation methods
I Multidimensional Scaling
Aspects of Multivariate Analysis 16 / 18
Summary
In summary, in this course we will be restricting ourselves to
I The extensions of univariate inferential techniques to multivariate data
I The study of Multivariate Linear Models
I The study of Dimension reduction techniques
I The study of different Classification techniques
Summary
In summary, in this course we will be restricting ourselves to
I The extensions of univariate inferential techniques to multivariate data
I The study of Multivariate Linear Models
I The study of Dimension reduction techniques
I The study of different Classification techniques
Aspects of Multivariate Analysis 17 / 18
Summary
In summary, in this course we will be restricting ourselves to
I The extensions of univariate inferential techniques to multivariate data
I The study of Multivariate Linear Models
I The study of Dimension reduction techniques
I The study of different Classification techniques
Summary
In summary, in this course we will be restricting ourselves to
I The extensions of univariate inferential techniques to multivariate data
I The study of Multivariate Linear Models
I The study of Dimension reduction techniques
I The study of different Classification techniques
Aspects of Multivariate Analysis 17 / 18