• No results found

Paper: Multivariate Analysis

N/A
N/A
Protected

Academic year: 2022

Share "Paper: Multivariate Analysis"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

Subject: Statistics

Paper: Multivariate Analysis

Module: Introduction to Multivariate Analysis

(2)

Development Team

Principal investigator: Dr. Bhaswati Ganguli, Professor, Department of Statistics, University of Calcutta

Paper co-ordinator: Dr. Sugata SenRoy,Professor, Department of Statistics, University of Calcutta

Content writer: Souvik Bandyopadhyay, Senior Lecturer, Indian Institute of Public Health, Hyderabad

Content reviewer: Dr. Kalyan Das,Professor, Department of Statistics, University of Calcutta

2 / 18

(3)

What is Multivariate Analysis?

Multivariate analysis is the simultaneous study of several variables.

I Rather than considering each variable individually, we consider all the variables under study together.

I Formally, in multivariate analysis, we have a set of m related variables

X= (X1, X2, ., Xm)0 which are studied together.

(4)

Why Multivariate Analysis?

I A major advantage of multivariate analysis is that, unlike univariate studies, it takes into account the interdependences between the variables under study and hence is more

informative.

I However, this also brings in more complexities in the analyses.

I In spite of it, most often a multivariate analysis of them variables is preferred to mseparate univariate analyses.

4 / 18

(5)

Multivariate Analysis?

I Usually most of the univariate methods can be extended to multivariate methods, with more complexities but better results.

I However, there are several multivariate techniques which are peculiar to themselves. These are

I eithertoo trivialin univariate studies

I ordo not arisein univariate problems

I It is mainly this latter with which we will be concerned here.

(6)

Some Examples

Let us next look at a few examples of multivariate data in different fields :

I Social Science : Characteristics of an individual like (Gender, Age, Nationality)

I Climatology : Weather on a particular day like

(Minimum temperature, Maximum temperature, Rainfall, Humidity)

I Economics : Management of a firm like (Input costs, Production, Profit)

I Socio-Demographic : Profile of a country like (GDP, Life expectancy, Literacy rate)

6 / 18

(7)

Some Examples

There can be several examples from Health Sciences :

I General Health : Health profile of an individual like (Systolic BP, Diastolic BP, Pulse rate)

I Pathology : The pathological profile of a patient like (Blood sugar level, Uric acid concentration, Hemoglobin count)

I Administrative : Day-to-day administration in a hospital (Admissions, Operations, Discharges, Deaths)

I Pharmaceutical : Quantity of sales per day in a pharmacy (Drug A, Drug B, ., Drug Z)

(8)

Visualization

Unlike univariate, bivariate or trivariate data, multivariate data (m≥4)is impossible to plot. Even for trivariate data (as shown in the diagram), the plots are often difficult to understand.

8 / 18

(9)

Example

Problem: In this problem, each face has several features. It is impossible to quantify and then plot these features to analyse their characteristics.

(10)

Different aspects of Multivariate Analysis

A major aspect of multivariate analysis is to extend the univariate results to the multivariate set-up.

This is mostly done to study the inter-relationships between the variables under study, which are lost in univariate studies. Hence emphasis here is onjoint rather thanmarginal studies.

I This may mean the extension of the Binomial or Normal distributions to their multivariate counterparts.

I It can also mean the extensions of the inference techniques to multivariate data. This can relate both to the estimation and hypothesis testing problems.

Different aspects of Multivariate Analysis 10 / 18

(11)

Cause-Effect Relationships

I A major extension is in the study of cause-effect relationships.

The multiple regression model with one response, the ANOVA model and the ANCOVA models can all be extended to their multivariate counterparts where several responses are studied simultaneously.

I The question asked here iswhether a set of variables have an effect on another set of variables, and if so, how ?

I As the model equations are linear in the parameters, the models are broadly referred to as Multivariate Linear Models.

I Special cases of this, depending on the nature of the covariates, areMultivariate Regression,MANOVAand

(12)

Different aspects of Multivariate Analysis

But there are aspects of multivariate analysis which are unique to itself. Two such broad aspects are

I Classification of Individuals

I Dimension Reduction

The problem ofClassification of Individuals is too trivial for univariate data

while theDimension Reductionproblem does not arise for single variable studies.

Aspects of Multivariate Analysis 12 / 18

(13)

Classification of Individuals

At the exploratory stage in the classification problem, the question often asked is

Can a group of individuals or units be separated into smaller subgroups according to similarity or dissimilarity ?

I The answer is often sought to segregate similar units into homogeneous groups and allow the heterogeneity to be treated as between group variability.

I The greater this between group variability, the more distinctly defined are the groups.

I The technique to do this is known as Cluster Analysis.

(14)

Classification of Individuals

Following the formation of such clusters, it is necessary to characterize these clusters. Thus the next natural question is

Why and how are these clusters different ?

I Such answers are generally provided by Discriminant Analysis, which helps to distinguish between the characteristics of the clusters.

I Classification techniques are then used to assign new individuals into one of these well-defined clusters.

Aspects of Multivariate Analysis 14 / 18

(15)

Dimension Reduction

I Very often, a data may contain a large number of variables.

I This may make both the analysis complicated and the interpretations difficult.

I A question thus asked is

Whether it is necessary to look at all the variables,

Or is it possible to capture the same information through a smaller set of variables ?

I The solutions to this problem are obtained throughDimension Reduction methods.

(16)

Dimension Reduction

There are several Dimension Reduction techniques. Four of the important ones are

I Principal Component Analysis

I Factor Analysis

I Canonical Correlation methods

I Multidimensional Scaling

Aspects of Multivariate Analysis 16 / 18

(17)

Summary

In summary, in this course we will be restricting ourselves to

I The extensions of univariate inferential techniques to multivariate data

I The study of Multivariate Linear Models

I The study of Dimension reduction techniques

I The study of different Classification techniques

(18)

Summary

In summary, in this course we will be restricting ourselves to

I The extensions of univariate inferential techniques to multivariate data

I The study of Multivariate Linear Models

I The study of Dimension reduction techniques

I The study of different Classification techniques

Aspects of Multivariate Analysis 17 / 18

(19)

Summary

In summary, in this course we will be restricting ourselves to

I The extensions of univariate inferential techniques to multivariate data

I The study of Multivariate Linear Models

I The study of Dimension reduction techniques

I The study of different Classification techniques

(20)

Summary

In summary, in this course we will be restricting ourselves to

I The extensions of univariate inferential techniques to multivariate data

I The study of Multivariate Linear Models

I The study of Dimension reduction techniques

I The study of different Classification techniques

Aspects of Multivariate Analysis 17 / 18

(21)

Thank You

References

Related documents

(Environmental variables should represent measurements of natural resources and reflect potential influences to its viability. It could incorporate air and water quality,

The famous classical multivariate methods like Cluster analysis, Principal Component Analysis, Principal Co-ordinates analysis and Multidimensional Scaling are best utilized for

This report provides some important advances in our understanding of how the concept of planetary boundaries can be operationalised in Europe by (1) demonstrating how European

Unlike the single species models in fisheries management, an ecosystem approach is an effective tool since it takes into account the complexity of the marine and

In the analysiS of multivariate data collected through truss network measurements the concept is that size and shape are the two factors, which account for the

However, multivariate statistic al analysis indicates two major trends in the codon usage variation among the genes; one being strongly correlated with the

The data for eight continuous ablation seasons (2000 - 2007) was used investigating correlations, lag cross correlations and multivariate regression analysis between daily

Based on data obtained from field work, artificial neural network analysis and multivariate regression analysis, following conclusions are made. 2) From ANN analysis,