• No results found

Multi-modal Learning Tasks

N/A
N/A
Protected

Academic year: 2022

Share "Multi-modal Learning Tasks"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Kernel Learning for Multi-modal Recognition Tasks

J. Saketha Nath

CSE, IIT-B

IBM Workshop

(2)

Multi-modal Learning Tasks

Multiple views or descriptions of the data is available E.g. Object Categorization

(3)

Object Categorization

Figure: Daffodils and Dandelions can be distinguished using shape features.

(4)

Object Categorization — Features

Typical Features:

HSV Color features.

SIFT Local texture and shape features.

HOG Global shape features.

(5)

Multi-modal Learning Tasks

Multiple views or descriptions of the data is available

E.g. Object Categorization, Multi-modal Speech/Speaker/Activity recognition

,Text Mining ?

Build classifier using all features (say SVM)

Can we do better?

E.g. Nilsback and Zisserman (CVPR06) found that: All the 3 kinds of features are critical

Not all features in each kind may be important Exploit natural grouping of features.

(6)

Multi-modal Learning Tasks

Multiple views or descriptions of the data is available

E.g. Object Categorization, Multi-modal Speech/Speaker/Activity recognition,Text Mining ?

Build classifier using all features (say SVM)

Can we do better?

E.g. Nilsback and Zisserman (CVPR06) found that: All the 3 kinds of features are critical

Not all features in each kind may be important Exploit natural grouping of features.

(7)

Multi-modal Learning Tasks

Multiple views or descriptions of the data is available

E.g. Object Categorization, Multi-modal Speech/Speaker/Activity recognition,Text Mining ?

Build classifier using all features (say SVM)

Can we do better?

E.g. Nilsback and Zisserman (CVPR06) found that: All the 3 kinds of features are critical

Not all features in each kind may be important Exploit natural grouping of features.

(8)

Multi-modal Learning Tasks

Multiple views or descriptions of the data is available

E.g. Object Categorization, Multi-modal Speech/Speaker/Activity recognition,Text Mining ?

Build classifier using all features (say SVM)

Can we do better?

E.g. Nilsback and Zisserman (CVPR06) found that:

All the 3 kinds of features are critical

Not all features in each kind may be important Exploit natural grouping of features.

(9)

Scope and Objective

Scope:

Binary classification task, Kernel methods like SVM Simultaneous feature selection and classifier construction

Multiple Kernel Learning[Lanckriet et.al., ’02]

Objective:

Customized for multi-modal tasks

Exploit the group structure in features (prior info)

(10)

Problem Setting

Given:

D={(xi, yi)|xi∈ X, yi ∈ {−1,1}, i= 1, . . . , m}

Feature maps Φj, j = 1, . . . , n (nis modality of data) E.g. Φ1(xi)— feature vector describing color of flowerxi

Using each Φj generate various kernels (linear, polynomial, Gaussian)

(11)

Problem Setting

Given:

Kernels Kjk, j= 1, . . . , n, k= 1, . . . , nj

n— modality of data

nj — no. kernels generated fromjthmode

MKL Task:

Simultaneously determine weights given to Kernels (features) and the classifier

Utilize prior information regarding the Kernels

(12)

Problem Setting

Given:

Kernels Kjk, j= 1, . . . , n, k= 1, . . . , nj

n— modality of data

nj — no. kernels generated fromjthmode

MKL Task:

Simultaneously determine weights given to Kernels (features) and the classifier

Utilize prior information regarding the Kernels

(13)

Existing Methods

SVM

maxα∈S 1>α−1

>YKYα K=Pn

j=1

Pnj

k=1Kjk (concatenation of all features)

MKL

min

λjk≥0,P

j,kλjk=1max

α∈S 1>α−1

>YKYα K=Pn

j=1

Pnj

k=1λjkKjk (convex combination of Kernels) Equivalent to selecting single bestkernel!

(14)

Existing Methods

SVM

maxα∈S 1>α−1

>YKYα K=Pn

j=1

Pnj

k=1Kjk (concatenation of all features)

MKL

min

λjk≥0,P

j,kλjk=1max

α∈S 1>α−1

>YKYα K=Pn

j=1

Pnj

k=1λjkKjk (convex combination of Kernels)

(15)

Proposed Methodology

Φjk(·) implicit map defined by Kjk

f(x) =Pn j=1

Pnj

k=1w>jkΦjk(x)−b

Convex Formulation:

wminjk,b,ξi

1 2

h

maxj Pnj

k=1kwjkk22i

+CP

iξi

s.t. yi(Pn j=1

Pnj

k=1wjk>Φjk(xi)−b)≥1−ξi, ξi ≥0

(16)

Proposed Methodology

Φjk(·) implicit map defined by Kjk

f(x) =Pn j=1

Pnj

k=1w>jkΦjk(x)−b SVM Formulation:

wminjk,b,ξi

1 2

Pn j=1

Pnj

k=1kwjkk2+CP

iξi

s.t. yi(Pn j=1

Pnj

k=1wjk>Φjk(xi)−b)≥1−ξi, ξi ≥0

(17)

Proposed Methodology

Φjk(·) implicit map defined by Kjk

f(x) =Pn j=1

Pnj

k=1w>jkΦjk(x)−b Convex Formulation:

wminjk,b,ξi

1 2

h

maxj Pnj

k=1kwjkk22i

+CP

iξi

s.t. yi(Pn j=1

Pnj

k=1wjk>Φjk(xi)−b)≥1−ξi, ξi ≥0

(18)

Dual

Formulation:

λj∈∆minnj∀j max

γ∈∆n, α∈S 1>α−1 2α>Y

n

X

j=1

Pnj

k=1λjkKjk γj

!

Yα (1)

Comments:

Equivalent to SVM formulation with K≡Pn j=1

Pnj

k=1λjkKjk

γj

.

1

γj weight for jth mode andλjk weight for kth kernel injth mode. γj6= 0, j= 1, . . . , nprovided Kjk are positive definite.

λjk is highly sparse for eachj. n= 1 gives back MKL!

(19)

Dual

Formulation:

λj∈∆minnj∀j max

γ∈∆n, α∈S 1>α−1 2α>Y

n

X

j=1

Pnj

k=1λjkKjk γj

!

Yα (1)

Comments:

Equivalent to SVM formulation with K≡Pn j=1

Pnj

k=1λjkKjk

γj

.

1

γj weight for jth mode andλjk weight for kth kernel injth mode.

γ 6= 0, j= 1, . . . , nprovided K are positive definite.

(20)

Efficient Solver

Pose as SOCP, solve using SeDuMi, Mosek

Extensions of iterative algorithms in MKL literature suffer from non-convexity problems

Mirror Descent based alg.:

Iterative alg. solving an SVM at each step.

Far more scalable than state-of-the-art MKL solvers (n= 1)

(21)

Efficient Solver

Pose as SOCP, solve using SeDuMi, Mosek

Extensions of iterative algorithms in MKL literature suffer from non-convexity problems

Mirror Descent based alg.:

Iterative alg. solving an SVM at each step.

Far more scalable than state-of-the-art MKL solvers (n= 1)

(22)

Results — Object Categorization

0 20 40 60 80 100

−100 0 100 200 300 400 500 600 700 800

Object Categories

Average gain wrt. L1−MKL (%)

0 20 40 60 80 100

−100 0 100 200 300 400 500 600

Object Categories

Average gain wrt. L2−MKL (%)

Figure: Plot of average gain (%) in accuracy withMixNorm-MKL on Caltech-101.

(23)

Results — Scaling

1.5 2 2.5 3 3.5 4

0 100 200 300 400 500 600

log10(Number of Kernels)

Training Time (seconds)

MixNorm−MKL L1−MKL

1.5 2 2.5 3 3.5 4

0 100 200 300 400 500 600 700 800 900

log10(Number of Kernels)

Training Time (seconds)

MixNorm−MKL L1−MKL

(24)

THANK YOU

References

Related documents

In this perspective, a multi-modal thresholding is adopted for automatic segmentation of the images thus obtained and a graph theoretic approach based on connected

An increase in modal (Md) content in modal/cotton (Md/C) blends results in a drop in the tensile strength of blended fabrics.. This effect of Md is not so pronounced in

Smita Pradhan.. imaging techniques that capture various aspects of the patients anatomy and metabolism. These are accomplished with image registration: the task of transforming

second crack depth and crack position are derived from theoretical, finite element analysis and experimental test corresponding to relative 1 st , 2 nd & 3 rd

a) Dynamic optimization: It refers to the minimization of time-varying objective func- tions (it should not be confused with dynamic programming). The goal in this case is to track

Speech Recognition; Soft Thresholding; Discrete Wavelet Transforms; Wavelet Packet Decomposition; Naive Bayes Classifier..

Since the ICDAR dataset contains only horizontally aligned text, we employ Delaunay triangulation to link adjacent CCs together and obtain those CCs that lie in a straight line

The current analysis aims at the development of a single crack and multi crack identification for intelligent condition monitoring of structures using the change in modal parameters