• No results found

Support Vector Machines

N/A
N/A
Protected

Academic year: 2022

Share "Support Vector Machines"

Copied!
72
0
0

Loading.... (view fulltext now)

Full text

(1)

Support Vector Machines

Ganesh Ramakrishnan

Adjunct Professor, Department of Computer Science and Engineering, IIT Bombay

&

Research Staff Member, IBM India Research Labs

(2)

Outline

Introduction and Motivation Perceptron

Support Vector Machines Kernel Perceptron

Kernels

Least Squared Support Vector Machines Proximal Support Vector Machines

Support Vector Regression

(3)

Basics..

(4)

Notations

(5)

Perceptron

(6)

Perceptron Update Rule

(7)

What does “w” turn out to be?

(8)

Dual Representation

(9)

Dual Representation (contd)

(10)

Duality

(11)

Linear Classifiers

(12)

Statistical Learning Theory

Minimize an upper bound on the generalization error through maximizing the margin between two disjoint half planes

Minimize empirical risk

Find f such that the following expression is minimized:

Where:

y

i known decision of object i; xi its feature vector;

n

number of objects

( )

=

=

n

1 i

i

i

- (x )

y 2

n ] 1

[

R

emp

f f

(13)

Motivation SVM

Age

Income : No churn

: Churn W

(14)

Classifier Margin

(15)

Classifier Margin (contd)

(16)

Handling Noise through margins

(17)

Handling Noise through margins

(18)

Standard SVM formulation

(19)

SVM: Linearly separable case

n objects: x

i ∈ Rm, i=1,…,n yi: label of object i; yi ∈ {-1,1}

Assume ∃ hyperplane w.x+b=0 separating positive from negative objects.

1 if

1

1 if

1

=

≤ +

+

= +

≥ +

i i

i i

y b

w x

y b

w x

i b

w x

y

i

(

i

⋅ + ) − 1 ≥ 0 ∀

equivalently:

(20)

w.x+b=0

(0,0)

| from 1

| w

b

(0,0)

| from 1

|

w b

w 2

(21)

What if…

(22)

Error handling

(23)

Support Vector Machines

(For Classification)

Find function f such that:

Classification error is minimized (in training set)

Margin is maximized (generalization!)

Two objectives:

Minimize Error (train model)

Maximize Margin

(generalization)

(24)

SVM: Linearly not separable case

Slack variables qi:

1 if

1

1 if

1

= +

≤ +

+

=

− +

≥ +

i i

i

i i

i

y q

b w x

y q

b w x

) (

2

2

+ C q

i

w

Objective function:

(25)

Primal formulation

Classification Error 1/Margin

( )

0 0 q

1 b

: subject to

2 C Minimize 1

i i i

i

i 2

q

≥ +

+

+

w x

y

W q

W: Normal to hyperplane.

b : Position of hyperplane

(26)

QP Dual Formulation

=

=

=

n

1 i i

s i i

0

0

: to subject

2 - 1 Maximize

i i

i

s i s i

y α

,...,n

x x y y

α 1

α α α

KERNELS!!!

(27)

Classifier

Linear classifier:

Application to new objects

sign(f(x)) = +1 => class +1 sign(f(x)) = -1 => class -1

b x

y x α

x f

i

i

i

⋅ +

= +

= W b

)

(

(28)

Support Vectors

(29)

Error-Margin tradeoff

(30)

Error-Margin tradeoff

(31)

More Complex Pattern Sets…

(32)

Limitation of Linear Machies

(33)

Handling the limitations?

(34)

Handling the limitations…

(35)

Projecting onto higher dimension

(36)

Ex-OR Gate

(37)

Ex-OR Gate

(38)

Ex-OR Gate: Non-linear Mapping

Gives Linear Separability

(39)

Projecting into higher dimensions

(40)

Learning in Feature Space

(41)

Which mapping to choose?

(42)

High dimensions - challenging

(43)

Kernels: Mapping implicitly to

Feature Space

(44)

Use the Dual!

(45)

Kernels

(46)

Using Kernels

(47)

Kernel Matrix

(48)

Kernel Matrix

(49)

Mercer’s Theorem

(50)

Mercer’s Theorem

(51)

Kernel Examples

(52)

Also…

(53)

Polynomial Kernels

(54)

Polynomial Kernels

(55)

Efficiency

(56)

Gaussian Kernels

(57)

Gaussian Kernel

(58)

Gaussian Kernel

(59)

Closure Properties

(60)

Kernel Machines

x X

(⋅) Φ

) (

) (

x X

x X

Φ

= Φ

= i

i

) , (⋅ ⋅

K (x ) (x) (x ,x)

i

i ⋅Φ = K

Φ

) )

, ( (

sign y K b

y

S i

i i

i +

=

x

α

x

) )

( )

( (

sign y b

y

S i

i i

i Φ ⋅Φ +

=

x

α

x

) ( )

(x x

X

Xi ⋅ = Φ i ⋅Φ

Mercer’s condition

) (

sign y b

y

S i

i i

i ⋅ +

=

X

α

X

(61)

Training SVMs

(62)

Training SVMs

(63)

Avoiding Memory Problems

(64)

Does it really work?

(65)

Example

(66)

Decomposition Methods

(67)

Caching

(68)

Caching Issues

(69)

Shrinking

(70)

Least Squared Support Vector Machines (LSSVM)

The classical SVM classifier aims to minimize an upper bound on the generalization error through maximizing the margin between two disjoint half planes

This involves solving a quadratic programming problem that could be prohibitive on large data sets.

To overcome this problem, Suykens and Vandewalle proposed the “least square SVM” (LSSVM) formulation The formulation considers equality constraints and adds

an extra term to the cost function. As a result, the solution follows from directly solving a set of linear equations.

(71)

LSSVM

Given a set of

M

patterns

x

k , where

x

k =(…, )T, with

corresponding labels

x

k ∈ {-1, +1}, the LS-SVM determines a separating surface of the form

wTφ(x) + b = 0

by solving a problem of the form

where C is a parameter.

The first term on the R.H.S. of (9) is a for regularization, while the second term is the empirical error.

The constant

C

determines the relative importance of the two.

[

w

( )

P b

]

q i M

y

q C q

w w

i i

T i

T T

w b q

, 1,2,...

1, subject to

2 2

1 Minimize

, ,

=

= +

+ + φ

(72)

Acknowledgements

Some slides were borrowed from Prof.

Jayadeva, Department of Electrical

Engineering, IIT Delhi

References

Related documents

INDEPENDENT MONITORING BOARD | RECOMMENDED ACTION.. Rationale: Repeatedly, in field surveys, from front-line polio workers, and in meeting after meeting, it has become clear that

This thesis completely focuses on classification of movie reviews in either as positive or negative review using machine learning techniques like Support Vector Machine(SVM),

In this thesis for optimization of cooperative spectrum sensing, we have used conventional energy detection technique as the sensing method by considering the

The scan line algorithm which is based on the platform of calculating the coordinate of the line in the image and then finding the non background pixels in those lines and

The dataset may contain many irrelevant and redundant attributes(features) which do not contribute to the output. But their presence affects the performance of the system. It

The face feature generated using LDP was used to classify into male and female faces using classifier support vector machine.. The accuracy achieved by SVM on FERET database

A similar method was implemented and results of the simulation performed on the KDD Cup 99 dataset for intrusion detection show a reduction of upto 90% in the size of labeled

Among various classification techniques Support Vector Machine (SVM) and Na¨ıve Bayes Classifier (NBC) are widely used, because of their robustness and surprisingly good behavior