• No results found

Choosing the working set in SVM

N/A
N/A
Protected

Academic year: 2022

Share "Choosing the working set in SVM"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

SVM and SMO

Instructor: Prof. Ganesh Ramakrishnan

November 2, 2015 1 / 28

(2)

Joachims’ SVM light

(3)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Choosing the working set in SVM

light

Let

f(α) = 1

2αQα−eα SVMlight chooses working setB by solving for:

∆α=min

d f(αk)d

whered is the descent direction and∆α=αk+1−αk s.t.

|{di:di̸= 0}| ≤q

Intuitively, ifqnon-zerodi’s are possible, theywillbe picked up since such a set will reduce the objective further as compared to a smaller set

yd= 0

k)y= 0, andk+1)y= 0 = k+d)y= 0 Thus,yd= 0

di [1,1]

di 0, for (αk)i= 0

di 0, for (αk)i=C

November 2, 2015 3 / 28

(4)

Solving for d in SVM

light

The intuition is that:

The descent directionsdi’s for the most violating (αk)i’s correspond to the(∇f(αk))i’s that are farthest from 0, taking care that we also wantyd= 0, ie. ∑

yidi = 0, for all i’s chosen as per above

(s.t. |{di :di̸= 0}| ≤q)

(5)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Solving for d in SVM

light

1 Sortyi(∇f(αk))i in decreasing order

2 Symmetrically do:

From the top, sequentially setdi =−yi From the bottom, sequentially setdi=yi

Until either

q

2 ‘di=−yi’s have been selected from the top, and q2

‘di=yi’s have been selected from the bottom

we cannot finddi=−yi from the top anddi=yi from the bottom at the same time

At any point,

if(αk)i = 0 anddi =−1, set di= 0 and bypass it, and if(αk)i =C anddi= 1, set di= 0 and bypass it

The goal is to achieve a balancing between the two signs from the opposite ends, ie. ∑

yidi= 0

3 di’s not yet considered are assigned0

November 2, 2015 5 / 28

(6)

If q2 ‘di =−yi’s from the top and q2 ‘di =yi’s from the bottom could not be selected (or ifq is large enough), the algorithm will stop atit from the top and ib from the bottom

One of the following will happen:

it is just before ib

There is one positioni between it and ib with 0<k)i <C

(7)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

When the algorithm stops, d is an optimal solution for

∆α=min

d f(αk)d s.t.

|{di:di̸= 0}| ≤q yd= 0

di[1,1]

di0, fork)i= 0 di0, fork)i=C

When the algorithm stops at it, if the next index in the sorted list of yi(∇f(αk))i is ¯it, there are three possible situations:

k)¯it (0,C)

k)¯it = 0 and y¯it =1 (αk)¯it =C and y¯it = 1

If the last two do not hold, we can move down further by assigningd¯it = 0

November 2, 2015 7 / 28

(8)

Decomposition in Joachims’

SVM light

(continued)

(9)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Choice of the working set size q

In the decomposition algorithm, a working set sizeq≤l must be chosen

There is a tradeoff between qand the number of iterations needed for the algorithm to converge

The higher the working set sizeq, the lower will be the number of iterations needed

However, with a largerq, individual iterations become extremely expensive

November 2, 2015 9 / 28

(10)

Correctness of the algorithm

(11)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Verify that the algorithm actually minimizes the objective1 When an iteration of the algorithm stops, d is an optimal solution for

∆α=min

d f(αk)d s.t.

|{di:di̸= 0}| ≤q

yd= 0

di [1,1]

di 0, for (αk)i= 0

di 0, for (αk)i=C

1Full proof athttp://www.csie.ntu.edu.tw/~cjlin/papers/conv.pdf

Chih-Jen Lin.On the Convergence of the Decomposition Method for Support Vector Machines

November 2, 2015 11 / 28

(12)

When an iteration of the algorithm stops, the following KKT conditions are satisfied, showing that dis an optimal solution:

∇f(αk) =−by+λi−ξi yd= 0

λi(di+ 1) = 0, if 0< αki ≤C λidi= 0, αki = 0

ξi(1−di) = 0, if 0≤αki <C ξidi = 0, if αki =C

λi 0, ξi 0, ∀i= 1, . . . ,l

(13)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Assume thatB is the working set at the kth iteration, and N= 1, . . . ,l\B

If we define s=αk+1−αk, thensN = 0 and

f(αk+1)−f(αk)

= 12sQs+sk−es

= 12sBQBBsB+sB(Qαk)B−eBsB

November 2, 2015 13 / 28

(14)

Thus, in thekth iteration, we solve the following problem with the variable sB:

minsB

1

2sBQBBsB +sB(Qαk)B −eBsB s.t.

0k+s)i≤C, i∈B yBsB = 0

This is written purely in terms of the basisB components, ignoring the function of sN in the objective which does not depend on sB

(15)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Using the KKT conditions that the optimal solution sB must satisfy, we show a sufficient decrease of f(α) over the iterations:

f(αk+1)≤f(αk) σ2αk+1−αk2

where, σ=minI(min(eig(QII)))

At every step, the function decreases by an amount that does not become insignificant

November 2, 2015 15 / 28

(16)

Convergence of SMO

(17)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

SVM Dual objective:

minα

1

2αQα−eα s.t.

0≤αi ≤C,∀i

yα= 0 where:

Qis positive-semidefinite, and Qij =yiyjϕ(xi)ϕ(xj)

e=



 1... 1



Rn

November 2, 2015 17 / 28

(18)

We split the constraint 0≤αi≤C,∀i into:

−αi0, with the Lagrange multiplierθi

αi ≤C, with the Lagrange multiplier Γi

and, consider yα= 0, with the Lagrange multiplier β Thus, we can write the Lagrangian as:

L(α, θ,Γ, β) = 12αQα−eα−θα+ Γ−C) +βyα s.t. ∀i,

θi 0

Γi0

θiαi= 0

Γii−C) = 0 Taking L= 0, we get:

(19)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

If αi = 0, αi−C̸= 0 and thus Γi = 0

(Qα)i1−θi+βyi= 0

= θi = (Qα)i1 +βyi

Asθi0,

(Qα)i1 +βyi0 If αi =C, θi = 0

(Qα)i1 + Γi+βyi= 0

=⇒ −Γi= (Qα)i1 +βyi

AsΓi0,

(Qα)i1 +βyi0 If αi (0,C), θi = 0 and Γi = 0

Thus,

(Qα)i1 +βyi= 0

November 2, 2015 19 / 28

(20)

Let us define the following sets of indices i I0(α) ={i: 0< αi<C}

I1(α) ={i:yi= +1, αi= 0} I2(α) ={i:yi=1, αi=C} I3(α) ={i:yi= +1, αi=C} I4(α) ={i:yi=−1, αi= 0}

Let us now consider

1 I0(α)∪I1(α)∪I2(α)

={i:yi = +1, αi <C} ∪ {i:yi=1, αi >0}

(

(Qα)i1)

yi ≥ −β

2 I0(α)∪I3(α)∪I4(α)

={i(:yi = +1, α) i >0} ∪ {i:yi =1, αi <C}

(21)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Thus, we have:

iImin0I1I2

((Qα)1)

yi max

iI0I3I4

((Qα)1) yi

We get:

min (

yi=+1,αmini>0(

∇f(α))

i, min

yi=1,αi<C

(∇f(α))

i

)

max

(

yi=+1,αmaxi<C(

∇f(α))

i, max

yi=1,αi>0

(∇f(α))

i

)

Let the min be attained at index l, and max be attained at index j.

If for (l,j), the inequality is violated, the KKT conditions are violated.

November 2, 2015 21 / 28

(22)

We need to prove that for all such choices of l and jacross iterations,

∀k,

f(αk+1)≤f(αk)−σ 2

αk+1−αk2 s.t. σ >0, and αk+1 ̸=αk

Once we findl and j, we will find closed form solutions for αk+1l =g(αlk, αkj, αkN)

αk+1j = ¯g(αlk, αkj, αkN)

(which have been discussed before)

(23)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Whatever be the values ofαk+1l and αk+1j , we will have:

ylαk+1l +yjαk+1j =−yNαkN (constant)

Thus, we can say that ifαl changes linearly, then αj also changes linearly

We can replace αk+1l andαjk+1 as:

αl(t)←αk+1l αj(t)←αk+1j

αl and αj vary linearly with t

αl(t)≡αkl +ty=αkl + yt

l

αj(t)≡αkj +ty=αkj +ytj

November 2, 2015 23 / 28

(24)

Letf(α) =ψ(t)

ψis a function of αN,αl(t), andαj(t)

We need to analyze w.r.t. ¯t that minimizes ψ(t) subject to constraints

αiyi= 0

αi [0,C]

That would give

αk =

 αNk αkj αkl

, andαk+1 =

 αkN αkj +y¯t

j

αkl +y¯t

l



αk+1−αk =

 0

¯t yj

¯t yl



(25)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Now, ψ(t) is a quadratic function on t Thus,ψ(t) = ψ(0) +ψ(0) +ψ′′(0)t22 ψ(t) = ∑m

i=1

(∇f(

α(t)))

i i(t)

dt

=yl

(∇f(

α(t)))

l−yj

(∇f(

α(t)))

j

=yl

(∑m

i=1Qliαi(t)1)

−yj

(∑m

i=1Qjiαi(t)1)

ψ(0) =yl(

∇f(αk) )

l−yj(

∇f(αk) )

j

ψ′′(t) =Qll+Qjj2ylyjQlj

=ϕ(xl)ϕ(xl) +ϕ(xj)ϕ(xj)(xl)ϕ(xj)

ψ′′(0) =ϕ(xl)−ϕ(xj)2

November 2, 2015 25 / 28

(26)

¯t minimizes ψ(t) s.t. ∑

αiyi = 0 and αi [0,C], ∀i

|¯t|= 1 2

αk+1−αk

Supposet minimizes ψ(t)without constraints

Solving forψ(t) = 0, we get: t =ψψ′′(0)(0)

ψ(¯t)≥ψ(t)

We can say that¯t=γt, where γ [0,1]

(you could have gone tillt but had to halt at¯t due to constraints)

(27)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

ψ(¯t) =ψ(γt) =ψ(−γψψ′′(0)(0))

=ψ(0) +ψ(0)

(−γψψ′′(0)(0) )

+ ψ′′2(0)

(−γψψ′′(0)(0) )2

=ψ(0)−γ(ψ(0))2

ψ′′(0) + γ22(ψ(0))2

ψ′′(0)

Since γ [0,1], γ2 ≤γ, and

γ2

2 −γ ≤ −γ22

Thus,ψ(¯t)≤ψ(0)−γ22(ψ(0))2

ψ′′(0)

= ψ(¯t)−ψ(0) ≤ −γ22(ψ(0))2

ψ′′(0)

= ψ(¯t)−ψ(0) ≤ −ψ′′4(0)αk+1−αk2 This becomes:

f(αk+1)−f(αk)≤ −σ2αk+1−αk2

where, σ= ψ′′2(0) = 12ϕ(xl)−ϕ(xj)2

σ >0except when feature vector ϕ(xl) is the same asϕ(xj)

November 2, 2015 27 / 28

(28)

We assume Qto be positive-semidefinite so that ψ′′(0)0 But in the analysis of general decomposition, we assumed QII to be positive-semidefinite for any submatrix of Q, which is a stronger assumption

References

Related documents

The idea of convex combination can be generalized to include infinite sums, integrals, and, in most general form, probability distributions.. Original set was not convex, whereas

● Generalization Property : Let T be a relation, and P and Q be sets of  attributes in T such that D P  &lt; D Q. 

{p ∨ q, ¬¬q, r } ` ¬¬q Monotonic applied to 1 We need to point at an earlier statement. Since assumption does not depend on any other statement, no need

Report on the Internal Financial Controls under Clause (i) of Sub-section 3 of Section 143 of the C ompanies Act, 2013 (“the Act”) We have audited the internal fi nancial controls

When a query q arrives, allot it to the interested bidder who maximizes bid×ψ(i ) , where i is his unspent fraction of the budget How do we choose ψ

INDEPENDENT MONITORING BOARD | RECOMMENDED ACTION.. Rationale: Repeatedly, in field surveys, from front-line polio workers, and in meeting after meeting, it has become clear that

We assume Q to be positive-semidefinite so that ψ ′′ (0) ≥ 0 But in the analysis of general decomposition, we assumed Q II to be positive-semidefinite for any submatrix of Q, which

To simulate a zero test transition of the form (q, c 1 == 0, q 0 ) (resp. a) and creates a task of the same type but its deadline is now set to one time unit, (iii) it will change