• No results found

CS621 : Artificial Intelligence

N/A
N/A
Protected

Academic year: 2023

Share "CS621 : Artificial Intelligence"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

IIT Bombay 1

Prof. Pushpak Bhattacharyya, IIT Bombay. 1

CS621 : Artificial Intelligence

Pushpak Bhattacharyya CSE Dept.,

IIT Bombay Lecture 29

PAC Learning and VC Dimension

(2)

C h

U Universe

C h = Error region

P(C h ) <= Є+

+

accuracy parameter Prob. distribution

(3)

Prof. Pushpak Bhattacharyya, IIT Bombay. 3

P(X) = Prob that x is generated by the teacher – the “oracle” and is labeled

<x, +> : Positive example.

<x, -> : Negative example.

(4)

Learning Means the following Should happen:

Pr(P(c h) <= Є) >= 1- δ

PAC model of learning correct.

+

Probably Approximately Correct

(5)

IIT Bombay 5

Prof. Pushpak Bhattacharyya, IIT Bombay. 5

Example

A B

D C

+ + + ---

--- - -

- Axis parallel

Universe

2- Dimensional Plane Inductive

Bias

(6)

Key insights from 40 years of machine Learning Research:

1) What is it that is being learnt , and how the hypothesis should be produced ? This is a

“MUST”. This is called Inductive Bias .

2)“Learning in the Vacuum” is not possible. A learner already has crucial given pieces of knowledge at its disposal.

(7)

IIT Bombay 7

Prof. Pushpak Bhattacharyya, IIT Bombay. 7

A B

D C

+

+ +

--- - -

- ---

- -

-

x y

(8)

Algo:

1. Ignore –ve example.

2. Find the closest fitting axis parallel rectangle for the data.

(9)

Prof. Pushpak Bhattacharyya, IIT Bombay. 9

Case 1: If P([]ABCD) < Є

than the Algo is PAC.

A B

D C

+ + +

--- - -

- ---

- -

-

x y

c

h

Pr(P(c h) <= Є ) >= 1- δ C h

+

+

(10)

Case 2:

A B

D C

--- - -

- ---

- -

-

x y

p([]ABCD) > Є

Bottom

Right Left

Top

P(Top) = P(Bottom) = P(Right) = P(Left) = Є /4 Case 2

(11)

IIT Bombay 11

Prof. Pushpak Bhattacharyya, IIT Bombay. 11

Let # of examples = m.

•Probability that a point comes from top = Є/4

•Probability that none of the m example come from top = (1- Є/4)m

(12)

Probability that none of m examples come from one of top/bottom/left/right = 4(1 - Є/4)m

Probability that at least one example will come from the 4 regions = 1- 4(1 - Є/4)m

(13)

Prof. Pushpak Bhattacharyya, IIT Bombay. 13

This fact must have probability greater than or equal to 1- δ

1-4 (1 - Є/4 )m >1- δ or 4(1 - Є/4 )m < δ

(14)

A B

D C

+ + +

x y

+

(15)

Prof. Pushpak Bhattacharyya, IIT Bombay. 15

Lets say we want 10% error with 90% confidence M > ((4/0.1) ln (4/0.1))

Which is nearly equal to 200

(16)

(1 - Є/4)m < e(-Єm/4) We must have

4 e(-Єm/4) < δ

Or m > (4/Є) ln(4/δ)

(17)

Prof. Pushpak Bhattacharyya, IIT Bombay. 17

Criticism against PAC learning

1. The model produces too many –ve results.

2. The Constrain of arbitrary probability distribution is too restrictive.

(18)

In spite of –ve results, so much

learning takes place around us.

(19)

Prof. Pushpak Bhattacharyya, IIT Bombay. 19

VC-dimension

Gives a necessary and sufficient condition for PAC learnability.

(20)

Def:-

Let C be a concept class, i.e., it has members c1,c2,c3,…… as concepts in it.

C1

C2

C3 C

(21)

Prof. Pushpak Bhattacharyya, IIT Bombay. 21

Let S be a subset of U (universe).

Now if all the subsets of S can be

produced by intersecting with Cis, then we say C shatters S.

(22)

The highest cardinality set S that can be shattered gives the VC-dimension of C.

VC-dim(C)= |S|

VC-dim: Vapnik-Cherronenkis dimension.

(23)

IIT Bombay 23

Prof. Pushpak Bhattacharyya, IIT Bombay. 23

2 – Dim surface C = { half planes}

x y

(24)

a

|s| = 1 can be shattered

S1= { a } {a}, Ø y

x

(25)

IIT Bombay 25

Prof. Pushpak Bhattacharyya, IIT Bombay. 25

a

|s| = 2 can be shattered

b S2= { a,b } {a,b},

{a}, {b},

Ø x y

(26)

a

|s| = 3 can be shattered b

c

x

y S3= { a,b,c

}

(27)

IIT Bombay 27

Prof. Pushpak Bhattacharyya, IIT Bombay. 27

(28)

A B

D C

x y

|s| = 4 cannot be shattered

S4= { a,b,c,d }

(29)

IIT Bombay 29

Prof. Pushpak Bhattacharyya, IIT Bombay. 29

Fundamental Theorem of PAC learning (Ehrenfeuct et. al, 1989)

• A Concept Class C is learnable for all

probability distributions and all concepts in C if and only if the VC dimension of C is

finite

• If the VC dimension of C is d, then…(next

page)

(30)

Fundamental theorem (contd)

(a) for 0<ε<1 and the sample size at least

max[(4/ε)log(2/δ), (8d/ε)log(13/ε)]

any consistent function A:S

c

C is a learning function for C

(b) for 0<ε<1/2 and sample size less than

max[((1-ε)/ ε)ln(1/ δ), d(1-2(ε(1- δ)+ δ))]

No function A:S

c

H, for any hypothesis

space is a learning function for C.

(31)

Prof. Pushpak Bhattacharyya, IIT Bombay. 31

Book

1. Computational Learning Theory, M. H. G.

Anthony, N. Biggs, Cambridge Tracts in Theoretical Computer Science, 1997.

Paper’s

1. A theory of the learnable, Valiant, LG (1984), Communications of the ACM 27(11):1134 -

1142.

2. Learnability and the VC-dimension, A Blumer, A Ehrenfeucht, D Haussler, M Warmuth -

Journal of the ACM, 1989.

References

Related documents

Maths Search, Analysis of search algos, logic Economics Expert Systems, Decision Theory,. Principles of

Another tourist example: this time in a restaurant setting in a different country restaurant setting in a different country (Manna, 1974). „ Facts: A tourist is in a restaurant in

If a system is Sound &amp; Complete, it does not matter how you “Prove” or “show the validity”. Take the Syntactic Path or the

State Space : Graph of states (Express constraints and parameters of the problem)L. Operators : Transformations applied to

Going backward from final winner sequence which ends in state S 2 (indicated By the 2 nd tuple), we recover the sequence... The HMM,

Going backward from final winner sequence which ends in state S2 (indicated By the 2 nd tuple), we recover the sequence... The HMM,

Anandavardhanan IIT Bombay Teaching and Learning in Mathematics Courses... Lotus as

IIT Bombay 5.. Key insights from 40 years of Key insights from 40 years of machine Learning Research: g.. 1) What is it that is being learnt , and how the 1) What is it that is