• No results found

Learned rules

N/A
N/A
Protected

Academic year: 2022

Share "Learned rules"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Learning Sets of Rules

[Read Ch. 10]

[Recommended exercises 10.1, 10.2, 10.5, 10.7, 10.8]

Sequential covering algorithms

FOIL

Induction as inverse of deduction

Inductive Logic Programming

(2)

Learning Disjunctive Sets of Rules

Method 1: Learn decision tree, convert to rules Method 2: Sequential covering algorithm:

1. Learn one rule with high accuracy, any coverage 2. Remove positive examples covered by this rule 3. Repeat

(3)

Sequential Covering Algorithm

Sequential-

covering(

Target attribute;Attributes;Examples;Threshold

)

Learned rules

fg

Rule

learn-one-

rule(

Target attribute;Attributes;Examples

)

while performance(

Rule;Examples

)

> Threshold

, do

{

Learned rules Learned rules

+

Rule

{

Examples Examples

? fexamples correctly classied by

Rule

g

{

Rule

learn-one-

rule(

Target attribute;Attributes;Examples

)

Learned rules

sort

Learned rules

accord to

performance over

Examples

return

Learned rules

(4)

Learn-One-Rule

...

...

IF Wind=weak

THEN PlayTennis=yes

IF Wind=strong THEN PlayTennis=no

IF

THEN PlayTennis=yes

THEN

IF Humidity=normal Wind=weak

PlayTennis=yes

IF Humidity=normal THEN PlayTennis=yes

THEN

IF Humidity=normal Outlook=sunny

PlayTennis=yes THEN

IF Humidity=normal Wind=strong

PlayTennis=yes THEN

IF Humidity=normal Outlook=rain

PlayTennis=yes IF Humidity=high

THEN PlayTennis=no

(5)

Learn-One-Rule

Pos

positive

Examples

Neg

negative

Examples

while

Pos

, do

Learn a

NewRule

{

NewRule

most general rule possible

{

NewRuleNeg Neg

{ while

NewRuleNeg

, do

Add a new literal to specialize

NewRule

1.

Candidate literals

generate candidates 2.

Best literal

argmaxL2Candidate l iteral s

Performance

(

SpecializeRule

(

NewRule;L

)) 3. add

Best literal

to

NewRule

preconditions 4.

NewRuleNeg

subset of

NewRuleNeg

that satises

NewRule

preconditions

{

Learned rules Learned rules

+

NewRule

{

Pos Pos

? fmembers of

Pos

covered by

NewRule

g

Return

Learned rules

(6)

1. May use beam search

2. Easily generalizes to multi-valued target functions

3. Choose evaluation function to guide search:

Entropy (i.e., information gain)

Sample accuracy:

n

c

where

n

c = correct rule predictions,

n n

= all predictions

m

estimate:

n

c +

mp

n

+

m

(7)

Variants of Rule Learning Programs

Sequential or simultaneous covering of data?

General ! specic, or specic ! general?

Generate-and-test, or example-driven?

Whether and how to post-prune?

What statistical evaluation function?

(8)

Learning First Order Rules

Why do that?

Can learn sets of rules such as

Ancestor

(

x;y

)

Parent

(

x;y

)

Ancestor

(

x;y

)

Parent

(

x;z

) ^

Ancestor

(

z;y

)

General purpose programming language

Prolog: programs are sets of such rules

(9)

Pages

[Slattery, 1997]

course(A)

has-word(A, instructor), Not has-word(A, good), link-from(A, B),

has-word(B, assign), Not link-from(B, C) Train: 31/31, Test: 31/34

(10)

FOIL(

Target predicate;Predicates;Examples

)

Pos

positive

Examples

Neg

negative

Examples

while

Pos

, do

Learn a

NewRule

{

NewRule

most general rule possible

{

NewRuleNeg Neg

{ while

NewRuleNeg

, do

Add a new literal to specialize

NewRule

1.

Candidate literals

generate candidates 2.

Best literal

argmaxL2Candidate l iteral s

Foil Gain

(

L;NewRule

) 3. add

Best literal

to

NewRule

preconditions

4.

NewRuleNeg

subset of

NewRuleNeg

that satises

NewRule

preconditions

{

Learned rules Learned rules

+

NewRule

{

Pos Pos

? fmembers of

Pos

covered by

NewRule

g

Return

Learned rules

(11)

Specializing Rules in FOIL

Learning rule:

P

(

x

1

;x

2

;:::;x

k)

L

1

:::L

n

Candidate specializations add new literal of form:

Q

(

v

1

;:::;v

r), where at least one of the

v

i in the created literal must already exist as a variable in the rule.

Equal

(

x

j

;x

k), where

x

j and

x

k are variables already present in the rule

The negation of either of the above forms of literals

(12)

Information Gain in FOIL

Foil Gain

(

L;R

)

t

0BB@log2

p

1

p

1 +

n

1 ? log2

p

0

p

0 +

n

0

1

C

C

A

Where

L

is the candidate literal to add to rule

R

p

0 = number of positive bindings of

R

n

0 = number of negative bindings of

R

p

1 = number of positive bindings of

R

+

L

n

1 = number of negative bindings of

R

+

L

t

is the number of positive bindings of

R

also covered by

R

+

L

Note

? log2 p0p0 +n

0 is optimal number of bits to indicate the class of a positive binding covered by

R

(13)

Induction as Inverted Deduction

Induction is nding

h

such that

(8h

x

i

;f

(

x

i)i 2

D

)

B

^

h

^

x

i `

f

(

x

i) where

x

i is

i

th training instance

f

(

x

i) is the target function value for

x

i

B

is other background knowledge

So let's design inductive algorithm by inverting operators for automated deduction!

(14)

Induction as Inverted Deduction

\pairs of people, h

u;v

i such that child of

u

is

v

,"

f

(

x

i) :

Child

(

Bob;Sharon

)

x

i :

Male

(

Bob

)

;Female

(

Sharon

)

;Father

(

Sharon;Bob

)

B

:

Parent

(

u;v

)

Father

(

u;v

)

What satises (8h

x

i

;f

(

x

i)i 2

D

)

B

^

h

^

x

i `

f

(

x

i)?

h

1 :

Child

(

u;v

)

Father

(

v;u

)

h

2 :

Child

(

u;v

)

Parent

(

v;u

)

(15)

Induction is, in fact, the inverse operation of deduction, and cannot be conceived to exist without the corresponding operation, so that the question of relative importance cannot arise. Who thinks of asking whether addition or subtraction is the more important process in arithmetic? But at the same time much dierence in diculty may exist between a direct and inverse operation;

:::

it must be allowed that inductive investigations are of a far higher degree of diculty and complexity than any questions of deduction

::::

(Jevons 1874)

(16)

Induction as Inverted Deduction

We have mechanical deductive operators

F

(

A;B

) =

C

, where

A

^

B

`

C

need inductive operators

O

(

B;D

) =

h

where (8h

x

i

;f

(

x

i)i 2

D

) (

B

^

h

^

x

i) `

f

(

x

i)

(17)

Positives:

Subsumes earlier idea of nding

h

that \ts"

training data

Domain theory

B

helps dene meaning of \t"

the data

B

^

h

^

x

i `

f

(

x

i)

Suggests algorithms that search

H

guided by

B

(18)

Negatives:

Doesn't allow for noisy data. Consider

(8h

x

i

;f

(

x

i)i 2

D

) (

B

^

h

^

x

i) `

f

(

x

i)

First order logic gives a huge hypothesis space

H

! overtting...

! intractability of calculating all acceptable

h

's

(19)

Deduction: Resolution Rule

P

_

L

:

L

_

R

P

_

R

1. Given initial clauses

C

1 and

C

2, nd a literal

L

from clause

C

1 such that :

L

occurs in clause

C

2 2. Form the resolvent

C

by including all literals

from

C

1 and

C

2, except for

L

and :

L

. More precisely, the set of literals occurring in the conclusion

C

is

C

= (

C

1 ? f

L

g) [ (

C

2 ? f:

L

g)

where [ denotes set union, and \?" denotes set dierence.

(20)

Inverting Resolution

PassExam Study C:

PassExam KnowMaterial

C :1 V

KnowMaterial Study

C :2 V

PassExam KnowMaterial

C :1 V

KnowMaterial Study

C :2 V

PassExam Study V

C: V

(21)

Inverted Resolution (Propositional)

1. Given initial clauses

C

1 and

C

, nd a literal

L

that occurs in clause

C

1, but not in clause

C

.

2. Form the second clause

C

2 by including the following literals

C

2 = (

C

? (

C

1 ? f

L

g)) [ f:

L

g

(22)

First order resolution

First order resolution:

1. Find a literal

L

1 from clause

C

1, literal

L

2 from clause

C

2, and substitution

such that

L

1

= :

L

2

2. Form the resolvent

C

by including all literals from

C

1

and

C

2

, except for

L

1

and :

L

2

. More precisely, the set of literals occurring in the conclusion

C

is

C

= (

C

1 ? f

L

1g)

[ (

C

2 ? f

L

2g)

(23)

Inverting First order resolution

C

2 = (

C

? (

C

1 ? f

L

1g)

1)

2?1 [ f:

L

1

1

2?1g

(24)

Cigol

Father Tom, Bob ( ) GrandChild y,x ( ) V Father x,z( ) V Father z,y( )

Bob/y, Tom/z}

{

{ Shannon/x}

GrandChild Bob, Shannon( )

Father Shannon, Tom( ) GrandChild Bob,x( ) V Father x,Tom( )

(25)

Progol

Progol: Reduce comb explosion by generating the most specic acceptable

h

1. User species

H

by stating predicates, functions, and forms of arguments allowed for each

2. Progol uses sequential covering algorithm.

For each h

x

i

;f

(

x

i)i

Find most specic hypothesis

h

i s.t.

B

^

h

i ^

x

i `

f

(

x

i)

{ actually, considers only

k

-step entailment 3. Conduct general-to-specic search bounded by

specic hypothesis

h

i, choosing hypothesis with minimum description length

References

Related documents

Dr Deepak B Phatak Lecture 2 Rules of the game 1...

Learn decisions in s in proportion to importance of s Advantages of decision trees over BDDs:. I more

• Since rules become composable, tree tiling based instruction selection algorithms can be used. Currently rules are non-composable and GCC uses full tree

While reiterating the instructions issued in the references third & fourth cited, Government further direct the Controlling Officers or Chief Vigilance Officers / Vigilance

While effecting the desirable changes mentioned in the foregoing paragraphs, the Anglo-American Cataloguing Rules has signi- ficantly maintained the traditional stand of recognising

(iv) the manufacturer shall not avail the credit of duty on inputs under rule 3 or rule 11 of the CENVAT Credit Rules, 2002 (herein after referred to as the said rules), paid on

Thus, in summary, Polyceptron, in general, outperforms a generic decision tree method as well as any specialized algorithm for learning polyhedral sets (e.g., PC-SLP)..

The clearance of the valve 6C-13-1 depends on the steam temperature behind the first stage of the superheaterc I, on the steam temperature behind the second stage of steam at