Learning Sets of Rules
[Read Ch. 10]
[Recommended exercises 10.1, 10.2, 10.5, 10.7, 10.8]
Sequential covering algorithms
FOIL
Induction as inverse of deduction
Inductive Logic Programming
Learning Disjunctive Sets of Rules
Method 1: Learn decision tree, convert to rules Method 2: Sequential covering algorithm:
1. Learn one rule with high accuracy, any coverage 2. Remove positive examples covered by this rule 3. Repeat
Sequential Covering Algorithm
Sequential-
covering(
Target attribute;Attributes;Examples;Threshold
)
Learned rules
fg
Rule
learn-one-rule(
Target attribute;Attributes;Examples
)while performance(
Rule;Examples
)> Threshold
, do{
Learned rules Learned rules
+Rule
{
Examples Examples
? fexamples correctly classied byRule
g{
Rule
learn-one-rule(
Target attribute;Attributes;Examples
)
Learned rules
sortLearned rules
accord toperformance over
Examples
return
Learned rules
Learn-One-Rule
...
...
IF Wind=weak
THEN PlayTennis=yes
IF Wind=strong THEN PlayTennis=no
IF
THEN PlayTennis=yes
THEN
IF Humidity=normal Wind=weak
PlayTennis=yes
IF Humidity=normal THEN PlayTennis=yes
THEN
IF Humidity=normal Outlook=sunny
PlayTennis=yes THEN
IF Humidity=normal Wind=strong
PlayTennis=yes THEN
IF Humidity=normal Outlook=rain
PlayTennis=yes IF Humidity=high
THEN PlayTennis=no
Learn-One-Rule
Pos
positiveExamples
Neg
negativeExamples
while
Pos
, doLearn a
NewRule
{
NewRule
most general rule possible{
NewRuleNeg Neg
{ while
NewRuleNeg
, doAdd a new literal to specialize
NewRule
1.Candidate literals
generate candidates 2.Best literal
argmaxL2Candidate l iteral sPerformance
(SpecializeRule
(NewRule;L
)) 3. addBest literal
toNewRule
preconditions 4.NewRuleNeg
subset ofNewRuleNeg
that satises
NewRule
preconditions{
Learned rules Learned rules
+NewRule
{
Pos Pos
? fmembers ofPos
covered byNewRule
gReturn
Learned rules
1. May use beam search
2. Easily generalizes to multi-valued target functions
3. Choose evaluation function to guide search:
Entropy (i.e., information gain)
Sample accuracy:
n
cwhere
n
c = correct rule predictions,n n
= all predictions
m
estimate:n
c +mp
n
+m
Variants of Rule Learning Programs
Sequential or simultaneous covering of data?
General ! specic, or specic ! general?
Generate-and-test, or example-driven?
Whether and how to post-prune?
What statistical evaluation function?
Learning First Order Rules
Why do that?
Can learn sets of rules such as
Ancestor
(x;y
)Parent
(x;y
)Ancestor
(x;y
)Parent
(x;z
) ^Ancestor
(z;y
)General purpose programming language
Prolog: programs are sets of such rules
Pages
[Slattery, 1997]
course(A)
has-word(A, instructor), Not has-word(A, good), link-from(A, B),
has-word(B, assign), Not link-from(B, C) Train: 31/31, Test: 31/34
FOIL(
Target predicate;Predicates;Examples
)
Pos
positiveExamples
Neg
negativeExamples
while
Pos
, doLearn a
NewRule
{
NewRule
most general rule possible{
NewRuleNeg Neg
{ while
NewRuleNeg
, doAdd a new literal to specialize
NewRule
1.Candidate literals
generate candidates 2.Best literal
argmaxL2Candidate l iteral s
Foil Gain
(L;NewRule
) 3. addBest literal
toNewRule
preconditions4.
NewRuleNeg
subset ofNewRuleNeg
that satisesNewRule
preconditions{
Learned rules Learned rules
+NewRule
{
Pos Pos
? fmembers ofPos
covered byNewRule
gReturn
Learned rules
Specializing Rules in FOIL
Learning rule:
P
(x
1;x
2;:::;x
k)L
1:::L
nCandidate specializations add new literal of form:
Q
(v
1;:::;v
r), where at least one of thev
i in the created literal must already exist as a variable in the rule.
Equal
(x
j;x
k), wherex
j andx
k are variables already present in the ruleThe negation of either of the above forms of literals
Information Gain in FOIL
Foil Gain
(L;R
)t
0BB@log2p
1p
1 +n
1 ? log2p
0p
0 +n
01
C
C
A
Where
L
is the candidate literal to add to ruleR
p
0 = number of positive bindings ofR
n
0 = number of negative bindings ofR
p
1 = number of positive bindings ofR
+L
n
1 = number of negative bindings ofR
+L
t
is the number of positive bindings ofR
also covered byR
+L
Note
? log2 p0p0 +n
0 is optimal number of bits to indicate the class of a positive binding covered by
R
Induction as Inverted Deduction
Induction is nding
h
such that(8h
x
i;f
(x
i)i 2D
)B
^h
^x
i `f
(x
i) where
x
i isi
th training instance
f
(x
i) is the target function value forx
i
B
is other background knowledgeSo let's design inductive algorithm by inverting operators for automated deduction!
Induction as Inverted Deduction
\pairs of people, h
u;v
i such that child ofu
isv
,"f
(x
i) :Child
(Bob;Sharon
)x
i :Male
(Bob
);Female
(Sharon
);Father
(Sharon;Bob
)B
:Parent
(u;v
)Father
(u;v
)What satises (8h
x
i;f
(x
i)i 2D
)B
^h
^x
i `f
(x
i)?h
1 :Child
(u;v
)Father
(v;u
)h
2 :Child
(u;v
)Parent
(v;u
)Induction is, in fact, the inverse operation of deduction, and cannot be conceived to exist without the corresponding operation, so that the question of relative importance cannot arise. Who thinks of asking whether addition or subtraction is the more important process in arithmetic? But at the same time much dierence in diculty may exist between a direct and inverse operation;
:::
it must be allowed that inductive investigations are of a far higher degree of diculty and complexity than any questions of deduction::::
(Jevons 1874)
Induction as Inverted Deduction
We have mechanical deductive operators
F
(A;B
) =C
, whereA
^B
`C
need inductive operators
O
(B;D
) =h
where (8hx
i;f
(x
i)i 2D
) (B
^h
^x
i) `f
(x
i)Positives:
Subsumes earlier idea of nding
h
that \ts"training data
Domain theory
B
helps dene meaning of \t"the data
B
^h
^x
i `f
(x
i)Suggests algorithms that search
H
guided byB
Negatives:
Doesn't allow for noisy data. Consider
(8h
x
i;f
(x
i)i 2D
) (B
^h
^x
i) `f
(x
i)First order logic gives a huge hypothesis space
H
! overtting...
! intractability of calculating all acceptable
h
'sDeduction: Resolution Rule
P
_L
:
L
_R
P
_R
1. Given initial clauses
C
1 andC
2, nd a literalL
from clauseC
1 such that :L
occurs in clauseC
2 2. Form the resolventC
by including all literalsfrom
C
1 andC
2, except forL
and :L
. More precisely, the set of literals occurring in the conclusionC
isC
= (C
1 ? fL
g) [ (C
2 ? f:L
g)where [ denotes set union, and \?" denotes set dierence.
Inverting Resolution
PassExam Study C:
PassExam KnowMaterial
C :1 V
KnowMaterial Study
C :2 V
PassExam KnowMaterial
C :1 V
KnowMaterial Study
C :2 V
PassExam Study V
C: V
Inverted Resolution (Propositional)
1. Given initial clauses
C
1 andC
, nd a literalL
that occurs in clauseC
1, but not in clauseC
.2. Form the second clause
C
2 by including the following literalsC
2 = (C
? (C
1 ? fL
g)) [ f:L
gFirst order resolution
First order resolution:
1. Find a literal
L
1 from clauseC
1, literalL
2 from clauseC
2, and substitution such thatL
1 = :L
22. Form the resolvent
C
by including all literals fromC
1 andC
2, except forL
1 and :L
2. More precisely, the set of literals occurring in the conclusionC
isC
= (C
1 ? fL
1g) [ (C
2 ? fL
2g)Inverting First order resolution
C
2 = (C
? (C
1 ? fL
1g)1)2?1 [ f:L
112?1gCigol
Father Tom, Bob ( ) GrandChild y,x ( ) V Father x,z( ) V Father z,y( )
Bob/y, Tom/z}
{
{ Shannon/x}
GrandChild Bob, Shannon( )
Father Shannon, Tom( ) GrandChild Bob,x( ) V Father x,Tom( )
Progol
Progol: Reduce comb explosion by generating the most specic acceptable
h
1. User species
H
by stating predicates, functions, and forms of arguments allowed for each2. Progol uses sequential covering algorithm.
For each h
x
i;f
(x
i)iFind most specic hypothesis
h
i s.t.B
^h
i ^x
i `f
(x
i){ actually, considers only
k
-step entailment 3. Conduct general-to-specic search bounded byspecic hypothesis