Rule Induction
Rule Induction
Given: Set of positive and negative examples of some concept
Example: (x1, x2, … , xn, y)
y: concept (Boolean)
x1, x2, … , xn: attributes (assume Boolean)
Goal: Induce a set of rules that cover all positive examples and no negative ones
Rule: xa ^ xb ^ … ⇒⇒⇒⇒ y (xa: Literal, i.e., xi or its negation)
Same as Horn clause: Body ⇒⇒⇒⇒ Head
Rule r covers example x iff x satisfies body of r
Eval(r): Accuracy, info. gain, coverage, support, etc.
Learning a Single Rule
head ← y body ← Ø repeat
for each literal x
r
x← r with x added to body Eval(r
x)
body ← body ^ best x
until no x improves Eval(r)
return r
Learning a Set of Rules
R ← Ø
S ← examples repeat
learn a single rule r R ← R U { r }
S ← S − positive examples covered by r until S = Ø
return R
First-Order Rule Induction
y and xi are now predicates with arguments E.g.: y is Ancestor(x,y), xi is Parent(x,y)
Literals to add are predicates or their negations
Literal to add must include at least one variable already appearing in rule
Adding a literal changes # groundings of rule
E.g.: Ancestor(x,z) ^ Parent(z,y) ⇒⇒⇒⇒ Ancestor(x,y)
Eval(r) must take this into account
E.g.: Multiply by # positive groundings of rule still covered after adding literal
“Inductive” Logic Programming
More “Interesting” ILP
Induction by inverting deduction
First investigated in depth mathematically by the 19th century political economist and philosopher of science Stanley Jevons
From Jevons' book on inductive inference:
Induction is, in fact, the inverse operation of deduction, and cannot be conceived to exist without the corresponding
operation, so that the question of relative importance cannot
arise. Who thinks of asking whether addition or subtraction is the more important process in arithmetic? But at the same time much difference in difficulty may exist between a direct and inverse
operation; the integral calculus, for instance, is infinitely more difficult than the differential calculus of which it is the inverse.
Similarly, it must be allowed that inductive investigations are of a far higher degree of difficulty and complexity than any questions of deduction; ...
Hypothesis formation and justification
Abduction. Process of hypothesis formation.
Justification. The degree of belief assigned to a hypothesis given a certain amount of
evidence.
Specific logical setting for abduction
Hypothesis Formation
Hypothesis Formation (contd)
Recall that if more than one H satisfies this, the one with highest posterior probability is chosen
Occam's razor can be taken as an instance of a prior
distribution which assigns higher prior probability to simpler hypotheses.
The Di can be found by examining clauses that
"relatively subsume" at least one example
Subsumption and Partial Ordering
Partial Ordering Example
Partial Ordering and Lattice
Lattice is a partially ordered set (or poset) in which every pair of elements has a unique
supremum (the elements' least upper bound)
and an infimum (greatest lower bound)
Example of Lattice
Aleph & Progol: Bottom [Most Specific] Clause
Bottom clause = Negation of Conjunction of Ground Literals derived from B and e
Abduction
Example of Bottom clause
How to get to a general theory from here?
From Bottom clause to Theory
A sufficient Implementation
(Given B,E)
May be infinite
May be redundant
Solving Redundancy
Depth-bounded mode language
Use constrained subset of definite clauses to construct finite most-specific clauses
Definite Mode Language
Depth of Variables
Depth bounded definite mode language
Examples of Depth bounded definite mode language
Revised Greedy Implementation
Search and Redundancy
Entailment vs Subsumption
If A subsumes B then A implies (or entails) B. But the reverse is not true.
The difference arises in sentences involving self-implication.
For example, consider the clause:
A: nat(s(X)):- nat(X).
It follows logically that:
B: nat(s(s(X)):- nat(X).
That is A |= B.
However, A does not subsume B.
This requires a simple substituition for X in A that will yield B. The obvious one of X/s(X) will not do.
Subsumption: Revisted
Recall that a clause like:
C: grandparent(X,Z):- parent(X,Y), parent(Y,Z) is really:
C: {grandparent(X,Z), ~parent(X,Y), ~parent(Y,Z)}
Given two clauses C1 and C2, C1 is said to subsume C2 iff
there exists a substitution θ s.t. θC1 → C2
Thus for the following pair of clauses:
C1: mem(A,[B|C]):- mem(A,C).
C2: mem(0,[1,0]):- nat(0), nat(1), mem(0,[0]).
It is the case that C1 subsumes C2. Note that of the two clauses C1 "looks" more general.
Subsumption Lattice examples
Completeness and Consistency of a Hypothesis
Pruning search in Lattice
Refinement through Lattice
Search Methods
Different Kinds of Guided Search
Optimal Search using Branch-And-BoundChoice of node to branch on in the active set is based on comparisons of a dual (primary and secondary) search key associated with each
node. For e.g.
•Search = breadth first Primary key = number of literals in a clause
•Eval function = coverage Secondary key = Pos-Neg
Refinement steps
Further reducing search space
Use smaller smaller clauselength or nodes setting. Try relaxing minacc or noise to allow clauses with lower accuracy.
Set minpos to some larger value than the default. Set a different value to best.
Write constraints and prune statements.
Use a refinement operator that enumerates a smaller space.
Restrict the search space by setting beam-width (using
parameter openlist); or using an iterative beam-width search (setting search to ibs); or using randomised local search
(setting search to rls) with an appropriate setting for associated parameters); or using Camacho's language search (using
parameter language or setting search to ils).
Use a time-bounded search by setting searchtime to some small value.
Redundancy 1: Literal Redundancy
Redundancy 2: Clause Redundancy
Implementation Issues
Example: Trainspotting
Trainspotting: Modes
Trainspotting:
Examples and BackgroundTrainspotting: Search
ILP Techniques
Generalization techniques
Relative least general generalization
Inverse resolution
A unifying framework for generalization
Specialization techniques
Top-down search of refinement graphs
Other issues in ILP
Single vs multiple concept learner
Batch learners vs incremental learners
Interactive vs non-interactive
Typically two ends of a spectrum
Batch non-interactive systems that learn single predicates from scratch
Incremental, interactive theory revisers that learn multiple predicates
Empirical ILP
Interactive ILP
Other issues in ILP (contd)
Handling imperfect data
Noise/Random errors in training and background data
Sparse training examples
Insufficient description language
Missing values in training examples
Acknowledgement
Some Slides borrowed from
Ashwin Srinivasan
Pedro Domingos tutorial on Statistical Relational Learning at ICML’07