Rule Induction

(1)

Rule Induction

(2)

Rule Induction

Given: Set of positive and negative examples of some concept

Example: (x₁, x₂, … , x_n, y)

y: concept (Boolean)

x₁, x₂, … , x_n: attributes (assume Boolean)

Goal: Induce a set of rules that cover all positive examples and no negative ones

Rule: x_a ^ x_b ^ … ⇒⇒⇒⇒ y (x_a: Literal, i.e., x_i or its negation)

Same as Horn clause: Body ⇒⇒⇒⇒ Head

Rule r covers example x iff x satisfies body of r

Eval(r): Accuracy, info. gain, coverage, support, etc.

(3)

Learning a Single Rule

head ← y body ← Ø repeat

for each literal x

r

_x

← r with x added to body Eval(r

_x

)

body ← body ^ best x

until no x improves Eval(r)

return r

(4)

Learning a Set of Rules

R ← Ø

S ← examples repeat

learn a single rule r R ← R U { r }

S ← S − positive examples covered by r until S = Ø

return R

(5)

First-Order Rule Induction

y and x_i are now predicates with arguments E.g.: y is Ancestor(x,y), x_i is Parent(x,y)

Literals to add are predicates or their negations

Literal to add must include at least one variable already appearing in rule

Adding a literal changes # groundings of rule

E.g.: Ancestor(x,z) ^ Parent(z,y) ⇒⇒⇒⇒ Ancestor(x,y)

Eval(r) must take this into account

E.g.: Multiply by # positive groundings of rule still covered after adding literal

(6)

“Inductive” Logic Programming

(7)

More “Interesting” ILP

(8)

Induction by inverting deduction

First investigated in depth mathematically by the 19th century political economist and philosopher of science Stanley Jevons

From Jevons' book on inductive inference:

Induction is, in fact, the inverse operation of deduction, and cannot be conceived to exist without the corresponding

operation, so that the question of relative importance cannot

arise. Who thinks of asking whether addition or subtraction is the more important process in arithmetic? But at the same time much difference in difficulty may exist between a direct and inverse

operation; the integral calculus, for instance, is infinitely more difficult than the differential calculus of which it is the inverse.

Similarly, it must be allowed that inductive investigations are of a far higher degree of difficulty and complexity than any questions of deduction; ...

(9)

Hypothesis formation and justification

Abduction. Process of hypothesis formation.

Justification. The degree of belief assigned to a hypothesis given a certain amount of

evidence.

(10)

Specific logical setting for abduction

(11)

Hypothesis Formation

(12)

Hypothesis Formation (contd)

Recall that if more than one H satisfies this, the one with highest posterior probability is chosen

Occam's razor can be taken as an instance of a prior

distribution which assigns higher prior probability to simpler hypotheses.

The D_i can be found by examining clauses that

"relatively subsume" at least one example

(13)

Subsumption and Partial Ordering

(14)

Partial Ordering Example

(15)

Partial Ordering and Lattice

Lattice is a partially ordered set (or poset) in which every pair of elements has a unique

supremum (the elements' least upper bound)

and an infimum (greatest lower bound)

(16)

Example of Lattice

(17)

Aleph & Progol: Bottom [Most Specific] Clause

Bottom clause = Negation of Conjunction of Ground Literals derived from B and e

Abduction

(18)

Example of Bottom clause

How to get to a general theory from here?

(19)

From Bottom clause to Theory

(20)

A sufficient Implementation

(Given B,E)

May be infinite

May be redundant

(21)

Solving Redundancy

(22)

Depth-bounded mode language

Use constrained subset of definite clauses to construct finite most-specific clauses

(23)

Definite Mode Language

(24)

Depth of Variables

(25)

Depth bounded definite mode language

(26)

Examples of Depth bounded definite mode language

(27)

Revised Greedy Implementation

(28)

Search and Redundancy

(29)

Entailment vs Subsumption

If A subsumes B then A implies (or entails) B. But the reverse is not true.

The difference arises in sentences involving self-implication.

For example, consider the clause:

A: nat(s(X)):- nat(X).

It follows logically that:

B: nat(s(s(X)):- nat(X).

That is A |= B.

However, A does not subsume B.

This requires a simple substituition for X in A that will yield B. The obvious one of X/s(X) will not do.

(30)

Subsumption: Revisted

Recall that a clause like:

C: grandparent(X,Z):- parent(X,Y), parent(Y,Z) is really:

C: {grandparent(X,Z), ~parent(X,Y), ~parent(Y,Z)}

Given two clauses C1 and C2, C1 is said to subsume C2 iff

there exists a substitution θ s.t. θC1 → C2

Thus for the following pair of clauses:

C1: mem(A,[B|C]):- mem(A,C).

C2: mem(0,[1,0]):- nat(0), nat(1), mem(0,[0]).

It is the case that C1 subsumes C2. Note that of the two clauses C1 "looks" more general.

(31)

Subsumption Lattice examples

(32)

Completeness and Consistency of a Hypothesis

(33)

Pruning search in Lattice

(34)

Refinement through Lattice

(35)

Search Methods

(36)

Different Kinds of Guided Search

(37)

Optimal Search using Branch-And-BoundChoice of node to branch on in the active set is based on comparisons of a dual (primary and secondary) search key associated with each

node. For e.g.

•Search = breadth first Primary key = number of literals in a clause

•Eval function = coverage Secondary key = Pos-Neg

Refinement steps

(38)

Further reducing search space

Use smaller smaller clauselength or nodes setting. Try relaxing minacc or noise to allow clauses with lower accuracy.

Set minpos to some larger value than the default. Set a different value to best.

Write constraints and prune statements.

Use a refinement operator that enumerates a smaller space.

Restrict the search space by setting beam-width (using

parameter openlist); or using an iterative beam-width search (setting search to ibs); or using randomised local search

(setting search to rls) with an appropriate setting for associated parameters); or using Camacho's language search (using

parameter language or setting search to ils).

Use a time-bounded search by setting searchtime to some small value.

(39)

Redundancy 1: Literal Redundancy

(40)

Redundancy 2: Clause Redundancy

(41)

Implementation Issues

(42)

Example: Trainspotting

(43)

Trainspotting: Modes

(44)

Trainspotting:

Examples and Background

(45)

Trainspotting: Search

(46)

ILP Techniques

Generalization techniques

Relative least general generalization

Inverse resolution

A unifying framework for generalization

Specialization techniques

Top-down search of refinement graphs

(47)

Other issues in ILP

Single vs multiple concept learner

Batch learners vs incremental learners

Interactive vs non-interactive

Typically two ends of a spectrum

Batch non-interactive systems that learn single predicates from scratch

Incremental, interactive theory revisers that learn multiple predicates

(48)

Empirical ILP

(49)

Interactive ILP

(50)

Other issues in ILP (contd)

Handling imperfect data

Noise/Random errors in training and background data

Sparse training examples

Insufficient description language

Missing values in training examples

(51)

Acknowledgement

Some Slides borrowed from

Ashwin Srinivasan

Pedro Domingos tutorial on Statistical Relational Learning at ICML’07

(52)

Subsumption Lattice examples

(53)