Inductive Logic Programming

Inductive Logic Programming and its use in Datamining Filip ZeleznyCenter of Applied Cybernetics Faculty of ElectrotechnicsCzech Technical University in Prague

Structure of Talk • Intro: ML & Datamining • ILP: Motivation, Concept • Basic Technique • Some Applications • Novel Approaches • Conclusions

Introduction • Machine Learning (ML) • a subfield of artificial intelligence, studies artificial systems that improve their behavior on the basis of experience, described formally by data. This is often achieved by reasoning analogically, or by building a model of the given domain on the basis of the data. • E.g. Pattern recognition by a trained neural network • Data Mining (DM) • is concerned with discovering understandably formulated knowledge that is valid but previously unseen in given data. This is often achieved by employing ML methods producing human-understandable models with predictive (e.g. predict an object attribute knowing the other attributes) or descriptive (e.g. find a frequently repeating pattern in data) capabilities. • E.g. ‘Shopping bag rule’: sausage  mustard

ILP: Points of View • Software Engineering View • ILP synthesizes logic programs from examples • ... but the programs may be used for data classification • Machine Learning View • ILP develops theories about data using predicate logic • ... but the theories are as expressive as algorithms (Turing machine)

A Motivation

Data Mining Example 1 • Table of cars: • Predict the attribute ‘ affordable ’ ! • Rule discovered: • Attribute learning is appropriate. • size=small & luxury=low  affordable

Data Mining Example 2 (1)[L. De Raedt, 2000] • Positive Examples • Negative Examples

Data Mining Example 2 (2)[L. De Raedt, 2000] • How to represent in AVL? • Assume fixed number of objects • Problem 1: exchange objects 1 & 2 • exponential number of different representations for the same entity

Data Mining Example 2 (3)[L. De Raedt, 2000] • Problem 2: Positional relations • explosion of false atributes • Problem 3: Variable number of objects • explosion of empty fields • explosion of entire table  We need a structural representation!

Data Mining Example 2 (3) • Could be done with more relations (tables) • BUT! Standard ML / Datamining Algorithms can work with 1 relation only • Neural nets, AQ (rules), C4.5 (decision trees), …  We need multirelational learning algorithms!

The language of Prolog

The Language of Prolog- Informal Introduction (1) • Ground facts (Predicate w. constants) add(1,1,2). • Variables add(X,0,X). • Functions e.g. s(X) - successor of X • Rules (implications) add(s(X),Y,s(Z))  add(X,Y,Z).add(0,X,X).

The Language of Prolog- Informal Introduction (2) • Invertibility minus(A,B,C)  add(B,C,A). • Functions can be avoided (flattening) suc(X,Y)  X is Y-1. (built-in arithmetics) add(0,X,X). add(X,Y,Z)  suc(A,X) & suc(B,Z) & add(A,Y,B).

The ILP Concept

Deduction (in Logic Programming) Apriori (background) knowledge about integers Theory (hypothesis) about addition suc(X,Y)  X is Y-1. add(0,X,X). add(X,Y,Z)  suc(A,X) & suc(B,Z) & add(A,Y,B). add(1,1,2), add(3,5,8), add(4,1,5), ... add(1,3,5), add(8,7,6), add(1,1,1), ... Positive examples of addition Negative examples of addition

Induction(in Inductive Logic Programming) Apriori (background) knowledge about integers Positive and negative examples of addition suc(X,Y)  X is Y-1. add(1,1,2), add(3,5,8), add(4,1,5), ... add(1,3,5), add(8,7,6), add(1,1,1), ... add(0,X,X). add(X,Y,Z)  suc(A,X) & suc(B,Z) & add(A,Y,B). Theory (hypothesis) about addition

Basic ILP Technique (1) • Search through a clause implication lattice • From general to specific (top-down) • From specific to general (bottom-up) add(X,Y,Z) add(X,Y,Z)  suc(A,X) add(X,Y,Z)  suc(B,Z) add(X,Y,Z)  suc(A,X), suc(B,X) ... etc. add(X,Y,Z)  suc(A,X) & suc(B,Z) & add(A,Y,B)

Basic ILP Technique (2) • Clauses usually constructed one-by-one • e.g. specialize until covers no negatives,then begin a new clause for the rest of positives • Implication is undecidable • instead use syntactic. subsumtion (NP - hard) • measure generality of clause with background knowledge • Efficiency: use strong bias! • syntactical: • indicate input/output vars; maximum clause length • semantical: e.g. preference heuristics

Applications

Protein Structure Prediction(1) [Muggleton, 1992] • Predict the secondary structure of protein • examples: • alpha(Protein, Position). - residue at Position in Protein is in alpha helix. • negatives: all other residues • background knowledge: • position(Protein, Pos, Residue) • chem. properties of Residues • basic arithmetics • etc.

Protein Structure Prediction(2) [Muggleton, 1992] • Results • added to background knowledge, then 2nd search • again added to B for the 3rd search alpha0(A,B)  ... position(A,D,O) & not_aromatic(O) & small_or_polar(O) & position(A,B,C) & very_hydrophobic(C) & not_aromatic(C) ...etc (22 literals) alpha1(A,B)  oct(D,E,F,G,B,H,I,J,K) & alpha0(A,F) & alpha0(A,G). alpha2(A,B)  oct(C,D,E,F,B,G,H,I,J) & alpha1(A,B) & alpha1(A,G) & alpha1(A,H).

Protein Structure Prediction(3) [Muggleton, 1992] • Final accuracy on testing set 81% • Best previous result (neural net) 76% • General-purpose bottom-up ILP system Golem used. • Experiment published in the « Protein Engineering » journal.

Mutagenecity Prediction[Srinivasan, 1995] • Predict mutagenecity (carcinogenecity) of chemicals with general system Progol [Muggleton] • Examples: compounds Active Inactive • Result: structural alert

Datamining in Telephony[Zelezny, Stepankova, Zidek 2000] • Discover frequent patterns of operations in an enterprise telephone exchange • Examples: history of calls + related attributes • Result: e.g. rule (lower case ~ constant) covers: • Predicates day, prefix, etc. in background knowledge. redirection(A,B,C,10)  day(tuesday,A) &prefix(C,[5,0],2). redirection([15], [13,14,48], [5,0,0,0,0,0,0,0], 10). redirection([15], [14,18,58], [5,0,9,6,0,1,8,9], 10). redirection([22], [18,50,30], [5,0,0,0,0,0,0,0], 10). redirection([29], [13,35,56], [5,0,0,0,0,0,0,0], 10). redirection([29], [13,57,36], [5,0,0,0,0,0,0,0], 10).

Other Applications • Finite element mesh design • Control of dynamical systems • qualitative simulation • Software Engineering • Many more, especially in data mining

Novel Approaches

Descriptive ILP • Examples are interpretations (models) • is one example • Hypothesis must be true in all examples • Suited for data mining • finds ALL true hypothesis - maximum characterisation triangle(t,up) & circle(c1) & inside(c,t) &circle(c2) & right_of (c2,t) & class(positive) class(positive)  triangle(X,Y) & circle(Z) & inside(Z,X).

Descriptive ILP – Application [Zelezny, Stepankova, Zidek / ILP 2000] • Call logging (mixed events) • Examples of single events(sets of actions and their logs) • Such as t(time(19,43,48),[1,2],time(19,43,48),e,li,empty,d,empty,empty,ex,[0,6,0,2,3,3,0,5,3,3],empty,anstr([0,0,5,0,0,0]),fe,fe,id(4)). t(time(19,43,48),[1,2],time(19,43,50),e,lb,e(relcause),d,dr,06,ex,[0,6,0,0,0,0,0,0,0,0],empty,anstr([0,0,5,0,0,0]),fe,fe,id(5)). ex_ans([0,6,0,2,3,3,0,5,3,3],[1,2]). hangsup([0,6,0,2,3,3,0,5,3,3]).

Descriptive ILP – Application [Zelezny, Stepankova, Zidek / ILP 2000] • Results • Rules that describe actions in terms of logging records • Such as ex_ans(RNCA1,DN1):- t(D1,IT1,DN1,ET1,e,li,empty,d,EF1,FI1,ex,RNCA1,empty,ANTR1,CO1,DE1,ID1), IT2=ET1, ANTR2=ANTR1, t(D2,IT2,DN2,ET2,e,lb,RC2,d,EF2,FI2,ex,RNCA2,empty,ANTR2,CO2,DE2,ID2), samenum(RNCA1,RNCA2).

Upgrades of Propositional Learnes:1st-order Decision Trees • Upgrades the C4.5 algorithm • E.g. Tilde [Blockheel, De Raedt] ? - circle(C1) ? - triangle(T,up) & inside(C1,T) class(positive) ? - circle(C2) & inside(C1,C2) class(positive) class(negative) class(positive)

More Upgrades of Propositional Learners • 1st-order association rules • the WARMR system [Dehaspe] • upgrade of Apriori • 1st-order Bayesian Nets • 1st-order Clustering • 1st-order Distance Based Learning [Zelezny / ILP 2001]

Concluding Remarks • Advantages of ILP • Theoretical: Turing-equivalent expressive power • Practical: rich but understandable language, integration of background knowledge, MULTI-relational data mining • Problems still to be solved... • efficiency, handling numbers, user interfaces

Find out more • ON • ML and DM literature, sources • Our ML and DM group • What we do • How you can participate • Etc. http://cyber.felk.cvut.cz/gerstner/machine-learning

Inductive Logic Programming