Machine Learning Supports Processes

Machine Learning Supports Processes Prof. Dr. Katharina Morik TU Dortmund, Computer Science LS VIII http://www-ai.cs.uni-dortmund.de

Overview • Learning programs from examples -- ILP • Real programs • Restricted logic programs • Learning in XML • Schema learning • Frequent sequences in web applications • Collaborative clustering of folksonomies • Learning about programs using meta-data • Co-update of files • Bug prediction: where, how often

The Old Dream • Programming a computer by examples • Alain Turing • Gordon Plotkin • Wray Buntine • Jörg-Uwe Kietz, Stefan Wrobel • Stephen Muggleton • Jean-Francois Puget, Céline Rouveirol member(A, [A|B]):- member(A, [B|C]):- member(A,C).

... but • the cut !Control structures are too hard to learn. • Negation: not(H  E) ≠ H  not(E)Negation by failure is not sufficient for planning (Puget 1989) • Humans easily express control structures. • Rule learning instead of learning programs.

Rule Learning from Examples Given: • a set of examples E={E+,E-} in LE, • a set of facts in LB Find a set of rules H in LH such that • M+(B  E)  M(H), i.e., H is true in all minimal models of B and E (correct) • For all h  H there is e  E such that not(B, E \{e} e), but B, E \{e}, h  e (necessary) • For each h  LH fulfilling 1) and 2), H h holds. (complete) • H is minimal. (not redundant)

Difficulties • Rule learning is more difficult than concept learning. • Inheriting difficulties of induction from deduction: D is more general than C if D  C • Whether an example clause is covered by a hypothesis clause is hard to decide.

B: mother(ann,bart). mother(ann, britta). father(arno,bart). father(arno, britta). mother(britta,celine). father(bernd,celine). parent(bernd,celine). parent(britta,celine). D is generative and deterministic w.r.t. B, hence D  C can efficiently be computed bound by depth (i) and arity(j). D=grandma(X,Z):- mother(X,Y),mother(Y,Z). {Y/britta} C=grandma(ann,celine):- mother(ann,britta), mother(britta,celine), father(arno,britta), father(bernd,celine). Example • D‘=grandma(X,Z):- mother(X,Y),parent(Y,Z). • {Y/bart} {Y/bernd} • {Y/britta} {Y/britta} • D‘ is generative and indeterministic w.r.t. B, hence D  C • is NP in the general case. If the indeterministic part is restricted to • k literals, here 2-llocal, learnability is polynomial.

The Borderline • Learnability: ij-deterministic clauses (Muggleton et al. 1992) and k-l-local indeterministic clauses are polynomially learnable (Kietz 1996). D0:- DDET, DNONDET LOCi shares no variables with DDET, there is no LOCj LOCi k ≥ |LOCi | (at most k literals)

The Old Dream Revisited Logic-based approach to robotics: • no maps necessary • no correct measures necessary but only relations • acting as long as a perception feature is valid • learnable representation at all levels of abstraction • easy communication with humans Morik, Katharina and Klingspor, Volker and Kaiser, Michael (editors). Making Robots Smarter -- Combining Sensing and Action through Robot Learning. Kluwer Academic Press, 1999. Klingspor, Volker and Morik, Katharina and Rieger, Anke. Learning Concepts from Sensor Data of a Mobile Robot. In Machine Learning, Vol. 23, No. 2/3, pages 305-332, 1996.

Navigation of a Mobile Robot chain clauses induction of operational concepts example generation, signal to symbol rule compilation planning, plan execution measurements real-time planning, high-level commands cheap sensors, low-level behaviors

Experiments • 25 tours given, 18 tours in rooms not used for training • Goal: crossing doorway • 23 tours successful (door found, doorway passed) • 21 tours: all perception features correctly recognized • 2 tours: obstacle not recognized • along_door: 1 false positive, 1 false negative.

1st Lessons -- ILP • Spatiotemporal relations can excellently be expressed. • First-order logic is difficult for engineers, though easy for linguists and philosophers. • Calculations and numerical relationships can hardly be expressed.

Lessons: ILP • Prerequisites needed: • Inference engine • Preprocessing tools constraining logic further: • Sort taxonomy • Predicate taxonomy • Declarative bias • MOBAL (Morik et al. 1993) ~ 30 person years

XML is it! • Business processes are described in XML. • Web applications are based on XML. • Software development tools use XML (Castor, JAXB). • Data integration relies on XML. • Even machine learning experiments are expressed in XML (RapidMiner). • DTDs (subset of XML schema) are context-free grammars with regular expressions at the right hand side. These RE are deterministic.

Learning regular expressions • Given XML, induce the underlying schema through forming regular expressions covering all instances. • If every tag occurs just once, first a Single Occurrence Automaton is inferred which is transformed into a regular expression. (Bex et al. 2006)

a d b c e Example Input: bacacdacde, cbacdbacde, abccaadcde ((b?(a+c)+)+d)+e Z.B.: authors, citation, (volume|month), year, pages?,(title|description)?,xrefs? Forming 2-Grams, drawing edges between The 2 symbols (Garcia, Vidal 1999) Simplifying automaton, Applying rewrite rules

Frequent Sequences • GSP (Srikant, Agrawal1996) • finds frequent sequences of event types. • Event types are described by items. • (GET), (URL, p1, p2), (POST, URL, q1,q2,q3) • Application defines attributes of items, GPS defines event types by means of items.

Web Applications Security • Positive model (allowed access) to be acquired from observed logs (audit) • XML language for • Resource tree • Parameters (regular expressions) • Finding frequent sequences of resource requests using GSP (Srikant, Agrawal 1996) Bockermann 2007 [(GET, form.html)(POST, register.pl, salut, name)]

games photoshop shopping imported photography programming mac java linux javascript free news reference howto music php Clustering based on Taggings User, Tags, Resources

Multi-objective Term Clustering • Given (users U, resources R, terms T) and relations Y  U  R  T • Find a hierarchical clustering of term sets, each containing a set of resources • Frequent Term-Based Clustering (Ester et al. 2002) • Selecting Pareto-optimal clusterings via NSGA • Orthogonal criteria of completeness vs. childcount, coverage vs. overlap Kaspari, Wurst 2007

Lessons: XML • Schema induction can be used for schema cleaning and enhancement. • Frequent sequences of web usage data deliver an XML positive access model. • Multi-objective frequent termset clustering delivers pareto-optimal navigation structures.

Co-update of files • For N= 4700 source files from a telephone switch system (PBX) N over 2 pairs are formed. • A pair is classified relevant if it is co-updated. • Each pair has attributes: same extension, common prefix length, number of shared types/routines... Word vectors for documentation and bug reports. • Decision tree learning -- best results with bug report words. Shirabad,Lethbridge, Matwin 2004

Predicting Bug Reports • Predicting the class, where failure is likely by a learned decision tree. • Predicting the number of future defects by a regression tree. • Number of revisions and reported problems are the best features. Bernstein et al. 2007

Process Optimization? • Directly XML-based:Finding independent sub-processes • Based on meta-descriptions of processes: • Finding appropriate feature sets • Optimizing according to which objective function?

Thank you for your attention!

Machine Learning Supports Processes

Machine Learning Supports Processes

Presentation Transcript

Basic Machine Processes

Gaussian Processes in Machine Learning

Learning Disability and Supports

Machine Learning

Machine Learning

MACHINE LEARNING

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Learning Supports Division

Basic Machine Processes

Learning Supports

Machine learning Courses | Machine Learning Training

Machine Learning

Learning Supports Division

Learning Centered Student Supports

Gaussian Processes in Machine Learning

Basic Machine Processes

Machine Learning Projects | Machine Learning Applications | Machine Learning Training | Simplilearn