1 / 30

Pattern Evaluation and Process Control

Pattern Evaluation and Process Control. Wei-Min Shen Information Sciences Institute University of Southern California. Outline. Intuition of Interestingness Principles for Measuring Interestingness Existing Measurement Systems Minimal Description Length Principle

doria
Download Presentation

Pattern Evaluation and Process Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Evaluation and Process Control Wei-Min Shen Information Sciences Institute University of Southern California UCLA Data Mining Short Course

  2. Outline • Intuition of Interestingness • Principles for Measuring Interestingness • Existing Measurement Systems • Minimal Description Length Principle • Methods for Process Control UCLA Data Mining Short Course

  3. Why Is a Pattern “Interesting”? • I did not know X before • It contradicts my thinking (surprise) • It is supported by the majority of the data • It is an exception of the usual cases • Occam’s Razor: Simple is better • More? UCLA Data Mining Short Course

  4. The Types of Classification Rule • Let h be a hypothesis and e the evidence, then respect to any given tuple, we have • Characteristic rule: he • Discriminate rule: eh • e and h can be interpreted as sets of tuples satisfying e and h respectively UCLA Data Mining Short Course

  5. A Few Definitions • Given a discriminate rule R: eh • |e| is the cover of the rule • |he|/|e| is the confidence, reliability, orcertainty factor of the rule • R is “X% complete”: if |he|/|h| = X% (e satisfies X% of |h|) • R is “Y% discriminate”: if |¬he|/|¬h| = (100-Y)% (e satisfies (100-Y)% of |¬h|) UCLA Data Mining Short Course

  6. Principles for Measuring “I” • 1. I = 0 if h and e are statistically independent • e and h have no relation at all • 2. I monotonically with |he| when |h|, |¬h|, and |e| remain the same • I relates to reliability UCLA Data Mining Short Course

  7. Principles for Measuring “I” • 3. I monotonically with |h|(or|e|) when |he|, |e| (or |h|), and |¬h| remain the same • I relates to completeness • 4. I monotonically with |e| when reliability |he|/|e|, |h|, and |¬h| remain the same • I relates to cover when reliability is the same UCLA Data Mining Short Course

  8. Treat Discriminate and Characteristic Rules Differently • Principles 1,2,3,4 apply to both discriminate and characteristic rules • 5.Treat discriminate and characteristic rules differently • Rule E H Discrim Complete • A Fever Flu 80% 30% • B Sneeze Flu 30% 80% • As discriminate rule I(A) > I(B) • As characteristic rule I(B) > I(A) UCLA Data Mining Short Course

  9. Existing Measurement Systems • RI (Piatetsky-Shapiro 91) • J (Smyth and Goodman 92) • CE (Hong and Mao 91) • IC++ (Kamber and Shinghal 96) UCLA Data Mining Short Course

  10. IC++ Measurement for Characteristic Rules • Given h, e, let rule d: eh and rule c: he • Nec(d) = P(¬e|h)/P(¬e|¬h) • Suf(d) = P(e|h)/P(e|¬h) • for he, C++= if 0Nec(d)<1 then (1-Nec(d))*P(h), else 0. • for h¬e, C+-= if 0Suf(d)<1 then (1-Suf(d))*P(h), else 0. • for¬he, C-+= if 0<Nec(d)< then (1-1/Nec(d))*P(¬h), else 0. • for¬h¬e, C--= if 0<Suf(d)< then (1-1/Suf(d))*P(¬h), else 0. UCLA Data Mining Short Course

  11. Minimal Description Length Principle • The goodness of a theory or hypothesis (H) relative to a set a data (D) is measured: • The sum of • The length of H • The length of explanation of D using H • Assuming both use the optimal coding schema UCLA Data Mining Short Course

  12. The Derivation of MDL • Based on probability theory, the best hypothesis H with respect to D is: • the max of P(H)P(D|H) • or the max of logP(H) + logP(D|H) • or the min of -logP(H) - logP(D|H) • Since the optimal encode of a set is related to the probability of the elements, so we have MDL • the min of |coding1(H)| + |coding2(D|H)| UCLA Data Mining Short Course

  13. An Illustration of MDL One line theory: explanation length = 294.9 Two line theory: explanation length = 298.7 UCLA Data Mining Short Course

  14. Fit Points with Lines • Theory = lines (#,angle,length,center) • Explanation: for each point: • the line it belongs to • the position on the line • the distance to line • Notice that the current coding is (x,y) • It is different if we choose coding (r,theta) UCLA Data Mining Short Course

  15. Process Control • The Goal: to predict future from past • The Given: the past data sequence • The methods: • Adaptive Control Theory • Chaotic theory • State Machines UCLA Data Mining Short Course

  16. Chaotic Theory • The data sequence may appear chaotic • The underlying model may be very simple • Extreme sensitive to initial condition • Difficult to make long term prediction • Short term prediction is possible UCLA Data Mining Short Course

  17. An Example Chaotic Sequence 1.0 s(k) 0.5 0.0 20 40 60 80 100 Time step k The simple logistic map model: sk+1= ask (1 - sk), where a=4 UCLA Data Mining Short Course

  18. Steps of Using Chaotic Theory • Reconstruction of state space: • xk = [xk, xk-, …, xk-(m-1)]T • where  is a time delay, m is the embedding dimension • Taken’s theorem:, one can always find an embedding dimension m2[d]+1, where [d] is the integer part of the attractor’s dimension, to preserve the invariant measures • Central task: chose m and  UCLA Data Mining Short Course

  19. State Machine Approach • Identify the number of states by clustering all points in the sequence • Construct a transition function by learning from the sequence UCLA Data Mining Short Course

  20. Construction & Synchronization • Environment = (A, P, Q, r) where |P|<|Q| • Model = (A, P, S, t) • Visibly equivalent • Perfect • Synchronized • The Construction problem • when and how to construct new model states • The Synchronization problem • how to determine which model state is current UCLA Data Mining Short Course

  21. Learning with a Reset Button • Two environmental states p and q (they may appear the same to the learner) are different if and only if there exists a sequence e of actions that leads from p and q to states that are visibly different • The interaction with the environment • Membership Query • Equivalence Query: “yes” or a counter example UCLA Data Mining Short Course

  22. Observation Table • Model states: {row(s) : s in S} • Initial state: row(l) • Final state: {row(s) : s in S and T(s)=1 • Transitions: (row(s),a) = row(sa) • Closed table: s,as’ row(sa)=row(s’) • Consistent table: row(s)=row(s’)  row(sa)=row(s’a) E (experiments) States (actions from init state) T: Observations S Transitions SxA UCLA Data Mining Short Course

  23. L* Algorithm • Initialize T for  and each action in A • Loop Use membership queries to make T complete, closed, and consistent If EQ(T)=w /* an counter example */ then add w and all its prefixes into S;Until EQ(T)=yes. UCLA Data Mining Short Course

  24. The Little Prince Example • A counter example ftf for M3 (Fig 5.3), the model ends at rose, but the real observation is volcano • An inconsistency in T4 (Tab 5.5), where row(f)=row(ft), but row(ff)  row(ftf). UCLA Data Mining Short Course

  25. Homing Sequence • L* is limited by a reset button • Homing Sequence h: if two observation sequences of executing h are the same, then these two sequences lead to the same state • Let q<h> be observation sequence, andqh the ending state, then h is defined as • for all p, q: [p<h>=q<h>] [ph=qh] • e.g., {fwd} is a homing seq for Little Prince UCLA Data Mining Short Course

  26. Properties of Homing Seq • Every FDA has a homing sequence • Can be constructed from a FDA by appending actions (<n) that distinguish a pair of states • The length of this construction is n2 • There are FDA whose shortest h is n2 long • h can be used as a reset • h cannot guarantee go to a fixed state UCLA Data Mining Short Course

  27. L* with a Homing Sequence h • Every time a reset is needed, repeat h until you see the desired observation sequence • Or for each possible observation sequence of h, make a copy of L* (see Fig 5.6) UCLA Data Mining Short Course

  28. Learning the Homing Sequence • If h is not a homing sequence, then we may discover that the same observation sequence  produced by executing h may lead us to two different states, p and q, for there is a sequence of actions x that p<x> q<x> • then, a better approximation of homing sequence is hx UCLA Data Mining Short Course

  29. L* + Learning h • Assume a homing sequence h, initially h= • When h is shown to be incorrect, extend h, and discard all copies of L* and start again • When h is incorrect, then there exists x such that qh<x>ph<x>, even if q<h>=p<h> UCLA Data Mining Short Course

  30. Learning h and the Model • Revist and Shapire’s algorithm (Fig 5.7) • Little Prince Example (notice the inconsistency produced by ff in Fig 5.10) UCLA Data Mining Short Course

More Related