1 / 30

Pattern Evaluation and Process Control - PowerPoint PPT Presentation

Pattern Evaluation and Process Control. Wei-Min Shen Information Sciences Institute University of Southern California. Outline. Intuition of Interestingness Principles for Measuring Interestingness Existing Measurement Systems Minimal Description Length Principle

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Pattern Evaluation and Process Control' - doria

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Pattern Evaluation and Process Control

Wei-Min Shen

Information Sciences Institute

University of Southern California

UCLA Data Mining Short Course

• Intuition of Interestingness

• Principles for Measuring Interestingness

• Existing Measurement Systems

• Minimal Description Length Principle

• Methods for Process Control

UCLA Data Mining Short Course

• I did not know X before

• It contradicts my thinking (surprise)

• It is supported by the majority of the data

• It is an exception of the usual cases

• Occam’s Razor: Simple is better

• More?

UCLA Data Mining Short Course

• Let h be a hypothesis and e the evidence, then respect to any given tuple, we have

• Characteristic rule: he

• Discriminate rule: eh

• e and h can be interpreted as sets of tuples satisfying e and h respectively

UCLA Data Mining Short Course

• Given a discriminate rule R: eh

• |e| is the cover of the rule

• |he|/|e| is the confidence, reliability, orcertainty factor of the rule

• R is “X% complete”: if |he|/|h| = X% (e satisfies X% of |h|)

• R is “Y% discriminate”: if |¬he|/|¬h| = (100-Y)% (e satisfies (100-Y)% of |¬h|)

UCLA Data Mining Short Course

• 1. I = 0 if h and e are statistically independent

• e and h have no relation at all

• 2. I monotonically with |he| when |h|, |¬h|, and |e| remain the same

• I relates to reliability

UCLA Data Mining Short Course

• 3. I monotonically with |h|(or|e|) when |he|, |e| (or |h|), and |¬h| remain the same

• I relates to completeness

• 4. I monotonically with |e| when reliability |he|/|e|, |h|, and |¬h| remain the same

• I relates to cover when reliability is the same

UCLA Data Mining Short Course

• Principles 1,2,3,4 apply to both discriminate and characteristic rules

• 5.Treat discriminate and characteristic rules differently

• Rule E H Discrim Complete

• A Fever Flu 80% 30%

• B Sneeze Flu 30% 80%

• As discriminate rule I(A) > I(B)

• As characteristic rule I(B) > I(A)

UCLA Data Mining Short Course

• RI (Piatetsky-Shapiro 91)

• J (Smyth and Goodman 92)

• CE (Hong and Mao 91)

• IC++ (Kamber and Shinghal 96)

UCLA Data Mining Short Course

• Given h, e, let rule d: eh and rule c: he

• Nec(d) = P(¬e|h)/P(¬e|¬h)

• Suf(d) = P(e|h)/P(e|¬h)

• for he, C++= if 0Nec(d)<1 then (1-Nec(d))*P(h), else 0.

• for h¬e, C+-= if 0Suf(d)<1 then (1-Suf(d))*P(h), else 0.

• for¬he, C-+= if 0<Nec(d)< then (1-1/Nec(d))*P(¬h), else 0.

• for¬h¬e, C--= if 0<Suf(d)< then (1-1/Suf(d))*P(¬h), else 0.

UCLA Data Mining Short Course

• The goodness of a theory or hypothesis (H) relative to a set a data (D) is measured:

• The sum of

• The length of H

• The length of explanation of D using H

• Assuming both use the optimal coding schema

UCLA Data Mining Short Course

• Based on probability theory, the best hypothesis H with respect to D is:

• the max of P(H)P(D|H)

• or the max of logP(H) + logP(D|H)

• or the min of -logP(H) - logP(D|H)

• Since the optimal encode of a set is related to the probability of the elements, so we have MDL

• the min of |coding1(H)| + |coding2(D|H)|

UCLA Data Mining Short Course

One line theory:

explanation length = 294.9

Two line theory:

explanation length = 298.7

UCLA Data Mining Short Course

• Theory = lines (#,angle,length,center)

• Explanation: for each point:

• the line it belongs to

• the position on the line

• the distance to line

• Notice that the current coding is (x,y)

• It is different if we choose coding (r,theta)

• UCLA Data Mining Short Course

• The Goal: to predict future from past

• The Given: the past data sequence

• The methods:

• Chaotic theory

• State Machines

UCLA Data Mining Short Course

• The data sequence may appear chaotic

• The underlying model may be very simple

• Extreme sensitive to initial condition

• Difficult to make long term prediction

• Short term prediction is possible

UCLA Data Mining Short Course

1.0

s(k)

0.5

0.0

20

40

60

80

100

Time step k

The simple logistic map model:

sk+1= ask (1 - sk), where a=4

UCLA Data Mining Short Course

• Reconstruction of state space:

• xk = [xk, xk-, …, xk-(m-1)]T

• where  is a time delay, m is the embedding dimension

• Taken’s theorem:, one can always find an embedding dimension m2[d]+1, where [d] is the integer part of the attractor’s dimension, to preserve the invariant measures

• Central task: chose m and 

UCLA Data Mining Short Course

• Identify the number of states by clustering all points in the sequence

• Construct a transition function by learning from the sequence

UCLA Data Mining Short Course

• Environment = (A, P, Q, r) where |P|<|Q|

• Model = (A, P, S, t)

• Visibly equivalent

• Perfect

• Synchronized

• The Construction problem

• when and how to construct new model states

• The Synchronization problem

• how to determine which model state is current

UCLA Data Mining Short Course

• Two environmental states p and q (they may appear the same to the learner) are different if and only if there exists a sequence e of actions that leads from p and q to states that are visibly different

• The interaction with the environment

• Membership Query

• Equivalence Query: “yes” or a counter example

UCLA Data Mining Short Course

• Model states: {row(s) : s in S}

• Initial state: row(l)

• Final state: {row(s) : s in S and T(s)=1

• Transitions: (row(s),a) = row(sa)

• Closed table: s,as’ row(sa)=row(s’)

• Consistent table: row(s)=row(s’)  row(sa)=row(s’a)

E (experiments)

States (actions from init state)

T: Observations

S

Transitions

SxA

UCLA Data Mining Short Course

L* Algorithm

• Initialize T for  and each action in A

• Loop Use membership queries to make T complete, closed, and consistent If EQ(T)=w /* an counter example */ then add w and all its prefixes into S;Until EQ(T)=yes.

UCLA Data Mining Short Course

• A counter example ftf for M3 (Fig 5.3), the model ends at rose, but the real observation is volcano

• An inconsistency in T4 (Tab 5.5), where row(f)=row(ft), but row(ff)  row(ftf).

UCLA Data Mining Short Course

• L* is limited by a reset button

• Homing Sequence h: if two observation sequences of executing h are the same, then these two sequences lead to the same state

• Let q<h> be observation sequence, andqh the ending state, then h is defined as

• for all p, q: [p<h>=q<h>] [ph=qh]

• e.g., {fwd} is a homing seq for Little Prince

UCLA Data Mining Short Course

• Every FDA has a homing sequence

• Can be constructed from a FDA by appending actions (<n) that distinguish a pair of states

• The length of this construction is n2

• There are FDA whose shortest h is n2 long

• h can be used as a reset

• h cannot guarantee go to a fixed state

UCLA Data Mining Short Course

L* with a Homing Sequence h

• Every time a reset is needed, repeat h until you see the desired observation sequence

• Or for each possible observation sequence of h, make a copy of L* (see Fig 5.6)

UCLA Data Mining Short Course

• If h is not a homing sequence, then we may discover that the same observation sequence  produced by executing h may lead us to two different states, p and q, for there is a sequence of actions x that p<x> q<x>

• then, a better approximation of homing sequence is hx

UCLA Data Mining Short Course

L* + Learning h

• Assume a homing sequence h, initially h=

• When h is shown to be incorrect, extend h, and discard all copies of L* and start again

• When h is incorrect, then there exists x such that qh<x>ph<x>, even if q<h>=p<h>

UCLA Data Mining Short Course

Learning h and the Model

• Revist and Shapire’s algorithm (Fig 5.7)

• Little Prince Example (notice the inconsistency produced by ff in Fig 5.10)

UCLA Data Mining Short Course