This presentation is the property of its rightful owner.
1 / 30

# Pattern Evaluation and Process Control PowerPoint PPT Presentation

Pattern Evaluation and Process Control. Wei-Min Shen Information Sciences Institute University of Southern California. Outline. Intuition of Interestingness Principles for Measuring Interestingness Existing Measurement Systems Minimal Description Length Principle

Pattern Evaluation and Process Control

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Pattern Evaluation and Process Control

Wei-Min Shen

Information Sciences Institute

University of Southern California

UCLA Data Mining Short Course

### Outline

• Intuition of Interestingness

• Principles for Measuring Interestingness

• Existing Measurement Systems

• Minimal Description Length Principle

• Methods for Process Control

UCLA Data Mining Short Course

### Why Is a Pattern “Interesting”?

• I did not know X before

• It contradicts my thinking (surprise)

• It is supported by the majority of the data

• It is an exception of the usual cases

• Occam’s Razor: Simple is better

• More?

UCLA Data Mining Short Course

### The Types of Classification Rule

• Let h be a hypothesis and e the evidence, then respect to any given tuple, we have

• Characteristic rule: he

• Discriminate rule: eh

• e and h can be interpreted as sets of tuples satisfying e and h respectively

UCLA Data Mining Short Course

### A Few Definitions

• Given a discriminate rule R: eh

• |e| is the cover of the rule

• |he|/|e| is the confidence, reliability, orcertainty factor of the rule

• R is “X% complete”: if |he|/|h| = X% (e satisfies X% of |h|)

• R is “Y% discriminate”: if |¬he|/|¬h| = (100-Y)% (e satisfies (100-Y)% of |¬h|)

UCLA Data Mining Short Course

### Principles for Measuring “I”

• 1. I = 0 if h and e are statistically independent

• e and h have no relation at all

• 2. I monotonically with |he| when |h|, |¬h|, and |e| remain the same

• I relates to reliability

UCLA Data Mining Short Course

### Principles for Measuring “I”

• 3. I monotonically with |h|(or|e|) when |he|, |e| (or |h|), and |¬h| remain the same

• I relates to completeness

• 4. I monotonically with |e| when reliability |he|/|e|, |h|, and |¬h| remain the same

• I relates to cover when reliability is the same

UCLA Data Mining Short Course

### Treat Discriminate and Characteristic Rules Differently

• Principles 1,2,3,4 apply to both discriminate and characteristic rules

• 5.Treat discriminate and characteristic rules differently

• RuleEHDiscrimComplete

• A FeverFlu80%30%

• BSneezeFlu30%80%

• As discriminate rule I(A) > I(B)

• As characteristic rule I(B) > I(A)

UCLA Data Mining Short Course

### Existing Measurement Systems

• RI (Piatetsky-Shapiro 91)

• J (Smyth and Goodman 92)

• CE (Hong and Mao 91)

• IC++ (Kamber and Shinghal 96)

UCLA Data Mining Short Course

### IC++ Measurement for Characteristic Rules

• Given h, e, let rule d: eh and rule c: he

• Nec(d) = P(¬e|h)/P(¬e|¬h)

• Suf(d) = P(e|h)/P(e|¬h)

• for he, C++= if 0Nec(d)<1 then (1-Nec(d))*P(h), else 0.

• for h¬e, C+-= if 0Suf(d)<1 then (1-Suf(d))*P(h), else 0.

• for¬he, C-+= if 0<Nec(d)< then (1-1/Nec(d))*P(¬h), else 0.

• for¬h¬e, C--= if 0<Suf(d)< then (1-1/Suf(d))*P(¬h), else 0.

UCLA Data Mining Short Course

### Minimal Description Length Principle

• The goodness of a theory or hypothesis (H) relative to a set a data (D) is measured:

• The sum of

• The length of H

• The length of explanation of D using H

• Assuming both use the optimal coding schema

UCLA Data Mining Short Course

### The Derivation of MDL

• Based on probability theory, the best hypothesis H with respect to D is:

• the max of P(H)P(D|H)

• or the max of logP(H) + logP(D|H)

• or the min of -logP(H) - logP(D|H)

• Since the optimal encode of a set is related to the probability of the elements, so we have MDL

• the min of |coding1(H)| + |coding2(D|H)|

UCLA Data Mining Short Course

### An Illustration of MDL

One line theory:

explanation length = 294.9

Two line theory:

explanation length = 298.7

UCLA Data Mining Short Course

### Fit Points with Lines

• Theory = lines (#,angle,length,center)

• Explanation: for each point:

• the line it belongs to

• the position on the line

• the distance to line

• Notice that the current coding is (x,y)

• It is different if we choose coding (r,theta)

• UCLA Data Mining Short Course

### Process Control

• The Goal: to predict future from past

• The Given: the past data sequence

• The methods:

• Chaotic theory

• State Machines

UCLA Data Mining Short Course

### Chaotic Theory

• The data sequence may appear chaotic

• The underlying model may be very simple

• Extreme sensitive to initial condition

• Difficult to make long term prediction

• Short term prediction is possible

UCLA Data Mining Short Course

### An Example Chaotic Sequence

1.0

s(k)

0.5

0.0

20

40

60

80

100

Time step k

The simple logistic map model:

sk+1= ask (1 - sk), where a=4

UCLA Data Mining Short Course

### Steps of Using Chaotic Theory

• Reconstruction of state space:

• xk = [xk, xk-, …, xk-(m-1)]T

• where  is a time delay, m is the embedding dimension

• Taken’s theorem:, one can always find an embedding dimension m2[d]+1, where [d] is the integer part of the attractor’s dimension, to preserve the invariant measures

• Central task: chose m and 

UCLA Data Mining Short Course

### State Machine Approach

• Identify the number of states by clustering all points in the sequence

• Construct a transition function by learning from the sequence

UCLA Data Mining Short Course

### Construction & Synchronization

• Environment = (A, P, Q, r) where |P|<|Q|

• Model = (A, P, S, t)

• Visibly equivalent

• Perfect

• Synchronized

• The Construction problem

• when and how to construct new model states

• The Synchronization problem

• how to determine which model state is current

UCLA Data Mining Short Course

### Learning with a Reset Button

• Two environmental states p and q (they may appear the same to the learner) are different if and only if there exists a sequence e of actions that leads from p and q to states that are visibly different

• The interaction with the environment

• Membership Query

• Equivalence Query: “yes” or a counter example

UCLA Data Mining Short Course

### Observation Table

• Model states: {row(s) : s in S}

• Initial state: row(l)

• Final state: {row(s) : s in S and T(s)=1

• Transitions: (row(s),a) = row(sa)

• Closed table: s,as’ row(sa)=row(s’)

• Consistent table: row(s)=row(s’)  row(sa)=row(s’a)

E (experiments)

States (actions from init state)

T: Observations

S

Transitions

SxA

UCLA Data Mining Short Course

### L* Algorithm

• Initialize T for  and each action in A

• Loop Use membership queries to make T complete, closed, and consistent If EQ(T)=w /* an counter example */ then add w and all its prefixes into S;Until EQ(T)=yes.

UCLA Data Mining Short Course

### The Little Prince Example

• A counter example ftf for M3 (Fig 5.3), the model ends at rose, but the real observation is volcano

• An inconsistency in T4 (Tab 5.5), where row(f)=row(ft), but row(ff)  row(ftf).

UCLA Data Mining Short Course

### Homing Sequence

• L* is limited by a reset button

• Homing Sequence h: if two observation sequences of executing h are the same, then these two sequences lead to the same state

• Let q<h> be observation sequence, andqh the ending state, then h is defined as

• for all p, q: [p<h>=q<h>] [ph=qh]

• e.g., {fwd} is a homing seq for Little Prince

UCLA Data Mining Short Course

### Properties of Homing Seq

• Every FDA has a homing sequence

• Can be constructed from a FDA by appending actions (<n) that distinguish a pair of states

• The length of this construction is n2

• There are FDA whose shortest h is n2 long

• h can be used as a reset

• h cannot guarantee go to a fixed state

UCLA Data Mining Short Course

### L* with a Homing Sequence h

• Every time a reset is needed, repeat h until you see the desired observation sequence

• Or for each possible observation sequence of h, make a copy of L* (see Fig 5.6)

UCLA Data Mining Short Course

### Learning the Homing Sequence

• If h is not a homing sequence, then we may discover that the same observation sequence  produced by executing h may lead us to two different states, p and q, for there is a sequence of actions x that p<x> q<x>

• then, a better approximation of homing sequence is hx

UCLA Data Mining Short Course

### L* + Learning h

• Assume a homing sequence h, initially h=

• When h is shown to be incorrect, extend h, and discard all copies of L* and start again

• When h is incorrect, then there exists x such that qh<x>ph<x>, even if q<h>=p<h>

UCLA Data Mining Short Course

### Learning h and the Model

• Revist and Shapire’s algorithm (Fig 5.7)

• Little Prince Example (notice the inconsistency produced by ff in Fig 5.10)

UCLA Data Mining Short Course