Loading in 5 sec....

Pattern Evaluation and Process ControlPowerPoint Presentation

Pattern Evaluation and Process Control

- 116 Views
- Uploaded on
- Presentation posted in: General

Pattern Evaluation and Process Control

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Pattern Evaluation and Process Control

Wei-Min Shen

Information Sciences Institute

University of Southern California

UCLA Data Mining Short Course

Outline

- Intuition of Interestingness
- Principles for Measuring Interestingness
- Existing Measurement Systems
- Minimal Description Length Principle
- Methods for Process Control

UCLA Data Mining Short Course

Why Is a Pattern “Interesting”?

- I did not know X before
- It contradicts my thinking (surprise)
- It is supported by the majority of the data
- It is an exception of the usual cases
- Occam’s Razor: Simple is better
- More?

UCLA Data Mining Short Course

The Types of Classification Rule

- Let h be a hypothesis and e the evidence, then respect to any given tuple, we have
- Characteristic rule: he
- Discriminate rule: eh

- e and h can be interpreted as sets of tuples satisfying e and h respectively

UCLA Data Mining Short Course

A Few Definitions

- Given a discriminate rule R: eh
- |e| is the cover of the rule
- |he|/|e| is the confidence, reliability, orcertainty factor of the rule

- R is “X% complete”: if |he|/|h| = X% (e satisfies X% of |h|)
- R is “Y% discriminate”: if |¬he|/|¬h| = (100-Y)% (e satisfies (100-Y)% of |¬h|)

UCLA Data Mining Short Course

Principles for Measuring “I”

- 1. I = 0 if h and e are statistically independent
- e and h have no relation at all

- 2. I monotonically with |he| when |h|, |¬h|, and |e| remain the same
- I relates to reliability

UCLA Data Mining Short Course

Principles for Measuring “I”

- 3. I monotonically with |h|(or|e|) when |he|, |e| (or |h|), and |¬h| remain the same
- I relates to completeness

- 4. I monotonically with |e| when reliability |he|/|e|, |h|, and |¬h| remain the same
- I relates to cover when reliability is the same

UCLA Data Mining Short Course

Treat Discriminate and Characteristic Rules Differently

- Principles 1,2,3,4 apply to both discriminate and characteristic rules
- 5.Treat discriminate and characteristic rules differently
- RuleEHDiscrimComplete
- A FeverFlu80%30%
- BSneezeFlu30%80%

- As discriminate rule I(A) > I(B)
- As characteristic rule I(B) > I(A)

UCLA Data Mining Short Course

Existing Measurement Systems

- RI (Piatetsky-Shapiro 91)
- J (Smyth and Goodman 92)
- CE (Hong and Mao 91)
- IC++ (Kamber and Shinghal 96)

UCLA Data Mining Short Course

IC++ Measurement for Characteristic Rules

- Given h, e, let rule d: eh and rule c: he
- Nec(d) = P(¬e|h)/P(¬e|¬h)
- Suf(d) = P(e|h)/P(e|¬h)
- for he, C++= if 0Nec(d)<1 then (1-Nec(d))*P(h), else 0.
- for h¬e, C+-= if 0Suf(d)<1 then (1-Suf(d))*P(h), else 0.
- for¬he, C-+= if 0<Nec(d)< then (1-1/Nec(d))*P(¬h), else 0.
- for¬h¬e, C--= if 0<Suf(d)< then (1-1/Suf(d))*P(¬h), else 0.

UCLA Data Mining Short Course

Minimal Description Length Principle

- The goodness of a theory or hypothesis (H) relative to a set a data (D) is measured:
- The sum of
- The length of H
- The length of explanation of D using H

- Assuming both use the optimal coding schema

- The sum of

UCLA Data Mining Short Course

The Derivation of MDL

- Based on probability theory, the best hypothesis H with respect to D is:
- the max of P(H)P(D|H)
- or the max of logP(H) + logP(D|H)
- or the min of -logP(H) - logP(D|H)

- Since the optimal encode of a set is related to the probability of the elements, so we have MDL
- the min of |coding1(H)| + |coding2(D|H)|

UCLA Data Mining Short Course

An Illustration of MDL

One line theory:

explanation length = 294.9

Two line theory:

explanation length = 298.7

UCLA Data Mining Short Course

Fit Points with Lines Notice that the current coding is (x,y) It is different if we choose coding (r,theta)

- Theory = lines (#,angle,length,center)
- Explanation: for each point:
- the line it belongs to
- the position on the line
- the distance to line

UCLA Data Mining Short Course

Process Control

- The Goal: to predict future from past
- The Given: the past data sequence
- The methods:
- Adaptive Control Theory
- Chaotic theory
- State Machines

UCLA Data Mining Short Course

Chaotic Theory

- The data sequence may appear chaotic
- The underlying model may be very simple
- Extreme sensitive to initial condition
- Difficult to make long term prediction
- Short term prediction is possible

UCLA Data Mining Short Course

An Example Chaotic Sequence

1.0

s(k)

0.5

0.0

20

40

60

80

100

Time step k

The simple logistic map model:

sk+1= ask (1 - sk), where a=4

UCLA Data Mining Short Course

Steps of Using Chaotic Theory

- Reconstruction of state space:
- xk = [xk, xk-, …, xk-(m-1)]T
- where is a time delay, m is the embedding dimension

- Taken’s theorem:, one can always find an embedding dimension m2[d]+1, where [d] is the integer part of the attractor’s dimension, to preserve the invariant measures
- Central task: chose m and

UCLA Data Mining Short Course

State Machine Approach

- Identify the number of states by clustering all points in the sequence
- Construct a transition function by learning from the sequence

UCLA Data Mining Short Course

Construction & Synchronization

- Environment = (A, P, Q, r) where |P|<|Q|
- Model = (A, P, S, t)
- Visibly equivalent
- Perfect
- Synchronized

- The Construction problem
- when and how to construct new model states

- The Synchronization problem
- how to determine which model state is current

UCLA Data Mining Short Course

Learning with a Reset Button

- Two environmental states p and q (they may appear the same to the learner) are different if and only if there exists a sequence e of actions that leads from p and q to states that are visibly different
- The interaction with the environment
- Membership Query
- Equivalence Query: “yes” or a counter example

UCLA Data Mining Short Course

Observation Table

- Model states: {row(s) : s in S}
- Initial state: row(l)
- Final state: {row(s) : s in S and T(s)=1
- Transitions: (row(s),a) = row(sa)
- Closed table: s,as’ row(sa)=row(s’)
- Consistent table: row(s)=row(s’) row(sa)=row(s’a)

E (experiments)

States (actions from init state)

T: Observations

S

Transitions

SxA

UCLA Data Mining Short Course

L* Algorithm

- Initialize T for and each action in A
- Loop Use membership queries to make T complete, closed, and consistent If EQ(T)=w /* an counter example */ then add w and all its prefixes into S;Until EQ(T)=yes.

UCLA Data Mining Short Course

The Little Prince Example

- A counter example ftf for M3 (Fig 5.3), the model ends at rose, but the real observation is volcano
- An inconsistency in T4 (Tab 5.5), where row(f)=row(ft), but row(ff) row(ftf).

UCLA Data Mining Short Course

Homing Sequence

- L* is limited by a reset button
- Homing Sequence h: if two observation sequences of executing h are the same, then these two sequences lead to the same state
- Let q<h> be observation sequence, andqh the ending state, then h is defined as
- for all p, q: [p<h>=q<h>] [ph=qh]
- e.g., {fwd} is a homing seq for Little Prince

UCLA Data Mining Short Course

Properties of Homing Seq

- Every FDA has a homing sequence
- Can be constructed from a FDA by appending actions (<n) that distinguish a pair of states
- The length of this construction is n2
- There are FDA whose shortest h is n2 long
- h can be used as a reset
- h cannot guarantee go to a fixed state

UCLA Data Mining Short Course

L* with a Homing Sequence h

- Every time a reset is needed, repeat h until you see the desired observation sequence
- Or for each possible observation sequence of h, make a copy of L* (see Fig 5.6)

UCLA Data Mining Short Course

Learning the Homing Sequence

- If h is not a homing sequence, then we may discover that the same observation sequence produced by executing h may lead us to two different states, p and q, for there is a sequence of actions x that p<x> q<x>
- then, a better approximation of homing sequence is hx

UCLA Data Mining Short Course

L* + Learning h

- Assume a homing sequence h, initially h=
- When h is shown to be incorrect, extend h, and discard all copies of L* and start again
- When h is incorrect, then there exists x such that qh<x>ph<x>, even if q<h>=p<h>

UCLA Data Mining Short Course

Learning h and the Model

- Revist and Shapire’s algorithm (Fig 5.7)
- Little Prince Example (notice the inconsistency produced by ff in Fig 5.10)

UCLA Data Mining Short Course