Pattern evaluation and process control
Download
1 / 30

Pattern Evaluation and Process Control - PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on

Pattern Evaluation and Process Control. Wei-Min Shen Information Sciences Institute University of Southern California. Outline. Intuition of Interestingness Principles for Measuring Interestingness Existing Measurement Systems Minimal Description Length Principle

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Pattern Evaluation and Process Control' - doria


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Pattern evaluation and process control

Pattern Evaluation and Process Control

Wei-Min Shen

Information Sciences Institute

University of Southern California

UCLA Data Mining Short Course


Outline
Outline

  • Intuition of Interestingness

  • Principles for Measuring Interestingness

  • Existing Measurement Systems

  • Minimal Description Length Principle

  • Methods for Process Control

UCLA Data Mining Short Course


Why is a pattern interesting
Why Is a Pattern “Interesting”?

  • I did not know X before

  • It contradicts my thinking (surprise)

  • It is supported by the majority of the data

  • It is an exception of the usual cases

  • Occam’s Razor: Simple is better

  • More?

UCLA Data Mining Short Course


The types of classification rule
The Types of Classification Rule

  • Let h be a hypothesis and e the evidence, then respect to any given tuple, we have

    • Characteristic rule: he

    • Discriminate rule: eh

  • e and h can be interpreted as sets of tuples satisfying e and h respectively

UCLA Data Mining Short Course


A few definitions
A Few Definitions

  • Given a discriminate rule R: eh

    • |e| is the cover of the rule

    • |he|/|e| is the confidence, reliability, orcertainty factor of the rule

  • R is “X% complete”: if |he|/|h| = X% (e satisfies X% of |h|)

  • R is “Y% discriminate”: if |¬he|/|¬h| = (100-Y)% (e satisfies (100-Y)% of |¬h|)

UCLA Data Mining Short Course


Principles for measuring i
Principles for Measuring “I”

  • 1. I = 0 if h and e are statistically independent

    • e and h have no relation at all

  • 2. I monotonically with |he| when |h|, |¬h|, and |e| remain the same

    • I relates to reliability

UCLA Data Mining Short Course


Principles for measuring i1
Principles for Measuring “I”

  • 3. I monotonically with |h|(or|e|) when |he|, |e| (or |h|), and |¬h| remain the same

    • I relates to completeness

  • 4. I monotonically with |e| when reliability |he|/|e|, |h|, and |¬h| remain the same

    • I relates to cover when reliability is the same

UCLA Data Mining Short Course


Treat discriminate and characteristic rules differently
Treat Discriminate and Characteristic Rules Differently

  • Principles 1,2,3,4 apply to both discriminate and characteristic rules

  • 5.Treat discriminate and characteristic rules differently

    • Rule E H Discrim Complete

    • A Fever Flu 80% 30%

    • B Sneeze Flu 30% 80%

  • As discriminate rule I(A) > I(B)

  • As characteristic rule I(B) > I(A)

UCLA Data Mining Short Course


Existing measurement systems
Existing Measurement Systems

  • RI (Piatetsky-Shapiro 91)

  • J (Smyth and Goodman 92)

  • CE (Hong and Mao 91)

  • IC++ (Kamber and Shinghal 96)

UCLA Data Mining Short Course


Ic measurement for characteristic rules
IC++ Measurement for Characteristic Rules

  • Given h, e, let rule d: eh and rule c: he

  • Nec(d) = P(¬e|h)/P(¬e|¬h)

  • Suf(d) = P(e|h)/P(e|¬h)

  • for he, C++= if 0Nec(d)<1 then (1-Nec(d))*P(h), else 0.

  • for h¬e, C+-= if 0Suf(d)<1 then (1-Suf(d))*P(h), else 0.

  • for¬he, C-+= if 0<Nec(d)< then (1-1/Nec(d))*P(¬h), else 0.

  • for¬h¬e, C--= if 0<Suf(d)< then (1-1/Suf(d))*P(¬h), else 0.

UCLA Data Mining Short Course


Minimal description length principle
Minimal Description Length Principle

  • The goodness of a theory or hypothesis (H) relative to a set a data (D) is measured:

    • The sum of

      • The length of H

      • The length of explanation of D using H

    • Assuming both use the optimal coding schema

UCLA Data Mining Short Course


The derivation of mdl
The Derivation of MDL

  • Based on probability theory, the best hypothesis H with respect to D is:

    • the max of P(H)P(D|H)

    • or the max of logP(H) + logP(D|H)

    • or the min of -logP(H) - logP(D|H)

  • Since the optimal encode of a set is related to the probability of the elements, so we have MDL

    • the min of |coding1(H)| + |coding2(D|H)|

UCLA Data Mining Short Course


An illustration of mdl
An Illustration of MDL

One line theory:

explanation length = 294.9

Two line theory:

explanation length = 298.7

UCLA Data Mining Short Course


Fit points with lines
Fit Points with Lines

  • Theory = lines (#,angle,length,center)

  • Explanation: for each point:

    • the line it belongs to

    • the position on the line

    • the distance to line

  • Notice that the current coding is (x,y)

  • It is different if we choose coding (r,theta)

  • UCLA Data Mining Short Course


    Process control
    Process Control

    • The Goal: to predict future from past

    • The Given: the past data sequence

    • The methods:

      • Adaptive Control Theory

      • Chaotic theory

      • State Machines

    UCLA Data Mining Short Course


    Chaotic theory
    Chaotic Theory

    • The data sequence may appear chaotic

    • The underlying model may be very simple

    • Extreme sensitive to initial condition

    • Difficult to make long term prediction

    • Short term prediction is possible

    UCLA Data Mining Short Course


    An example chaotic sequence
    An Example Chaotic Sequence

    1.0

    s(k)

    0.5

    0.0

    20

    40

    60

    80

    100

    Time step k

    The simple logistic map model:

    sk+1= ask (1 - sk), where a=4

    UCLA Data Mining Short Course


    Steps of using chaotic theory
    Steps of Using Chaotic Theory

    • Reconstruction of state space:

      • xk = [xk, xk-, …, xk-(m-1)]T

      • where  is a time delay, m is the embedding dimension

    • Taken’s theorem:, one can always find an embedding dimension m2[d]+1, where [d] is the integer part of the attractor’s dimension, to preserve the invariant measures

    • Central task: chose m and 

    UCLA Data Mining Short Course


    State machine approach
    State Machine Approach

    • Identify the number of states by clustering all points in the sequence

    • Construct a transition function by learning from the sequence

    UCLA Data Mining Short Course


    Construction synchronization
    Construction & Synchronization

    • Environment = (A, P, Q, r) where |P|<|Q|

    • Model = (A, P, S, t)

      • Visibly equivalent

      • Perfect

      • Synchronized

    • The Construction problem

      • when and how to construct new model states

    • The Synchronization problem

      • how to determine which model state is current

    UCLA Data Mining Short Course


    Learning with a reset button
    Learning with a Reset Button

    • Two environmental states p and q (they may appear the same to the learner) are different if and only if there exists a sequence e of actions that leads from p and q to states that are visibly different

    • The interaction with the environment

      • Membership Query

      • Equivalence Query: “yes” or a counter example

    UCLA Data Mining Short Course


    Observation table
    Observation Table

    • Model states: {row(s) : s in S}

    • Initial state: row(l)

    • Final state: {row(s) : s in S and T(s)=1

    • Transitions: (row(s),a) = row(sa)

    • Closed table: s,as’ row(sa)=row(s’)

    • Consistent table: row(s)=row(s’)  row(sa)=row(s’a)

    E (experiments)

    States (actions from init state)

    T: Observations

    S

    Transitions

    SxA

    UCLA Data Mining Short Course


    L algorithm
    L* Algorithm

    • Initialize T for  and each action in A

    • Loop Use membership queries to make T complete, closed, and consistent If EQ(T)=w /* an counter example */ then add w and all its prefixes into S;Until EQ(T)=yes.

    UCLA Data Mining Short Course


    The little prince example
    The Little Prince Example

    • A counter example ftf for M3 (Fig 5.3), the model ends at rose, but the real observation is volcano

    • An inconsistency in T4 (Tab 5.5), where row(f)=row(ft), but row(ff)  row(ftf).

    UCLA Data Mining Short Course


    Homing sequence
    Homing Sequence

    • L* is limited by a reset button

    • Homing Sequence h: if two observation sequences of executing h are the same, then these two sequences lead to the same state

    • Let q<h> be observation sequence, andqh the ending state, then h is defined as

    • for all p, q: [p<h>=q<h>] [ph=qh]

    • e.g., {fwd} is a homing seq for Little Prince

    UCLA Data Mining Short Course


    Properties of homing seq
    Properties of Homing Seq

    • Every FDA has a homing sequence

    • Can be constructed from a FDA by appending actions (<n) that distinguish a pair of states

    • The length of this construction is n2

    • There are FDA whose shortest h is n2 long

    • h can be used as a reset

    • h cannot guarantee go to a fixed state

    UCLA Data Mining Short Course


    L with a homing sequence h
    L* with a Homing Sequence h

    • Every time a reset is needed, repeat h until you see the desired observation sequence

    • Or for each possible observation sequence of h, make a copy of L* (see Fig 5.6)

    UCLA Data Mining Short Course


    Learning the homing sequence
    Learning the Homing Sequence

    • If h is not a homing sequence, then we may discover that the same observation sequence  produced by executing h may lead us to two different states, p and q, for there is a sequence of actions x that p<x> q<x>

    • then, a better approximation of homing sequence is hx

    UCLA Data Mining Short Course


    L learning h
    L* + Learning h

    • Assume a homing sequence h, initially h=

    • When h is shown to be incorrect, extend h, and discard all copies of L* and start again

    • When h is incorrect, then there exists x such that qh<x>ph<x>, even if q<h>=p<h>

    UCLA Data Mining Short Course


    Learning h and the model
    Learning h and the Model

    • Revist and Shapire’s algorithm (Fig 5.7)

    • Little Prince Example (notice the inconsistency produced by ff in Fig 5.10)

    UCLA Data Mining Short Course


    ad