Fuzzy interpretation of discretized intervals dr xindong wu
Download
1 / 33

Fuzzy Interpretation of Discretized Intervals - PowerPoint PPT Presentation


  • 291 Views
  • Updated On :

Fuzzy Interpretation of Discretized Intervals Dr. Xindong Wu. Andrea Porter April 11, 2002. Plan For Presentation. Introduction to Problem, HCV Discretization Techniques/Fuzzy Borders A Hybrid Solution for HCV Experiments and Results Conclusion. Introduction.

Related searches for Fuzzy Interpretation of Discretized Intervals

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Fuzzy Interpretation of Discretized Intervals' - jaden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Fuzzy interpretation of discretized intervals dr xindong wu

Fuzzy Interpretation of Discretized IntervalsDr. Xindong Wu

Andrea Porter

April 11, 2002


Plan for presentation
Plan For Presentation

  • Introduction to Problem, HCV

  • Discretization Techniques/Fuzzy Borders

  • A Hybrid Solution for HCV

  • Experiments and Results

  • Conclusion


Introduction
Introduction

  • Real-world data contains both numerical and nominal data, must be able to deal with different types of data.

  • Existing systems discretize numerical domains into intervals and treat intervals as nominal values during induction.

  • Problems occur if test examples are not covered in training data (no-match, multiple match)

  • The solution is a hybrid approach using fuzzy intervals for no-match problem.


HCV

  • Attribute based rule induction algorithm, extension matrix approach

    • Divide positive examples into intersecting groups

    • Find a heuristic conjunctive rule in each group that covers all PE and no NE

  • HCV can find a rule in the form of variable-valued logic

  • More compact than the decision trees/rules of ID3 and C4.5


Variable valued logic and selectors
Variable Valued Logic and Selectors

  • Represents decisions where variables can take a range

  • Selector:

    [ X # R ]

    X = attribute

    # = relational operator ( = , <, >, . . . )

    R = Reference, list of 1 or more values

    e.g [ Windy = true] [Temp > 90]


Hcv software
HCV Software

  • C++ implementation

  • Can work with noisy and real-valued domains as well as nominal and noise-free databases

  • Provides a set of deduction facilities for the user to test the accuracy of the produced rules on test examples



C4 5 results vs hcv

C4.5:The T class

X2 = b

X1 = 0 & X3 = a

X1 = 0 & X3 = b

X1 = 0 & X2 = a

C4.5 Results vs. HCV

  • HCV:The T class

  • X2 = b

  • X1 = 0 & X2 = a

  • X1 = 0 & X4 = 0

  • C4.5:The F class

    X1 = 1 & X2 = a

    • X1 = 1 & X2 = c

    • X2 = c & X3 = c


Deduction of induction results
Deduction of Induction Results

  • Induction generates knowledge from existing data

  • Deduction applies induction results to interpret new data.

  • With real-world data, induction can not be assumed to be perfect

  • Three cases:

    1) no-match (measure of fit)

    2) single-match

    3) multiple-match (estimate of probability)


Discretization
Discretization

  • Occurs during rule induction

  • Discretize numerical domains into intervals and treat similar to nominal values.

  • The challenge is to find the right borders for the intervals

  • Possible Methods:

    1) Simplest Class-Separating Method

    2) Information Gain Heuristic (implemented in HCV)


Simplest class separating method
Simplest Class- Separating Method:

  • Interval Borders are places between each adjacent pair of examples which have different classes.

  • If attribute is very informative - method is efficient and useful.

  • If attribute is not informative - method produces too many intervals


Information gain heuristic
Information Gain Heuristic

Use IGH to find more informative border.

  • x = (xi + xi+1)/2 for (i = 1, …, n-1)

  • x is a possible cut point if xi and xi+1 are of different classes.

  • Use IGH to find best x

  • Recursively split on left and right

  • To stop recursive splitting:

    1) stop if IGH is same on all possible cut points.

    2) stop if # of examples to split is less than a predefined number

    3) limit the number of intervals


Fuzzy borders
Fuzzy Borders

  • Discretization of continuous domains does not always fit accurate interpretation.

  • Instead of using sharp borders, use a membership function, measures the degree of membership.

  • A value can be classified into a few different intervals at the same time (e.g. single to multiple match)


Fuzzy borders 2
Fuzzy Borders (2)

  • Fuzzy matching - deduction with fuzzy borders of discretized intervals.

  • Take the interval with the greatest degree as the value’s discrete value.

  • 3 functions to fuzzify borders:

    1) linear

    2) polynomial

    3) arctan

  • Definitions

    s = spread parameter l = length of original

    xleft, xright = left/right sharp borders

l

xleft xright


Linear membership function

l

sl

xleft xright

Linear Membership Function

a = -kxleft + 1/2 b = kxright + 1/2

linleft(x) = kx + a

lin right(x) = -kx + b

lin(x) = MAX(0, MIN(1,linleft(x),linright(x)))

k = 1/2sl



Polynomial membership function
*Polynomial Membership Function

polyside(x) = asidex3 + bsidex2 + csidex + dside

aside = 1/(4(ls)3)

bside = -3asidexside side {left,right}

cside = 3aside(xside2 - (ls)2)

dside = -a(xside3 -3xside(ls)2 + 2(ls)3)

polyleft(x), if xleft -ls  x  xleft + ls

poly(x) = polyright(x), if xright -ls  x  xright +ls

1, if xleft +ls  x  xright -ls

0, otherwise


Match degree
Match Degree

  • Selector method - take the max membership degree of the value in all the intervals involved. If 2 adjacent intervals have the same class, values close to the border will have low membership.

  • Conjunction method - adds with fuzzy plus

    ab=a + b - ab


No match resolution
No-Match Resolution

Largest Class

  • Assign all no match examples to the largest class, the default class.

  • Works well, if the number of classes in a training set is small and one class is clearly larger.

  • Deteriorates if there is a larger number of classes and the examples are evenly distributed


No match resolution1
No-Match Resolution

Measure of Fit

Calculate the measure of fit for each class:

1) calculate MF for each selector (sel)

MF(sel, e) = 1, if sel is satisfied by e

n/|x|, otherwise

2) calculate MF for each conjunctive rule(conj)

MF(conj, e) =  MF(sel, e) * n(conj)/N


No match resolution2
No-Match Resolution

Measure of Fit (2)

3) calculate MF for each class c

MF(c, e) = MF(conj1, e) + MF(conj2, e) - MF(conj1,e)MF(conj2,e)

* For more than two rules, apply formula recursively.

* Find maximum MF - determines which class is closest to the example


Multiple match
Multiple-Match

  • Caused by over-generalization of the training examples at induction time

  • Example

    • (X1 = a, X2 = 1)

      • All PE cover X1 = a

      • All NE cover X2 = 1

      • Multiple Match


Multiple match resolution
Multiple-Match Resolution

First Hit

  • Use first rule which classifies the example

  • Produces reasonable results if the rules from induction have been ordered according to a measure of reliability

  • Advantages - straightforward, efficient

  • Disadvantages - have to sort rules at induction time


Multiple match resolution1
Multiple-Match Resolution

Largest Rule

  • Similar to largest class method from no-match resolution

  • Choose conjunctive rule that covers the most examples in the training set.


Multiple match resolution2
Multiple-Match Resolution

Estimation of Probability

  • Assign EP value to each class based on the size of the satisfied conjunctive rules.

    1) Find EP for each conjunctive rule (conj):

    EP(conj, e)= { n(conj)/N, if conj is satisfied by e

    0, otherwise

    n(conj) = number of examples covered by conj

    N = number of total examples


Multiple match resolution3
Multiple-Match Resolution

Estimation of Probability (2)

2) Find EP value for each class:

EP(c, e) = EP(conj1, e) + EP(conj2, e) - EP(conj1,e)EP(conj2,e).

* For more rules, apply formula recursively

* Choose class with highest EP value


Hybrid interpretation
Hybrid Interpretation

  • Used because fuzzy borders only add conflicts because they don’t reduce the number rules that are applicable

  • HCV - use sharp borders during induction and use fuzzy borders only during deduction

  • Algorithm:

    * Single match - use class indicated by rules

    * Multiple match - use estimation probability (EP) with sharp borders

    * No match - use fuzzy borders with polynomial membership function to find closest rule


The data
The Data

  • Used 17 databases from the Machine Learning Database Repository, U. of California, Irvine.

  • Databases selected because:

    1) All include numerical data

    2) All lead to situations where no rules clearly apply.



Results cont
Results (cont.)

  • The results shown for C4.5 and NewID are the pruned ones

    • These were usually better than the unpruned ones in this experiment

  • HCV did not fine tune different parameters because this would be loss of generality and applicability of the conclusions


Accuracy results
Accuracy Results

  • HCV(hybrid) - 9 databases

  • C4.5 (R 8) - 7 databases

  • C4.5 (R 5) - 6 databases

  • HVC (large) - 3 databases

  • HCV (fuzzy) - 2 databases


Hcv comparison
HCV Comparison

  • HCV (fuzzy) generally performs better than the simple largest class method

  • HCV’s performance improves significantly when the fuzzy borders (for no match) are combined with probability estimation (for multiple match) in HCV (hybrid)


Conclusions
Conclusions

  • Fuzzy borders are constructed and used at deduction time only when a no match case occurs.

  • This hybrid method performs more accurately than several other current deduction programs.

  • Fuzziness is strongly domain dependent, HCV allows the user to specify their own intervals and fuzzy functions.


ad