slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Johanna GOLD PowerPoint Presentation
Download Presentation
Johanna GOLD

Loading in 2 Seconds...

play fullscreen
1 / 107

Johanna GOLD - PowerPoint PPT Presentation


  • 140 Views
  • Uploaded on

Rough Sets Theory Logical Analysis of Data. Monday , November 26, 2007. Johanna GOLD. Introduction. Comparison of two theories for rules induction. Different methodologies Same results?. Generalities. Set of objects described by attributes. Each object belongs to a class.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Johanna GOLD' - kaori


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Rough Sets Theory

Logical Analysis of Data.

Monday, November 26, 2007

Johanna GOLD

slide2

Introduction

  • Comparison of two theories for rules induction.
  • Different methodologies
  • Same results?
slide3

Generalities

  • Set of objects described by attributes.
  • Each object belongs to a class.
  • We want decision rules.
slide4

Approaches

  • There are two approaches:
    • Rough Sets Theory (RST)
    • Logical Analysis of Data (LAD)
  • Goal : compare them
slide5

Contents

Rough Sets Theory

Logical Analysis Of data

Comparison

Inconsistencies

slide6

Inconsistencies

  • Two examples having the exact same values in all attributes, but belonging to two different classes.
  • Example: two sick people have the same symptomas but different disease.
slide7

Covered by RST

  • RST doesn’t correct or aggregate inconsistencies.
  • For each class : determination of lower and upper approximations.
slide8

Approximations

  • Lower : objects we are sure they belong to the class.
  • Upper : objects than can belong to the class.
slide9

Impact on rules

  • Lower approximation → certain rules
  • Upper approximation → possible rules
slide10

Pretreatment

  • Rules induction on numerical data → poor rules → too many rules.
  • Need of pretreatment.
slide11

Discretization

  • Goal : convert numerical data into discrete data.
  • Principle : determination of cut points in order to divide domains into successive intervals.
slide12

Algorithms

  • First algorithm: LEM2
  • Improved algorithms:
    • Include the pretreatment
    • MLEM2, MODLEM, …
slide13

LEM2

  • Induction of certain rules from the lower approximation.
  • Induction of possible rules from the upper approximation.
  • Same procedure
slide14

Definitions (1)

  • For an attribute x and its value v, a block [(x,v)] of attribute-value pair (x,v) is all the cases where the attribute x has the value v.
  • Ex : [(Age,21)]=[Martha]

[(Age,22)]=[David ; Audrey]

slide15

Definitions (2)

  • Let B be a non-empty lower or upper approximation of a concept represented by a decision-value pair (d,w).
  • Ex : (level,middle)→B=[obj1 ; obj5 ; obj7]
slide16

Definitions (3)

  • Let T be a set of pairs attribute-value (a,v).
  • Set B depends on set T if and only if:
slide17

Definitions (4)

  • A set T is minimal complex of B if and only if B depends on T and there is no subset T’ of T such as B depends on T’.
slide18

Definitions (5)

  • Let T be a non-empty collection of non-empty set of attribute-value pairs.
    • T is a set of T.
    • T is a set of (a,v).
slide19

Definitions (6)

  • T is a local cover of B if and only if:
    • Each member T of T is a minimal complex of B.
    • T is minimal
slide20

Algorithm

principle

  • LEM2’s output is a local cover for each approximation of the decision table concept.
  • It then convert them into decision rules.
slide22

Heuristics details

Among the possible blocks, we choose the one:

  • With the highest priority
  • With the highest intersection
  • With the smallest cardinal
slide23

Heuristics details

  • As long as it is not a minimal complex, pairs are added.
  • As long as there is not a local cover, minimal complexes are added.
slide24

Illustration

  • Illustration through an example.
  • We consider that the pretreatment has already been done.
slide26

Cut points

  • For the attribute Height, we have the values 160, 170 and 180.
  • The pretreatment gives us two cut points: 165 and 175.
slide27

Blocks [(a,v)]

  • [(Height, 160..165)]={1,3,5}
  • [(Height, 165..180)]={2,4}
  • [(Height, 160..175)]={1,2,3,5}
  • [(Height, 175..180)]={4}
  • [(Hair, Blond)]={1,2}
  • [(Hair, Red)]={3}
  • [(Hair, Black)]={4,5,6}
slide28

First concept

  • G = B = [(Attraction,-)] = {1,4,5,6}
  • Here there is no inconsistencies. If there were some, it’s at this point that we have to chose between the lower and the upper approximation.
slide29

Eligible pairs

  • Pair (a,v) such as [(a,v)]∩[(Attraction,-)]≠Ø
    • (Height,160..165)
    • (Height,165..180)
    • (Height,160..175)
    • (Height,175..180)
    • (Hair,Blond)
    • (Hair,Black)
slide30

Choice of a pair

  • We chose the most appropriate, which is to say (a,v) for which

| [(a,v)] ∩ [(Attraction,-)] |

is the highest.

  • Here : (Hair, Black)
slide31

Minimal complex

  • The pair (Hair, Black) is a minimal complex because:
slide32

New concept

  • B = [(Attraction,-)] – [(Hair,Black)]

= {1,4,5,6} - {4,5,6}

= {1}

slide33

Choice of a pair (1)

  • Through the pairs (Height,160..165), (Height,160..175) and (Hair, Blond).
  • Intersections having the same cardinality, we chose the pair having the smallest cardinal:

(Hair, Blond)

slide34

Choice of a pair (2)

  • Problem :
  • (Hair, Blond) is non a minimal complex.
  • We chose the following pair:

(Height,160..165).

slide35

Minimal Complex

  • {(Hair, Blond),(Height,160..165)} is a second minimal complex.
slide36

End of the concept

  • {{(Hair, Black)}, {(Hair, Blond), (Height, 160..165)}}

is a local cover of [(Attraction,-)].

slide37

Rules

  • (Hair, Red) → (Attraction,+)
  • (Hair, Blond) & (Height,165..180 ) → (Attraction,+)
  • (Hair, Black) → (Attraction,-)
  • (Hair, Blond) & (Height,160..165 ) → (Attraction,-)
slide38

Contents

Rough Sets Theory

Logical Analysis Of data

Comparison

Inconsistencies

slide39

Principle

  • Work on binary data.
  • Extension of boolean approach on non-binary case.
slide40

Definitions (1)

  • Let S be the set of all observations.
  • Each observation is described by n attributes.
  • Each observation belongs to a class.
slide41

Definitions (2)

  • The classification can be considered as a partition into two sets
  • An archive is represented by a boolean function Φ :
slide42

Definitions (3)

  • A literal is a boolean variable or its negation:
  • A term is a conjunction of literals :
  • The degree of a term is the number of literals.
slide43

Definitions (4)

  • A term Tcovers a point

if T(p)=1.

  • A characteristic term of a point p is the unique term of degree n covering p.
  • Ex :
slide44

Definitions (5)

  • A term T is an implicant of a boolean function f if T(p) ≤ f(p) for all
  • An implicant is called prime if it is minimal (its degree).
slide45

Definitions (6)

  • A positive prime patternis a term covering at least one positive example and no negative example.
  • A negative prime patternis a term covering at least one negative example and no positive example.
slide47

Example

  • is a positive pattern :
    • There is no negative example such as
    • There is one positive example : the 3rd line.
  • It's a positive prime pattern :
    • covers one negative example : 4th line.
    • covers one negative example : 5th line.
slide48

Pattern generation

  • symmetry between positive and negative patterns.
  • Two approaches :
    • Top-down
    • Bottom-up
slide49

Top-down

  • we associate each positive example to its characteristic term→ it’s a pattern.
  • we take out the literals one by one until having a prime pattern.
slide50

Bottom-up

  • we begin with terms of degree one:
    • if it does not cover a negative example, it is a pattern
    • If not, we add literals until having a pattern.
slide51

Objectives

  • We prefer short pattern → simplicity principle.
  • we also want to cover the maximum of examples with only one model → globality principle.
  • hybrid approach bottom-up – top-down.
slide52

Hybrid approach

  • We fix a degree D.
  • We start by a bottom-up approach to generate the models of degree lower or equal to D.
  • For all the points which are not covered by the 1st phase, we proceed to the top-down approach.
slide53

Extension to the non binary case

  • Extension from binary case : binerization.
  • Two types of data :
    • quantitative : age, height, …
    • qualitative : color, shape, …
slide54

Qualitative data

  • For each value v that a qualitative attribute x can be, we associate a boolean variable b(x,v) :
    • b(x,v) = 1 if x = v
    • b(x,v) = 0 otherwise
slide55

Quantitative data

  • there are two types of associated variables:
    • Level variables
    • Interval variables
slide56

Level variables

  • For each attribute x and each cut point t, we introduce a boolean variable b(x,t) :
    • b(x,t) = 1 if x ≥ t
    • b(x,t) = 0 if x < t
slide57

Intervals variables

  • For each attribute x and each pair of cut points t’, t’’ (t’<t’’), we introduce a boolean variable b(x,t’,t’’) :
    • b(x,t’,t’’) = 1 if t’ ≤ x < t’’
    • b(x,t’,t’’) = 0 otherwise
slide66

Supporting set

  • A set of binary attributes is called supporting set if the archive obtained by the elimination of all the other attributes will remained "contradiction-free".
  • A supporting set is irredundant if there is no subset of it which is a supporting set.
slide67

Variables

  • We associate to the attribute a variable

such as if the attribute belongs to the supporting set.

  • Application : elements a and e are different on attributes 1, 2, 4, 6, 9, 11, 12 and 13 :
slide68

Linear program

  • We do the same for all pairs of true and false observations :
  • Exponential number of solutions : we choose the smallest set :
slide69

Solution of

our example

  • Positive patterns :
  • Negative patterns :
slide70

Contents

Rough Sets Theory

Logical Analysis Of data

Comparison

Inconsistencies

slide71

Basic idea

  • LAD more flexible than RST
  • Linear program -> modification of parameters
slide72

Comparisonblocks / variables

  • RST : couples (attribute, value)
  • LAD : binary variables
  • Correspondence?
slide73

Qualitative data

  • For an attribute a taking the values :
slide74

Quantitative data

  • Discretization : convert numerical data into discrete data.
  • Principle : determination of cut points in order to divide domains into successive intervals :
slide75

Quantitative data

  • RST : for each cut point, we have two blocks :
slide76

Quantitative data

  • LAD : for each cut point, we have a level variable :
    • ...
slide77

Quantitative data

  • LAD : for each pair of cut points, we have a interval variable :
    • ...
slide78

Quantitative data

  • Correspondence :
    • Level variable :
slide79

Quantitative data

  • Correspondence :
    • Interval variable :
slide80

Variation of LP parameters

  • Three parameters can change :
    • Right hand side of constraints:
    • coefficients of the objective function:
    • coefficients of the left hand side of the constraints:
slide81

Heuristics adaptation

  • We try to adapt the three heuristics :
    • The highest priority
    • The highest intersection with the concept
    • The smallest cardinality
slide82

The highest priority

  • Priority on blocks -> priority on attributes
  • Introduction as weights in the objective function
  • Minimization : choice of pairs with first priorities
slide83

The highest intersection

  • Pb : in LAD, no notion of concept ; everything is done symmetrically, the same time.
slide84

The highest intersection

  • Modification of the heuristic : difference between the intersection with a concept and the intersection with the other.
  • The highest, the better.
slide85

The highest intersection

  • Goal of RST : find minimal complexes:
    • Find blocks covering the most examples of the concept : highest possible intersection with the concept
    • Find blocks covering the less examples of the other concept : difference of intersections
slide86

The highest intersection

  • For LAD : difference between the number of times a variable takes the value 1 in

and in .

  • Introduction as weights in the constraints : we choose first the variable with the highest difference.
slide87

The smallest cardinality

  • Simple : number of times a variable takes the value 1.
  • Introduction as weight in the constraints.
slide88

Weight of the constraints

  • Two calculations to be introduced :
    • The highest difference
    • The smallest cardinality
  • Difference of the two calculations
slide89

Right hand side of the constraints

  • Before : everything is 1.
  • Pb : modification of the weights of the left hand side has no signification.
slide90

Ideas of modification

  • Average of compared to the number of attributes.
  • Average of in each constraint
  • Inconvenient : not a real signification
slide91

Ideas of modification

  • Not touch the weight in the constraints: introduce everything in the coefficients of the objective function:
slide92

Contents

Rough Sets Theory

Logical Analysis Of data

Comparison

Inconsistencies

slide93

For RST

  • Use of two approximations : lower and upper.
  • Rules generation: sure and possible.
slide94

For LAD

  • Classification mistakes: positive point classified as negative or the other way.
  • Two different cases.
slide95

Pos. Point

classified as neg.

  • All other points are well classify : our point will not be covered.
  • If the number of non covered points is high: generation of longer patterns.
  • If this number is small : erroneous classification and we forgot the points for the following.
slide96

Neg. Point

classified as pos.

  • Terms covering a lot of positive points : also some negative points.
  • Probably wrongly classified : not taken into account for the evaluation of candidates terms.
slide97

Ratio

  • We introduce a ratio.
  • A term is still candidate if the ratio between negative and positive points is smallest than:
slide98

Inconsistenciesand mistakes

  • An inconsistence can be considered as a mistake of classification
  • Inconsistence : two « identical » objects differently classified.
  • One of them is wrongly classified (approximations)
slide99

Equivalence?

  • Let consider an inconsistence in LAD :
    • two points :
    • two classes :
  • There are two possibilities :
    • is not covered by small degree patterns
    • is covered by patterns of
slide100

1st case

  • We have only one inconsistence.
  • The covered point is isolated ; it’s not taken into account.
  • Patterns of will be generated without the inconsistence point

-> lower approximation

slide101

2nd case

  • A point covered by the other concept patterns is wrongly classified.
  • It’s not taken into account for the candidate terms.
  • It’s not taken into account for the pattern generation of

-> lower approximation

slide102

2nd case

  • Not taken into account for but not a problem for
  • For : upper approximation
slide103

Equivalence?

  • According to a ratio, LAD decide if a point is well classified or not.
  • For an inconsistence, it’s the same as consider:
    • The upper approximation of a class
    • The lower approximation of the other
  • On more than 1 inconsistence : we re-classify the points.
slide104

Conclusion

  • Complete data : we can try to match LAD and RST.
  • Inconsistencies : classification mistakes of LAD can correspond to approximations.
  • Missing data : different management
slide105

Sources (1)

  • Jerzy W. Grzymala-Busse, MLEM2 - Discretization During Rule Induction, Proceedings of the IIPWM'2003, International Conference on Intelligent Information Processing and WEB Mining Systems, Zakopane, Poland, June 2-5, 2003, 499-508. Springer-Verlag.
  • Jerzy W. Grzymala-Busse, Jerzy Stefanowski, Three Discretization Methods for Rule Induction, International Journal of Intelligent Systems, 2001.
  • Endre Boros, Peter L. Hammer, Toshihide Ibaraki, Alexander Kogan, Eddy Mayoraz, Ilya Muchnik, An Implementation of Logical Analysis of Data, Rutcor Research Raport 22-96, 1996.
slide106

Sources (2)

  • Endre Boros, Peter L. Hammer, Toshihide Ibaraki, Alexander Kogan, Logical Analysis of Numerical Data, Rutcor Research Raport 04-97, 1997.
  • Jerzy W. Grzymala-Busse, Rough Set Strategies to Data with Missing Attribute Values,Proceedings of theWorkshop on Foundation and New Directions in Data Mining, Melbourne, FL, USA. 2003.
  • Jerzy W. Grzymala-Busse, Sachin Siddhaye, Rough Set Approaches to Rule Induction from Incomplete Data, Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based System[C],Perugia,Italy, July 4, 2004 2 : 923- 930.
slide107

Sources (3)

  • Jerzy Stefanowski, Daniel Vanderpooten, Induction of Decision Rules in Classi_cation and Discovery-Oriented Perspectives, International Journal of Intelligent Systems, 16 (1), 2001, 13-28.
  • Jerzy Stefanowski, The Rough Set based Rule Induction Technique for Classification Problems, Proceedings of 6th European Conference on Intelligent Techniques and Soft Computing EUFIT 98, Aachen 7-10 Sept., (1998) 109.113.
  • Roman Slowinski, Jerzy Stefanowski, Salvatore Greco, Benedetto Matarazzo, Rough Sets Processing of Inconsistent Information in Decision Analysis, Control and Cybernetics 29, 379±404, 2000.