- By
**kaori** - Follow User

- 137 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Johanna GOLD' - kaori

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

- Comparison of two theories for rules induction.
- Different methodologies
- Same results?

- Set of objects described by attributes.
- Each object belongs to a class.
- We want decision rules.

- There are two approaches:
- Rough Sets Theory (RST)
- Logical Analysis of Data (LAD)
- Goal : compare them

- Two examples having the exact same values in all attributes, but belonging to two different classes.
- Example: two sick people have the same symptomas but different disease.

- RST doesn’t correct or aggregate inconsistencies.
- For each class : determination of lower and upper approximations.

- Lower : objects we are sure they belong to the class.
- Upper : objects than can belong to the class.

- Lower approximation → certain rules
- Upper approximation → possible rules

- Rules induction on numerical data → poor rules → too many rules.
- Need of pretreatment.

- Goal : convert numerical data into discrete data.
- Principle : determination of cut points in order to divide domains into successive intervals.

- First algorithm: LEM2
- Improved algorithms:
- Include the pretreatment
- MLEM2, MODLEM, …

- Induction of certain rules from the lower approximation.
- Induction of possible rules from the upper approximation.
- Same procedure

- For an attribute x and its value v, a block [(x,v)] of attribute-value pair (x,v) is all the cases where the attribute x has the value v.
- Ex : [(Age,21)]=[Martha]

[(Age,22)]=[David ; Audrey]

- Let B be a non-empty lower or upper approximation of a concept represented by a decision-value pair (d,w).
- Ex : (level,middle)→B=[obj1 ; obj5 ; obj7]

- Let T be a set of pairs attribute-value (a,v).
- Set B depends on set T if and only if:

- A set T is minimal complex of B if and only if B depends on T and there is no subset T’ of T such as B depends on T’.

- Let T be a non-empty collection of non-empty set of attribute-value pairs.
- T is a set of T.
- T is a set of (a,v).

- T is a local cover of B if and only if:
- Each member T of T is a minimal complex of B.
- T is minimal

principle

- LEM2’s output is a local cover for each approximation of the decision table concept.
- It then convert them into decision rules.

Among the possible blocks, we choose the one:

- With the highest priority
- With the highest intersection
- With the smallest cardinal

- As long as it is not a minimal complex, pairs are added.
- As long as there is not a local cover, minimal complexes are added.

- Illustration through an example.
- We consider that the pretreatment has already been done.

- For the attribute Height, we have the values 160, 170 and 180.
- The pretreatment gives us two cut points: 165 and 175.

- [(Height, 160..165)]={1,3,5}
- [(Height, 165..180)]={2,4}
- [(Height, 160..175)]={1,2,3,5}
- [(Height, 175..180)]={4}
- [(Hair, Blond)]={1,2}
- [(Hair, Red)]={3}
- [(Hair, Black)]={4,5,6}

- G = B = [(Attraction,-)] = {1,4,5,6}
- Here there is no inconsistencies. If there were some, it’s at this point that we have to chose between the lower and the upper approximation.

- Pair (a,v) such as [(a,v)]∩[(Attraction,-)]≠Ø
- (Height,160..165)
- (Height,165..180)
- (Height,160..175)
- (Height,175..180)
- (Hair,Blond)
- (Hair,Black)

- We chose the most appropriate, which is to say (a,v) for which

| [(a,v)] ∩ [(Attraction,-)] |

is the highest.

- Here : (Hair, Black)

- The pair (Hair, Black) is a minimal complex because:

- Through the pairs (Height,160..165), (Height,160..175) and (Hair, Blond).
- Intersections having the same cardinality, we chose the pair having the smallest cardinal:

(Hair, Blond)

- Problem :
- (Hair, Blond) is non a minimal complex.
- We chose the following pair:

(Height,160..165).

- {(Hair, Blond),(Height,160..165)} is a second minimal complex.

- {{(Hair, Black)}, {(Hair, Blond), (Height, 160..165)}}

is a local cover of [(Attraction,-)].

- (Hair, Red) → (Attraction,+)
- (Hair, Blond) & (Height,165..180 ) → (Attraction,+)
- (Hair, Black) → (Attraction,-)
- (Hair, Blond) & (Height,160..165 ) → (Attraction,-)

- Work on binary data.
- Extension of boolean approach on non-binary case.

- Let S be the set of all observations.
- Each observation is described by n attributes.
- Each observation belongs to a class.

- The classification can be considered as a partition into two sets
- An archive is represented by a boolean function Φ :

- A literal is a boolean variable or its negation:
- A term is a conjunction of literals :
- The degree of a term is the number of literals.

- A term Tcovers a point

if T(p)=1.

- A characteristic term of a point p is the unique term of degree n covering p.
- Ex :

- A term T is an implicant of a boolean function f if T(p) ≤ f(p) for all
- An implicant is called prime if it is minimal (its degree).

- A positive prime patternis a term covering at least one positive example and no negative example.
- A negative prime patternis a term covering at least one negative example and no positive example.

- is a positive pattern :
- There is no negative example such as
- There is one positive example : the 3rd line.
- It's a positive prime pattern :
- covers one negative example : 4th line.
- covers one negative example : 5th line.

- symmetry between positive and negative patterns.
- Two approaches :
- Top-down
- Bottom-up

- we associate each positive example to its characteristic term→ it’s a pattern.
- we take out the literals one by one until having a prime pattern.

- we begin with terms of degree one:
- if it does not cover a negative example, it is a pattern
- If not, we add literals until having a pattern.

- We prefer short pattern → simplicity principle.
- we also want to cover the maximum of examples with only one model → globality principle.
- hybrid approach bottom-up – top-down.

- We fix a degree D.
- We start by a bottom-up approach to generate the models of degree lower or equal to D.
- For all the points which are not covered by the 1st phase, we proceed to the top-down approach.

Extension to the non binary case

- Extension from binary case : binerization.
- Two types of data :
- quantitative : age, height, …
- qualitative : color, shape, …

- For each value v that a qualitative attribute x can be, we associate a boolean variable b(x,v) :
- b(x,v) = 1 if x = v
- b(x,v) = 0 otherwise

- there are two types of associated variables:
- Level variables
- Interval variables

- For each attribute x and each cut point t, we introduce a boolean variable b(x,t) :
- b(x,t) = 1 if x ≥ t
- b(x,t) = 0 if x < t

- For each attribute x and each pair of cut points t’, t’’ (t’<t’’), we introduce a boolean variable b(x,t’,t’’) :
- b(x,t’,t’’) = 1 if t’ ≤ x < t’’
- b(x,t’,t’’) = 0 otherwise

- A set of binary attributes is called supporting set if the archive obtained by the elimination of all the other attributes will remained "contradiction-free".
- A supporting set is irredundant if there is no subset of it which is a supporting set.

- We associate to the attribute a variable

such as if the attribute belongs to the supporting set.

- Application : elements a and e are different on attributes 1, 2, 4, 6, 9, 11, 12 and 13 :

- We do the same for all pairs of true and false observations :
- Exponential number of solutions : we choose the smallest set :

- LAD more flexible than RST
- Linear program -> modification of parameters

- RST : couples (attribute, value)
- LAD : binary variables
- Correspondence?

- For an attribute a taking the values :

- Discretization : convert numerical data into discrete data.
- Principle : determination of cut points in order to divide domains into successive intervals :

- RST : for each cut point, we have two blocks :

- LAD : for each cut point, we have a level variable :
- ...

- LAD : for each pair of cut points, we have a interval variable :
- ...

- Correspondence :
- Level variable :

- Correspondence :
- Interval variable :

- Three parameters can change :
- Right hand side of constraints:
- coefficients of the objective function:
- coefficients of the left hand side of the constraints:

- We try to adapt the three heuristics :
- The highest priority
- The highest intersection with the concept
- The smallest cardinality

- Priority on blocks -> priority on attributes
- Introduction as weights in the objective function
- Minimization : choice of pairs with first priorities

- Pb : in LAD, no notion of concept ; everything is done symmetrically, the same time.

- Modification of the heuristic : difference between the intersection with a concept and the intersection with the other.
- The highest, the better.

- Goal of RST : find minimal complexes:
- Find blocks covering the most examples of the concept : highest possible intersection with the concept
- Find blocks covering the less examples of the other concept : difference of intersections

- For LAD : difference between the number of times a variable takes the value 1 in

and in .

- Introduction as weights in the constraints : we choose first the variable with the highest difference.

- Simple : number of times a variable takes the value 1.
- Introduction as weight in the constraints.

- Two calculations to be introduced :
- The highest difference
- The smallest cardinality
- Difference of the two calculations

Right hand side of the constraints

- Before : everything is 1.
- Pb : modification of the weights of the left hand side has no signification.

- Average of compared to the number of attributes.
- Average of in each constraint
- Inconvenient : not a real signification

- Not touch the weight in the constraints: introduce everything in the coefficients of the objective function:

- Use of two approximations : lower and upper.
- Rules generation: sure and possible.

- Classification mistakes: positive point classified as negative or the other way.
- Two different cases.

classified as neg.

- All other points are well classify : our point will not be covered.
- If the number of non covered points is high: generation of longer patterns.
- If this number is small : erroneous classification and we forgot the points for the following.

classified as pos.

- Terms covering a lot of positive points : also some negative points.
- Probably wrongly classified : not taken into account for the evaluation of candidates terms.

- We introduce a ratio.
- A term is still candidate if the ratio between negative and positive points is smallest than:

- An inconsistence can be considered as a mistake of classification
- Inconsistence : two « identical » objects differently classified.
- One of them is wrongly classified (approximations)

- Let consider an inconsistence in LAD :
- two points :
- two classes :
- There are two possibilities :
- is not covered by small degree patterns
- is covered by patterns of

- We have only one inconsistence.
- The covered point is isolated ; it’s not taken into account.
- Patterns of will be generated without the inconsistence point

-> lower approximation

- A point covered by the other concept patterns is wrongly classified.
- It’s not taken into account for the candidate terms.
- It’s not taken into account for the pattern generation of

-> lower approximation

- Not taken into account for but not a problem for
- For : upper approximation

- According to a ratio, LAD decide if a point is well classified or not.
- For an inconsistence, it’s the same as consider:
- The upper approximation of a class
- The lower approximation of the other
- On more than 1 inconsistence : we re-classify the points.

- Complete data : we can try to match LAD and RST.
- Inconsistencies : classification mistakes of LAD can correspond to approximations.
- Missing data : different management

- Jerzy W. Grzymala-Busse, MLEM2 - Discretization During Rule Induction, Proceedings of the IIPWM'2003, International Conference on Intelligent Information Processing and WEB Mining Systems, Zakopane, Poland, June 2-5, 2003, 499-508. Springer-Verlag.
- Jerzy W. Grzymala-Busse, Jerzy Stefanowski, Three Discretization Methods for Rule Induction, International Journal of Intelligent Systems, 2001.
- Endre Boros, Peter L. Hammer, Toshihide Ibaraki, Alexander Kogan, Eddy Mayoraz, Ilya Muchnik, An Implementation of Logical Analysis of Data, Rutcor Research Raport 22-96, 1996.

- Endre Boros, Peter L. Hammer, Toshihide Ibaraki, Alexander Kogan, Logical Analysis of Numerical Data, Rutcor Research Raport 04-97, 1997.
- Jerzy W. Grzymala-Busse, Rough Set Strategies to Data with Missing Attribute Values,Proceedings of theWorkshop on Foundation and New Directions in Data Mining, Melbourne, FL, USA. 2003.
- Jerzy W. Grzymala-Busse, Sachin Siddhaye, Rough Set Approaches to Rule Induction from Incomplete Data, Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based System[C],Perugia,Italy, July 4, 2004 2 : 923- 930.

- Jerzy Stefanowski, Daniel Vanderpooten, Induction of Decision Rules in Classi_cation and Discovery-Oriented Perspectives, International Journal of Intelligent Systems, 16 (1), 2001, 13-28.
- Jerzy Stefanowski, The Rough Set based Rule Induction Technique for Classification Problems, Proceedings of 6th European Conference on Intelligent Techniques and Soft Computing EUFIT 98, Aachen 7-10 Sept., (1998) 109.113.
- Roman Slowinski, Jerzy Stefanowski, Salvatore Greco, Benedetto Matarazzo, Rough Sets Processing of Inconsistent Information in Decision Analysis, Control and Cybernetics 29, 379±404, 2000.

Download Presentation

Connecting to Server..