slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Daniel Delic, Hans-J. Lenz, Mattis Neiling Free University of Berlin Institute of Applied Computer Science Garystr. 21, PowerPoint Presentation
Download Presentation
Daniel Delic, Hans-J. Lenz, Mattis Neiling Free University of Berlin Institute of Applied Computer Science Garystr. 21,

Loading in 2 Seconds...

play fullscreen
1 / 30

Daniel Delic, Hans-J. Lenz, Mattis Neiling Free University of Berlin Institute of Applied Computer Science Garystr. 21, - PowerPoint PPT Presentation


  • 516 Views
  • Uploaded on

Mining Association Rules with Rough Sets and Large Itemsets - A Comparative Study Daniel Delic, Hans-J. Lenz, Mattis Neiling Free University of Berlin Institute of Applied Computer Science Garystr. 21, D-14195 Berlin, Germany Task Comparison of both methods Interesting Questions

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Daniel Delic, Hans-J. Lenz, Mattis Neiling Free University of Berlin Institute of Applied Computer Science Garystr. 21,' - Gabriel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Mining Association Rules with

Rough Sets and Large Itemsets

- A Comparative Study

Daniel Delic, Hans-J. Lenz, Mattis Neiling

Free University of Berlin

Institute of Applied Computer Science

Garystr. 21, D-14195 Berlin, Germany

slide2

Task

  • Comparison of both methods

Interesting Questions

  • Are there any differences/ similarities between the extracted rules?
  • If so:
    • Which method leads to better rules?
    • Could a combination of both procedures improve the quality of the derived rules?

Two different methods for the extraction of association rules

  • Large itemset method (e.g. Apriori)
  • Rough set method

1 INTRODUCTION

slide3

Introduction

  • Large Itemset Method
  • Rough Set Method
  • Comparison of the Procedures
  • Hybrid Procedure Apriori+
  • Summary
  • Outlook
  • References
slide4

LARGE ITEMSET METHOD

2 LARGE ITEMSET METHOD

slide5

Type of analyzable data

  • "Market basket data" Attributes with boolean domains
  • Stored in table  Each row representing a market basket

2 LARGE ITEMSET METHOD

slide6

Step 1

  • Candidate 1-Itemsets
  • Spaghetti  support = 3 = 60%
  • Tomato Sauce  support = 3 = 60%
  • Bread  support = 5 = 60%
  • Butter  support = 1 = 20%

^

^

^

^

  • Large k-Itemset generation with Apriori
  • Minimum support 40%

2 LARGE ITEMSET METHOD

slide7

Step 2

  • Large 1-Itemsets
  • Spaghetti
  • Tomato Sauce
  • Bread
  • Candidate 2-Itemsets
  • {Spaghetti, Tomato Sauce}  support = 2 = 40%
  • {Spaghetti, Bread}  support = 2 = 40%
  • {Tomato Sauce, Bread}  support = 2 = 40%

^

^

^

2 LARGE ITEMSET METHOD

slide8

Step 3

  • Large 2-Itemsets
  • {Spaghetti, Tomato Sauce}
  • {Spaghetti, Bread}
  • {Tomato Sauce, Bread}
  • Candidate 3-Itemsets
  • {Spaghetti, Tomato Sauce,Bread}  support = 1 = 20%

^

  • Large 3-Itemsets
  • { }

2 LARGE ITEMSET METHOD

slide9

Step 4

  • Association Rules

Scheme: If subset then large k-itemset

with support s and confidence c

  • s = (support of large k-itemset) / (total count of tupels)
  • c = (support of large k-itemset) / (support of subset)
  • Example
  • Total count of tupels = 5

Large 2-itemset = {Spaghetti, Tomato Sauce}

  • Support (Spaghetti, Tomato Sauce) = 2

Subsets = { {Spaghetti}, {Tomato Sauce} }

  • Support (Spaghetti) = 3
  • Support (Tomato Sauce) = 3

Scheme: If {Spaghetti} then {Spaghetti, Tomato Sauce}

Rule: If Spaghetti then Tomato Sauce

Support: s = 2 / 5 = 0,4 = 40%

Confidence: c = 2 / 3  0,66 = 66%

^

^

2 LARGE ITEMSET METHOD

slide11

Type of analyseable data

  • Attributes which can have more than two values
  • Predefined set of condition attributes and decision attribute(s)
  • Stored in table  each row containing values of the predefined attributes

3 ROUGH SET METHOD

slide12

Deriving association rules with rough sets

Step 1

Creating partitions over U

Partition:

U divided into subsets (equivalence classes)

induced by equivalence relations

3 ROUGH SET METHOD

slide13

Examples of Equivalence relations:

R1 = {(u, v)|u and v have the same temperature}

R2 = {(u, v)|u and v have the same blood pressure}

R3 = {(u, v)|u and v have the same temperature and blood pressure}

R4 = {(u, v)|u and v have the same heart problem}

3 ROUGH SET METHOD

slide14

X1

X2

X3

Partition R3*

Induced by equivalence relation R3 (based on condition attributes)

R3 = {(u, v)|u and v have the same temperature and blood pressure}

R3  R3 * = {X1, X2, X3} with

X1 = {Adams, Brown}, X2 = {Ford}, X3 = {Gill, Bellows}

3 ROUGH SET METHOD

slide15

Y1

Y2

Partition R4*

Induced by equivalence relation R4 (based on decision attribute(s))

R4 = {(u, v)|u and v have the same heart problem}

R4  R4 * = {Y1, Y2} with

Y1 = {Adams, Brown, Gill}, Y2 = {Ford, Bellows}

3 ROUGH SET METHOD

slide16

X1

Y1

X2

Y2

X3

Step 2

  • Defining the approximation space
  • overlapping the partitions created by the equivalence relations
  • Result: 3 distinct regions in the approximation space
    • Positive region: POSS(Yj) = UxiYjXi =X1
    • Boundary region: BNDS(Yj) = UxiYjXi =X3
    • Negative region: NEGS(Yj) = UxiYj=Xi =X2

3 ROUGH SET METHOD

slide17

X1

Y1

  • Rules from positive region (POSS(Yj) = UxiYjXi )
  • Example for POSS(Y1)
  • X1 = {Adams, Brown}  Y1 = {Adams, Brown, Gill}
  •  Clear rule (confidence 100%, support 40%):
  • If temperature normal and blood pressure low then heart problem no

3 ROUGH SET METHOD

slide18

Y1

X3

  • Rules from boundary region (BNDS(Yj) = UxiYjXi)
  • Example for BNDS(Y1)
  • X3 = {Gill, Bellows}  Y1 = {Adams, Brown, Gill}
  •  possible rule (confidence ?, support 20%):
  • If temperature high and blood pressure high then heart problem no
  •  confidence:c = |Xi Yj| / |Xj| = |X3 Y1| / |X3| = 1 / 2 = 0,5 = 50%

3 ROUGH SET METHOD

slide19

Y1

X2

  • Negative region (NEGS(Yj) = UxiYj=Xi)
  • Example for NEGS(Y1)
  • X2 = {Ford}  Y1 = {Adams, Brown, Gill}
  •  since X2 Y1 = , no rule derivable from the negative region

3 ROUGH SET METHOD

slide20

Reducts  Simplification of rules by removal of unecessary attributes

Original rule:

If temperature normal and blood pressure low then heart problem no

Simplified (more precise) rule:

If blood pressure low then heart problem no

3 ROUGH SET METHOD

slide22

Large Itemsets

Rough Sets

Universe

persons

Cond. attributes

blood pressure

Dec.attribute(s)

heart problem

TID

Attributes

1

spaghetti, tomato sauce

Adams

low

...

2

spaghetti, bread

Brown

medium

...

Ford

high

...

TID

spaghetti

tomato sauce

bread

TID

bp_low

bp_med

bp_high

...

1

1

1

0

1

1

0

0

...

2

1

0

1

2

0

1

0

...

3

0

0

1

...

  • Prerequisites for comparison of both methods
  • modification of rough set method (RS-Rules)

 no fixed decision attribute required (RS-Rules+)

  • Compatible data structure  Bitmaps

4 DATA TRANSFORMATION

slide23

Computing times2

Database

Car Evaluation

Mushroom

Adult

Minconfidence

10%

35%

17%

Minsupport

75%

90%

94%

Method

RS+

Apr

RS+

Apr

RS+

Apr

CPU Time [min]

3.15

1.10

15

2

233

44

  • Benchmark data sets1
  • Car Evaluation Database: 1728 tuples, 25 bitmap attributes
  • Mushroom Database: 8416 tuples, 12 original attributes selected,

68 bitmap attributes

  • Adult: 32561 tuples, 12 original attributes selected, 61 bitmap attributes
  • Results
  • almost similar results for all examined tables
  • exceptions: reducts

 Quality of rough set rules better (more precise rules)

1 UCI Repository of Machine Learning Database and Domain Theories (URL: ftp.ics.uci.edu/pub/machine-learning-databases

2 Algorithms written in Visual Basic 6.0, executed on Win98 PC with AMD K6-2/400 processor

5 COMPARISON OF THE PROCEDURES

slide24

HYBRID PROCEDURE

Apriori+

6 HYBRID PROCEDURE Apriori+

slide25

Computing Times

minutes

  • Hybrid Method Apriori+
  • based on Apriori
  • capable of extracting reducts
  • capable of deriving rules based on predefined decision attribute
  • Comparison Results (Apriori+ compared to RS-Rules+)
  • identical rules

6 HYBRID PROCEDURE Apriori+

slide27

creation of a compatible datatype for both methods

  • comparison of both methods
  • RS-Rules+ derived rules that were more precise (due to reducts) than those derived by Apriori
  • Apriori+ derived same rules as RS-Rules+
  • Computing times in favor of the large itemset methods

Conclusion: Combination of both original methods best solution

7 CONCLUSION

slide29

More Interesting Capabilities of Rough Sets

  • Analysing dependencies between rules
  • Analysing the impact of one special condition attribute on the

decision attribute(s)

Idea

Enhancing the data mining capabilities of Apriori+ by those further

rough set features

 Result: A powerful and efficient data mining application (?)

8 OUTLOOK

references
References

Agrawal, R. and Srikant, S. (1994). Fast Algorithms for Mining Association Rules in Large Databases. In: VLDB’94, 487–499. Morgan Kaufmann.

Düntsch, I. and Gediga G. (1999). Rough set data analysis.

Munakata, T. (1998). Rough Sets. In: Fundamentals of the New Artificial Intelligence, 140–182. New York: Springer-Verlag.