1 / 24

Association Rules

Association Rules. Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li. Fuzzy Association Rules. Association rules mining provides information to assess significant correlations in large databases IF X THEN Y Initial data mining analysis

wallsa
Download Presentation

Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Association Rules Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li

  2. Fuzzy Association Rules • Association rules mining provides information to assess significant correlations in large databases • IF X THEN Y • Initial data mining analysis • Not predictive • SUPPORT: degree to which relationship appears in data • CONFIDENCE: probability that if X, then Y

  3. Association Rule Algorithms • APriori • Agrawal et al., 1993; Agrawal & Srikant, 1994 • Find correlations among transactions, binary values • Weighted association rules • Cai et al., 1998; Lu et al. 2001 • Cardinal data • Srikant & Agrawal, 1996 • Partitions attribute domain, combines adjacent partitions until binary

  4. Fuzzy Analysis Deal with vagueness & uncertainty • Fuzzy Set Theory • Zadeh [1965] • Probability Theory • Pearl [1988] • Rough Set Theory • Pawlak [1982] • Set Pair Theory • Zhao [2000]

  5. Fuzzy Association Rules • Most based on APriori algorithm • Treat all attributes as uniform • Can increase number of rules by decreasing minimum support, decreasing minimum confidence • Generates many uninteresting rules • Software takes a lot longer

  6. Gyenesei (2000) • Studied weighted quantitative association rules in fuzzy domain • With & without normalization • NONNORMALIZED • Used product operator to define combined weight and fuzzy value • If weight small, support level small, tends to have data overflow • NORMALIZED • Used geometric mean of item weights as combined weight • Support then very small

  7. Algorithm • Get membership functions, minimum support, minimum confidence • Assign weight to each fuzzy membership for each attribute (categorical) • Calculate support for each fuzzy region • If support > minimum, OK • If confidence > minimum, OK • If both OK, generate rules

  8. Demo Model: Loan App

  9. Membership value 1.2 1 0.8 0.6 0.4 0.2 0 Age 0 25 35 40 50 100 Young Middle Old Figure 2: The membership functions of attibute Age Fuzzified Age

  10. Fuzzify Age

  11. Calculate Support for Each Pair of Fuzzy Categories • Membership value • Identify weights for each attribute • Identify highest fuzzy membership category for each case • Membership value = minimum weight associated with highest fuzzy membership category • Support • Average membership value for all cases

  12. Support by Single Item

  13. Support • If support for pair of categories is above minimum support, retain • Identifies all pairs of fuzzy categories with sufficiently strong relationship • For outcomes, R51(On Time) strong, R52(Default) not

  14. Support by Pair: minsup 0.25

  15. Support by Triplet: minsup 0.25

  16. Quartets • None qualify, so algorithm stops

  17. Confidence • Identify direction • For those training set cases involving the pair of attributes, what proportion came out as predicted?

  18. Confidence Values: PairsMinimum confidence 0.9

  19. 4 Rules • IF Income is Middle THEN Outcome is On-Time • R22→R51 support 0.490 confidence 0.916 • IF Credit is Good THEN Outcome is On-Time • R41→R51 support 0.576 confidence 0.972 • IF Income is Middle AND Credit is Good THEN Outcome is On-Time • R22R41→R51support 0.419 confidence 0.995 • IF Risk is High AND Credit is Good THEN Outcome is On-Time • R31R41→R51support 0.266 confidence 0.993

  20. Rules vs. Support

  21. Rules vs. Confidence

  22. Higher order combinations • Try triplets • If ambitious, sets of 4, and beyond • Here, none • Problems: • Computational complexity explodes • Doesn’t guarantee total coverage • That also would explode complexity • Can control by lowering minsup, minconf

  23. Simulation Testing • Selected 550 cases • Held out 100 • Randomly assigned weights to each fuzzy region of each attribute • minsup {0.35, 0.45, 0.55, 0.65} • minconf {0.7, 0.8, 0.9}

  24. Simulation Results

More Related