1 / 21

Hash-Based Algorithm for Mining Association Rules

Hash-Based Algorithm for Mining Association Rules. Data Mining. Mining Association Rules. Mining Association Rules. Mining Association Rules Support Obtain Large Itemset Confidence Generate Association Rules. Apriori - رويكرد مبتني بر

Download Presentation

Hash-Based Algorithm for Mining Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hash-Based Algorithm for Mining Association Rules

  2. Data Mining • Mining Association Rules

  3. Mining Association Rules • Mining Association Rules • Support • Obtain Large Itemset • Confidence • Generate Association Rules

  4. Apriori -رويكرد مبتني بر • ابتدا در ميان مجموعه ساختار هاي داده شده به دنبال زيرساختارهاي متناوبي با اندازه Apriori در رويكرد مبتني بر • كوچك مي گرديم. پس از آن در هر مرحله با يك نود به يك زير ساختار متناوب، زير ساختار جديدي ايجاد مي شود. • براي افزودن نودها به يك زير ساختار متناوب، تنها نودهايي مورد استفاده قرار م يگيرند كه در مرحله اول به عنوان • نود متناوب شناخته شده باشند. با ايجاد زير ساختار جديد، مجموعه ساختارها براي مشخص شدن تناوب يا عدم • تناوب زيرساختار جديد مورد پويش قرار م يگيرد.

  5. D C1 Scan D Scan D Scan D L1 Apriori C2 Sup=2 C2 L2 C3 C3 L3

  6. Apriori Cont. • Disadvantages • Inefficient • Produce much more useless candidates

  7. DHP • Prune useless candidates in advance • Reduce database size at each iteration DHP Direct Hashing with Efficient Pruning for Fast Data Mining

  8. D Min sup=2 H{[x y]}=((order of x )*10+(order of y)) mod 7; Hash table H2 Hash address Bit vector The number of items hashed to bucket 0

  9. Perfect Hashing Schemes (PHS) for Mining Association Rules

  10. Motivation • Apriori and DHP produce Ci from Li-1 that may be the bottleneck • Collisions in DHP • Designing a perfect hashing function for every transaction databases is a thorny problem

  11. Definition • Definition. A Join operation is to join two different (k-1)-itemsets, , respectively, to produces a k-itemset, where • = p1p2…pk-1 • = q1q2…qk-1 and • p2=q1, p3=q2,…,pk-2=qk-3, pk-1=qk-2. • Example: ABC, BCD • 3-itemsets of ABCD: ABC, ABD, ACD, BCD • only one pair that satisfies the join definition

  12. Algorithm • PHS (Perfect Hashing and Data Shrinking)

  13. TID Encoding TID Itemsets Items (BC) A Items (BD) B (BE) C (CD) D (CE) (DE) L1 100 100 (CD) ACD Original (BC) (BE) (CD) (CE) Support 2 1 3 2 2 1 200 200 (BC) (BE)(CE) BCE 300 300 (BC)(BD)(BE)(CD)(CE)(DE) BCDE 400 BE 400 (BE) Example1 (sup=2)

  14. TID Encoding Itemsets Items (AB) A (AC) (AD) (BC) (BD) (CD) 100 Original Null (AD) Support 0 1 2 0 0 0 200 (AD) 300 (AC)(AD) 400 Null Example2 (sup=2) Decode -> (BC)(CE) = BCE

  15. Problem on Hash Table • Consider a database contains p transactions, which are comprised of unique items and are of equal length N, and the minimum support of 1. Loading density :

  16. Itemsets (AB) (AC) A (AD) (BC) B (BD) C (CD) Hash Table C D Null Null Support 0 1 2 0 0 0 Count 1 2 How to Improve the Loading Density • Two level perfect hash scheme (parital hash)

  17. Experiments

  18. Experiments

  19. Experiments

  20. Conclusions We examined in this paper the issue of mining association rules among items in a large database of sales transactions. The problem of discovering large itemsets was solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement

More Related