1 / 23

Block Interaction: A Generative Summarization Scheme for Frequent Patterns

Block Interaction: A Generative Summarization Scheme for Frequent Patterns. Ruoming Jin Kent State University Joint work with Yang Xiang (OSU) , Hui Hong (KSU) and Kun Huang (OSU). Frequent Pattern Mining. Summarizing the underlying datasets, providing key insights

Download Presentation

Block Interaction: A Generative Summarization Scheme for Frequent Patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Block Interaction: A Generative Summarization Schemefor Frequent Patterns Ruoming Jin Kent State University Joint work with Yang Xiang (OSU),Hui Hong (KSU) and Kun Huang (OSU)

  2. Frequent Pattern Mining • Summarizing the underlying datasets, providing key insights • Key building block for data mining toolbox • Association rule mining • Classification • Clustering • Change Detection • etc… • Application Domains • Business, biology, chemistry, WWW, computer/networing security, software engineering, …

  3. The Problem • The number of patterns is too large • Attempt • Maximal Frequent Itemsets • Closed Frequent Itemsets • Non-Derivable Itemsets • Compressed or Top-k Patterns • … • Tradeoff • Significant Information Loss • Large Size

  4. Pattern Summarization • Using a small number of itemsets to best represent the entire collection of frequent itemsets • The Spanning Set Approach [Afrati-Gionis-Mannila, KDD04] • Exact Description = Maximal Frequent Itemsets • Our problem: • Can we find a concise representation which can allow both exact and approximate summarization of a collection of frequent itemsets?

  5. Block-Interaction Model • Summarize a collection of frequent itemsets and provide accurate support information using only a small number of frequent itemsets. • Two operators: Vertical Union, Horizontal Union.

  6. Vertical Operator

  7. Horizontal Operator

  8. (2X2) Block-Interaction Model

  9. Minimal 2X2 Block Model Problem • Given the (2×2) block interaction model, our goal is to provide a generative view of an entire collection of itemsets Fα using only a small set of core blocks B.

  10. NP-Hardness

  11. NP-Hardness

  12. Two Stage Approach

  13. Two Stage Approach

  14. Algorithm Stage1: Block Vertical Union Stage2: Block Horizontal Union

  15. Experiment • 1. How does our block interaction model(B.I.) compare with the state-of-art summarization schemes, including Maximal Frequent Itemsets [19](MFI), Close Frequent Itemsets [21](CFI), Non-Derivable Frequent Itemsets [9](NDI), and Representative pattern [29](δ-Cluster). • 2. How do different parameters, including α, ϵ, and ϵ1 affect the conciseness of the block modeling, i.e., the number of core blocks?

  16. Experiment Setup • Group 1: In the first group of experiments, we vary the support level α for each dataset with a fixed user-preferred accuracy level ϵ (either 5% or 10%) and fix ϵ1 = ϵ/2 . • Group 2: In the second group of experiments, we study how userpreferred accuracy level ϵ would affect the model conciseness (the number of core blocks). Here, we vary ϵ generally in the range from 0.1 to 0.2 with a fixed support level α and ϵ1 = ϵ/2 . • Group 3: In the third group of experiments, we study how the distribution of accuracy level ϵ1 in the two stages would affect the model conciseness. We vary ϵ1 between 0.1ϵ and 0.9ϵ with fixed support level α and the overall accuracy level ϵ.

  17. Data Description

  18. Group1 Results

  19. Group2 Results

  20. Group3 Results

  21. Case Study

  22. Thanks!!!Questions?

  23. Related Work • K-itemset approximation: [Afrati04]. • Difference: • their work is a special case of our work; • their work is expensive for exact description; • Our work use set cover and max-k cover methods. • Restoring the frequency of frequent itemsets: [Yan05, Wang06, Jin08]. • Hyperrectangle covering problem: [Xiang08].

More Related