510 likes | 796 Views
Mining Frequent Patterns. Jiawei Han, Jian Pei and Yiwen Yin School of Computer Science Simon Fraser University. Without Candidate Generation. Afsoon Yousefi. CS:332, March 24 th , 2014 Inspired by Song Wang slides. Outline. Problem of mining frequent Pattern Review of Apriori
E N D
Mining Frequent Patterns Jiawei Han, Jian Pei and Yiwen Yin School of Computer Science Simon Fraser University Without Candidate Generation AfsoonYousefi CS:332, March 24th, 2014 Inspired by Song Wang slides
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Problem of mining frequent Pattern • Frequent pattern mining plays an essential role in mining associations. • Most of the previous studies, adopt an Apriori-like approach. • Achieves good performance but suffers from:
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Review of Apriori • Knowing the minimum support threshold • Use frequent (k-1)-itemsets • generate candidates of frequent k-itemsets • Scan database and count each pattern in • Get frequent k-itemsets
Review of Apriori • Bottleneck of the Apriori-like method is at the • Candidate set generation • Test • How to avoid generating a huge set of candidates? • A novel compact data structure, called FP-tree • FP-tree based pattern fragment growth mining method • Employing a divide-and-conquer search method for frequent itemsets combinations
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Frequent Pattern Tree: An Example • Minimum support threshold • One scan of DB to identify the set of frequent items • Items are ordered in frequency descending order • For convenience, the frequent itemsets of each transaction is listed in this ordering
Frequent Pattern Tree: An Example • One scan of DB to identify the set of frequent items • Store the set of frequent items of each transaction in a tree • Create a “null” root • Scan the DB for second time • Add the paths which are the ordered frequent items • Share the path until a different item comes up • Branch and create a sub-path root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1
Frequent Pattern Tree: An Example • One scan of DB to identify the set of frequent items • Store the set of frequent items of each transaction in a tree • To facilitate tree traversal, build item header table • Nodes with the same item-name are linked root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Frequent Pattern Tree: Design and Construction • One root • Aset of item prefix subtrees as the children of the root • Afrequent-item header table • The tree consist of • Each node in the tree has three fields • Each entry in the frequent-item header table consist of • Item-name • Count • Node-link • Item-name • Head of node-link
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Frequent Pattern Tree: Properties • Constructing FP-tree • Needs exactly two scans of DB • First to collect the set of frequent items • Second to construct the FP-tree • The cost of inserting transaction is • is the number of frequent items in • Completeness • the FP-tree contains all the information related to mining frequent patterns • given the minimum support threshold • Compactness • The size of the tree is bounded by the occurrences of frequent items • The height of the tree is bounded by the maximum number of items in a transaction
Frequent Pattern Tree: Properties • The frequent itemsets of transactions have descending order • An example for unordered itemsets root root m:2 p:3 f:4 c:1 c:1 b:1 m:2 b:1 c:3 b:1 b:1 b:1 a:2 a:2 c:1 a:3 p:1 p:1 c:1 c:2 m:2 b:1 f:2 f:2 p:2 m:1
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Mining Frequent Patterns Using FP-tree • Examine the mining process by starting from the bottom of the header table • Collect all the patterns that node participates • Starting from ’s head in the header table and following ’s node-links
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Mining Frequent Patterns Using FP-tree: An Example • Node p (p:3) • FP-tree paths <f:4 , c:3 , a:3 , m:2 , p:2> , <c:1 , b:1 , p:1> • Conditional pattern base {(f:2 , c:2 , a:2 , m:2), (c:1 , b:1)} • Construction of a FP-tree on these • just keep the frequent items root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1
Mining Frequent Patterns Using FP-tree: An Example • Node m (m:3) • FP-tree paths <f:4 , c:3 , a:3 , m:2 > , < f:4 , c:3 , a:3 , b:1 , m:1 > • Conditional pattern base {(f:2 , c:2 , a:2 ), (f:1 , c:1 , a:1 , b:1)} • Construction of a FP-tree on these • just keep the frequent items • create the tree root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1
Mining Frequent Patterns Using FP-tree: An Example • Node b (b:3) • FP-tree paths <f:4 , c:3 , a:3 , b:1 > , < f:4 , b:1 > , < c:1 , b:1 > • Conditional pattern base {(f:1 , c:1 , a:1 ), (f:1), (c:1)} • Construction of a FP-tree on these • just keep the frequent items • create the tree root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1
Mining Frequent Patterns Using FP-tree: An Example • Node a (a:3) • FP-tree paths <f:4 , c:3 , a:3 > • Conditional pattern base {(f:3 , c:3)} • Construction of a FP-tree on these • just keep the frequent items • create the tree root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1
Mining Frequent Patterns Using FP-tree: An Example • Node c (c:4) • FP-tree paths <f:4 , c:3> , <c:1> • Conditional pattern base {(f:3)} • Construction of a FP-tree on these • just keep the frequent items • create the tree root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1
Mining Frequent Patterns Using FP-tree: An Example • Node f (f:4) • FP-tree paths <f:4 > • Conditional pattern base {()} • Construction of a FP-tree on these • just keep the frequent items • create the tree root f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Mining Frequent Patterns Using FP-tree: Design and construction
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Mining Frequent Patterns Using FP-tree : Properties • To calculate the frequent patterns containing in path • Only consider prefix sub-path of node in • The frequency count of every node in tat sub-path is the same as node • Suppose FP-tree has a single path • The complete set of the frequent patterns of FP-tree can be generated by • Enumeration of all the combinations of the sub-paths of • The support of each is equal to the minimum support of the items contained in that sub-path
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Algorithm Efficiency Properties • FP-tree is usually much smaller than the size of DB. • FP-trees constructed in the FP-growth are never bigger than the sub-paths • Mining operations consist of • mainly prefix count adjustment • Counting • Pattern fragment concatenation This is much less costly than • Generating a very large number of candidate patterns • Test each of them
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Performance Study • Comparison of FP-growth with Apriori • Performed on a • 450MHz Pentium PC • 128MB main memory • Microsoft Windows/NT • Written in Microsoft/Visual C++6.0 • Run Time was considered time interval between input and output • Two datasets
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Future Works • Construction of FP-trees for projected Databases • Database is large • FP-tree can not be constructed in the main memory • Partition database into a set of projected databases • Construct an FP-tree • Mine it in each projected databases
Future Works • Construction of a disk-resident FP-tree • Use B+-tree structure to index FP-tree • Split the tree based on the common prefix paths • Materialization of an FP-tree • Constructing FP-tree needs two scan of the database • Materialize an FP-tree for frequent pattern mining • How to select a good minimum support threshold • Use a low ?
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Conclution • Constructs a highly compact FP-tree • Usually substantially smaller than the original database • Applies a pattern growth method • Avoids costly candidate generation and tests • Applies a partitioning-based divide and conquer method • Dramatically reduces the size of the subsequent conditional FP-trees • Mines both short and long patterns efficiently in large databases
Outline • Problem of mining frequent Pattern • Review of Apriori • Frequent Pattern Tree • An Example • Design & Construction • Properties • Mining Frequent Patterns Using FP-Tree • An Example • Design and Construction • Properties • Algorithm Efficiency Properties • Performance Study • Future Works • Conclusion • Selected Questions
Selected questions • One root • Aset of item prefix subtrees as the children of the root • Afrequent-item header table • What are the components of a FP-tree? • How To calculate the frequent patterns containing in path • Compare efficiency of mining operation in FP-growth with Apriori • Only consider prefix sub-path of node in • The frequency count of every node in tat sub-path is the same as node • Find all the combinations • Mining operations consist of • mainly prefix count adjustment • Counting • Pattern fragment concatenation • This is much less costly than • Generating a very large number of candidate patterns • Test each of them
Mining Frequent Patterns Jiawei Han, Jian Pei and Yiwen Yin School of Computer Science Simon Fraser University Without Candidate Generation AfsoonYousefi CS:332, March 24th, 2014 Inspired by Song Wang slides