1 / 29

A distributed method for mining association rules

A distributed method for mining association rules. Pham Nguyen Anh Huy* Department of Information Technology Vietnam National University of HoChiMinh city presented by Ho Tu Bao School of Knowledge Science Japan Advanced Institute of Science and Technology.

mikel
Download Presentation

A distributed method for mining association rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A distributed method for mining association rules Pham Nguyen Anh Huy* Department of Information Technology Vietnam National University of HoChiMinh city presented by Ho Tu Bao School of Knowledge Science Japan Advanced Institute of Science and Technology (*work done during 3 months of the author JSPS’s fellowship in JAIST)

  2. Outline • Introduction • Background • A distributed Apriori algorithm using mobile agents • Experimental evaluation • Conclusion

  3. Introduction • Association analysis is a new and attractive research area in data mining • Apriori algorithm (R. Agrawal, IBM 1993) is a key technique for association analysis • Though the apriori principle allows us to considerably reduce the search space, the technique still requires a huge computation, particularly for large database • This research proposes a distributed version of Apriori algorithm using mobile agents. The experiments show that we can reduce computation time when using computers in a distributed computing environment.

  4. Outline • Introduction • Background • Association rules and Apriori algorithm • Mobile agents and Aglets • A distributed Apriori algorithm using mobile agents • Experimental evaluation • Conclusion

  5. Association rules: Market basket analysis • Analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets” (in the form X  Y, where X and Y are sets of items) • I = {I1=beer, I2=cake, I3=onigiri} • Transactional database • An association rule {I1}  {I3} How often people buy onigiri and beer together? TID1: {I1, I2, I3} TID2: {I1, I2} TID3: {I2, I3} TID4: {I2} TID5: {I1, I2}

  6. Rule measures: Support and Confidence • Association rule X Y • support s = probability that a transaction contains X and Y • confidence c =conditional probability that a transaction having X also contains Y • A  C (s=50%, c=66.6%) • C  A (s=50%, c=100%) Customer buys both Customer buys beer Customer buys onigiri

  7. Associationmining: Apriori algorithm It is composed of two steps: • Find all frequent itemsets: By definition, each of these itemsets will occur at least as frequently as a pre-determined minimum support count • Generate strong association rules from the frequent itemsets: By definition, these rules must satisfy minimum support and minimum confidence (Agrawal, R., 1993)

  8. Association mining: Apriori principle Min. support 50% Min. confidence 50% For rule A  C support = support({A and C}) = 50% confidence = support({A and C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent (if an itemset is not frequent, its supersets are not)

  9. The Apriori algorithm: Finding frequent itemsets using candidate generation • Find the frequent itemsets: the sets of items that have support higher than the minimum support • A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset • Iteratively find frequent itemsets Lk with cardinality from 1 to k (k-itemset) by from candidate itemsets Ck (Lk Ck) • Use the frequent itemsets to generate association rules. C1 …  Li-1  Ci  Li  Ci+1  …  Lk

  10. Example (min_sup_count = 2) Scan D for count of each candidate Compare candidate support count with minimum support count Transactional data TID List of items_IDs T100 I1, I2, I5 T200 I2, I4 T300 I2, I3 T400 I1, I2, I4 T500 I1, I3 T600 I2, I3 T700 I1, I3 T800 I1, I2, I3, I5 T900 I1, I2, I3 C1 L1 Itemset Sup.Count {I1} 6 {I2} 7 {I3} 6 {I4} 2 {I5} 2 Itemset Sup.Count {I1} 6 {I2} 7 {I3} 6 {I4} 2 {I5} 2

  11. Example (min_sup_count = 2) C2 C2 Itemset S.count {I1, I2} 4 {I1, I3} 4 {I1, I4} 1 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 {I2, I5} 2 {I3, I4} 0 {I3, I5} 1 {I4, I5} 0 Compare candidate support count with minimum support count Itemset {I1, I2} {I1, I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5} Generate candidates C2 from L1 using Apriori principle Scan D for count of each candidate L2 Itemset S.count {I1, I2} 4 {I1, I3} 4 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 {I2, I5} 2 Compare candidate support count with minimum support count Generate candidates C3 from L2 using Apriori principle C3 L3 Scan D for count of each candidate Itemset {I1, I2, I3} {I1, I2, I5} Itemset Sc {I1, I2, I3} 2 {I1, I2, I5} 2 Itemset Sc {I1, I2, I3} 2 {I1, I2, I5} 2

  12. Agents and Mobile agents Mobile network agents are programs that: • can migrate from system to system within a network environment • Performs some processing at each host • Agent decides when and where to move next • How does it move? • Save state • Transport saved state to next system • Resume execution of saved state An agent is a computation entity that: • Acts on behalf of other entities in autonomous fashion. • Performs its actions with some level of pro-activity and re-activeness. • Exhibits some level of the key attributes of co-operation.

  13. Distributed Computing using Mobile Programs

  14. Mobile agent tools

  15. What are Aglets ? • Aglets (Agile Applets) are Java objects that can move from one host on the Internet to another, and perform arbitrary operations within the security limits. • When an Aglet moves it takes along its program code as well as its data. The Aglets framework is implemented by the Aglets Software Development Kit (ASDK) from IBM. It is an environment for programming mobile Internet Agent in Java.

  16. Aglets at Runtime • Currently aglets use the Agent Transfer Protocol (ATP) as a default implementation of the communication layer (ATP is modeled after HTTP) • Used on the Tahiti aglet server • Use the Aglets Server Interface to write application capable of hosting, receiving and dispatching aglets

  17. Outline • Introduction • Background • A distributed Apriori algorithm using the mobile agents • Experimental evaluation • Conclusion

  18. A distributed Apriori algorithm 1 (1) spawn n slave processes; (2) divide database into partitions (3) distribute partitions to each slave process 2 Master process • sendglobal candidate (k-1)-itemsetsCk-1 to each slave process • wait and receivelocal supports,countglobal supports for global candidate (k-1)-itemsets Ck-1 • computefrequent (k-1)-itemsets Lk-1, and send clusters of frequent (k-1)-itemsets Lk-1 to slave processes 8. wait and receive local candidate k-itemsets from slave processes 9. unionize local candidate k-itemsets and prune to form global candidate k-itemsets. Slave processes • receive the global candidate (k-1)-itemsets Ck-1 • countlocal supports for global candidate (k-1)-itemsets Ck-1, andsendlocal supports to the master process. • receivefrequent (k-1)-itemsets Lk-1from the master process • generatelocal candidate k-itemsets andsend these local candidate k-itemsets to the master process

  19. A distributed Apriori algorithm SEND global candidate (k-1) itemsets Ck-1 COUNT and SEND local supports for global candidate (k-1)-itemsets (counting support Aglets) COUNT global supports for global candidate (k-1)-itemsets Ck-1 JOIN and SEND local candidate k-itemsets (Aprio_gen Aglet) UNIONIZE local candidate k-itemsets and PRUNE to form global candidate k-itemsets Ck e.g.,{AB} 2 3 1 … DB1 DB1 … DB DB DB2 DB2 DB 8 . . . . . . FIND and SEND frequent (k-1)-itemsets Lk-1 DBn DBn master slaves master slaves master

  20. Global support count & Global candidate itemsets • X is a candidate itemset, global support count of X is • The set of global candidate k-itemsets GCk formed by local candidate k-itemsets • GLk formed by Apriori-gen with ID segment (p, q) of GLk-1 • GLk = {GCk׀ GCk.G-Supp  G-Min-Supp}

  21. Outline • Introduction • Background • A distributed Apriori algorithm using the mobile agents • Experimental evaluation • Conclusion

  22. Name |D| |T| Size (MB) D100k.T30 100K 30 3M D100k.T100 100K 100 10M D320k.T150 320K 150 48M |D| Number of transactions |T| Average amount of items on transactions Experiments:Synthetic datasets • Using synthetic datasets of varying sizes:

  23. Experiment environment • Software • Database : Oracle server • Language: Java – JDK1.3-Sun • Mobile agents: Aglet- IBM • Protocol traffic: ATP – Aglet Transfer Protocol • Platform: Windows • Hardware • PC Petium3-300 Mhz, RAM 128MB • 15 machines (at Knowledge Science Center, JAIST)

  24. Execution time (sec.) with different minimum support thresholds 35% 40% 50%

  25. Execution time with min_sup 35%

  26. Execution time with min_sup 40%

  27. Execution time with min_sup 50%

  28. Rate of execution time The rate between execution time and number of slaves is nearly linear

  29. Conclusion • Proposed a distributed apriori algorithm for mining association rule • Experimental evaluation show that when the number of slaves increases the execution time decreases nearly linear • Future work: • Segment both the master and GLk for support counts • Develop incremental algorithms for association analysis using the MA technology

More Related