『Data Mining』 Start By Jung, hae-sun
Contents • Introduction • Definition • Data Mining Applications • Data Mining Tasks • 5. Overview of the System • 6. Data Mining Analysis • 7. Application • 8. Reference
1. Introduction • Data mining is related to - Data warehousing - Online analytical processing (OLAP) - Data visualization • Data mining needs a data warehouse for effective mining. The aims of OLAP and data mining are similar but only data mining involves looking for unknown patterns. Finally, data mining requires data visualization of presentation of results.
2. Definition • A technique using software tools geared for the user who typically does not know exactly what he's searching for, but is looking for particular patterns or trends. Data mining is the process of sifting through large amounts of data to produce data content relationships. This is also known as data surfing.
3. Data Mining Applications • Applications in financial, telecom, insurance and retail companies for - market segmentation - fraud detection -better marketing - trend analysis - market basket analysis - customer churn
4. Data Mining Tasks • Class description • Association • Sequential Patterns • Time-Series analysis • Prediction • Classification • Clustering
Grouping between customer & product Grouping betweenproducts 5. Overview of the System - Recommender System Normalized Customer vectors Product Database Customer Purchase Database Data Mining Clustering Cluster assignments Products eligible for recommendation Cluster-specific Product lists Products List For target customer’s cluster Vector for Target customer Matching Algorithm Data Mining Associations Product affinities Personalized Recommendation List Target Customer
Matching Algorithm (Key points in this paper) 6. Data Mining Analysis (1) ▶Clustering • Neural Clustering Algorithm • Demographic Clustering Algorithm ▶Association Rule • Apriori Algorithm • AprioriAll Algorithm • AprioriTid Algorithm • DynamicSome Algorithm • FP-Growth
6. Data Mining Analysis (2) ▶Association Rule- Concept • Search for interesting relationships among items in a given data set. ▶Association Rule- Procedure • Find all frequent itemsets. ; Each of these itemsets will occur at least as frequently as a pre-determined minimum support. • Generate strong association rules from the frequent itemsets.; These rules must satisfy minimum support and minimum confidence.
6. Data Mining Analysis (3) ▶Association Rule- Measure number of transactions containing both A and B • Support (A B) = Total number of transactions = P(A B) ∩ number of transactions containing both A and B • Confidence (A B) = number of transactions containing A P(A B) ∩ = P(B | A) = P(A)
6. Data Mining Analysis (4) ▶Association Rule- Example Support of A & D = 3/5 = 0.6 Support of A & F = 4/5 = 0.8 Support of A & E = 1/5= 0.2 Step1: Find all frequent itemsets. Minimum support = 60%
6. Data Mining Analysis (5) Step2: Generate strong association rules from the frequent itemsets. AD : Confidence = 60%/100%= 0.6, D F : Confidence = 60%/60% = 1 Minimum Confidence = 90% Strong Association Rule : D F , etc
7. Application (1) - Safeway Stores ▶Data Collection • Duration : 7 months • Number of Customers : 200 • Recommendation Products per each customer : 10~20
Problem : Multilevel Products (Data Mining Issue) Seasonal Products 7. Application (2) - Safeway Stores ▶Safeway product taxonomy Product classes (99) Petfoods Tea Soft Drinks Dried Cat Food Dried Dog Food Canned Cat Food Canned Dog Food Product subclasses (2302) Products (~30000) Friskies Liver (250g)
This system can be used a reasonable tool for recommending new products in Supermarket. 7. Application (3) - Safeway Stores ▶Results • 1957 products were recommended. Of these, 120(6.1%) were chosen. • (It is important to recall that the recommendation list will contain no products • previously purchased by this customer.)
8. References Agrawal, R. and Srikant, R., Fast Algorithms for mining association rules, In proc. of the VLDB Conf., 1994 http://www.twocrows.com/glossary.htm, “Two Crows, Data Mining Glossary” http://www.mis.postech.ac.kr/topic/dm_e.html, “Data Mining” http://wwwmaths.anu.edu.au/~steve/pdcn.pdf