1 / 27

Data Mining

Data Mining. Chase Repp. What is Data Mining?. knowledge discovery searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained within. What is Data Mining?.

zorana
Download Presentation

Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Chase Repp

  2. What is Data Mining? knowledge discovery searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained within

  3. What is Data Mining? Data mining differs from database querying in the following manner: database querying asks “what company purchased $100,000 worth of widgets last year?” while this asks “what company is likely to purchase over $100,000 of widgets next year and why?”

  4. History of Data Mining coined in the 1960s Data mining was used to find basic information from the collections of data such as total revenue over the last three years. classic statistics artificial intelligence machine learning

  5. Knowledge Discovery Process

  6. Categories • Predictive Data Mining • Target value • Future trends • Descriptive Data Mining • No target value • Focuses on relations

  7. Predictive • focuses on discovering a relationship between independent variables and a relationship between dependent and independent variables • used to forecast specific things

  8. Descriptive describes a data set in a brief but comprehensive way and gives interesting characteristics of the data without having any predefined target Focus on relations

  9. Association patterns are discovered based on a relationship of a specific item with other items in the same transaction Descriptive Example: groceries

  10. Classification to classify each item in a set of data into one of the predefined sets of classes or groups Often used with machine learning Predictive Example: cat or dog person?

  11. Clustering Different from classification, the clustering technique also defines the classes and put objects in them Descriptive Example: a library

  12. Regression used to predict numbers from data sets that have known target values Predictive Example: sales, distance, temperature, value, etc

  13. Sequential Patterns discovers frequent sequences or subsequences as patterns in a sequence database Descriptive Derived from association mining

  14. Sequential Pattern Mining • There are three categories that the main sequential pattern mining techniques fall into. • Apriori-based • Pattern-growth • Early-pruning

  15. Aprior-based follow the aprioriproperty - all nonempty subsets of a frequent itemset must also be frequent if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset AprioriAll, GSP, PSP, and SPAM

  16. t1: Beef, Chicken, Milk t2: Beef, Cheese t3: Cheese, Boots t4: Beef, Chicken, Cheese t5: Beef, Chicken, Clothes, Cheese, Milk t6: Chicken, Clothes, Milk t7: Chicken, Milk, Clothes Transaction data Assume: minsup = 30% minconf = 80% An example frequent itemset: {Chicken, Clothes, Milk} [sup = 3/7] about 43% Association rulesfrom the itemset: Clothes  Milk, Chicken [sup = 3/7, conf = 3/3] … … Clothes, Chicken Milk, [sup = 3/7, conf = 3/3]

  17. AprioriAlgorithm • Two steps: • Find all itemsets that have minimum support (frequent itemsets). • Use frequent itemsets to generate rules. • E.g., a frequent itemset {Chicken, Clothes, Milk} [sup = 3/7] and one rule from the frequent itemset Clothes  Milk, Chicken [sup = 3/7, conf = 3/3]

  18. Finding frequent itemsets Dataset T minsup=50% itemset:count 1. scan T  C1: {1}:2, {2}:3, {3}:3, {4}:1, {5}:3  F1: {1}:2, {2}:3, {3}:3, {5}:3  C2: {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5} 2.scan T  C2: {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2  F2: {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2  C3:{2, 3,5} 3. scan T  C3: {2, 3, 5}:2  F3: {2, 3, 5}

  19. Pattern-growth divide-and-conquer strategy to focus the search on a restricted portion of the initial database and generate as few candidate sequences as possible FreeSpan, PrefixSpan, WAP-mine, and FS-Miner

  20. Early-pruning utilize a sort of position induction to prune candidate sequences very early in the mining process and to avoid support counting as much as possible LAPIN, HVSM, and DISC-all

  21. Web Mining • searching for patterns in data through • content mining • Search engines • structure mining • Hyper links (hits / page rank) • usage mining • User’s browser data and forms submitted

  22. Web Mining One use is for finding user navigational patterns on the World Wide Web by extracting knowledge from web logs

  23. Example An example of applying sequential pattern mining S = {a, b, c, d, e, f} [P1,<abdac>] [P2,<eaebcac>] [P3,<babfaec>] [P4,<abfac>] Frequent pattern of abac

  24. Visual Data Mining • combines traditional mining methods and information visualization techniques • user is directly involved • VDMS - simplicity, reliability, reusability, availability, and security

  25. Visual Data Mining http://www.youtube.com/user/quiterian http://www.youtube.com/watch?v=MtJ4Xa4-J8g http://www.youtube.com/watch?v=_8HzwQCFFfw

  26. Questions?

More Related