1 / 33

Data Science course

ExcelR is a leading Data Science Course in pune training institute.Data Science Course in pune will be delivered by highly experienced and certified trainers who are considered as one the best trainers in the industry and so we are considered to be one of the best Data Science Course in pune training institutes.<br><br>

shital11
Download Presentation

Data Science course

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Market Basket Analysis AssociationRules Relationship Mining Affinity Analysis © 2013 ExcelR Solutions. All RightsReserved

  2. Market BasketAnalysis • Large number of transaction records through data collected using bar-codescanners • Each record = All items purchased on a single purchasetransaction © 2013 ExcelR Solutions. All RightsReserved

  3. AssociationRules • What item goes withwhat • Are certain groups of items consistently purchasedtogether • What business strategies will you device with thisknowledge © 2013 ExcelR Solutions. All RightsReserved

  4. AssociationRules • Products shelf placement – a specific product besideanother • Selling of prominent shelves – SlottingFees • Stocking – Supply ChainManagement • Price Bundling – Combo offers.How? • Source:http://www.economist.com/news/business/21654601-supplier-rebates-are-heart-some-supermarket-chains-woes-buying-up-shelves • https://en.wikipedia.org/wiki/Association_rule_learning © 2013 ExcelR Solutions. All RightsReserved

  5. A store sells accessories for cellular phones runs a promotion on faceplates • OFFER! • Buy multiple faceplates from a choiceof • 6 different colors & getdiscount • How would you help store managers device strategy to become more profitable Association Rules – Cell phonefaceplates © 2013 ExcelR Solutions. All RightsReserved

  6. ListFormat Binary MatrixFormat Association Rules – Cell phonefaceplates • Association Rules are probabilistic “if-then”statements • 2 Main Ideas: • Examine all possible “if-then” ruleformats • Select rules, which indicates truedependence © 2013 ExcelR Solutions. All RightsReserved

  7. Rules for { Red, White, Green} • Problem • Many rules arepossible • How to select the TRUE/GOOD rulesfrom all generatedrules? Association Rules – Cell phonefaceplates © 2013 ExcelR Solutions. All RightsReserved

  8. “IF” part = Antecedent =A • “THEN” part = Consequent =C • If {Red, White} then{Green} • If Red & White phone faceplates are purchased, then Green faceplate is purchased • Antecedent: Red &White • Consequent:Green Association Rules –Terminology © 2013 ExcelR Solutions. All RightsReserved

  9. Association Rules – Performance Measures 1 Support 2 Confidence 3 Lift © 2013 ExcelR Solutions. All RightsReserved

  10. Consider only combinations that occur with higher frequency in thedatabase • Support is the criterion based onfrequency • Percentage / Number of transactions in which IF/Antecedent & THEN / Consequent appear in thedata Association Rules –Support 1 Support Mathematically: # transactions in which A & C appeartogether Total no. oftransactions © 2013 ExcelR Solutions. All RightsReserved

  11. Support -Calculation • What is the support for “if White thenBlue”? • What is the support for “if Blue then White”? 1. 4 2. 40% 3. 2 4. 90% 1. 4 2. 40% 3. 2 4. 90% © 2013 ExcelR Solutions. All RightsReserved

  12. Generating all possible rules is exponential in the number of distinctitems • Solution: • Frequent item sets using AprioriAlgorithm Support -Problem © 2013 ExcelR Solutions. All RightsReserved

  13. AprioriAlgorithm For kproducts: 1 Set minimum supportcriteria Generate list of one-item sets that meet thesupport 2 criterion Use list of one-item sets to generate list of two-itemsets 3 that meet supportcriterion Use list of two-item sets to generate list of three-itemsets 4 that meet supportcriterion 5 Continue up through k-itemsets © 2013 ExcelR Solutions. All RightsReserved

  14. Support – Criterion =2 Create rules from frequent item setsonly © 2013 ExcelR Solutions. All RightsReserved

  15. Rules for { Red, White, Green} Support CriterionExample © 2013 ExcelR Solutions. All RightsReserved

  16. Percentage of If/Antecedent transactions that also have the Then/Consequent itemset Association Rules –Confidence Mathematically: P (Consequent | Antecedent) = P(C & A) /P(A) 2 Confidence # transactions in which A & C appeartogether # transactions withA © 2013 ExcelR Solutions. All RightsReserved

  17. Confidence -Calculation • What is theconfidence • for “if White thenBlue”? • What is theconfidence • for “if Blue thenWhite”? 1. 4/5 2. 5/8 3. 5/4 4. 4/8 1. 4/5 2. 5/8 3. 5/4 4. 4/8 © 2013 ExcelR Solutions. All RightsReserved

  18. If antecedent and consequenthave: • HighSupport => High / BiasedConfidence Confidence -Weakness © 2013 ExcelR Solutions. All RightsReserved

  19. Confidence / Benchmark confidence Benchmark assumes independence between antecedent & consequent: Association Rules – LiftRatio Benchmarkconfidence P(C|A) = P(C & A) / P(A) = P(C) X P(A) /P(A) =P(C) 3 LiftRatio # transactions with consequent itemsets # transactions indatabase © 2013 ExcelR Solutions. All RightsReserved

  20. Lift > 1 indicates a rule that is useful in finding consequent item sets • The rule above is much better than selecting randomtransactions InterpretingLift © 2013 ExcelR Solutions. All RightsReserved

  21. Lift -Calculation • What is the Lift for “if White thenBlue”? • 1. 4/8 • 2. 5/10 • 3. 4/5 • 4. 1 © 2013 ExcelR Solutions. All RightsReserved

  22. Generate all rules that meet specified Support &Confidence • Find frequent item sets based on Support specified by applying minimum supportcutoff • From these item sets, generate rules with defined Confidence. By filtering remaining rules select only thosewith highConfidence Rules selection process © 2013 ExcelR Solutions. All RightsReserved

  23. Rules © 2013 ExcelR Solutions. All RightsReserved

  24. Random data can generateapparently interesting associationrules • More the rules you produce, greater the danger • Rules based on large numbers of records • are less subject to thisdanger Alarming! © 2013 ExcelR Solutions. All RightsReserved

  25. Profusion ofrules © 2013 ExcelR Solutions. All RightsReserved

  26. What if Product & Stores are selected as a tuple foranalysis? Applications • What if crimes in different geographies for eachweek is known? Narcotics Public Peace Violation Battery Assault Narcotics Robbery © 2013 ExcelR Solutions. All RightsReserved

  27. How can you use the information if you know aboutthe • purchase history of customers in a specificgeography? • Supermarket database has 100,000 POStransactions Recap with anexample • 2000 transactions include both Strepsils & OrangeJuice • 800 of the above 2000 include Souppurchases © 2013 ExcelR Solutions. All RightsReserved

  28. What is the support for rule “IF (Orange Juice & Strepsils) are purchased THEN (Soup) is purchased on the sametrip”? • 1. 0.8% • 2. 2% • 3. 40% Recap with anexample • What is the confidence for rule “IF (Orange Juice & Strepsils) arepurchased • THEN (Soup) is purchased on the sametrip”? • 1. 0.8% • 2. 2% • 3. 40% © 2013 ExcelR Solutions. All RightsReserved

  29. What is the lift ratio for rule “IF (Orange Juice & Strepsils) are purchased THEN (Soup) is purchased on the sametrip”? Recap with anexample © 2013 ExcelR Solutions. All RightsReserved

  30. ITIS • If person X has taken “Data Mining Unsupervised” training in 1st Quarter, Person X has alsotaken “Data Mining training in 2nd Supervised” Quarter Sequential PatternMining • Based on the statement • above, recommend “Data NOT Mining Supervised” training to those who have enrolled for “Data MiningUnsupervised” Purchases / events occur at the sametime © 2013 ExcelR Solutions. All RightsReserved

  31. Look for temporalpatterns • Order/sequence of a & b matters for a rule “b followsa” • However, what happens in between a & b doesn’tmatter • In phone faceplates dataset: • Association among items, which were bought withinthe same week werediscovered • How about finding what they would buy next week or the week after, if they had bought ‘x’ in thisweek? Association Rules vs. Sequential PatternMining © 2013 ExcelR Solutions. All RightsReserved

  32. Identify the appropriateBasket Applications • Identify popular taxiroutes • Sequential pattern from GPS tracks; spatiotemporal records of taxi trajectories • First cluster collocatedcustomers © 2013 ExcelR Solutions. All RightsReserved

  33. www.excelr.com surya@excelr.com +91 9880913504 ExcelR - Data Science, Data Analytics Course Training in Pune Address: 102, 1st Floor, Phase II, Prachi Residency Opposite to KapilMalhar, Baner Rd, Baner, Pune,Maharashtra 411046 Hour: Mon- Sat 07AM – 11PM Established in Year: 2013 CONTACT US THANK YOU © 2013 ExcelR Solutions. All RightsReserved

More Related