1 / 31

Data Mining

Data Mining. Process, Key Success Factors, Illustrations. Data Mining in the BI Context. Data Extraction. Collecting / Transforming. Data Storage. Storing / Aggregating / Historising. Business Intelligence. Visualization. Reporting / EIS / MIS. Exploration. OLAP. Data Analysis.

ouidat
Download Presentation

Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Process, Key Success Factors, Illustrations

  2. Data Mining in the BI Context Data Extraction Collecting / Transforming Data Storage Storing / Aggregating / Historising BusinessIntelligence Visualization Reporting / EIS / MIS Exploration OLAP Data Analysis Discovery Data Mining

  3. What Is Data Mining?Business Definition • Deployment of business processes, supported by adequate analytical techniques, to: • Take further advantage of data • Discover relevant knowledge • Act on the results

  4. CRISP-DM Data Understanding Business Understanding Data Preparation Modeling Deployment Evaluation Determine Business Objectives Background Business Objectives Business Success Criteria Situation Assessment Inventoryof Resources Requirements, Assumptions, and Constraints RisksandContingencies Terminology CostsandBenefits Determine Data Mining Goal Data Mining Goals Data Mining Success Criteria Produce Project Plan Project PlanInitial Asessmentof Tools andTechniques Collect Initial Data Initial Data Collection Report Describe Data Data Description Report Explore Data Data Exploration Report Verify Data Quality Data Quality Report Data Set Data Set Description Select Data Rationale forInclusion / Exclusion Clean Data Data Cleaning Report Construct Data Derived Attributes Generated Records Integrate Data Merged Data Format Data Reformatted Data Select Modeling Technique Modeling Technique Modeling Assumptions Generate Test Design Test Design Build Model Parameter Settings Models Model Description Assess Model Model AssessmentRevised Parameter Settings Evaluate Results Assessment of Data Mining Results w.r.t. Business Success Criteria Approved Models Review Process Review of Process Determine Next Steps List of Possible Actions Decision Plan Deployment Deployment Plan Plan Monitoring and Maintenance Monitoring and Maintenance Plan Produce Final Report Final Report Final Presentation Review Project Experience Documentation DOCUMENT EVERYTHING!

  5. Data Mining Tasks • Summarization • Classification / Prediction • Classification, Concept learning, Regression • Clustering • Dependency modeling • Anomaly detection • Link Analysis

  6. Human Resources

  7. Survey andOnline Game

  8. Do They Know Us?

  9. Who Plays?

  10. How Well Do They Do? 0-13136 Poor 21 13136-19453 Fair 91 19453-25769 Good 90 25769-32086 Excellent 39 32086+ Outstanding 15

  11. Subscription Retail

  12. Situation & Goal • Poor understanding of customers and behaviors • Short audit: • Nice DWH, only 2 years old, not fully populated • Limited data on purchases and subscriptions • Potential goals: • Associations of products that sell together • Segmentation of customers

  13. Summarization / Aggregation • Revenue distribution • 80% generated by 41.5% of subscribers • 60% generated by 18.3% of subscribers • 42.9% generated by top 5 products • Simple customer classes • Over 65 years old most profitable • Under 16 years old least profitable • Birthdate filled-in for only about 10% of subscribers!

  14. Product Association • About 21% of subscribers buy P4, P7 and P9 • P4 is most profitable product • P7 is ranked 6th • P9 is ranked 15th with only 2% of revenue • Several possible actions • Make a bundle offering of these products • Cross-sell from P9 to P4 • Temptation to remove P9 should be resisted

  15. Clustering 30% of customers who buy a single yearly product !!!

  16. Summary of Findings • Data Mining found: • A small percentage of the customers is responsible for a large share of the sales • Several groups of « strongly-connected » articles • A sizeable group of subscribers who buy a single article • Lessons learned: • First 2 findings: « we knew that! » (BUT: scientific confirmation of business observation) • 3rd finding: « we could target these customers with a special offer! » • Lack of relevant data: the structure is in place but not being used systematically

  17. Campaign Management

  18. Situation & Goal

  19. Lift Lift(c) = CR(c) / c Example: Lift(25%) = CR(25%) / 25% = 62% / 25% = 2.5 If we send to 25% of our prospects using the model, they are 2.5 times as likely to respond than if we were to select them randomly. 30 0 0 30,000

  20. Expected ROI Assume: 200 seminars per year €0.41 stamp €200 per seminar Send half as many, same response (from 0.1% to 0.2% response rate)

  21. Approach & Cost Fixed price: €5,000 Decision: No !?!

  22. Laws of Data Mining

  23. Eight Laws (I) • Business/domain objectives are the origin of every data mining solution • Business/domain knowledge is central to every step of the data mining process • Data preparation is more than half of every data mining process • The right model for a given application can only be discovered by experiment

  24. Eight Laws (II) • There are always patterns • Data mining amplifies perception in the domain • The value of data mining results is not determined by the accuracy or stability of predictive models • All patterns are subject to change

  25. The Right Expectation • Data Mining is unlikely to produce surprising results that will utterly transform a business. Rather: • Early results: insights about data and scientific confirmation of human experience/intuition • Beyond: steady improvement to an already successful organization • Occasionally: discovery of one rare/highly valuable piece of knowledge

  26. The Right Organization • Data Mining is not sophisticated enough to be substituted for domain knowledge or for experience in analysis and model building. • Rather: • Data Mining is a joint venture • “… put teams together that have a variety of skills (e.g., statistics, business and IT skills), are creative and are close to the business thinking .”

  27. Key Success Factors • Have a clearly articulated business problem that needs to be solved and for which Data Mining is the adequate technology • Ensure that the problem being pursued is supported by the right type of data of sufficient quality and in sufficient quantity • Recognize that Data Mining is a process with many components and dependencies • Plan to learn from the Data Mining process whatever the outcome

  28. Essential Tips

  29. Tips (I) • Don’t wait to get started – the competition is only a mouse click away • Begin with the end in mind • It’s the decision maker, stupid! • Unless there’s a method, there’s madness • Better data means better results

  30. Tips (II) • Twyman’s law: any statistic that appears interesting is almost certainly a mistake (double-check all findings) • Avoid the OLAP trap • Deployment is the key to data mining ROI • Champions train so they can win the race

  31. Crawl, Walk, Run • Exploratory Workshop / Brainstorm • Identify potential profitable applications • Data Audit • Assess data quality and relevance • Identify shortcomings • Suggest ways to enrich data (internal and external) • Domain-relevant Case Studies (start small) • Refine list of applications to produce well-defined, actionable, domain-relevant case studies • Select 1 or more case studies as « pilots » • Scale-up

More Related