1 / 53

Data Mining : Commercial Applications

Data Mining : Commercial Applications . 趙民德 中央研究院 統計科學研究所 2002/10/28. DM  good data analysis KDD  DM with commercial objective in mind. Data mining for maximum value is difficult unless a structured plan is followed. Knowledge Discovery process to get the most out of data mining. .

whitby
Download Presentation

Data Mining : Commercial Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

  2. DM  good data analysis • KDD  DM with commercial objective in mind

  3. Data mining for maximum value is difficult unless a structured plan is followed. • Knowledge Discovery process to get the most out of data mining.

  4. Knowledge Discovery and Data Mining: • The Expectation of Magic (Dorian Pyle , PC AI magazine, Sept/Oct 1998 )

  5. Business managers seem to expect magic from applying data mining tools to their data.

  6. This key to appropriate use of data mining lies in a structured methodology to • Find problems, • Define solutions, • Set expectations, and • Deliver results

  7. This process is called Knowledge Discovery.

  8. 10 guiding principles • Select clearly defined problems that will yield tangible benefits • Specify the solution required • Specify how the solution delivered will be used

  9. Understand as much as possible about the problem and the data set (the domain) • Let the problem drive the modeling (i.e., tool selection, data preparation, etc.) • Stipulate assumptions • Iteratively refine the model • Make the model as simple as possible, but no simpler

  10. Define instability in the model: areas where change in output is drastically different for small change in output • Define uncertainty in the model: critical areas and ranges in the data set where the model produces low confidence prediction or insights.

  11. Mining the Data (three parts) • Preparing the data • Surveying the data • Modeling the data.

  12. Briefly, problem exploration involves the discovery of appropriate problems using interviewing and problem elicitation techniques. • Decision support tools, including pair-wise rankings and ambiguity resolution, help build a problem matrix. • The problems are ranked for the benefit each will return based on various factors of importance to the problem owner.

  13. Solution exploration finds the most effective solutions for each problem: • ranking alternatives if necessary.

  14. Implementation addresses such issues as: what is to be delivered, who will use the solution, how it will be used, what training is required to use it, how long it will remain effective, how to monitor continued effectiveness.

  15. Data preparation takes at least 60% of the project’s time. • Implementation specification is key to the project’s success

  16. Projects that were very successful technically can fail because the results were never implemented in practice. • Without the will, resources, and commitment to put the solution in place, Knowledge Discovery will yield no return at all!

  17. Let me give you my data, tell me what you find. Familiar words? • This is the expectation of magic.

  18. The outcome of a data mining project consists of a model which does one of two things: The model will be • Explanatory, or • Predictive.

  19. Inferential models explain the relationships that exist in data. They may indicate • the driving factors for stock market movements, or • show failure factors in printed circuit board production. Regardless of purpose, these models help explain relationships.

  20. Predictive models may or may not explain relationships. Primarily, they make predictions of output conditions given a set of input conditions

  21. In many direct mail solicitation campaigns, the marketing manager did not ask what factor motivated people to respond to the solicitation. • Instead, the focus of the model was simply to increase response. • If it worked reliably and robustly, fine. If not, it was of no value.

  22. Whether explanatory or predictive, the data mining model must provide actionable information. This is critical to the project’s success.

  23. The purpose of the project is to provide information that will allow better decision-making. • Therefore, data mining is a tool in the decision support arsenal, a formidably potent tool when properly used.

  24. Knowledge Discovery, as a process, makes sure the goals of mining data align with the user’s needs. The results will directly and unambiguously bear on the domain of the decision to be made.

  25. Knowledge Discovery aligns the objectives of the modeler with the problem domain to search for optimal return for the effort invested.

  26. Instead of “let me give you my data”, Knowledge Discovery leads to “let’s discuss the problem and see what can be done”. No magic here. This is a structured search of alternatives and options.

  27. Each stage requires a commitment from separate groups of people inside a business or organization. At each stage, various parties work through the issues, making choices at each point, and fully understand the issues and expectations.

  28. Example 1 • A Fortune 500 pharmaceutical and bio-chemical company heard of data mining and wanted to explore what it could do for them.

  29. Some of the managers read of the wonderful things that data mining could do by just looking at their data. • Rather than accept copious amounts of data, the benefits of Knowledge Discovery is explained.

  30. The initial exploration, which included Problem and Solution Exploration, took two weeks. • When completed, more than 250 problems were clearly defined for areas including personnel, manufacturing, inventory control, and testing.

  31. Managers in each department worked through defining appropriate problems and defining solutions. • Their involvement was crucial to finding appropriate problems, the solutions to which would yield real business value.

  32. Senior managers and bio-chemists were presented with the results, an analysis that defined where the resources were located and which projects were to proceed. This lead to the Implementation phase.

  33. Note the level of involvement of the key actors. Because they worked through the problem, understood realistically what might be done with each problem, and evaluated the issue of implementing the solution, this project was a success.

  34. Example 2 • A major telecommunications company’s marketing department wanted data mining to solve their churn problem.

  35. When presented with the Knowledge Discovery approach, they dismissed it as irrelevant in this case. Their problem was churn: well defined, well understood. • Build a model to predict churn, and all would be well.

  36. The data was dirty and polluted, but with the help of advanced data preparation techniques, a reliable and robust model was constructed which was 83% accurate at predicting churn customers.

  37. The best previous techniques had achieved about 59% accuracy. The model provided a 40% improvement in predictive power.

  38. Marketing then spent a six-figure sum attempting to avert churn -- to no avail!

  39. Predicting churn was not the problem. The problem with churn, perhaps, would have been better addressed by building a demographic or sociographic model of the causes of churn, and address those causes. • That, however, did not occur.

  40. they were persuaded to try again using the Knowledge Discovery process. It turned out that for this company the most valuable feature was “Customer Lifetime Value”. To identify and focus on the motivating factors promoting this feature yielded significant benefit.

  41. Solving the right problem is more important than simply building a good model. • The Knowledge Discovery process does exactly that.

  42. Three Components of DM • Data Preparation • Data Surveying • The Data Model

  43. Data Preparation is the most important part of mining. • Sometimes the data is available in a data warehouse. This is helpful, but not sufficient. • Data preparation for data mining is a different activity than preparing data for warehousing. • CRISP

  44. Data mining requires fixing the problems of missing and empty variables, monotonic variables, categorical ordering, and many other problems not dealt with in data warehousing.

  45. In one extreme example, data from a warehouse not prepared for mining was modeled and produced a model that was 6% effective at predicting the required feature. This data had many problems, but after suitable preparation a reliable and robust model that was nearly 60% effective was produced.

  46. Data Surveying involves a look at the shape of the whole data set, by building a map of the territory before expending the time and effort required to create models. The survey addresses the question “Is the answer in here anyway?”

  47. The Data Model is the small-scale map of some very particular part of the territory. The nature of the data and the purpose of the model will determine which tools are appropriate. • Building the model is the piece that is typically thought of as data mining --- the application of automated tools to data.

  48. While important, building the model is just a piece of the whole Knowledge Discovery process.

  49. Data mining, the practice of applying automated pattern detection software tools to data, is not carried out in isolation from the rest of the world. • A commercial data mining project will not be successful if it is not driven by business needs. To discover and fulfill appropriate business problems, define solutions to those problems, use appropriate data, and build useful models requires an integrated process.

  50. The Knowledge Discovery process provides the necessary framework to ensure a successful outcome, if one is possible.

More Related