1 / 19

Knowledge discovery process

Knowledge discovery process. Chapter 1. Juha Vesanto Juha.Vesanto@hut.fi. Starting point!. Data exploration starts with data. ?. The real starting point!. Data exploration starts with data. ?. Data exploration starts with identifying a need. ?. !. Customer. Problem owners

Download Presentation

Knowledge discovery process

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Knowledge discovery process Chapter 1 Juha Vesanto Juha.Vesanto@hut.fi

  2. Starting point! Data exploration starts with data. ?

  3. The real starting point! Data exploration starts with data. ? Data exploration starts with identifying a need. ? !

  4. Customer • Problem owners • Problem holders • Useful • Profitable Participation Motivation

  5. The process (CRISP-DM)

  6. The process (Pyle) Exploring the problem Exploring the solution Implementation specification Preparation Survey Data modeling 20% work 80% importance 80% work 20% importance

  7. The problem • Identify the right problem • Define solvable problem(s) • Transfer the problem understanding to the miner

  8. Example “I really need a model of the Monday and Friday failure rates so we can stop them!” • What is a failure? • How it is detected/measured? • Is it a quality problem or just fluctuation of error rates? • Which problem components need to be looked at? • ...

  9. The solution What does the solution look like? - a program used by an expert - a data set to be referred to - a model to be used for prediction - a presentation / report - ... How (and by whom) is the solution implemented?

  10. Data mining • Prepare: • both the data and the miner • Survey: • understand the data • is the data adequate? • Model: • refining the details • depends on nature of data and the solution goal

  11. Why preparation? GIGO: fix the data Get a data set which is of maximum use preserves the information enhanced for problem & model

  12. new data PIE Prepared Information Environment 1. prepare the training/testing data 2. transform prepared values to original 3. apply the same preparation to new data PIE-in data model PIE-out report

  13. Why survey? Get a broad idea of the data: • what is covered • what is not covered, or is covered poorly Dangerous areas: • bias in data • sparse data (in a dynamic area) Is the data adequate?

  14. Modeling hype Universal approximator  can be applied to any data Data-driven  no theoretical knowledge required

  15. Modeling definition Model: “a representation … to show the construction or serve as a copy of something” = makes information understandable or usable =

  16. Modeling in data mining Modeling is iterative: 1. Define problem 2. Select tool 3. Collect data 4. Make model 5. Apply 6. Evaluate Traditional statistical methods: first model, then data

  17. Model types • Active or passive • Explanatory or predictive • Static or continuously learning

  18. 1. Select clear problem with tangible benefit 2. Specify required solution 3. Define how solution is implemented 4. Understand the domain 6. Stipulate assumptions 5. Let the problem drive the modeling 7. Refine the model iteratively 8. Make the model as simple as possible (but no simpler) 9. Find areas of instability 10. Find areas of uncertainty Ten golden rules

  19. Critique • Model evaluation is missing • Iteration of planning stage • Domain expert as data miner

More Related