1 / 26

Some working definitions….

Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = the discovery of interesting , meaningful and actionable patterns hidden in large amounts of data

Download Presentation

Some working definitions….

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some working definitions…. • ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably • Data mining = • the discovery of interesting, meaningful and actionable patterns hidden in large amounts of data • Multidisciplinary field originating from artificial intelligence, pattern recognition, statistics, machine learning, econometrics, ….

  2. Data mining is a process… • Business objectives • Model Development • Model objective • Data collection & preparation • Model construction • Model evaluation • Combining models with business knowledge into decision logic • Model / decision logic deployment • Model / decision logic monitoring

  3. Data mining is a process…a marketing example • Business objectives • Cross sell MMS bundle to lapsed users / non users • Model Development • Model objective • For consumers with no MMS bundle in past 6 months, predict MMS bundle ownership yes/no in next three months • Data collection & preparation • All fields for all active customers as of end APR05; remove all customers with MMS bundle in NOV04-APR05; Left join MMS Bundle field from MAY05, JUNE05, JULY05 • Model construction • Build various models to predict MMS Bundle MAY or JUNE or JULY = ‘N’ on 70% if the data • Model evaluation • Evaluate predictive power on 70% data for model development and 30% test set • Combining models with business knowledge into decision logic • Target the top 30% and randomly test two propositions (50 MMS for 5Euro; 100MMS for 7.50Euro) across two channel (Direct mail and SMS) • Model / decision logic deployment • Run the campaign • Model / decision logic monitoring • Compare predctions against actual response to evaluate model quality and robustness • What propositions / channels work best

  4. Data mining tasks • Undirected, explorative, descriptive, ‘unsupervised’ data mining • Matching & search • Profile & rule extraction • Clustering & segmentation; dimension reduction • Directed, predictive, ‘supervised’ data mining • Predictive modeling

  5. Data mining task example: Clustering & segmentation

  6. Data mining task example: Clustering & segmentation

  7. Start Looking Glass Source: Sentient Information Systems (www.sentient.nl)

  8. Tussenresultaat looking glass Source: Sentient Information Systems (www.sentient.nl)

  9. Resultaat Looking Glass Source: Sentient Information Systems (www.sentient.nl)

  10. Resultaat Looking Glass Source: Sentient Information Systems (www.sentient.nl)

  11. Past experience Score Behaviour Data Case A 10 9 8 7 6 5 4 3 2 1 Better business Good Case B Bad Case A 7 Model Bad Good Case B 4 Worse business Data mining task example:predictive modeling

  12. Data mining task example:predictive modeling Collected data

  13. Data mining task example:predictive modeling score = (0 x Income) + (-1 x Age) + (25 x Children)

  14. Data mining techniques for predictive modeling • Linear and logistic regression • Decision trees • Neural Networks • Nearest Neighbor • Genetic Algorithms • ….

  15. Linear Regression Models score = (0 x Income) + (-1 x Age) + (25 x Children)

  16. Regression in pattern space Only a single line available in pattern space to separate classes Class ‘square’ income Class ‘circle’ age

  17. Decision Trees 20000 customers response 1% Income >150000? yes no 1200 customers 18800 customers balance>50000? Purchases >10? yes no no 400 customers 800 customers etc. response 0,1% response 1,8%

  18. Decision Trees in Pattern Space Line pieces perpendicular to axes Each line is a split in the tree, two answers to a question income age

  19. Decision Trees in Pattern Space Goal classifier is to seperate classes (circle, square) on the basis of attribute age and income Each line corresponds to a split in the tree Decision areas are ‘tiles’ in pattern space weight age

  20. Nearest Neighbour • Data itself is the classification model, so no abstraction like a tree etc. • For a given instance x, search the k instances that are most similar to x • Classify x as the most occurring class for the k most similar instances

  21. Nearest Neighbor in Pattern Space Classification = new instance Any decision area possible Condition: enough data available fe weight fe age

  22. Nearest Neighbor in Pattern Space Voorspellen Any decision area possible Condition: enough data available bvb. weight f.e. age

  23. Example classification algorithm 3:Neural Networks • Inspired by neuronal computation in the brain (McCullough & Pitts 1943 (!)) • Input (attributes) is coded as activation on the input layer neurons, activation feeds forward through network of weighted links between neurons and causes activations on the output neurons (for instance diabetic yes/no) • Algorithm learns to find optimal weight using the training instances and a general learning rule.

  24. Neural Networks • Example simple network (2 layers) • Probability of being diabetic = f (age * weightage + body mass index * weightbody mass index) age body_mass_index Weightbody mass index weightage Probability of being diabetic

  25. Neural Networks in Pattern Space Classification Simpel network: only a line available (why?) to seperate classes Multilayer network: Any classification boundary possible f.e. weight f.e. age

  26. Dilbert’s Perspective on Data Mining

More Related