1 / 34

DATA MINING

DATA MINING. Team #1 Kristen Durst Mark Gillespie Banan Mandura. University of Dayton MBA 664 13 APR 09. Data Mining: Outline. Introduction Applications / Issues Products Process Techniques Example. Introduction. Data Mining Definition Analysis of large amounts of digital data

tiara
Download Presentation

DATA MINING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of Dayton MBA 664 13 APR 09

  2. Data Mining: Outline • Introduction • Applications / Issues • Products • Process • Techniques • Example MBA 664, Team #1

  3. Introduction • Data Mining Definition • Analysis of large amounts of digital data • Identify unknown patterns, relationships • Draw conclusions AND predict future • Data Mining Growth • Increase in computer processing speed • Decrease in cost of data storage MBA 664, Team #1

  4. Introduction • High Level Process • Summarize the Data • Generate Predictive Model • Verify the Model • Analyst Must Understand • The business • Data and its origins • Analysis methods and results • Value provided MBA 664, Team #1

  5. Applications / Issues • Applications • Telecommunications • Cell phone contract turnover • Credit Card • Fraud identification • Finance • Corporate performance • Retail • Targeting products to customers • Legal and Ethical Issues • Aggregation of data to track individual behavior MBA 664, Team #1

  6. Data Mining Products • Angoss Software (www.angoss.com) • Knowledge Seeker/Studio • Strategy Builder • Infor Global Solutions (www.infor.com) • Infor CRM Epiphany • Portrait Software (www.portraitsoftware.com) • SAS Institute (www.sas.com) • SAS Enterprise Miner • SAS Analytics • SPSS Inc (www.spss.com) • Clementine MBA 664, Team #1

  7. Angoss Knowledge Studio MBA 664, Team #1

  8. SAS Institute MBA 664, Team #1

  9. SPSS Inc. MBA 664, Team #1

  10. Data Mining Process • No uniformly accepted practice • 2002 www.KDnuggets.com survey • SPSS CRISP-DM • SAS SEMMA MBA 664, Team #1

  11. Data Mining Process • SPSS CRISP-DM • CRoss Industry Standard Process for Data Modeling • Consortium: Daimler-Chrysler, SPSS, NCR • Hierarchical Process – Cyclical and Iterative MBA 664, Team #1

  12. Data Mining Process • CRISP-DM MBA 664, Team #1

  13. Data Mining Process • SAS SEMMA • Model development is focus • User defines problem, conditions data outside SEMMA • Sample – portion data, statistically • Explore – view, plot, subgroup • Modify – select, transform, update • Model – fit data, any technique • Assess – evaluate for usefulness MBA 664, Team #1

  14. Data Mining Process • Common Steps in Any DM Process • 1. Problem Definition • 2. Data Collection • 3. Data Review • 4. Data Conditioning • 5. Model Building • 6. Model Evaluation • 7. Documentation / Deployment MBA 664, Team #1

  15. Data Mining Techniques • Statistical Methods (Sample Statistics, Linear Regression) • Nearest Neighbor Prediction • Neural Network • Clustering/Segmenting • Decision Tree MBA 664, Team #1

  16. Statistical Methods • Sample Statistics • Quick look at the data • Ex: Minimum, Maximum, Mean, Median, Variance • Linear Regression • Easy and works with simple problems • May need more complex model using different method MBA 664, Team #1

  17. Example: Linear Regression Total Purchase Amount Customer Income MBA 664, Team #1

  18. Nearest Neighbor Prediction • Easy to understand • Used for predicting • Works best with few predictor variables • Based on the idea that something will behave the same as how others “near” it behave • Can also show level of confidence in prediction MBA 664, Team #1

  19. Example: Nearest Neighbor Product Sales by Population of City and Distance from Competitor Population of City A A: > 200 units B: 100 – 200 units C: < 100 units A A B A U A A A B B C B A C C B C Distance from Competitor MBA 664, Team #1

  20. Neural Network • Contains input, hidden and output layer • Used when there are large amounts of predictive variables • Model can be used again and again once confirmed successful • Can be hard to interpret • Extremely time consuming to format the data MBA 664, Team #1

  21. Example: Neural Network Population of City W1 =.36 Product Sales Prediction 0.736 W2 =.64 Distance from Competitor MBA 664, Team #1

  22. Clustering/Segmenting • Not used for prediction • Forms groups that are very similar or very different • Gives an overall view of the data • Can also be used to identify potential problems if there is an outlier MBA 664, Team #1

  23. Example: Clustering/Segmenting Dimension B < 40 years >= 40 years Red = Female Blue= Male Dimension A MBA 664, Team #1

  24. Decision Trees • Uses categorical variables • Determines what variable is causing the greatest “split” between the data • Easy to interpret • Not much data formatting • Can be used for many different situations MBA 664, Team #1

  25. Example: Decision Trees Change from original score .76 .14 .58 -.46 n = 67 n = 51 n = 115 n = 48 Baseline < 3.75 Baseline >= 3.75 F M Large body type Small body type M F -.29 n = 24 -.63 n = 24 .47 n = 28 1.11 n = 23 -.29 n = 24 MBA 664, Team #1

  26. Data Mining Example1. Problem Definition • Improve On-Time Delivery of New Products MBA 664, Team #1

  27. Brainstorm Variation Sources Data Collection Plan Data Mining Example2. Collect Data MBA 664, Team #1

  28. Data Mining Example3. Data Review • Data Segments TOTAL LEAD TIME by Part Type: p < .05 Level N Mean StDev ----+---------+---------+---------+-- BRACKET 520 x6.76 x3.14 (--*-) DUCT 138 x6.70 x0.40 (----*---) MANIFOLD 44 x9.95 x4.68 (-------*-------) TUBE 47 x3.60 x2.79 (------*-------) ----+---------+---------+---------+-- Pooled StDev = 68.47 MBA 664, Team #1

  29. Data Mining Example5. Build Model MBA 664, Team #1

  30. Data Mining Example5. Build Model Combined Model: 2 separate regressions Design and Manufacturing – combined thru a common term SHIP-DUE = 7.97 + 0.269*(MODEL_CR-DUE) + 0.173*(CR-ISS) + 0.704*(MAN_BOMC) + 0.748*(SCH_ST-MAN) + 0.862*(MOS_MOFIN) [R^2A 4.4%] – {R^2A(1) 76.5%, R^2A(2) 68.0%} MBA 664, Team #1

  31. Data Mining Example6. Model Evaluation Model Accurately Reflects Delivery Distribution MBA 664, Team #1

  32. Data Mining Example7. Document / Deploy Design Release Required for On Time Delivery Due Date MBA 664, Team #1

  33. Requirements Plan Actual Data Mining Example7. Document / Deploy Update Planning and Automate Tracking MBA 664, Team #1

  34. Data Mining • Questions? MBA 664, Team #1

More Related