1 / 162

Chapter 2: Basics of Business Analytics

5. Chapter 2: Basics of Business Analytics. 5. Chapter 2: Basics of Business Analytics. Objectives. Name two major types of data mining analyses. List techniques for supervised and unsupervised analyses. Define/Refine business objective. Assess results. Select data. Deploy models.

semah
Download Presentation

Chapter 2: Basics of Business Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 5 Chapter 2: Basics of Business Analytics

  2. 5 Chapter 2: Basics of Business Analytics

  3. Objectives • Name two major types of data mining analyses. • List techniques for supervised and unsupervised analyses.

  4. Define/Refine business objective. Assess results. Select data Deploy models Explore input data Prepare and Repair data Apply Analysis Transform input data Analytical Methodology A methodology clarifies the purpose and implementation of analytics.

  5. Business Analytics and Data Mining • Data mining is a key part of effective business analytics. • Components of data mining: • data management • data management • data management • customer segmentation • predictive modeling • forecasting • standard and nonstandard statistical modeling practices

  6. What Is Data Mining? • Information Technology • Complicated database queries • Machine Learning • Inductive learning from examples • Statistics • What we were taught not to do

  7. Translation for This Course • Predictive Modeling • Supervised classification • Linear regression • Logistic regression • Decision trees • Other techniques • Segmentation • Unsupervised classification • Cluster Analysis • Association Rules • Other techniques

  8. Customer Segmentation • Segmentation is a vague term with many meanings. • Segments can be based on the following: • A Priori Judgment • Alike based on business rules, not based on data analysis • Unsupervised Classification • Alike with respect to several attributes • Supervised Classification • Alike with respect to a target, defined by a set of inputs

  9. Segmentation: Unsupervised Classification Training Data Training Data case 1: inputs, ?case 2: inputs, ?case 3: inputs, ? case 4: inputs, ? case 5: inputs, ? case 1: inputs,cluster 1case 2: inputs, cluster 3case 3: inputs, cluster 2 case 4: inputs, cluster 1 case 5: inputs, cluster 2 new case new case

  10. Segmentation: A Selection of Methods BarbieCandy Beer Diapers Peanut butter  Meat Association rules (Market basket analysis) k-means clustering

  11. Predictive Modeling: Supervised Classification Training Data case 1: inputs prob classcase 2: inputs prob classcase 3: inputs prob classcase 4: inputsprob classcase 5: inputs probclass new case new case

  12. Predictive Modeling: Supervised Classification Inputs Target ... ... ... ... ... ... Cases ... ... ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

  13. Types of Targets • Logistic Regression • event/no event (binary target) • class label (multiclass problem) • Regression • continuous outcome • Survival Analysis • time-to-event (possibly censored)

  14. Discrete Targets • Healthcare • Target = favorable/unfavorable outcome • Credit Scoring • Target = defaulted/did not default on a loan • Marketing • Target = purchased product A, B, C, or none

  15. Continuous Targets • Healthcare Outcomes • Target = hospital length of stay, hospital cost • Liquidity Management • Target = amount of money at an ATM machine or in a branch vault • Merchandise Returns • Target = time between purchase and return (censored)

  16. Application: Target Marketing • Cases = customers, prospects, suspects, households • Inputs = geo/demo-graphics, psychometrics, RFM variables • Target = response to a past or test solicitation • Action = target high-responding segments of customers in future campaigns

  17. Application: Attrition Prediction/Defection Detection • Cases = existing customers • Inputs = payment history, product/service usage, • demographics • Target = churn, brand-switching, cancellation, • defection • Action = customer loyalty promotion

  18. Application: Fraud Detection • Cases = past transaction or claims • Inputs = particulars and circumstances • Target = fraud, abuse, deception • Action = impede or investigate suspicious cases

  19. Application: Credit Scoring • Cases = past applicants • Inputs = application information, credit bureau reports • Target = default, charge-off, serious delinquency, repossession, foreclosure • Action = accept or reject future applicants for credit

  20. The Fallacy of Univariate Thinking Prob(churn) Daytime Usage International Usage What is the most important cause of churn?

  21. A Selection of Modeling Methods Linear Regression, Logistic Regression Decision Trees

  22. Hard Target Search Transactions ...

  23. Hard Target Search Transactions Fraud

  24. Undercoverage Accepted Bad Accepted Good Rejected No Follow-up ...

  25. Next Generation Undercoverage Accepted Bad Accepted Good Rejected No Follow-up

  26. 5 Chapter 2: Basics of Business Analytics

  27. Objectives • Identify several of the challenges of data mining and present ways to address these challenges.

  28. Initial Challenges in Data Mining ... • What do I want to predict? • What level of granularity is needed to obtain data about the customer?

  29. Initial Challenges in Data Mining ... • What do I want to predict? • a transaction • an individual • a household • a store • a sales team • What level of granularity is needed to obtain data about the customer?

  30. Initial Challenges in Data Mining • What do I want to predict? • a transaction • an individual • a household • a store • a sales team • What level of granularity is needed to obtain data about the customer? • transactional • regional • daily • monthly • other

  31. Typical Data Mining Time Line Allotted Time Projected: Actual: Dreaded: (Data Acquisition) Needed: Data Preparation Data Analysis

  32. Data Challenges What identifies a unit?

  33. Cracking the Code ID1ID2DATEJOBSEXFINPRO3CR_TERA 2612 624 941106 06 8 DEC . . . 2613 625 940506 04 5 ETS . . . 2614 626 940809 11 5 PBB . . . 2615 627 941010 16 1 RVC . . . 2616 628 940507 04 2 ETT . . . 2617 629 940812 09 1 OFS . . . 2618 630 950906 09 2 RFN 71 612 12 2618 631 951107 13 2 PBB 0 623 23 2619 632 950112 10 5 SLP 0 504 04 2620 633 950802 11 1 STL 34 611 11 2620 634 950908 06 0 DES 0 675 75 2620 635 950511 01 1 DLF 0 608 08 What identifies a unit?

  34. Data Challenges What should the data look like to perform an analysis?

  35. Data Arrangement Long-Narrow Accttype 2133 MTG 2133 SVG 2133 CK 2653 CK 2653 SVG 3544 MTG 3544 CK 3544 MMF 3544 CD 3544 LOC Short-Wide AcctCKSVGMMFCDLOCMTG 2133 1 1 0 0 0 1 2653 1 1 0 0 0 0 3544 1 0 1 1 1 1 What should the data look like to perform an analysis?

  36. Data Challenges What variables do I need?

  37. Derived Inputs Claim Accident Date Time 11nov96 102396/12:38 22dec95 012395/01:42 26apr95 042395/03:05 02jul94 070294/06:25 08mar96 123095/18:33 15dec96 061296/18:12 09nov94 110594/22:14 DelaySeasonDark 19 fall 0 333 winter 1 3 spring 1 0 summer 0 69 winter 0 186 summer 0 4 fall 1 What variables do I need?

  38. Data Challenges How do I convert my data to the proper level of granularity?

  39. Roll-Up HHAcctSales 4461 2133 160 4461 2244 42 4461 2773 212 4461 2653 250 4461 2801 122 4911 3544 786 5630 2496 458 5630 2635 328 6225 4244 27 6225 4165 759 HHAcctSales 4461 2133 ? 4911 3544 ? 5630 2496 ? 6225 4244 ? How do I convert my data to the proper level of granularity?

  40. Rolling Up Longitudinal Data Frequent Flying VIP Flier Month Mileage Member 10621 Jan 650 No 10621 Feb 0 No 10621 Mar 0 No 10621 Apr 250 No 33855 Jan 350 No 33855 Feb 300 No 33855 Mar 1200 Yes 33855 Apr 850 Yes How do I convert my data to the proper level of granularity?

  41. Data Challenges What sorts of raw data quality problems can I expect?

  42. Errors, Outliers, and Missings cking#ckingADBNSFdirdepSVGbal Y 1 468.11 1 1876 Y 1208 Y 1 68.75 0 0 Y 0 Y 1 212.04 0 6 0 . . 0 0 Y 4301 y 2 585.05 0 7218 Y 234 Y 1 ­47.69 2 1256 238 Y 1 4687.7 0 0 0 . . 1 0 Y 1208 Y . . . 1598 0 1 0.00 0 0 0 Y 3 89981.12 0 0 Y 45662 Y 2 585.05 0 7218 Y 234 What sorts of raw data quality problems can I expect?

  43. Missing Value Imputation Inputs ? ? ? ? ? Cases ? ? ? ? What sorts of raw data quality problems can I expect?

  44. Data Challenges Can I (more importantly, shouldI) analyze all the data that I have? All the observations? All the variables?

  45. Massive Data Bytes 210 220 230 240 250 Paper ½ sheet 1 ream 167 feet 32 miles 32,000 miles Kilobyte Megabyte Gigabyte Terabyte Petabyte Can I (more importantly, shouldI) analyze all the data that I have?

  46. Sampling Can I (more importantly, shouldI) analyze all the data that I have?

  47. Oversampling OK Fraud Can I (more importantly, shouldI) analyze all the data that I have?

  48. The Curse of Dimensionality 1–D 2–D 3–D Can I (more importantly, shouldI) analyze all the data that I have?

  49. Dimension Reduction Redundancy Irrelevancy E(Target) Input3 Input1 Input2 Input1 Can I (more importantly, shouldI) analyze all the data that I have?

  50. Catalog Case Study Analysis goal: A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers who are most likely to purchase in the future. Data set: CATALOG Number of rows: 48,356 Number of columns: 98 Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales Targets: RESPOND (binary) ORDERSIZE (continuous)

More Related