1 / 17

Overview of Methods

Overview of Methods. Data mining techniques What techniques do, examples, Advantages & disadvantages. History. Statistics AI: genetic algorithms, neural networks analogies with biology memory-based reasoning link analysis from graph theory. Techniques. Statistical

salasm
Download Presentation

Overview of Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Methods Data mining techniques What techniques do, examples, Advantages & disadvantages

  2. History • Statistics • AI: • genetic algorithms, neural networks • analogies with biology • memory-based reasoning • link analysis from graph theory

  3. Techniques • Statistical • Market-Basket Analysis - find groups of items • Memory-Based Reasoning- case based • Cluster Detection - undirected (quantitative MBA) • Artificial Intelligence • Link Analysis - MCI’s Friends & Family • Decision Trees, Rule Induction - production rule • Neural Networks - automatic pattern detection • Genetic Algorithms- keep best parameters

  4. Models • Regression: Y = a + bX • Classification: assign new record to class • Predictive: assign value to new record • Clustering: groups for data • Time-series: assign future value • Links: patterns in data

  5. Fitting • Underfitting: not enough detail • leave out important variables • Overfitting: too much detail • memorizes training set, but doesn’t help with new data • data set too small • redundancy in data

  6. Comparison of Features

  7. Data Mining Functions • Classification • Identify categories in data • Prediction • Formula to predict future observations • Association • Rules using relationships among entities • Detection • Anomalies & irregularities (fraud detection)

  8. Financial Applications

  9. Telecom Applications

  10. Marketing Applications

  11. Web Applications

  12. Other Applications

  13. Data Sets • Loan Applications • classification • Job Applications • classification • Insurance Fraud • detection • Expenditure Data • prediction

  14. Loan Data • 650 observations • OUTCOMES (binary): • On-time cost of error: $300 • Late (default) cost of error: $2,000 • Variables • Age, Income, Assets, Debts, Want, Credit • Credit ordinal • Transform: Assets, Debts, & Want →Risk

  15. Job Application Data • 500 observations • OUTCOMES (ordinal): • Unacceptable • Minimal • Acceptable • Excellent • Variables • Age, State, Degree, Major, Experience • State nominal; degree & major ordinal • State is superfluous

  16. Insurance Claim Data • 5000 observations • OUTCOMES (binary): • OK cost of error $500 • Fraudulent cost of error $2,500 • Variables • Age, Gender, Claim, Tickets, Prior claims, Attorney • Gender & attorney nominal, tickets & prior claims categorical

  17. Expenditure Data • 10,000 observations • OUTCOMES: • Could predict response in a number of categories • Others • Variables: • Age, Gender, Marital, Dependents, Income, Job years, Town years, Education years, Drivers license, Own home, Number of credit cards • Churn, proportion of income spent on seven categories

More Related