1 / 10

Business Intelligence and Decision Modeling

Business Intelligence and Decision Modeling. Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT. CHAID or CART. Chi-Square Automatic Interaction Detector Based on Chi-Square All variables discretecized Dependent variable: nominal Classification and Regression Tree

luella
Download Presentation

Business Intelligence and Decision Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT

  2. CHAID or CART • Chi-Square Automatic Interaction Detector • Based on Chi-Square • All variables discretecized • Dependent variable: nominal • Classification and Regression Tree • Variables can be discrete or continuous • Based on GINI or F-Test • Dependent variable: nominal or continuous

  3. Use of Decision Trees • Classify observations from a target binary or nominal variable Segmentation • Predictive response analysis from a target numerical variable Behaviour • Decision support rules  Processing

  4. Decision Tree

  5. Example:dmdata.sav Underlying Theory  X2

  6. CHAID AlgorithmSelecting Variables • Example • Regions (4), Gender (3, including Missing)Age (6, including Missing) • For each variable, collapse categories to maximize chi-square test of independence: Ex: Region (N, S, E, W,*)  (WSE, N*) • Select most significant variable • Go to next branch … and next level • Stop growing if …estimated X2 < theoretical X2

  7. CART (Nominal Target) • Nominal Targets: • GINI (Impurity Reduction or Entropy) Squared probability of node membership Gini=0 when targets are perfectly classified. Gini Index =1-∑pi2 • Example • Prob: Bus = 0.4, Car = 0.3, Train = 0.3 • Gini = 1 –(0.4^2 + 0.3^2 + 0.3^2) = 0.660

  8. CART (Metric Target) • Continuous Variables: Variance Reduction (F-test)

  9. Comparative Advantages(From Wikipedia) • Simple to understand and interpret • Requires little data preparation • Able to handle both numerical and categorical data • Uses a white box model easilyexplained by Boolean logic. • Possible to validate a modelusing statistical tests • Robust

  10. Where to get help? http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp

More Related