1 / 18

Mining Rules from Surveys and Questionnaires

Mining Rules from Surveys and Questionnaires. Scott Burton and Richard Morris CS 676 Presentation 12 April 2011. Surveys and Questionnaires. Frequently Used Problems for data mining Rarity Related and dependent questions Ordinal / Likert scale. Association Rule Mining.

reed-baker
Download Presentation

Mining Rules from Surveys and Questionnaires

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Rules from Surveys and Questionnaires Scott Burton and Richard Morris CS 676 Presentation 12 April 2011

  2. Surveys and Questionnaires • Frequently Used • Problems for data mining • Rarity • Related and dependent questions • Ordinal / Likert scale

  3. Association Rule Mining Market basket analysis Cookies -> Milk

  4. Our Goal: Improve Precision Standard Algorithms/Approaches • Apriori, MS-Apriori • Too many rules • Rules are not “interesting” or actionable • Finding the needle in the haystack Our goal • Improve Precision • How do you measure “interestingness?”

  5. Interestingness Measures • Mostly based on Support or Confidence • Considered about 40 different metrics • All seemed to favor the wrong types of rules

  6. Our Datasets • Smoking habits of middle school students in Mexico • Global Youth Tobacco Survey for the Pan American Health Organization (GYTSPAHO) • ~65 Questions and 13,000 responses • HINTS (Health Information National Trends Survey) • hints.cancer.gov • 2007 response data had ~475 Questions and 8,000 responses • We focused on a subset of ~100 questions

  7. Apriori vs. MS-Apriori Apriori (Figure 1) MS-Apriori (Figure 2)

  8. Related and Dependent Questions True but worthless rules • Do you smoke=no -> Did you smoke last week=no Our approach • Cluster similar questions • Remove any intra-cluster rules 1 2 3 7 4 8 9 5 6

  9. Creating Clusters • Distance Metrics • Bi-conditional prediction • Attribute vs. Attribute-Value pair • Involving the subject matter expert

  10. A Sample Clustering of Questions (see handout)

  11. Effects of Cluster Pruning MS-Apriori (Figure 2) After cluster pruning (Figure 3)

  12. Similar Rules Abstract Viewpoint: • A B -> C D • A -> C D • A B -> C • A B Z -> C D

  13. Similar Rule Pruning

  14. Effects of Similar Rule Pruning After cluster pruning (Figure 3) After Similar Rule Pruning (Figure 4)

  15. Ordinal and Likert Data Two Approaches • Pre-process • Post-process Likert Ordinal

  16. Effects of Pre-Binning (Figure 5)

  17. Other Examples • HINTS Data (see handout, Figures 6-10)

  18. Conclusions and Future Work Conclusions • Increased precision of “interesting” rules • More work to be done Future work • Tuning of existing processes • Handle numerical data • Handle questions not asked to everyone • Handle questions with multiple responses • Try other record matching techniques for similar rule pruning

More Related