1 / 20

Detecting Group Differences: Mining Contrast Sets

Detecting Group Differences: Mining Contrast Sets. Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin. Outline. Motivation Objective Research Review Search for Contrast Sets Filtering for Summarizing Contrast Set Evaluation Conclusion. Motivation.

oscardixon
Download Presentation

Detecting Group Differences: Mining Contrast Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin

  2. Outline • Motivation • Objective • Research Review • Search for Contrast Sets • Filtering for Summarizing Contrast Set • Evaluation • Conclusion

  3. Motivation • Learning group differences a central problem in many domains • Contrasting groups especially important in social science research

  4. Objective • Automatically detect differences between contrasting groups from observational multivariate data

  5. Research Review • time series research • multiple observations • traditional statistical methods • rule learner and decision tree • miss group differences • association rule mining • multiple group and different search criteria

  6. Problem Definition • itemset concept extends to contrast set Definition 1: Let A1,A2,...,Ak be a set of k variables called attributes. Each Ai can take on values from the set {Vi1,Vi2,...Vim}. Contrast set a conjunction of attribute –value pairs defined on groups G1,G2,...,Gn with no Ai occurring more than once.

  7. Define support of contrast set • Definition 2: • The support of a contrast set with respect to a group G is the percentage of examples in G where the contrast set is true. • minimum support difference δ user defined threshold

  8. Search for Contrast Sets • find contrast sets meet our criteria though search • explore all possible contrast sets return only sets meet our criteria • STUCCO (Search and Testing for Understandable Consistent Contrasts): breadth-first search incorporates several efficiently mining techniques

  9. Framework • use set-enumeration trees • use breadth-first search • counting phase organize nodes into candidate groups

  10. Finding Significant Contrast Sets • testing the null hypothesis across all groups • support counts from contingency tables

  11. Controlling Search Error • data mining test many hypotheses • family of tests control Type I error • Bonferroni inequality:given any set of events e1,e2,...,en, the probability of their union is less than or equal to the sum of the individual probabilities

  12. Pruning • prune when contrast sets fail to meet effect size or statistical significance criteria • prune when lead to uninteresting contrast sets • Effect Size Pruning • prune nodes when bound maximum support difference groups below δ • Statistical Significance Pruning • pruned when too few data or maximum value X2 too small

  13. Interest Based Pruning • contrast sets are not interesting when have identical support or relation between groups is fixed • Specializations with Identical Support • marital-status=husband • marital-status=husband ^ Sex = male

  14. Fixed Relations • Fixed Relations • prune node as contrast set specializations do not add new information

  15. Relation to Itemset Mining • minimum support difference criterion implies constraints support levels in individual groups • eliminate large portions of the search space based on: • subset infrequency pruning • effect size pruning • superset frequency pruning • interest based pruning ab abc

  16. Filtering for Summarizing Contrast Set • past approaches • limit the rules shown by constraint the variables or items • compare discovered rules, show only unexpected results • new methods • expectation based statistical approach • identify and select linear trend contrast sets

  17. Statistical Surprise • show most general contrast sets first, more complicated conjunctions if surprising based on previously shown sets • IPF(Iterative Proportional Fitting) find maximum likelihood estimates

  18. Detecting Linear Trends • identical to finding change over time • detect significant contrast set by using the chi-square test • use regression techniques to find the portion of the x2

  19. Evaluation • three research points: • low support difference • few high support attribute-value pairs, lower bounds can’t take advantage • pruning rules • δ -> 0 statistical significance pruning is more important • filtering rules

  20. Conclusion • STUCCO algorithm combined statistical hypothesis testing with search for mining contrast sets • STUCOO has • pruning rules efficient mining at low support differences • guaranteed control over false positives • linear trend detection • compact summarization of result

More Related