280 likes | 395 Views
This work by Ying Hu at the University of British Columbia focuses on Treatment Learning—a novel approach integrating learning algorithms and data mining techniques. It reviews the TAR2 and TAR3 algorithms and their applications, emphasizing their efficiency in improving decision tree learning and feature selection. Through pilot studies and case analyses, the research demonstrates the effectiveness of Treatment Learning in diverse domains, aiming to enhance classification accuracy in machine learning tasks. Key findings underscore the importance of attribute selection and its impact on model performance.
E N D
Treatment Learning:Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia
Outline • An example • Background Review • TAR2 Treatment Learner • TARZAN: Tim Menzies • TAR2: Ying Hu & Tim Menzies • TAR3: improved tar2 • TAR3: Ying Hu • Evaluation of treatment learning • Application of Treatment Learning • Conclusion Ying Hu http://www.ece.ubc.ca/~yingh 2
low high • C4.5’s decision tree: • Treatment learner: 6.7 <= rooms < 9.8 and 12.6 <= parent teacher ratio < 15.9 0.6 <= nitric oxide < 1.9 and 17.16 <= living standard < 39 First Impression • Boston Housing Dataset (506 examples, 4 classes) Ying Hu http://www.ece.ubc.ca/~yingh 3
Review: Background • What is KDD ? • KDD = Knowledge Discovery in Database [fayyad96] • Data mining: one step in KDD process • Machine learning: learning algorithms • Common data mining tasks • Classification • Decision tree induction (C4.5) [quinlan86] • Nearest neighbors [cover67] • Neural networks [rosenblatt62] • Naive Baye’s classifier [duda73] • Association rule mining • APRIORI algorithm [agrawal93] • Variants of APRIORI Ying Hu http://www.ece.ubc.ca/~yingh 4
Input: classified dataset Assume: classes are ordered Output: Rx=conjunction of attribute-value pairs Size of Rx = # of pairs in the Rx confidence(Rx w.r.t Class) = P(Class|Rx) Goal: to find Rx that have different level of confidence across classes Evaluate Rx: lift Visualization form of output Treatment Learning: Definition Ying Hu http://www.ece.ubc.ca/~yingh 5
Motivation: Narrow Funnel Effect • When is enough learning enough? • Attributes: < 50%, accuracy: decrease 3-5% [shavlik91] • 1-level decision tree is comparable to C4 [Holte93] • Data engineering: ignoring 81% features result in 2% increase of accuracy [kohavi97] • Scheduling: random sampling outperforms complete search (depth-first) [crawford94] • Narrow funnel effect • Control variables vs. derived variables • Treatment learning: finding funnel variables Ying Hu http://www.ece.ubc.ca/~yingh 6
TAR2: The Algorithm • Search + attribute utility estimation • Estimation heuristic: Confidence1 • Search: depth-first search • Search space: confidence1 > threshold • Discretization: equal width interval binning • Reporting Rx • Lift(Rx) > threshold • Software package and online distribution Ying Hu http://www.ece.ubc.ca/~yingh 7
The Pilot Case Study • Requirement optimization • Goal: optimal set of mitigations in a cost effective manner Risks Cost relates Requirements incur reduce achieve Mitigations Benefit • Iterative learning cycle Ying Hu http://www.ece.ubc.ca/~yingh 8
Compared to Simulated Annealing The Pilot Study (continue) • Cost-benefit distribution (30/99 mitigations) Ying Hu http://www.ece.ubc.ca/~yingh 9
Problem of TAR2 • Runtime vs. Rx size • To generate Rx of size r: • To generate Rx from size [1..N] Ying Hu http://www.ece.ubc.ca/~yingh 10
TAR3: the improvement • Random sampling • Key idea: • Confidence1 distribution = probability distribution • sample Rx from confidence1 distribution • Steps: • Place item (ai) in increasing order according to confidence1 value • Compute CDF of each ai • Sample a uniform value u in [0..1] • The sample is the least ai whose CDF>u • Repeat till we get a Rx of given size Ying Hu http://www.ece.ubc.ca/~yingh 11
Runtime vs. Data size • Runtime vs. TAR2 • Runtime vs. Rx size Comparison of Efficiency Ying Hu http://www.ece.ubc.ca/~yingh 12
pilot2 dataset (58 * 30k ) • Mean and STD in each round Comparison of Results • 10 UCI domains, identical best Rx • Final Rx: TAR2=19, TAR3=20 Ying Hu http://www.ece.ubc.ca/~yingh 13
learning Compare Accuracy some attributes learning External Evaluation C4.5 Naive Bayes • FSS framework All attributes (10 UCI datasets) Feature subset selector TAR2less Ying Hu http://www.ece.ubc.ca/~yingh 14
The Results • Number of attributes • Accuracy using Naïve Bayes • Accuracy using C4.5 (avg decrease 0.9%) (Avg increase = 0.8% ) Ying Hu http://www.ece.ubc.ca/~yingh 15
Compare to other FSS methods • # of attribute selected (Naive Bayes) • # of attribute selected (C4.5 ) • 17/20, fewest attributes selected • Another evidence for funnels Ying Hu http://www.ece.ubc.ca/~yingh 16
Applications of Treatment Learning • Downloading site: http://www.ece.ubc.ca/~yingh/ • Collaborators: JPL, WV, Portland, Miami • Application examples • pair programming vs. conventional programming • identify software matrix that are superior error indicators • identify attributes that make FSMs easy to test • find the best software inspection policy for a particular software development organization • Other applications: • 1 journal, 4 conference, 6 workshop papers Ying Hu http://www.ece.ubc.ca/~yingh 17
Main Contributions • New learning approach • A novel mining algorithm • Algorithm optimization • Complete package and online distribution • Narrow funnel effect • Treatment learner as FSS • Application on various research domains Ying Hu http://www.ece.ubc.ca/~yingh 18
====================== • Some notes follow Ying Hu http://www.ece.ubc.ca/~yingh 19
Input example classified dataset Output example: Rx=conjunction of attribute-value pairs confidence(Rx w.r.t C) = P(C|Rx) Rx Definition example Ying Hu http://www.ece.ubc.ca/~yingh 20
TAR2 in practice • Domains containing narrow funnels • A tail in the confidence1 distribution • A small number of variables that have disproportionally large confidence1 value • Satisfactory Rx of small size (<6) Ying Hu http://www.ece.ubc.ca/~yingh 21
Background: Classification • 2-step procedure • The learning phase • The testing phase • Strategies employed • Eager learning • Decision tree induction (e.g. C4.5) • Neural Networks (e.g. Backpropagation) • Lazy learning • Nearest neighbor classifiers (e.g. K-nearest neighbor classifier) Ying Hu http://www.ece.ubc.ca/~yingh 22
Possible Rule: B => C,E [support=2%, confidence= 80%] Where support(X->Y) = P(X) confidence(X->Y) = P(Y|X) Representative algorithms APRIORI Apriori property of large itemset Max-Miner More concise representation of the discovered rules Different prune strategies. Background: Association Rule Ying Hu http://www.ece.ubc.ca/~yingh 23
Background: Extension • CBA classifier • CBA = Classification Based on Association • X=>Y, Y = class label • More accurate than C4.5 (16/26) • JEP classifier • JEP = Jumping Emerging Patterns • Support(X w.r.t D1) = 0, Support(X w.r.t D2) > 0 • Model: collection of JEPs • Classify: maximum collective impact • More accurate than both C4.5 & CBA (15/25) Ying Hu http://www.ece.ubc.ca/~yingh 24
Background: Standard FSS Method • Information Gain attribute ranking • Relief • Principle Component Analysis (PCA) • Correlation based feature selection • Consistency based subset evaluation • Wrapper subset evaluation Ying Hu http://www.ece.ubc.ca/~yingh 25
Comparison • Relation to classification • Class boundary / class density • Class weighting • Relation to association rule mining • Multiple classes / no class • Confidence-based pruning • Relation to change detecting algorithm • support: |P(X|y=c1)-P(X|y=c2)| • confidence: |P(y=c1|X)-P(y=c2|X)| • Baye’s rule Ying Hu http://www.ece.ubc.ca/~yingh 26
Confidence Property • Universal-extential upward closure R1: Age.young -> Salary.low R2: Age.young, Gender.m -> Salary.low R2: Age.young, Gender.f -> Salary.low • Long rule tend to have high confidence • Large Rx tend to have high lift value Ying Hu http://www.ece.ubc.ca/~yingh 27
TAR3: Usability • Usability: more user-friendly • Intuitive, default setting Ying Hu http://www.ece.ubc.ca/~yingh 28