1 / 31

Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers

Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers. Dae-Ki Kang , Adrian Silvescu, Jun Zhang and Vasant Honavar Artificial Intelligence Research Laboratory Iowa State University, USA.

bunme
Download Presentation

Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers Dae-Ki Kang,Adrian Silvescu, Jun Zhang and Vasant Honavar Artificial Intelligence Research Laboratory Iowa State University, USA This research is sponsored in part by grants from the National Science Foundation (IIS 0219699) and National Institutes of Health (GM066387)

  2. Paper Highlights • AVT-Learner, an algorithm for automated construction of attribute value taxonomiesfrom data • Evaluation of the AVTs generated by AVT-Learner using AVT-aware learning algorithm on benchmark data sets

  3. Overview • Background and Motivation • AVT-Learner Algorithm • Experimental Results • Summary

  4. Attribute Value Taxonomies (AVT)-- ISA hierarchies Human-supplied Attribute Value Taxonomy (AVT) for student status ISA relationship Abstract Values Primitive Values Cut

  5. Motivations for learning AVTsfrom data • Learning from AVTs and data has severaladvantages (Zhang & Honavar; 2003, 2004) • Preference for simple, comprehensible, yet accurate and robust classifiers • When data are limited, statistics estimated from abstract values are often more reliable than statistics estimated from primitive values • However, in most domains, AVTs are usually unavailable, and manual AVT generation is tedious • Need to generate AVTs that are useful for classification tasks

  6. Learning Scenario with AVT-Learner

  7. Overview • Background and Motivation • AVT-Learner Algorithm • Experimental Results • Summary

  8. AVT-Learner • An algorithm for automated construction of AVTs from a data set of instances wherein each instance is described by an ordered tuple of N nominal attribute values and a class label • Hierarchical agglomerative clustering (HAC) of the attribute values according to the distribution of classes that co-occur with them • Using the pairwise divergence between the distributions of class labels associated with the corresponding attribute values as a measure of the dissimilarity between the attribute values.

  9. Problem Definitions • A={A1,A2,…,An} – nominal attributes • Vi– a set of primitive values of attribute Ai • C={C1,C2,…,Ck} – mutually disjoint class labels • Data D  V1  V2  …Vn C • T={T1,T2,…,Tn} – a set of AVT s.t. Ti is an AVT associated with the attribute Ai • Learning AVTs from data – given a data set D and similarity measure DM(P(x)||Q(x)), output a set of AVTs T={T1,T2,…,Tn} s.t. each Ti corresponds to a hierarchical grouping of values in Vi based on the specified similarity measure

  10. Major steps of AVT Learner • Initialize Cut L = {vi1 ,… vij ,… vil} • Compare and choose • For each value lj in L and class label ck, calculate class label conditional probability distribution P(C|lj) • Find (x,y)=argmin(DM(P(C|lx), P(C|ly))), x≠y • Merge and update • lxy← lxly • L←L\{lx,ly}{lxy} • Loop until |L|>1

  11. AVT Construction for Odor attribute Odor {m,s,y,f,c,p} {s,y,f,c,p} Most similar! {s,y,f} {a,l,n} {s,y} {c,p} {a,l} {m} {y} {s} {f} {c} {p} {a} {l} {n} Done!

  12. More about AVT Learner • Similarity measure – Pairwise Jensen-Shannon Divergence • For continuous-valued attributes, define intervals based on observed values for the attribute in the data set

  13. Evaluation of AVTs • We use Attribute Value Taxonomies guided Naïve Bayes Learner (AVT-NBL) – (Zhang & Honavar, 2004) • Why AVT-NBL? • AVT-NBL offers an effective approach to learning compact (hence more comprehensible) accurate classifiers from AVTs and data

  14. AVT-NBL algorithm • Find the most accurate Naïve Bayes classifier using the most abstract attribute values • Same assumption as NBL that each attribute is independent of the other attributes given the class • Starting with the NBL that is based on the most abstract value of each attribute and successively refining the classifier (hypothesis) • Using a tradeoff criterion between the accuracy and complexity of the resulting classifier

  15. Overview • Background and Motivation • AVT-Learner Algorithm • Experimental Results • Summary

  16. Experimental Settings • Settings • AVT-NBL with AVT generated by AVT-Learner • AVT-NBL with human-supplied taxonomy • Naïve Bayes Learner (NBL) • Data sets: • 37 benchmark datasets from UCI Machine Learning Repository • Simulated missing attribute values • Use stratified 10-fold cross validation

  17. Performance comparisons • AVT-Learner generated AVTs vs. human-supplied AVTsvs. no AVTs (standard NBL) • AVT-Learner generated AVTs vs. no AVTs • Binary AVTs vs. k-ary AVTs

  18. Comparison with human-supplied AVTs • Compare performance between human-supplied AVTs and AVT-Learner generated AVTson Mushroom and Nursery datasets from UCI Repository • Explore the performance on datasets with different percentage (0%~50%) of simulated missing attribute values • Assume the missing values are uniformly distributed on the nominal attributes

  19. Figure 1(a). Error rate comparison of classifiers generated by NBL, AVT-NBL JS (Jensen-Shannon divergence, AVT-Learner generated), and AVT-NBL HT (human-supplied AVTs) on Mushroom data

  20. Figure 1(b). Size comparison of classifiers generated by NBL, AVT-NBL JS (Jensen-Shannon divergence, AVT-Learner generated), and AVT-NBL HT (human-supplied AVTs) on Mushroom data

  21. Result Shown from Figure 1(a) & 1(b) • In terms of the error rates and the size of the resulting classifiers, AVTs generated by AVT-Learner are competitive with human-supplied AVTs when used by AVT-NBL

  22. Further experiments • For most data sets, there are no human-supplied AVTs available • Compare performance between standard NBL and AVT-NBL with AVT-Learner generated AVTson 37 data sets from UCI

  23. Table 1. Comparison of accuracy and size of classifiers generated by standard NBL and AVT-NBL with AVT-Learner

  24. Results Shown from Table 1 • AVT-Learner can generate useful AVTs when no human-supplied AVTs are available (which is common in most application domains) • AVTs generated by AVT-Learner, when used by AVT-NBL, yield substantially more compact Naive Bayes Classifiers than those produced by NBL

  25. Binary vs. k-ary • The AVTs generated by AVT-Learner are basically binary trees • Does k-ary AVTs yield better results when used with AVT-NBL? • K-ary clustering by merging two internal nodes (parent-child pair) of AVT generated by binary clustering

  26. Merging internal nodes for 4-ary clustering Odor {m,s,y,f,c,p} Most similar! {s,y,f,c,p} {s,y,f} {a,l,n} {s,y} {c,p} {a,l} {m} {y} {s} {f} {c} {p} {a} {l} {n} Done!

  27. Table 2. Accuracycomparison of classifiers generated by AVT-NBL used with (a) 2-ary AVTs, (b) 3-ary AVTs, and (c) 4-ary AVTs

  28. Results Shown from Table 2 • AVT-NBL mostly works best when binary AVTs are used • Reducing internal nodes in AVTs will eventually reduce the search space of cuts in AVT-NBL, which leads to generating a less compact classifier

  29. Summary • Human-supplied AVTs are unavailable in many application domains AVT-Learner, a simple algorithm for automated construction of AVT from data • The AVTs generated by AVT-Learner are competitive with human-supplied AVTs in terms of both the error rate and size of the resulting classifiers. • AVT-Learner is effective in generating AVTs that when used by AVT-NBL, result in classifiers that are substantially more compact (and often more accurate) than those obtained by the standard Naive Bayes Learner (that does not use AVTs) on the domain where human-supplied AVTs are not available.

  30. Future Work • Extending AVT-Learner to learn AVTs that correspond to tangled hierarchies (which can be represented by directed acyclic graphs) • Learning AVT from data for a broad range of real world applications • Developing algorithms for learning hierarchical ontologies based on part-whole and other relations as opposed to ISA relations captured by an AVT

  31. Thank You! Questions?

More Related