Loading in 5 sec....

Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact ClassifiersPowerPoint Presentation

Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers

- By
**bunme** - Follow User

- 123 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers' - bunme

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers

Dae-Ki Kang,Adrian Silvescu, Jun Zhang and Vasant Honavar

Artificial Intelligence Research Laboratory

Iowa State University, USA

This research is sponsored in part by grants from the National Science Foundation (IIS 0219699) and National Institutes of Health (GM066387)

Paper Highlights Data-Driven Construction of Accurate and Compact Classifiers

- AVT-Learner, an algorithm for automated construction of attribute value taxonomiesfrom data
- Evaluation of the AVTs generated by AVT-Learner using AVT-aware learning algorithm on benchmark data sets

Overview Data-Driven Construction of Accurate and Compact Classifiers

- Background and Motivation
- AVT-Learner Algorithm
- Experimental Results
- Summary

Attribute Value Taxonomies (AVT) Data-Driven Construction of Accurate and Compact Classifiers-- ISA hierarchies

Human-supplied Attribute Value Taxonomy (AVT) for student status

ISA relationship

Abstract Values

Primitive Values

Cut

Motivations for learning AVT Data-Driven Construction of Accurate and Compact Classifierssfrom data

- Learning from AVTs and data has severaladvantages (Zhang & Honavar; 2003, 2004)
- Preference for simple, comprehensible, yet accurate and robust classifiers
- When data are limited, statistics estimated from abstract values are often more reliable than statistics estimated from primitive values

- However, in most domains, AVTs are usually unavailable, and manual AVT generation is tedious
- Need to generate AVTs that are useful for classification tasks

Learning Scenario with AVT-Learner Data-Driven Construction of Accurate and Compact Classifiers

Overview Data-Driven Construction of Accurate and Compact Classifiers

- Background and Motivation
- AVT-Learner Algorithm
- Experimental Results
- Summary

AVT-L Data-Driven Construction of Accurate and Compact Classifiersearner

- An algorithm for automated construction of AVTs from a data set of instances wherein each instance is described by an ordered tuple of N nominal attribute values and a class label
- Hierarchical agglomerative clustering (HAC) of the attribute values according to the distribution of classes that co-occur with them
- Using the pairwise divergence between the distributions of class labels associated with the corresponding attribute values as a measure of the dissimilarity between the attribute values.

Problem Definitions Data-Driven Construction of Accurate and Compact Classifiers

- A={A1,A2,…,An} – nominal attributes
- Vi– a set of primitive values of attribute Ai
- C={C1,C2,…,Ck} – mutually disjoint class labels
- Data D V1 V2 …Vn C
- T={T1,T2,…,Tn} – a set of AVT s.t. Ti is an AVT associated with the attribute Ai
- Learning AVTs from data – given a data set D and similarity measure DM(P(x)||Q(x)), output a set of AVTs T={T1,T2,…,Tn} s.t. each Ti corresponds to a hierarchical grouping of values in Vi based on the specified similarity measure

Major steps of AVT Learner Data-Driven Construction of Accurate and Compact Classifiers

- Initialize Cut L = {vi1 ,… vij ,… vil}
- Compare and choose
- For each value lj in L and class label ck, calculate class label conditional probability distribution P(C|lj)
- Find (x,y)=argmin(DM(P(C|lx), P(C|ly))), x≠y

- Merge and update
- lxy← lxly
- L←L\{lx,ly}{lxy}

- Loop until |L|>1

AVT Data-Driven Construction of Accurate and Compact Classifiers Construction for Odor attribute

Odor

{m,s,y,f,c,p}

{s,y,f,c,p}

Most similar!

{s,y,f}

{a,l,n}

{s,y}

{c,p}

{a,l}

{m}

{y}

{s}

{f}

{c}

{p}

{a}

{l}

{n}

Done!

More about Data-Driven Construction of Accurate and Compact ClassifiersAVT Learner

- Similarity measure – Pairwise Jensen-Shannon Divergence
- For continuous-valued attributes, define intervals based on observed values for the attribute in the data set

Evaluation of AVTs Data-Driven Construction of Accurate and Compact Classifiers

- We use Attribute Value Taxonomies guided Naïve Bayes Learner (AVT-NBL) – (Zhang & Honavar, 2004)
- Why AVT-NBL?
- AVT-NBL offers an effective approach to learning compact (hence more comprehensible) accurate classifiers from AVTs and data

AVT- Data-Driven Construction of Accurate and Compact ClassifiersNBL algorithm

- Find the most accurate Naïve Bayes classifier using the most abstract attribute values
- Same assumption as NBL that each attribute is independent of the other attributes given the class
- Starting with the NBL that is based on the most abstract value of each attribute and successively refining the classifier (hypothesis)
- Using a tradeoff criterion between the accuracy and complexity of the resulting classifier

Overview Data-Driven Construction of Accurate and Compact Classifiers

- Background and Motivation
- AVT-Learner Algorithm
- Experimental Results
- Summary

Experimental Settings Data-Driven Construction of Accurate and Compact Classifiers

- Settings
- AVT-NBL with AVT generated by AVT-Learner
- AVT-NBL with human-supplied taxonomy
- Naïve Bayes Learner (NBL)

- Data sets:
- 37 benchmark datasets from UCI Machine Learning Repository
- Simulated missing attribute values

- Use stratified 10-fold cross validation

P Data-Driven Construction of Accurate and Compact Classifierserformance comparisons

- AVT-Learner generated AVTs vs. human-supplied AVTsvs. no AVTs (standard NBL)
- AVT-Learner generated AVTs vs. no AVTs

- Binary AVTs vs. k-ary AVTs

Comparison with human-supplied AVTs Data-Driven Construction of Accurate and Compact Classifiers

- Compare performance between human-supplied AVTs and AVT-Learner generated AVTson Mushroom and Nursery datasets from UCI Repository
- Explore the performance on datasets with different percentage (0%~50%) of simulated missing attribute values
- Assume the missing values are uniformly distributed on the nominal attributes

Figure 1(a) Data-Driven Construction of Accurate and Compact Classifiers. Error rate comparison of classifiers generated by NBL, AVT-NBL JS (Jensen-Shannon divergence, AVT-Learner generated), and AVT-NBL HT (human-supplied AVTs) on Mushroom data

Figure 1(b) Data-Driven Construction of Accurate and Compact Classifiers. Size comparison of classifiers generated by NBL, AVT-NBL JS (Jensen-Shannon divergence, AVT-Learner generated), and AVT-NBL HT (human-supplied AVTs) on Mushroom data

Result Shown from Data-Driven Construction of Accurate and Compact ClassifiersFigure 1(a) & 1(b)

- In terms of the error rates and the size of the resulting classifiers, AVTs generated by AVT-Learner are competitive with human-supplied AVTs when used by AVT-NBL

Further experiments Data-Driven Construction of Accurate and Compact Classifiers

- For most data sets, there are no human-supplied AVTs available
- Compare performance between standard NBL and AVT-NBL with AVT-Learner generated AVTson 37 data sets from UCI

Table 1. Comparison of Data-Driven Construction of Accurate and Compact Classifiersaccuracy and size of classifiers generated by standard NBL and AVT-NBL with AVT-Learner

Results Shown from Table Data-Driven Construction of Accurate and Compact Classifiers1

- AVT-Learner can generate useful AVTs when no human-supplied AVTs are available (which is common in most application domains)
- AVTs generated by AVT-Learner, when used by AVT-NBL, yield substantially more compact Naive Bayes Classifiers than those produced by NBL

Binary vs. k-ary Data-Driven Construction of Accurate and Compact Classifiers

- The AVTs generated by AVT-Learner are basically binary trees
- Does k-ary AVTs yield better results when used with AVT-NBL?
- K-ary clustering by merging two internal nodes (parent-child pair) of AVT generated by binary clustering

Merging internal nodes for 4-ary clustering Data-Driven Construction of Accurate and Compact Classifiers

Odor

{m,s,y,f,c,p}

Most similar!

{s,y,f,c,p}

{s,y,f}

{a,l,n}

{s,y}

{c,p}

{a,l}

{m}

{y}

{s}

{f}

{c}

{p}

{a}

{l}

{n}

Done!

Table Data-Driven Construction of Accurate and Compact Classifiers2. Accuracycomparison of classifiers generated by AVT-NBL used with (a) 2-ary AVTs, (b) 3-ary AVTs, and (c) 4-ary AVTs

Results Shown from Table Data-Driven Construction of Accurate and Compact Classifiers2

- AVT-NBL mostly works best when binary AVTs are used
- Reducing internal nodes in AVTs will eventually reduce the search space of cuts in AVT-NBL, which leads to generating a less compact classifier

Summary Data-Driven Construction of Accurate and Compact Classifiers

- Human-supplied AVTs are unavailable in many application domains AVT-Learner, a simple algorithm for automated construction of AVT from data
- The AVTs generated by AVT-Learner are competitive with human-supplied AVTs in terms of both the error rate and size of the resulting classifiers.
- AVT-Learner is effective in generating AVTs that when used by AVT-NBL, result in classifiers that are substantially more compact (and often more accurate) than those obtained by the standard Naive Bayes Learner (that does not use AVTs) on the domain where human-supplied AVTs are not available.

Future Work Data-Driven Construction of Accurate and Compact Classifiers

- Extending AVT-Learner to learn AVTs that correspond to tangled hierarchies (which can be represented by directed acyclic graphs)
- Learning AVT from data for a broad range of real world applications
- Developing algorithms for learning hierarchical ontologies based on part-whole and other relations as opposed to ISA relations captured by an AVT

Thank You! Data-Driven Construction of Accurate and Compact Classifiers Questions?

Download Presentation

Connecting to Server..