Classifying Categorical Data

Classifying Categorical Data Risi Thonangi M.S. Thesis Presentation Advisor: Dr. Vikram Pudi

Presentation Outline • Introduction • Related Work • Contributions • ACME Classifier • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work

Presentation Outline • Introduction • The classification problem • Preliminaries • Related Work • Contributions • ACME Classifier • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work

The Classification Problem classes C = {c1,c2} attributes I = {i1 , i2 , i3 , i4}

The Classification Problem classes C = {c1,c2} attributes I = {i1 , i2 , i3 , i4} query

Formal Problem Statement • Given a Dataset D • Learn from this dataset to classify a potentially unseen record `q’ [query]to its correct class. • Each record riis explained using boolean attributes I = { i1 , i2 , …, i|I| } and is labeled to one of the classes C = { c1,c2, …, c|C| } • I = { i1 , i2 , …, i|I| } can also be looked at as a set of items.

Preliminaries • itemsetA set of items – { i1 , i2 , i3 } • P(.)Probability Distribution • frq-itemset An itemset whose frequency is above a given threshold σ • σSupport Threshold • τConfidence Threshold • { i1 , i2 } → { i3 } An Association Rule ( AR ) • { i1 , i2 } → c1 A Classification Association Rule ( CAR )

Presentation Outline • Introduction • Related Work • Classification based on Associations (CBA) • Classification based on Multiple Association Rules (CMAR) • Large Bayes (LB) Classifier • Contributions • ACME Classifier • Handling Non-Closed Itemsets • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work

Classification based on Associations (CBA) • [Bing Liu – KDD98] • First Classifier that used the paradigm of Association Rules • Steps in CBA: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Classify using the rule that satisfies the query and has the highest confidence

Classification based on Associations (CBA) • [Bing Liu – KDD98] • First Classifier that used the paradigm of Association Rules • Steps in CBA: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Classify using the rule that satisfies the query and has the highest confidence • Disadvantages: • Single rule based classification – Not Robust • Cannot handle Fully Confident Associations

Disadvantages with CBA: Single Rule based classification • Let the classifier have 3 rules : • i1 → c1 support: 0.3, confidence: 0.8 • i2 , i3 → c2 support: 0.7, confidence: 0.7 • i2 , i4 → c2 support: 0.8, confidence: 0.7 • Query { i1 , i2 , i3 , i4 } will be classified to the class c1 by CBA which might be incorrect. • CBA, being a single-rule classifier, cannot consider the effects of multiple-parameters.

Classification based on Associations (CBA) • [Bing Liu – KDD98] • First Classifier that used the paradigm of Association Rules • Steps in CBA: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Classify using the rule that satisfies the query and has the highest confidence • Disadvantages: • Single rule based classification – Not Robust • Cannot handle Fully Confident Associations

Fully Confident Associations • An Association { i1 , i2 } → { i3 } is fully confident if its confidence is 100%. This means P( ~i3 , i1 , i2 ) = 0. • If CBA includes the CAR { i1 , i2 , i3 } → c1it will also include{i1 , i2 } → c1 • If the query { i1 , i2 , ~i3 } arrives for classification, it is classified to c1 using {i1 , i2 } → c1 • But P (~i3 , i1 , i2 ) = 0 • CBA does not check for all statistical relationships.

Classification based on Multiple ARs (CMAR) • [WenminLi-ICDM01] • Uses multiple CARs in the classification step • Steps in CMAR: • Mine for CARs satisfying support and confidence thresholds • Sort all CARs based on confidence • Find all CARs which satisfy the given query • Group them based on their class label • Classify the query to the class whose group of CARs has the maximum weight

CMAR contd. R Rules satisfying query `q’ Output the class with the highest weight

CMAR Disadvantages • No proper statistical explanation given for the mathematical formulae that were employed • Cannot handle Fully Confident Associations

Large Bayes (LB) Classifier • [Meretakis-KDD99] • Build P( q|ci )using frequent itemsets in ci which are subsets of `q’ • Steps in LB: • Mine for frequent itemsets • Prune thefrequent itemsets • CalculateP( q ) using a product approximation • Classify to the class with the highest probability

LB: Pruning FRQ-Itemsets • An immediate itemset of `s’ without the item `i’ is denoted as `s\i ’ • Ex: s = { i1 , i2 } then s\i1 denotes the set { i2 } • Symbol ` I ‘ stands for interestingness • Pj,k ( s ) denotes the estimate of `s ‘ calculated from the frequencies of `s\j ‘ and `s\k ‘

LB Pruner: Disadvantages • Assumes an itemset’s interestingness does not depend on items not occurring in it. • Ex: For s = { i1 , i2 }, I( s )is only dependent on { i1 } and { i2 } but not on { i3 } • Assumes an itemset’s interestingness can be calculated from pairs of immediate-subsets • Uses a global information measure for all classes. • itemsets can be informative in one class but not in another.

Large Bayes (LB) Classifier • [Meretakis-KDD99] • Build P( q|ci )using frequent itemsets in ci which are subsets of `q’ • Steps in LB: • Mine for frequent itemsets • Prune thefrequent itemsets • CalculateP( q ) using a product approximation • Classify to the class with the highest probability

LB: Calculating P(q) • Approximately calculates P( q )using frequencies of frequent itemsets. • Ex: P( i1 , i2 , i3 , i4 , i5 ) = P( i2 , i5 ) · P( i3 | i5 ) · P( i1 , i4 | i2 ) • Following should be available: { i2 , i5 }, { i3 , i5 }, { i5 }, { i1 , i4 , i2 }, { i2 }

LB: Calculating P(q) • Approximately calculates P( q )using frequencies of frequent itemsets. • Ex: P( i1 , i2 , i3 , i4 , i5 ) = P( i2 , i5 ) · P( i3 | i5 ) · P( i1 , i4 | i2 ) • Following should be available: { i2 , i5 }, { i3 , i5 }, { i5 }, { i1 , i4 , i2 }, { i2 } • Iteratively select itemsets until all the items in ` q ‘ are covered. • There could be many product approximations.

LB: Calculating P(q) • Approximately calculates P( q )using frequencies of frequent itemsets. • Ex: P( i1 , i2 , i3 , i4 , i5 ) = P( i2 , i5 ) · P( i3 | i5 ) · P( i1 , i4 | i2 ) • Following should be available: { i2 , i5 }, { i3 , i5 }, { i5 }, { i1 , i4 , i2 }, { i2 } • Iteratively select itemsets until all the items in ` q ‘ are covered. • There could be many product approximations. • Heuristic: Select itemset ` s ‘ iteratively s.t. • new items in `s’ are the least. • If there are contenders, pick `s’ with the highest I ( s )

Estimating P(q): Disadvantages • LB calculates probability of q as if its an itemset. • Uses an approximationof P(q) and hence assumes independences between items • There could be a better product approximation. • Cannot handle Fully Confident Associations • if there exists a rule { i1 , i2 } → i3 , i.e. P( i1 , i2 , i3 ) = P ( i1 , i2 ) • q = { i1 , i2 , i4 } ‘s product approximation is built as P ( i1 , i2 )· P ( i4 | i1 , i2 )

Presentation Outline • Introduction • Related Work • Contributions • ACME Classifier • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work

Contributions • Frequent Itemsets + Maximum Entropy = very accurate, robust and theoretically appealing classifier • Fixed the existing Maximum Entropy model to work with frequent itemsets • Made the approach scalable to large databases • Proved that Naïve Bayes is a specialization of ACME

Presentation Outline • Introduction • Related Work • Contributions • ACME Classifier • Philosophy • Steps involved • Handling Non-Closed Itemsets • Improving the Execution time • ACME and Naïve Bayes • Experimental Results • Conclusions and Future Work

ACME Classifier • Main philosophy of the ACME: Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset • ACME has three steps: • Mining step • Learning step • Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

Mining Informative Patterns D Records labeled as c1 Records labeled as c2 D1 D2

Mining Informative Patterns Di Apriori Constraints of class ci Confidence based Pruner Non-Redundant Constraints of class ci

Mining constraints • Let Sidenote the set of itemsets which are frequent in ci, i.e.: • Let S denote the set of itemsets which are frequent in atleast one class, i.e.: • Constraints of ci denoted by Ci are:

Pruning constraints • Constraints are pruned based on how well they differentiate between classes. • Ex: s={ i1 , i2 } an itemset in S, is pruned if • a case when it is not pruned

ACME Classifier • Main philosophy of the ACME: • ACME does not explicitly assume any independence whatsoever if it is not represented in the dataset • ACME has three steps: • Mining step • Learning step • Classifying step Use Data Mining principles to mine informative patterns from a dataset and build a Statistical Model using these patterns.

Learning from constraints Will be used in the classification step Available from the mining step Statistical Distribution of Class ci(Pi) Constraints of Class ci(Ci)

Learning from constraints Will be used in the classification step Available from the mining step Statistical Distribution of Class ci(Pi) How? Constraints of Class ci(Ci)

Learning from constraints Will be used in the classification step Available from the mining step Statistical Distribution of Class ci(Pi) How? Constraints of Class ci(Ci) Characteristic of Pi : Should satisfy every constraint in Ci

An Example Output of Mining Step for c1 C1 = { ( i1 , 0.5 ) , ( i2 , 0.5 ) }

An Example Output of Mining Step for c1 C1 = { ( i1 , 0.5 ) , ( i2 , 0.5 ) } choose the distribution with the highest entropy.

Learning step • The final outcome solution distribution Pfor class ci of the Learning Step should: • Satisfy constraints of the class • Have the highest entropy possible

Learning step • The final outcome solution distribution Pfor class ci of the Learning Step should: • Satisfy constraints of the class • Have the highest entropy possible • [Good-AMS63] • P can be modeled as the following log-linear model.

Classifying Categorical Data

Classifying Categorical Data

Presentation Transcript

Categorical Data Analysis

Categorical Data

Categorical Data Analysis

Categorical Data

Categorical Data

Analyzing categorical data

Displaying Categorical data

Categorical Data

Displaying Categorical Data

Explore Categorical Data

Categorical Data

Interpreting Categorical Data

Categorical Data

Categorical Data

Categorical Data

Categorical Data Analysis

Categorical Data

Analyzing Categorical Data

Categorical Data Analysis

Classifying Categorical Data

Categorical Data Analysis

Categorical data