Classification

Classification Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009

Classification—A Two-Step Process • Classification • classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data • predicts categorical class labels (discrete or nominal)

Classification • Typical applications • Credit approval • Target marketing • Medical diagnosis • Fraud detection • And much more

Decision Tree • Decision Tree induction is the learning of decision trees from class-labeled training tuples • A decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute • Each Branch represents an outcome of the test • Each Leafnode holds a class label

Decision Tree Example

Decision Tree Algorithm • Basic algorithm (a greedy algorithm) • Tree is constructed in a top-down recursive divide-and-conquer manner • At start, all the training examples are at the root • Attributes are categorical (if continuous-valued, they are discretized in advance) • Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)

Attribute Selection Measure: Information Gain (ID3/C4.5) • Select the attribute with the highest information gain • Let pi be the probability that an arbitrary tuple in D belongs to class Ci, estimated by |Ci, D|/|D| • Expected information (entropy) needed to classify a tuple in D:

Attribute Selection Measure: Information Gain (ID3/C4.5) • Information needed (after using A to split D into v partitions) to classify D: • Information gained by branching on attribute A

Decision Tree

Decision Tree • means “age <=30” has 5 out of 14 samples, with 2 yes’s and 3 no’s. • I(2,3) = -2/5 * log(2/5) – 3/5 * log(3/5)

Decision Tree • Similarily, we can compute • Gain(income)=0.029 • Gain(student)=0.151 • Gain(credit_rating)=0.048 • Since “student” obtains highest information gain, we can partition the tree into:

Decision Tree

Classification

Classification

Presentation Transcript

Classification

Classification

Classification

Classification

Classification

Classification

Classification

Classification

CLASSIFICATION

Classification

Classification Techniques: Bayesian Classification

CLASSIFICATION

Classification

Classification

Classification

Classification

Classification