classification n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Classification PowerPoint Presentation
Download Presentation
Classification

Loading in 2 Seconds...

play fullscreen
1 / 14

Classification - PowerPoint PPT Presentation


  • 131 Views
  • Uploaded on

Classification . Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009. Classification— A Two-Step Process. Classification classifies data (constructs a model) based on the training set and the values ( class labels ) in a classifying attribute and uses it in classifying new data

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Classification' - afi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
classification

Classification

Dr. Bernard Chen Ph.D.

University of Central Arkansas

Fall 2009

classification a two step process
Classification—A Two-Step Process
  • Classification
    • classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data
    • predicts categorical class labels (discrete or nominal)
classification1
Classification
  • Typical applications
    • Credit approval
    • Target marketing
    • Medical diagnosis
    • Fraud detection
    • And much more
decision tree
Decision Tree
  • Decision Tree induction is the learning of decision trees from class-labeled training tuples
  • A decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute
  • Each Branch represents an outcome of the test
  • Each Leafnode holds a class label
decision tree algorithm
Decision Tree Algorithm
  • Basic algorithm (a greedy algorithm)
    • Tree is constructed in a top-down recursive divide-and-conquer manner
    • At start, all the training examples are at the root
    • Attributes are categorical (if continuous-valued, they are discretized in advance)
    • Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)
attribute selection measure information gain id3 c4 5
Attribute Selection Measure: Information Gain (ID3/C4.5)
  • Select the attribute with the highest information gain
  • Let pi be the probability that an arbitrary tuple in D belongs to class Ci, estimated by |Ci, D|/|D|
  • Expected information (entropy) needed to

classify a tuple in D:

attribute selection measure information gain id3 c4 51
Attribute Selection Measure: Information Gain (ID3/C4.5)
  • Information needed (after using A to split D into v partitions) to classify D:
  • Information gained by branching on attribute A
decision tree2
Decision Tree
  • means “age <=30” has 5 out of 14 samples, with 2 yes’s and 3 no’s.
  • I(2,3) = -2/5 * log(2/5) – 3/5 * log(3/5)
decision tree3
Decision Tree
  • Similarily, we can compute
    • Gain(income)=0.029
    • Gain(student)=0.151
    • Gain(credit_rating)=0.048
  • Since “student” obtains highest information gain, we can partition the tree into: