1 / 14


Turorial#3. classification. A Tree Classification algorithm is used to compute a decision tree. Decision trees are easy to understand and modify, and the model developed can be expressed as a set of decision rules. classification.

Download Presentation


An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Turorial#3

  2. classification • A Tree Classification algorithm is used to compute a decision tree. Decision trees are easy to understand and modify, and the model developed can be expressed as a set of decision rules.

  3. classification • By classifying larger data sets, you will be able to improve the accuracy of the Classification model. In Classification, the given situation is a set of example records, called a training set, where each record consists of several fields or attributes. Attributes are either numerical (coming from an ordered domain), or categorical (coming from an unordered domain). One of the attributes, called the class label field (target field), indicates the class to which each example belongs.

  4. classification • A Decision Tree model contains rules to predict the target variable. • The Tree Classification algorithm (ID3).

  5. ID3 Algorithm • First: Calculate Entropy (s) for all data: • Second: Try all attribute and calculate Gain for each one. • Third: Build a tree starting division with maximum Gain.

  6. Example

  7. Hair length Weight Age

  8. 9 Persons Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 no yes Hair Length <4? 3 Males 4 Females, 2Males Let us try splitting on Hair length Entropy(4F,2M) = -(4/6)log2(4/6) - (2/6)log2(2/6) = 0.92 Entropy(0F,3M) = -(0/3)log2(0/3) - (3/3)log2(3/3) = 0 Gain(Hair Length < 4) = 0.9911 – (3/9 * 0+ 6/9 * 0.92) = 0.3789

  9. 9 Persons Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 no yes Weight < 170? 4 Females, 1 Male 4 Males Let us try splitting on Weight Entropy(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4) = 0 Entropy(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5) = 0.7219 Gain(Weight < 170) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900

  10. 9 Persons Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 no yes age <= 40? 3 Females, 3 Males 1 Female, 2 Males Let us try splitting on Age Entropy(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3) = 0.9183 Entropy(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6) = 1 Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183

  11. Decision Tree: 9 Persons no yes Weight < 170? 1 Male 4 Females 4 Males no yes Hair Length < 4? 1 Male 4 Females

  12. Weight < 170? Convert Decision Trees to rules… yes no Hair Length < 4? Male no yes Male Female Rules to Classify Males/Females IfWeightgreater thanorequal 170, classify as Male ElseifHair Lengthless than 4, classify as Male Else classify as Female

  13. Try weka Program • Insert same data (in file test.csv) in example to weka and show the same tree.

  14. References: • Quinlan, J.R. 1986, Machine Learning, 1, 81 • http://dms.irb.hr/tutorial/tut_dtrees.php • http://www.dcs.napier.ac.uk/~peter/vldb/dm/node11.html • http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/4_dtrees2.html • Professor Sin-Min Lee, SJSU. http://cs.sjsu.edu/~lee/cs157b/cs157b.html

More Related