1 / 16

Demo: Classification Programs C4.5 CBA

Demo: Classification Programs C4.5 CBA. Minqing Hu CS594 Fall 2003 UIC. C4.5. Classification using decision tree. Where to find the program? C4.5 Release 8: by Ross Quinlan http://www.cse.unsw.edu.au/~quinlan/ Running under Unix

sheenad
Download Presentation

Demo: Classification Programs C4.5 CBA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Demo:Classification ProgramsC4.5CBA Minqing Hu CS594 Fall 2003 UIC

  2. C4.5 • Classification using decision tree. • Where to find the program? • C4.5 Release 8: by Ross Quinlan • http://www.cse.unsw.edu.au/~quinlan/ • Running under Unix • Reference book: “C4.5: programs for machine learning” J.Ross Quinlan

  3. C4.5 Files • Names files (filestem.names) • provides names for classes, attributes, and attribute values. • Consists of a series of entries, each starting on a new line and ending with a period. • The first entry gives the class names, separated by commas. • The rest of the files consists a single entry for each attribute. • Begins with the attribute name followed by a colon, then a specification of the values that the attribute can take. • Four specifications are possible: • ignore; causes the value of the attribute to be disregarded • continuous; attribute has numeric values • discreteN; N is a positive integer, specifies that the attribute has no more than N discrete values • A list of names separated by commas;

  4. Example: Golf.names Play, Don't Play. | class labels outlook: sunny, overcast, rain. temperature: continuous. humidity: continuous. windy: true, false.

  5. C4.5 Files (cont) • Data file (filestem.data) • Data file describe the training cases for generating the decision tree and/or rules • Each line describe one case, providing values for all the attributes and then the case’s class, separated by commas and terminated by a period • Attribute values must appear in the same order that the attributes were given in the names file • For missing or unknown data, use ? to specify • Test file (filestem.test) • Use to evaluate the classifier you have produced • In exactly the same format as the data file

  6. Example:Golf.data | outlook, temperature, humidity, windy, class label sunny, 85, 85, false, Don't Play sunny, 80, 90, true, Don't Play overcast, 83, 78, false, Play rain, 70, 96, ?, Play rain, 68, ?, false, Play rain, 65, 70, true, Don't Play overcast, 64, 65, true, Play sunny, 72, 95, false, Don't Play sunny, 69, 70, false, Play overcast, 72, 90, true, Play overcast, 81, 75, false, Play rain, 71, 80, true, Don't Play

  7. Running the programs • C4.5: decision tree generation “c4.5 –f filestem [-u]” -f filestem (Default: DF) used to specify the filestem of the task -u (Default: no test set) This option is invoked when a test file has been prepared Example: only training: “c4.5 –f ../Data/vote” training and testing: “c4.5 –f ../Data/vote –u”

  8. c4.5 output C4.5 [release 8] decision tree generator Fri Sep 12 12:02:31 2003 ---------------------------------------- Options: File stem <../Data/vote> Read 300 cases (16 attributes) from ../Data/vote.data Decision Tree: physician fee freeze = n: | adoption of the budget resolution = y: democrat (151.0) | adoption of the budget resolution = u: democrat (1.0) | adoption of the budget resolution = n: | | education spending = n: democrat (6.0) | | education spending = y: democrat (9.0) | | education spending = u: republican (1.0) physician fee freeze = y: | synfuels corporation cutback = n: republican (97.0/3.0) | synfuels corporation cutback = u: republican (4.0) | synfuels corporation cutback = y: | | duty free exports = y: democrat (2.0) | | duty free exports = u: republican (1.0) | | duty free exports = n: | | | education spending = n: democrat (5.0/2.0) | | | education spending = y: republican (13.0/2.0) | | | education spending = u: democrat (1.0) physician fee freeze = u: | water project cost sharing = n: democrat (0.0) | water project cost sharing = y: democrat (4.0) | water project cost sharing = u: | | mx missile = n: republican (0.0) | | mx missile = y: democrat (3.0/1.0) | | mx missile = u: republican (2.0) • The numbers at the leaves, in the form (N) or (N/E) • N is the sum of cases that reach the leaf • E is the number of cases that belong to the classes other than the nominated class

  9. c4.5 output(cont) Simplified Decision Tree: physician fee freeze = n: democrat (168.0/2.6) physician fee freeze = y: republican (123.0/13.9) physician fee freeze = u: | mx missile = n: democrat (3.0/1.1) | mx missile = y: democrat (4.0/2.2) | mx missile = u: republican (2.0/1.0)

  10. c4.5 output(cont) Evaluation on training data (300 items): Before Pruning After Pruning ---------------- --------------------------- Size Errors Size Errors Estimate 25 8( 2.7%) 7 13( 4.3%) ( 6.9%) << Evaluation on test data (135 items): Before Pruning After Pruning ---------------- --------------------------- Size Errors Size Errors Estimate 25 7( 5.2%) 7 4( 3.0%) ( 6.9%) << (a) (b) <-classified as ---- ---- 80 3 (a): class democrat 1 51 (b): class republican

  11. Running the programs (cont) • C4.5rules: rule induction Should only be used after running the decision tree program c4.5, since it reads the unpruned file containning the unprunned tree. “c4.5rules –f filestem [-u]” Example: c4.5rules –f ../Data/vote

  12. c4.5rules output C4.5 [release 8] rule generator Fri Sep 12 12:07:10 2003 ------------------------------- Options: File stem <../Data/vote> Read 300 cases (16 attributes) from ../Data/vote ------------------ Processing tree 0 Final rules from tree 0: Rule 2: physician fee freeze = n -> class democrat [98.4%] Rule 9: synfuels corporation cutback = y duty free exports = y -> class democrat [97.5%] … Rule 13: physician fee freeze = u mx missile = u -> class republican [50.0%] Default class: democrat

  13. Evaluation on training data (300 items): Rule Size Error Used Wrong Advantage ---- ---- ----- ---- ----- --------- 2 1 1.6% 168 1 (0.6%) -1 (0|1) democrat 9 2 2.5% 3 0 (0.0%) 0 (0|0) democrat 11 2 29.3% 3 0 (0.0%) 0 (0|0) democrat 5 2 5.2% 97 3 (3.1%) 21 (23|2) republican 7 3 6.0% 15 2 (13.3%) 11 (13|2) republican 3 2 18.0% 2 0 (0.0%) 2 (2|0) republican 13 2 50.0% 2 0 (0.0%) 2 (2|0) republican Drop rule 2 Rule Size Error Used Wrong Advantage ---- ---- ----- ---- ----- --------- 9 2 2.5% 54 0 (0.0%) 0 (0|0) democrat 11 2 29.3% 3 0 (0.0%) 0 (0|0) democrat 5 2 5.2% 97 3 (3.1%) 21 (23|2) republican 7 3 6.0% 15 2 (13.3%) 11 (13|2) republican 3 2 18.0% 3 0 (0.0%) 3 (3|0) republican 13 2 50.0% 2 0 (0.0%) 2 (2|0) republican Tested 300, errors 9 (3.0%) << (a) (b) <-classified as ---- ---- 179 5 (a): class democrat 4 112 (b): class republican Evaluation on test data (135 items): Rule Size Error Used Wrong Advantage ---- ---- ----- ---- ----- --------- 9 2 2.5% 24 2 (8.3%) 0 (0|0) democrat 11 2 29.3% 1 0 (0.0%) 0 (0|0) democrat 5 2 5.2% 41 0 (0.0%) 6 (6|0) republican 7 3 6.0% 8 3 (37.5%) 2 (5|3) republican 3 2 18.0% 2 0 (0.0%) 2 (2|0) republican Tested 135, errors 7 (5.2%) << (a) (b) <-classified as ---- ---- 80 3 (a): class democrat 4 48 (b): class republican c4.5rules output(cont)

  14. confusion matrix & error rate error rate of this classifier (4+3)/(83+52) = 5.2%

  15. CBA • Classification Based on Association • Download at http://www.comp.nus.edu.sg/~dm2 • Use same data types as c4.5,i.e., *.names, *.data, and *.test • Refer to help topics • Discretization function, The discretization program sometime is not compatible with some systems, if errors occurs, then try to use the DOS version of the discretizer under the CBA directory. “discretize”

  16. Data Repository online • UCI machine learning repository http://www.ics.uci.edu/~mlearn/MLRepository.html

More Related