1 / 10

Classification by Machine Learning Approaches - Exercise Solution

Classification by Machine Learning Approaches - Exercise Solution. Michael J. Kerner – kerner@cbs.dtu.dk Center for Biological Sequence Analysis Technical University of Denmark. Exercise Solution:. donors_trainset.arff - All features: trees.J48 === Stratified cross-validation ===

manju
Download Presentation

Classification by Machine Learning Approaches - Exercise Solution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification by Machine Learning Approaches- Exercise Solution Michael J. Kerner – kerner@cbs.dtu.dk Center for Biological Sequence Analysis Technical University of Denmark

  2. Exercise Solution: • donors_trainset.arff - All features: • trees.J48 • === Stratified cross-validation === • === Summary === • Correctly Classified Instances 4972 94.5967 % • Incorrectly Classified Instances 284 5.4033 % • Kappa statistic 0.8381 • === Detailed Accuracy By Class === • TP Rate FP Rate Precision Recall F-Measure Class • 0.87 0.034 0.875 0.87 0.872 true • 0.966 0.13 0.965 0.966 0.966 false • === Confusion Matrix === • a b <-- classified as • 971 145 | a = true • 139 4001 | b = false

  3. Exercise Solution: • donors_trainset.arff - All features: • bayes.NaiveBayes • === Stratified cross-validation === • === Summary === • Correctly Classified Instances 4910 93.417 % • Incorrectly Classified Instances 346 6.583 % • Kappa statistic 0.8056 • === Detailed Accuracy By Class === • TP Rate FP Rate Precision Recall F-Measure Class • 0.862 0.046 0.834 0.862 0.848 true • 0.954 0.138 0.962 0.954 0.958 false • === Confusion Matrix === • a b <-- classified as • 962 154 | a = true • 192 3948 | b = false

  4. Exercise Solution: • donors_trainset.arff - All features: • functions.SMO • === Stratified cross-validation === • === Summary === • Correctly Classified Instances 4986 94.863 % • Incorrectly Classified Instances 270 5.137 % • Kappa statistic 0.8455 • === Detailed Accuracy By Class === • TP Rate FP Rate Precision Recall F-Measure Class • 0.871 0.03 0.885 0.871 0.878 true • 0.97 0.129 0.965 0.97 0.967 false • === Confusion Matrix === • a b <-- classified as • 972 144 | a = true • 126 4014 | b = false

  5. Exercise Solution: donors_trainset.arff Binary Feature Encoding donors_trainset_diffencod.arff Fewer featuresFour (nominal) values per feature • @RELATION donors.train • @ATTRIBUTE -7_A {0,1} • @ATTRIBUTE -7_T {0,1} • @ATTRIBUTE -7_C {0,1} • [...] • @ATTRIBUTE 6_A {0,1} • @ATTRIBUTE 6_T {0,1} • @ATTRIBUTE 6_C {0,1} • @ATTRIBUTE 6_G {0,1} • @ATTRIBUTE class {true,false} • @DATA • 0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,true • 0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,true • [...] • @RELATION donors.train • @ATTRIBUTE -7 {A,C,G,T} • @ATTRIBUTE -6 {A,C,G,T} • @ATTRIBUTE -5 {A,C,G,T} • @ATTRIBUTE -4 {A,C,G,T} • [...] • @ATTRIBUTE +3 {A,C,G,T} • @ATTRIBUTE +4 {A,C,G,T} • @ATTRIBUTE +5 {A,C,G,T} • @ATTRIBUTE +6 {A,C,G,T} • @ATTRIBUTE splicesite {true,false} • @DATA • C,T,C,C,G,A,A,A,G,G,A,T,T,true • T,C,A,G,A,A,G,G,A,G,G,G,C,true • T,T,G,G,A,A,G,T,C,G,C,A,G,true • [..]

  6. Exercise Solution: • donors_trainset_diffencod.arff - All features: • trees.J48 • === Stratified cross-validation === • === Summary === • Correctly Classified Instances 4948 94.14 % • Incorrectly Classified Instances 308 5.86 % • Kappa statistic 0.8248 • === Detailed Accuracy By Class === • TP Rate FP Rate Precision Recall F-Measure Class • 0.862 0.037 0.862 0.862 0.862 true • 0.963 0.138 0.963 0.963 0.963 false • === Confusion Matrix === • a b <-- classified as • 962 154 | a = true • 154 3986 | b = false

  7. Exercise Solution: • donors_trainset_diffencod.arff - All features: • bayes.NaiveBayes • === Stratified cross-validation === • === Summary === • Correctly Classified Instances 4922 93.6454 % • Incorrectly Classified Instances 334 6.3546 % • Kappa statistic 0.8078 • === Detailed Accuracy By Class === • TP Rate FP Rate Precision Recall F-Measure Class • 0.834 0.036 0.862 0.834 0.848 true • 0.964 0.166 0.956 0.964 0.96 false • === Confusion Matrix === • a b <-- classified as • 931 185 | a = true • 149 3991 | b = false

  8. Exercise Solution: • donors_trainset_diffencod.arff - All features: • functions.SMO • === Stratified cross-validation === • === Summary === • Correctly Classified Instances 4986 94.863 % • Incorrectly Classified Instances 270 5.137 % • Kappa statistic 0.8456 • === Detailed Accuracy By Class === • TP Rate FP Rate Precision Recall F-Measure Class • 0.872 0.031 0.885 0.872 0.878 true • 0.969 0.128 0.966 0.969 0.967 false • === Confusion Matrix === • a b <-- classified as • 973 143 | a = true • 127 4013 | b = false

  9. Exercise Solution: • Feature Selection: • CfsSubsetEval, BestFirst: • Features -2A, -1G, 1A, 2A, 3_G • CorrelationCoefficients: • J48: 0.7981 • NaiveBayes: 0.7762 • SMO: 0.7388 • MultilayerPerceptron: 0.8053 • ClassifierSubsetEval (w/ NaiveBayes), BestFirst: • Features: -7A, -7C, -6G, -4A, -1G, 1A, 1T, 1C, 2A, 3G, 4T, 5A • CorrelationCoefficients: • J48: 0.7935 • NaiveBayes: 0.8033 • SMO: 0.7597 • MultilayerPerceptron: 0.7765

  10. Summary • Generally, there is no ‘best’ method for all problems. • Feature representation can influence classification results. • Feature selection often improves classification performance, but not always. • Feature selection significantly speeds up classification – thereby allowing also computationally very demanding classifiers Always try to test multiple methods!

More Related