1 / 14

W E K A

W E K A. W aikato E nvironment for K nowledge A quisition. Goals of the workshop. evaluate & interpret the results. identifying a problem. Write seminar work. apply to data. choose appropriate DM technique. transform into data.

Download Presentation

W E K A

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. W E K A Waikato Environment for Knowledge Aquisition

  2. Goals of the workshop evaluate & interpret the results identifying a problem Write seminar work apply to data choose appropriate DM technique transform into data • Aquisition of functional knowledge about the WEKA platform • Ability of processing (own) data in WEKA

  3. What is WEKA ? Some basic facts about WEKA: • WEKA(1) = a flightless bird with an inquisitive nature (found only on the islands of New Zealand) • WEKA(2) = a software ‘workbench’ incorporating several standard ML/DM techniques • Authors = Ian H. Witten, Eibe Frank (et. al.) • Programminglanguage = JAVA • Origin = The Universityof Waikato, NewZealand • Literature = Ian H. Witten, Eibe Frank: Practical Machine Learning Tools with JAVA Implementations, Morgan Kaufmann, 1999 • Homepage = http://www.cs.waikato.ac.nz/~ml/weka

  4. Objectives of WEKA • make ML/DM techniques generally available • apply them to practical problems (in agriculture) • develop new ML/DM algorithms • contribute to the theoretical frameworkof the field (ML/DM)

  5. Versions of WEKA • There are several versions of WEKA: • WEKA 3.0: “book version” compatible with description in data mining book • WEKA 3.2: “GUI version” adds graphical user interfaces (book version is command-line only) • WEKA 3.4: “development version” with lots of improvements • This workshop is based on WEKA 3.4(.3)

  6. The input to WEKA ARFF format (“flat” files): • example: Play-tennis domain %this is an example of a knowledge %domain in ARFF format @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes . . . Conversion to the ARFF format ? • Example: • converting from • MS-EXCEL to ARFF

  7. Starting WEKA – the GUI

  8. A quick tour of the “explorer” • Preprocess panel Filters panel Domain info. panel Attribute info. panel Attributes panel Attribute visualization panel Status bar Log file

  9. A quick tour of the “explorer” • Classify panel Output panel Classifier panel Test options panel Class attribute Result panel

  10. A quick tour of the “explorer” • Visualize panel

  11. The command line C:\Temp>java weka.classifiers.trees.J48 Weka exception: No training file and no object input file given. General options: -t <name of training file> Sets training file. -T <name of test file> Sets test file. If missing, a cross-validation will be performed on the training data. -c <class index> Sets index of class attribute (default: last). -x <number of folds> Sets number of folds for cross-validation (default: 10). -s <random number seed> Sets random number seed for cross-validation (default: 1). -m <name of file with cost matrix> Sets file with cost matrix. -l <name of input file> Sets model input file. -d <name of output file> Sets model output file. -v Outputs no statistics for training data. -o Outputs statistics only, not the classifier. -i Outputs detailed information-retrieval statistics for each class. -k Outputs information-theoretic statistics. -p Only outputs predictions for test instances. -r Only outputs cumulative margin distribution. -z <class name> Only outputs the source representation of the classifier, giving it the supplied name. -g Only outputs the graph representation of the classifier. Options specific to weka.classifiers.j48.J48: -U Use unpruned tree. -C <pruning confidence> Set confidence threshold for pruning. (default 0.25) -M <minimum number of instances> Set minimum number of instances per leaf. (default 2) -R Use reduced error pruning. -N <number of folds> Set number of folds for reduced error pruning. One fold is used as pruning set. (default 3) -B Use binary splits only. -S Don't perform subtree raising. -L Do not clean up after the tree has been built. • example:

  12. GUI vs. command line • Command line (-): • only textual visualisation of models • awkward to use • Command line (+): • full functionality • (‘saving the model’) • batch processing GUI (+): • visualisation of data and (some) models GUI (-): • not all the parameters can be set (reduced functionality)

  13. PROs & CONs of WEKA PROs: • open source (GNU licence) • platform-independent (JAVA) • easy to use • (relatively) easy to modify • CONs: • relatively slow (JAVA) • ‘incomplete’documentation • (some GUI features could • be explained better) • some features available • only from command line

  14. Let’s go to work

More Related