Weka

Weka Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato

Overview • Classifiers, Regressors, and clusterers • Multiple evaluation schemes • Bagging and Boosting • Feature Selection • Experimenter • Visualizer • Text not up to date. • They welcome additions.

Learning Tasks • Classification: given examples labelled from a finite domain, generate a procedure for labelling unseen examples. • Regression: given examples labelled with a real value, generate procedure for labelling unseen examples. • Clustering: from a set of examples, partitioning examples into “interesting” groups. What scientists want.

Data Format: IRIS @RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa Etc. General from @atttribute attribute-name REAL or list of values

J48 = Decision Tree petalwidth <= 0.6: Iris-setosa (50.0) : # under node petalwidth > 0.6 # ..number wrong | petalwidth <= 1.7 | | petallength <= 4.9: Iris-versicolor (48.0/1.0) | | petallength > 4.9 | | | petalwidth <= 1.5: Iris-virginica (3.0) | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0) | petalwidth > 1.7: Iris-virginica (46.0/1.0)

Cross-validation • Correctly Classified Instances 143 95.3% • Incorrectly Classified Instances 7 4.67 % • Default 10-fold cross validation i.e. • Split data into 10 equal sized pieces • Train on 9 pieces and test on remainder • Do for all possibilities and average

J48 Confusion Matrix Old data set from statistics: 50 of each class a b c <-- classified as 49 1 0 | a = Iris-setosa 0 47 3 | b = Iris-versicolor 0 3 47 | c = Iris-virginica

Other Evaluation Schemes • Leave-one-out cross-validation • Cross-validation where n = number of training instanced • Specific train and test set • Allows for exact replication • Ok if train/test large, e.g. 10,000 range.

Bootstrap sampling • Randomly select n with replacement from n • Expect about 2/3 to be chosen for training • Prob of not chosen = (1-1/n)^n ~ 1/e. • Testing on remainder • Repeat about 30 times and average. • Avoids partition bias

Weka

Weka

Presentation Transcript

Introduction to Weka

Weka

Using Weka for Obesity

Introduction of Weka

How to Run WEKA Demo SVM in WEKA

Evaluation of WEKA

WEKA 3.5.5

Weka Tutorial

Weka

Weka

Weka

Introduction of Weka

Weka Package

Bayesian networks practice (Weka)