Introduction to WEKA: Data Mining and Machine Learning Software

Data MiningCSCI 307, Spring 2019Lecture 7 Output: Trees WEKA intro

Can Use Trees for Numeric Prediction Too • Regression: the process of computing an expression that predicts a numeric quantity • Regression tree: “decision tree” where each leaf predicts a numeric quantity • Predicted value is average value of training instances that reach the leaf • Model tree: “regression tree” with linear regression models at the leaf nodes • Linear patches approximate continuous function

Linear Regression for the CPU Data PRP = -56.1 + 0.049 MYCT + 0.015 MMIN + 0.006 MMAX + 0.630 CACH - 0.270 CHMIN + 1.46 CHMAX

Regression Tree for the CPU Data

Model Tree for the CPU Data LM1 PRP = 8.29 + 0.004MMAX + 2.77CHMIN LM2 PRP = 20.3 + 0.004MMIN – 3.99CHMIN + 0.946CHMAX LM3 PRP = 38.1 + 0.012MMIN LM4 PRP = 19.5 + 0.002MMAX + 0.698CACH + 0.969CHMAX LM5 PRP = 285 – 1.46MYCT + 1.02CACH -9.39CHMIN LM6 PRP = -65.8 + 0.003MMIN – 2.94CHMIN + 4.98CHMAX

On Radius, do this once (make a WEKA folder, copy all the .arff files, copy the weka jar file) cd mkdirWEKAfiles cd WEKAfiles cp /usr/local/weka-3-8-1/data/* . cp /usr/local/weka-3-8-1/weka.jarweka.jar WEKA Waikato Environment for Knowledge Analysis To Run the WEKA application (cdWEKAfiles, if not there already) java –Xmx1000M -jar weka.jar To Download onto a Windows or Mac computer, visit: https://www.cs.waikato.ac.nz/ml/weka/

WEKA Introduction • A collection of open source of many data mining and machine learning algorithms, including • pre-processing on data • classification • clustering • association rule extraction • Created by researchers at the University of Waikato in New Zealand. • Java based (also open source).

WEKA Main Features • ∼ 49 data preprocessing tools • ∼ 76 classification/regression algorithms • ∼ 8 clustering algorithms • ∼15 attribute/subset evaluators + 10 search algorithms for feature selection • ∼ 3 algorithms for finding association rules • 3 graphical user interfaces • “The Explorer” (exploratory data analysis) • “The Experimenter” (experimental environment) • “The Knowledge Flow” (new process model inspired interface)

WEKA

WEKA Application Interface • Explorer • preprocessing, attribute selection, learning, visualization • Experimenter • testing and evaluating machine learning algorithms • Knowledge Flow • visual design of the KDD (Knowledge Discovery /from Data/in Databases/with Data mining) process • Simple Command-line • A simple interface for typing commands

WEKA Functions and Tools • Preprocessing Filters • Attribute selection • Classification/Regression • Clustering • Association discovery • Visualization

WEKA: Pros andCons • Pros • Open source, • Free • Extensible • Can be integrated into other java packages • GUIs (Graphic User Interfaces) • Relatively easy to use • Features • Run individual experiment, or • Build KDD phases • Cons • Lack of proper and adequate documentations • Systems are updated constantly (Kitchen Sink Syndrome)

Introduction to WEKA: Data Mining and Machine Learning Software

Introduction to WEKA: Data Mining and Machine Learning Software

Presentation Transcript

Data Mining CSCI 307 Spring, 2019

Data Mining CSCI 307, Spring 2019 Lecture 13

Data Structures CSCI 132, Spring 2019 Lecture 21 Doubly Linked Lists

DATA MINING LECTURE 7

DATA MINING LECTURE 7

Data Structures CSCI 132, Spring 2014 Lecture 17 Backtracking

Data Structures CSCI 132, Spring 2019 Lecture 14 Review for Exam 1

Data Structures CSCI 132, Spring 2019 Lecture 18 Recursion and Look-Ahead