1 / 12

Data Mining CSCI 307, Spring 2019 Lecture 7

Data Mining CSCI 307, Spring 2019 Lecture 7. Output: Trees WEKA intro. Can Use Trees for Numeric Prediction Too. Regression : the process of computing an expression that predicts a numeric quantity Regression tree : “decision tree” where each leaf predicts a numeric quantity

janiceb
Download Presentation

Data Mining CSCI 307, Spring 2019 Lecture 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data MiningCSCI 307, Spring 2019Lecture 7 Output: Trees WEKA intro

  2. Can Use Trees for Numeric Prediction Too • Regression: the process of computing an expression that predicts a numeric quantity • Regression tree: “decision tree” where each leaf predicts a numeric quantity • Predicted value is average value of training instances that reach the leaf • Model tree: “regression tree” with linear regression models at the leaf nodes • Linear patches approximate continuous function

  3. Linear Regression for the CPU Data PRP = -56.1 + 0.049 MYCT + 0.015 MMIN + 0.006 MMAX + 0.630 CACH - 0.270 CHMIN + 1.46 CHMAX

  4. Regression Tree for the CPU Data

  5. Model Tree for the CPU Data LM1 PRP = 8.29 + 0.004MMAX + 2.77CHMIN LM2 PRP = 20.3 + 0.004MMIN – 3.99CHMIN + 0.946CHMAX LM3 PRP = 38.1 + 0.012MMIN LM4 PRP = 19.5 + 0.002MMAX + 0.698CACH + 0.969CHMAX LM5 PRP = 285 – 1.46MYCT + 1.02CACH -9.39CHMIN LM6 PRP = -65.8 + 0.003MMIN – 2.94CHMIN + 4.98CHMAX

  6. On Radius, do this once (make a WEKA folder, copy all the .arff files, copy the weka jar file) cd mkdirWEKAfiles cd WEKAfiles cp /usr/local/weka-3-8-1/data/* . cp /usr/local/weka-3-8-1/weka.jarweka.jar WEKA Waikato Environment for Knowledge Analysis To Run the WEKA application (cdWEKAfiles, if not there already) java –Xmx1000M -jar weka.jar To Download onto a Windows or Mac computer, visit: https://www.cs.waikato.ac.nz/ml/weka/

  7. WEKA Introduction • A collection of open source of many data mining and machine learning algorithms, including • pre-processing on data • classification • clustering • association rule extraction • Created by researchers at the University of Waikato in New Zealand. • Java based (also open source).

  8. WEKA Main Features • ∼ 49 data preprocessing tools • ∼ 76 classification/regression algorithms • ∼ 8 clustering algorithms • ∼15 attribute/subset evaluators + 10 search algorithms for feature selection • ∼ 3 algorithms for finding association rules • 3 graphical user interfaces • “The Explorer” (exploratory data analysis) • “The Experimenter” (experimental environment) • “The Knowledge Flow” (new process model inspired interface)

  9. WEKA

  10. WEKA Application Interface • Explorer • preprocessing, attribute selection, learning, visualization • Experimenter • testing and evaluating machine learning algorithms • Knowledge Flow • visual design of the KDD (Knowledge Discovery /from Data/in Databases/with Data mining) process • Simple Command-line • A simple interface for typing commands

  11. WEKA Functions and Tools • Preprocessing Filters • Attribute selection • Classification/Regression • Clustering • Association discovery • Visualization

  12. WEKA: Pros andCons • Pros • Open source, • Free • Extensible • Can be integrated into other java packages • GUIs (Graphic User Interfaces) • Relatively easy to use • Features • Run individual experiment, or • Build KDD phases • Cons • Lack of proper and adequate documentations • Systems are updated constantly (Kitchen Sink Syndrome)

More Related