Artificial intelligence project 3 analysis of decision tree learning using weka
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA PowerPoint PPT Presentation


  • 206 Views
  • Uploaded on
  • Presentation posted in: General

Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA. May 23, 2006. Introduction. Decision tree learning is a method for approximating discrete-valued target function The learned function is represented by a decision tree

Download Presentation

Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Artificial intelligence project 3 analysis of decision tree learning using weka

Artificial Intelligence Project #3: Analysis of Decision Tree Learning Using WEKA

May 23, 2006


Introduction

Introduction

  • Decision tree learning is a method for approximating discrete-valued target function

  • The learned function is represented by a decision tree

  • Decision tree can also be re-represented as if-then rules to improve human readability


An example of decision tree

An Example of Decision Tree


Decision tree representation 1 2

Decision Tree Representation (1/2)

  • Decision tree classify instances by sorting them down the tree from the root to some leaf node

  • Node

    • Specifies test of some attribute

  • Branch

    • Corresponds to one of the possible values for this attribute


Decision tree representation 2 2

Each path corresponds to a conjunction of attribute tests

(Outlook=sunny, Temperature=Hot, Humidity=high, Wind=Strong)(Outlook=Sunny ∧ Humidity=High) so NO

Decision trees represent a disjunction of conjunction of constraints on the attribute values of instances

(Outlook=Sunny ∧Humidity=normal)

∨(Outlook=Overcast)

∨(Outlook=Rain ∧Wind=Weak)

Decision Tree Representation (2/2)

Outlook

Sunny

Rain

Overcast

Humidity

Yes

Wind

High

Normal

Strong

Weak

No

Yes

No

Yes

  • What is the merit of tree representation?


Appropriate problems for decision tree learning

Appropriate Problems for Decision Tree Learning

  • Instances are represented by attribute-value pairs

  • The target function has discrete output values

  • Disjunctive descriptions may be required

  • The training data may contain errors

    • Both errors in classification of the training examples and errors in the attribute values

  • The training data may contain missing attribute values

  • Suitable for classification


Study

60 leukemia patients

Bone marrow samples

Affymetrix GeneChip arrays

Gene expression data

Study

  • Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells,MH Cheok et al., Nature Genetics 35, 2003.


Gene expression data

Gene Expression Data

  • # of data examples

    • 120 (60: before treatment, 60: after treatment)

  • # of genes measured

    • 12600 (Affymetrix HG-U95A array)

  • Task

    • Classification between “before treatment” and “after treatment” based on gene expression pattern


Affymetrix genechip arrays

Affymetrix GeneChip Arrays

  • Use short oligos to detect gene expression level.

  • Each gene is probed by a set of short oligos.

  • Each gene expression level is summarized by

    • Signal: numerical value describing the abundance of mRNA

    • A/P call: denotes the statistical significance of signal


Preprocessing

Preprocessing

  • Remove the genes having more than 60 ‘A’ calls

    • # of genes: 12600  3190

  • Discretization of gene expression level

    • Criterion: median gene expression value of each sample

    • 0 (low) and 1 (high)


Gene filtering

Gene Filtering

  • Using mutual information

    • Estimated probabilities were used.

    • # of genes: 3190  1000

  • Final dataset

    • # of attributes: 1001 (one for the class)

      • Class: 0 (after treatment), 1 (before treatment)

    • # of data examples: 120


Final dataset

Final Dataset

1000

120


Materials for the project

Materials for the Project

  • Given

    • Preprocessed microarray data file: data2.txt

  • Downloadable

    • WEKA (http://www.cs.waikato.ac.nz/ml/weka/)


Analysis of decision tree learning

Analysis of Decision Tree Learning


Analysis of decision tree learning1

Analysis of Decision Tree Learning


Submission

Submission

  • Due date: June 15 (Thu.), 12:00(noon)

  • Report: Hard copy(301-419) & e-mail.

    • ID3, J48 and another decision tree algorithm with learning parameter.

    • Show the experimental results of each algorithm. Except for ID3, you should try to find out better performance, changing learning parameter.

    • Analyze what makes difference between selected algorithms.

    • E-mail : [email protected]


  • Login