Artificial intelligence project 3 analysis of decision tree learning using weka
Download
1 / 16

Artificial Intelligence Project 3 : Analysis of Decision Tree Learning Using WEKA - PowerPoint PPT Presentation


  • 243 Views
  • Uploaded on

Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA. May 23, 2006. Introduction. Decision tree learning is a method for approximating discrete-valued target function The learned function is represented by a decision tree

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Artificial Intelligence Project 3 : Analysis of Decision Tree Learning Using WEKA' - terrel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Artificial intelligence project 3 analysis of decision tree learning using weka

Artificial Intelligence Project #3: Analysis of Decision Tree Learning Using WEKA

May 23, 2006


Introduction
Introduction

  • Decision tree learning is a method for approximating discrete-valued target function

  • The learned function is represented by a decision tree

  • Decision tree can also be re-represented as if-then rules to improve human readability



Decision tree representation 1 2
Decision Tree Representation (1/2)

  • Decision tree classify instances by sorting them down the tree from the root to some leaf node

  • Node

    • Specifies test of some attribute

  • Branch

    • Corresponds to one of the possible values for this attribute


Decision tree representation 2 2

Each path corresponds to a conjunction of attribute tests

(Outlook=sunny, Temperature=Hot, Humidity=high, Wind=Strong)(Outlook=Sunny ∧ Humidity=High) so NO

Decision trees represent a disjunction of conjunction of constraints on the attribute values of instances

(Outlook=Sunny ∧Humidity=normal)

∨(Outlook=Overcast)

∨(Outlook=Rain ∧Wind=Weak)

Decision Tree Representation (2/2)

Outlook

Sunny

Rain

Overcast

Humidity

Yes

Wind

High

Normal

Strong

Weak

No

Yes

No

Yes

  • What is the merit of tree representation?


Appropriate problems for decision tree learning
Appropriate Problems for Decision Tree Learning

  • Instances are represented by attribute-value pairs

  • The target function has discrete output values

  • Disjunctive descriptions may be required

  • The training data may contain errors

    • Both errors in classification of the training examples and errors in the attribute values

  • The training data may contain missing attribute values

  • Suitable for classification


Study

60 leukemia patients

Bone marrow samples

Affymetrix GeneChip arrays

Gene expression data

Study

  • Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells,MH Cheok et al., Nature Genetics 35, 2003.


Gene expression data
Gene Expression Data

  • # of data examples

    • 120 (60: before treatment, 60: after treatment)

  • # of genes measured

    • 12600 (Affymetrix HG-U95A array)

  • Task

    • Classification between “before treatment” and “after treatment” based on gene expression pattern


Affymetrix genechip arrays
Affymetrix GeneChip Arrays

  • Use short oligos to detect gene expression level.

  • Each gene is probed by a set of short oligos.

  • Each gene expression level is summarized by

    • Signal: numerical value describing the abundance of mRNA

    • A/P call: denotes the statistical significance of signal


Preprocessing
Preprocessing

  • Remove the genes having more than 60 ‘A’ calls

    • # of genes: 12600  3190

  • Discretization of gene expression level

    • Criterion: median gene expression value of each sample

    • 0 (low) and 1 (high)


Gene filtering
Gene Filtering

  • Using mutual information

    • Estimated probabilities were used.

    • # of genes: 3190  1000

  • Final dataset

    • # of attributes: 1001 (one for the class)

      • Class: 0 (after treatment), 1 (before treatment)

    • # of data examples: 120


Final dataset
Final Dataset

1000

120


Materials for the project
Materials for the Project

  • Given

    • Preprocessed microarray data file: data2.txt

  • Downloadable

    • WEKA (http://www.cs.waikato.ac.nz/ml/weka/)




Submission
Submission

  • Due date: June 15 (Thu.), 12:00(noon)

  • Report: Hard copy(301-419) & e-mail.

    • ID3, J48 and another decision tree algorithm with learning parameter.

    • Show the experimental results of each algorithm. Except for ID3, you should try to find out better performance, changing learning parameter.

    • Analyze what makes difference between selected algorithms.

    • E-mail : [email protected]


ad