artificial intelligence project 3 analysis of decision tree learning using weka
Download
Skip this Video
Download Presentation
Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA

Loading in 2 Seconds...

play fullscreen
1 / 16

Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA - PowerPoint PPT Presentation


  • 251 Views
  • Uploaded on

Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA. May 23, 2006. Introduction. Decision tree learning is a method for approximating discrete-valued target function The learned function is represented by a decision tree

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA' - terrel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction
Introduction
  • Decision tree learning is a method for approximating discrete-valued target function
  • The learned function is represented by a decision tree
  • Decision tree can also be re-represented as if-then rules to improve human readability
decision tree representation 1 2
Decision Tree Representation (1/2)
  • Decision tree classify instances by sorting them down the tree from the root to some leaf node
  • Node
    • Specifies test of some attribute
  • Branch
    • Corresponds to one of the possible values for this attribute
decision tree representation 2 2
Each path corresponds to a conjunction of attribute tests

(Outlook=sunny, Temperature=Hot, Humidity=high, Wind=Strong)(Outlook=Sunny ∧ Humidity=High) so NO

Decision trees represent a disjunction of conjunction of constraints on the attribute values of instances

(Outlook=Sunny ∧Humidity=normal)

∨(Outlook=Overcast)

∨(Outlook=Rain ∧Wind=Weak)

Decision Tree Representation (2/2)

Outlook

Sunny

Rain

Overcast

Humidity

Yes

Wind

High

Normal

Strong

Weak

No

Yes

No

Yes

  • What is the merit of tree representation?
appropriate problems for decision tree learning
Appropriate Problems for Decision Tree Learning
  • Instances are represented by attribute-value pairs
  • The target function has discrete output values
  • Disjunctive descriptions may be required
  • The training data may contain errors
    • Both errors in classification of the training examples and errors in the attribute values
  • The training data may contain missing attribute values
  • Suitable for classification
study
60 leukemia patients

Bone marrow samples

Affymetrix GeneChip arrays

Gene expression data

Study
  • Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells,MH Cheok et al., Nature Genetics 35, 2003.
gene expression data
Gene Expression Data
  • # of data examples
    • 120 (60: before treatment, 60: after treatment)
  • # of genes measured
    • 12600 (Affymetrix HG-U95A array)
  • Task
    • Classification between “before treatment” and “after treatment” based on gene expression pattern
affymetrix genechip arrays
Affymetrix GeneChip Arrays
  • Use short oligos to detect gene expression level.
  • Each gene is probed by a set of short oligos.
  • Each gene expression level is summarized by
    • Signal: numerical value describing the abundance of mRNA
    • A/P call: denotes the statistical significance of signal
preprocessing
Preprocessing
  • Remove the genes having more than 60 ‘A’ calls
    • # of genes: 12600  3190
  • Discretization of gene expression level
    • Criterion: median gene expression value of each sample
    • 0 (low) and 1 (high)
gene filtering
Gene Filtering
  • Using mutual information
    • Estimated probabilities were used.
    • # of genes: 3190  1000
  • Final dataset
    • # of attributes: 1001 (one for the class)
      • Class: 0 (after treatment), 1 (before treatment)
    • # of data examples: 120
materials for the project
Materials for the Project
  • Given
    • Preprocessed microarray data file: data2.txt
  • Downloadable
    • WEKA (http://www.cs.waikato.ac.nz/ml/weka/)
submission
Submission
  • Due date: June 15 (Thu.), 12:00(noon)
  • Report: Hard copy(301-419) & e-mail.
    • ID3, J48 and another decision tree algorithm with learning parameter.
    • Show the experimental results of each algorithm. Except for ID3, you should try to find out better performance, changing learning parameter.
    • Analyze what makes difference between selected algorithms.
    • E-mail : [email protected]
ad