Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA

1 / 16

# Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA - PowerPoint PPT Presentation

Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA. May 23, 2006. Introduction. Decision tree learning is a method for approximating discrete-valued target function The learned function is represented by a decision tree

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA' - terrel

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Artificial Intelligence Project #3: Analysis of Decision Tree Learning Using WEKA

May 23, 2006

Introduction
• Decision tree learning is a method for approximating discrete-valued target function
• The learned function is represented by a decision tree
• Decision tree can also be re-represented as if-then rules to improve human readability
Decision Tree Representation (1/2)
• Decision tree classify instances by sorting them down the tree from the root to some leaf node
• Node
• Specifies test of some attribute
• Branch
• Corresponds to one of the possible values for this attribute
Each path corresponds to a conjunction of attribute tests

(Outlook=sunny, Temperature=Hot, Humidity=high, Wind=Strong)(Outlook=Sunny ∧ Humidity=High) so NO

Decision trees represent a disjunction of conjunction of constraints on the attribute values of instances

(Outlook=Sunny ∧Humidity=normal)

∨(Outlook=Overcast)

∨(Outlook=Rain ∧Wind=Weak)

Decision Tree Representation (2/2)

Outlook

Sunny

Rain

Overcast

Humidity

Yes

Wind

High

Normal

Strong

Weak

No

Yes

No

Yes

• What is the merit of tree representation?
Appropriate Problems for Decision Tree Learning
• Instances are represented by attribute-value pairs
• The target function has discrete output values
• Disjunctive descriptions may be required
• The training data may contain errors
• Both errors in classification of the training examples and errors in the attribute values
• The training data may contain missing attribute values
• Suitable for classification
60 leukemia patients

Bone marrow samples

Affymetrix GeneChip arrays

Gene expression data

Study
• Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells,MH Cheok et al., Nature Genetics 35, 2003.
Gene Expression Data
• # of data examples
• 120 (60: before treatment, 60: after treatment)
• # of genes measured
• 12600 (Affymetrix HG-U95A array)
• Classification between “before treatment” and “after treatment” based on gene expression pattern
Affymetrix GeneChip Arrays
• Use short oligos to detect gene expression level.
• Each gene is probed by a set of short oligos.
• Each gene expression level is summarized by
• Signal: numerical value describing the abundance of mRNA
• A/P call: denotes the statistical significance of signal
Preprocessing
• Remove the genes having more than 60 ‘A’ calls
• # of genes: 12600  3190
• Discretization of gene expression level
• Criterion: median gene expression value of each sample
• 0 (low) and 1 (high)
Gene Filtering
• Using mutual information
• Estimated probabilities were used.
• # of genes: 3190  1000
• Final dataset
• # of attributes: 1001 (one for the class)
• Class: 0 (after treatment), 1 (before treatment)
• # of data examples: 120
Materials for the Project
• Given
• Preprocessed microarray data file: data2.txt
• WEKA (http://www.cs.waikato.ac.nz/ml/weka/)
Submission
• Due date: June 15 (Thu.), 12:00(noon)
• Report: Hard copy(301-419) & e-mail.
• ID3, J48 and another decision tree algorithm with learning parameter.
• Show the experimental results of each algorithm. Except for ID3, you should try to find out better performance, changing learning parameter.
• Analyze what makes difference between selected algorithms.
• E-mail : [email protected]