1 / 16

Artificial Intelligence Project 3 : Analysis of Decision Tree Learning Using WEKA - PowerPoint PPT Presentation

Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA. May 23, 2006. Introduction. Decision tree learning is a method for approximating discrete-valued target function The learned function is represented by a decision tree

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Artificial Intelligence Project 3 : Analysis of Decision Tree Learning Using WEKA' - terrel

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Artificial Intelligence Project #3: Analysis of Decision Tree Learning Using WEKA

May 23, 2006

• Decision tree learning is a method for approximating discrete-valued target function

• The learned function is represented by a decision tree

• Decision tree can also be re-represented as if-then rules to improve human readability

• Decision tree classify instances by sorting them down the tree from the root to some leaf node

• Node

• Specifies test of some attribute

• Branch

• Corresponds to one of the possible values for this attribute

(Outlook=sunny, Temperature=Hot, Humidity=high, Wind=Strong)(Outlook=Sunny ∧ Humidity=High) so NO

Decision trees represent a disjunction of conjunction of constraints on the attribute values of instances

(Outlook=Sunny ∧Humidity=normal)

∨(Outlook=Overcast)

∨(Outlook=Rain ∧Wind=Weak)

Decision Tree Representation (2/2)

Outlook

Sunny

Rain

Overcast

Humidity

Yes

Wind

High

Normal

Strong

Weak

No

Yes

No

Yes

• What is the merit of tree representation?

• Instances are represented by attribute-value pairs

• The target function has discrete output values

• Disjunctive descriptions may be required

• The training data may contain errors

• Both errors in classification of the training examples and errors in the attribute values

• The training data may contain missing attribute values

• Suitable for classification

Bone marrow samples

Affymetrix GeneChip arrays

Gene expression data

Study

• Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells,MH Cheok et al., Nature Genetics 35, 2003.

• # of data examples

• 120 (60: before treatment, 60: after treatment)

• # of genes measured

• 12600 (Affymetrix HG-U95A array)

• Classification between “before treatment” and “after treatment” based on gene expression pattern

• Use short oligos to detect gene expression level.

• Each gene is probed by a set of short oligos.

• Each gene expression level is summarized by

• Signal: numerical value describing the abundance of mRNA

• A/P call: denotes the statistical significance of signal

• Remove the genes having more than 60 ‘A’ calls

• # of genes: 12600  3190

• Discretization of gene expression level

• Criterion: median gene expression value of each sample

• 0 (low) and 1 (high)

• Using mutual information

• Estimated probabilities were used.

• # of genes: 3190  1000

• Final dataset

• # of attributes: 1001 (one for the class)

• Class: 0 (after treatment), 1 (before treatment)

• # of data examples: 120

1000

120

• Given

• Preprocessed microarray data file: data2.txt

• WEKA (http://www.cs.waikato.ac.nz/ml/weka/)

• Due date: June 15 (Thu.), 12:00(noon)

• Report: Hard copy(301-419) & e-mail.

• ID3, J48 and another decision tree algorithm with learning parameter.

• Show the experimental results of each algorithm. Except for ID3, you should try to find out better performance, changing learning parameter.

• Analyze what makes difference between selected algorithms.

• E-mail : [email protected]