Computational analysis of genomewide expression data. Paul Pavlidis Columbia Genome Center pp175@columbia.edu. Lecture overview. Microarray technology and applications How the data is collected and what you get. “High level” analysis methods: applied to the study of human sarcoma.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Paul Pavlidis
Columbia Genome Center
pp175@columbia.edu
A/R B/R C/R D/R E/R etc.
A B C D E
after
before
Collaboration with
Memorial Sloan Kettering Cancer Center
Are these types distinguishable at the level of RNA expression?
Training set
Genes
Learner
Model
Experiments
Class membership
Predictor
Genes
Experiments
Predicted Class
Test set (“unknowns”)
+
+
+
+
+


Locate a plane that separates positive from negative examples.
+
+

+

+
+





+



+
+


+


Focus on the examples closest to the boundary.
μi = mean expression value in class i
ni = number of examples in class i
v = pooled variance across both classes
Other methods exist:
Analysis of variance, ttest variants, nonparametric methods, etc.
Welch’s ttest
Student’s ttest
Fisher’s disc.
1...52
1: hold out one sample
2: select features
22, 4,..,13
apply to test data
3: train SVM
4: classify heldout sample
log2Number of features
Number of occurrences
True positive ()
false positive ()
True positive (“refined” classes”) ()
false positive (“refined” classes”) ()
Are there any functional commonalities among the genes which were affected?
Given: Expression data and functional annotations (class labels) for the genes.
Task: Find the interesting gene classes.
Solution: Give each class a score.
(Browser at http://www.godatabase.org/cgibin/go.cgi)
What makes a gene class “interesting”?
n*(n1)/2 pairwise
correlations
Data
Class
data
average
Score
pvalue
data
gene pvalues
stats
for class
average
pvalue
test each gene
Score
Tcell receptor: ave log (pvalue) = 4.6, p<105
Transferases: ave log (pvalue) = ~ 1.5, p ~1
ALLBcell
ALLTcell
AML
ALLBcell
ALLTcell
AML
How likely are we to get a class with a given score by chance?
Mouse brain region data (Sandberg et al.)
see Shoemaker, et al., Nature 2001 (genome issue)