Large scale mining of gene expression patterns
1 / 43

Large-scale mining of gene expression patterns - PowerPoint PPT Presentation

  • Uploaded on

Large-scale mining of gene expression patterns. Paul Pavlidis [email protected] VanBUG September 2007. Students Leon French Meeta Mistry Vaneet Lotay Postdoc Jesse Gillis Undergraduates Raymond Lim Suzanne Lane Programmers Kelsey Hamer Luke McCarthy. Genome. Synapse.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Large-scale mining of gene expression patterns' - dai

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Leon French

Meeta Mistry

Vaneet Lotay


Jesse Gillis


Raymond Lim

Suzanne Lane


Kelsey Hamer

Luke McCarthy








Signal transduction

Synaptic modulation


  • Connectivity database and analysis

  • Gene expression data re-use system

  • Scaling up gene coexpression analysis

  • Applications and ongoing work




With JJ Mann, V Arango, E Sibille et al.




Data from

Goals for a system
Goals for a system

  • Researchers should be able to put their new expression data in a wider context of previous studies without extraordinary effort.

  • Move analyzing multiple microarray data sets from a niche activity to the mainstream

  • Integration of other data types, domain specific information.

Public data sources


Differential expression

Challenges to comparing data sets
Challenges to comparing data sets

  • Need to match genes/transcripts across platforms

  • Data from third parties not always easy to handle

  • Varying scales, normalization, etc.

  • Varying data quality

  • Varying levels of “raw data” available

  • Selecting appropriate data to compare

Which data sets are reasonable to compare
Which data sets are reasonable to compare? al.)

Too general, but lots of power

All mouse data sets

Mouse brain data sets

Mouse neocortex data sets

Mouse neocortex data sets examining stress

Mouse neocortex data sets examining hypoxic stress

Mouse neocortex data sets examining hypoxic stress after 3 hours of hypoxia

Very specific, low power

Array Designs: al.)178

Assays (i.e., chips): 20837

Coexpression links (probe-level): >100 million

Scaling up analysis of gene coexpression
Scaling up analysis of gene coexpression al.)

  • Genes that are coexpressed tend to have related function

    • Needed at the same place at the same time

    • “Guilt by association”

  • Reasonable to compare across studies

Eisen et al., 1998 PNAS

Two ribosomal protein genes.



Biological noise
Biological noise al.)

  • Induced gene expression effects are often small.

  • Gene expression varies between “replicates” in biologically-meaningful ways.

  • Allows us to repurpose data

Sample type

Functional coexpression should be somewhat generalized
Functional coexpression should be (somewhat) generalized al.)

  • If two genes are coexpressed under one condition, they will probably be coexpressed under at least some other conditions (or data sets).

  • Coexpression seen “only once” needs special care in interpretation.

  • We shouldn’t expect coexpression to be perfectly reproducible (for biological and technical reasons)



Genome Research, June 2004 al.)

A simple approach:

Count Recurring patterns

Proof of concept analysis
Proof of concept analysis al.)

  • 60 human data sets, 15700 RefSeq genes.

  • 70% cancer data

  • 11 million “links”

  • About 9.7 million different links

GRIN1 al.)



Allen Brain Institute

Application analysis of imprinted genes
Application: analysis of imprinted genes al.)

Laurent Journot, INSERM – Universités Montpellier

Lyar interacting proteins
LYAR interacting proteins al.)

Correlation p-value


Ewing et al, 2007 Molecular Systems Biology

Vote counting limitations
Vote counting limitations al.)

  • Weak evidence distributed across data sets will not be picked up.

  • This example meets strict “vote counting” criteria in only 2/23 data sets


Correlation (Global) al.)

Support (# of datasets)

Datasets al.)

Genes pairs

Related work: Zhou XJ et al., Nat.Biotech 2005

Summary al.)

  • Reuse of public data: ‘adding value’

  • Meta-analysis of coexpression

  • Some applications

    • Functional prediction

    • Candidate identification

    • Platform evaluation

Ongoing and future work
Ongoing and future work al.)

  • Applications and analyses

    • Protein interactions and hubs

    • Prediction of gene function at the synapse

    • Differential expression analysis

      • Regionalization

      • Mouse models of brain injury

      • Mouse models of psychosis

  • Expanding our public database and software

    Web-based tools for biologists; web services coming soon

  • Integration with other information sources

Thanks al.)

  • And to:

  • NCBI GEO team

  • Groups who made data available

  • Collaborators who provided data prior to publication

    • Conrad Gilliam

    • Abraham Palmer

    • Andreas Kottmann

    • Etienne Sibille


Xiang Wan

Kelsey Hamer

Luke McCarthy

Kiran Keshav

Suzanne Lane

Meeta Mistra

Jesse Gillis

Joseph Santos

Gozde Cozen

David Quigley

Anshu Sinha

Spiro Pantazatos

Wei-Keat Lim


Homin Lee

Amy Hsu

Jon Sajdak

Jie Qin

Tzu-Lin Hsaio


Barclay Morrison

Joseph Gogos

Michael Hayden

Blair Leavitt

Tony Blau

Panos Papapanou

Answers to faqs
Answers to FAQs al.)

  • No, they don’t have to be time course experiments.

  • Yes, we’re using cDNA as well as Affymetrix etc.

  • Yes, we see reproducible negative correlations.

  • Yes, we’re interested in finding differences as well as similarities between data sets.

  • No, we aren’t necessarily inferring regulatory relationships

  • Yes, we know that RNA is just one way of measuring cell state.

  • No, we don’t have {worm,fly,yeast…} data, but we’d like to.