Large scale mining of gene expression patterns
Download
1 / 43

Large-scale mining of gene expression patterns - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Large-scale mining of gene expression patterns. Paul Pavlidis [email protected] VanBUG September 2007. Students Leon French Meeta Mistry Vaneet Lotay Postdoc Jesse Gillis Undergraduates Raymond Lim Suzanne Lane Programmers Kelsey Hamer Luke McCarthy. Genome. Synapse.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Large-scale mining of gene expression patterns' - dai


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Students

Leon French

Meeta Mistry

Vaneet Lotay

Postdoc

Jesse Gillis

Undergraduates

Raymond Lim

Suzanne Lane

Programmers

Kelsey Hamer

Luke McCarthy


Genome

Synapse

Injury

Stress

Disease

Aging

Development

Signal transduction

Synaptic modulation


Topics
Topics

  • Connectivity database and analysis

  • Gene expression data re-use system

  • Scaling up gene coexpression analysis

  • Applications and ongoing work




Age

Genes

Samples

With JJ Mann, V Arango, E Sibille et al.


Age

Genes

Samples

Data from http://national_databank.mclean.harvard.edu/



Goals for a system
Goals for a system

  • Researchers should be able to put their new expression data in a wider context of previous studies without extraordinary effort.

  • Move analyzing multiple microarray data sets from a niche activity to the mainstream

  • Integration of other data types, domain specific information.


Public data sources

Coexpression

Differential expression


Challenges to comparing data sets
Challenges to comparing data sets

  • Need to match genes/transcripts across platforms

  • Data from third parties not always easy to handle

  • Varying scales, normalization, etc.

  • Varying data quality

  • Varying levels of “raw data” available

  • Selecting appropriate data to compare




Which data sets are reasonable to compare
Which data sets are reasonable to compare? al.)

Too general, but lots of power

All mouse data sets

Mouse brain data sets

Mouse neocortex data sets

Mouse neocortex data sets examining stress

Mouse neocortex data sets examining hypoxic stress

Mouse neocortex data sets examining hypoxic stress after 3 hours of hypoxia

Very specific, low power


Array Designs: al.)178

Assays (i.e., chips): 20837

Coexpression links (probe-level): >100 million


Scaling up analysis of gene coexpression
Scaling up analysis of gene coexpression al.)

  • Genes that are coexpressed tend to have related function

    • Needed at the same place at the same time

    • “Guilt by association”

  • Reasonable to compare across studies

Eisen et al., 1998 PNAS

Two ribosomal protein genes.

Expression

Samples


Biological noise
Biological noise al.)

  • Induced gene expression effects are often small.

  • Gene expression varies between “replicates” in biologically-meaningful ways.

  • Allows us to repurpose data

Sample type


Functional coexpression should be somewhat generalized
Functional coexpression should be (somewhat) generalized al.)

  • If two genes are coexpressed under one condition, they will probably be coexpressed under at least some other conditions (or data sets).

  • Coexpression seen “only once” needs special care in interpretation.

  • We shouldn’t expect coexpression to be perfectly reproducible (for biological and technical reasons)

Correlation

Correlation


Genome Research, June 2004 al.)

A simple approach:

Count Recurring patterns



Proof of concept analysis
Proof of concept analysis al.)

  • 60 human data sets, 15700 RefSeq genes.

  • 70% cancer data

  • 11 million “links”

  • About 9.7 million different links





GRIN1 al.)

ATP6V0A1

PLD3

Allen Brain Institute


Application analysis of imprinted genes
Application: analysis of imprinted genes al.)

Laurent Journot, INSERM – Universités Montpellier


Lyar interacting proteins
LYAR interacting proteins al.)

Correlation p-value

LYAR-interactors

Ewing et al, 2007 Molecular Systems Biology


Vote counting limitations
Vote counting limitations al.)

  • Weak evidence distributed across data sets will not be picked up.

  • This example meets strict “vote counting” criteria in only 2/23 data sets

Correlation


Correlation (Global) al.)

Support (# of datasets)


Datasets al.)

Genes pairs

Related work: Zhou XJ et al., Nat.Biotech 2005


Summary
Summary al.)

  • Reuse of public data: ‘adding value’

  • Meta-analysis of coexpression

  • Some applications

    • Functional prediction

    • Candidate identification

    • Platform evaluation


Ongoing and future work
Ongoing and future work al.)

  • Applications and analyses

    • Protein interactions and hubs

    • Prediction of gene function at the synapse

    • Differential expression analysis

      • Regionalization

      • Mouse models of brain injury

      • Mouse models of psychosis

  • Expanding our public database and software

    http://www.bioinformatics.ubc.ca/Gemma

    Web-based tools for biologists; web services coming soon

  • Integration with other information sources


Thanks
Thanks al.)

  • And to:

  • NCBI GEO team

  • Groups who made data available

  • Collaborators who provided data prior to publication

    • Conrad Gilliam

    • Abraham Palmer

    • Andreas Kottmann

    • Etienne Sibille

Gemma

Xiang Wan

Kelsey Hamer

Luke McCarthy

Kiran Keshav

Suzanne Lane

Meeta Mistra

Jesse Gillis

Joseph Santos

Gozde Cozen

David Quigley

Anshu Sinha

Spiro Pantazatos

Wei-Keat Lim

Tmm

Homin Lee

Amy Hsu

Jon Sajdak

Jie Qin

Tzu-Lin Hsaio

Collaborators

Barclay Morrison

Joseph Gogos

Michael Hayden

Blair Leavitt

Tony Blau

Panos Papapanou


Answers to faqs
Answers to FAQs al.)

  • No, they don’t have to be time course experiments.

  • Yes, we’re using cDNA as well as Affymetrix etc.

  • Yes, we see reproducible negative correlations.

  • Yes, we’re interested in finding differences as well as similarities between data sets.

  • No, we aren’t necessarily inferring regulatory relationships

  • Yes, we know that RNA is just one way of measuring cell state.

  • No, we don’t have {worm,fly,yeast…} data, but we’d like to.


ad