1 / 43

Large-scale mining of gene expression patterns

Large-scale mining of gene expression patterns. Paul Pavlidis paul@bioinformatics.ubc.ca. VanBUG September 2007. Students Leon French Meeta Mistry Vaneet Lotay Postdoc Jesse Gillis Undergraduates Raymond Lim Suzanne Lane Programmers Kelsey Hamer Luke McCarthy. Genome. Synapse.

dai
Download Presentation

Large-scale mining of gene expression patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large-scale mining of gene expression patterns Paul Pavlidis paul@bioinformatics.ubc.ca VanBUG September 2007

  2. Students Leon French Meeta Mistry Vaneet Lotay Postdoc Jesse Gillis Undergraduates Raymond Lim Suzanne Lane Programmers Kelsey Hamer Luke McCarthy

  3. Genome Synapse Injury Stress Disease Aging Development Signal transduction Synaptic modulation

  4. Topics • Connectivity database and analysis • Gene expression data re-use system • Scaling up gene coexpression analysis • Applications and ongoing work

  5. Another ‘ome

  6. Leon French, Suzanne Lane

  7. Age Genes Samples With JJ Mann, V Arango, E Sibille et al.

  8. Age Genes Samples Data from http://national_databank.mclean.harvard.edu/

  9. GEO

  10. Goals for a system • Researchers should be able to put their new expression data in a wider context of previous studies without extraordinary effort. • Move analyzing multiple microarray data sets from a niche activity to the mainstream • Integration of other data types, domain specific information.

  11. Public data sources Coexpression Differential expression

  12. Challenges to comparing data sets • Need to match genes/transcripts across platforms • Data from third parties not always easy to handle • Varying scales, normalization, etc. • Varying data quality • Varying levels of “raw data” available • Selecting appropriate data to compare

  13. With Cincinnati Children’s Hospital (D.Glass, M. Barnes et al.)

  14. Probe specificity (or lack thereof)

  15. Which data sets are reasonable to compare? Too general, but lots of power All mouse data sets Mouse brain data sets Mouse neocortex data sets Mouse neocortex data sets examining stress Mouse neocortex data sets examining hypoxic stress Mouse neocortex data sets examining hypoxic stress after 3 hours of hypoxia Very specific, low power

  16. Array Designs: 178 Assays (i.e., chips): 20837 Coexpression links (probe-level): >100 million

  17. Scaling up analysis of gene coexpression • Genes that are coexpressed tend to have related function • Needed at the same place at the same time • “Guilt by association” • Reasonable to compare across studies Eisen et al., 1998 PNAS Two ribosomal protein genes. Expression Samples

  18. Biological noise • Induced gene expression effects are often small. • Gene expression varies between “replicates” in biologically-meaningful ways. • Allows us to repurpose data Sample type

  19. Functional coexpression should be (somewhat) generalized • If two genes are coexpressed under one condition, they will probably be coexpressed under at least some other conditions (or data sets). • Coexpression seen “only once” needs special care in interpretation. • We shouldn’t expect coexpression to be perfectly reproducible (for biological and technical reasons) Correlation Correlation

  20. Genome Research, June 2004 A simple approach: Count Recurring patterns

  21. Pipeline for one dataset

  22. Proof of concept analysis • 60 human data sets, 15700 RefSeq genes. • 70% cancer data • 11 million “links” • About 9.7 million different links

  23. Many links are replicated across studies

  24. Evaluation on biological grounds

  25. Cluster involving NMDAR1 (GRIN1)

  26. GRIN1 ATP6V0A1 PLD3 Allen Brain Institute

  27. Application: analysis of imprinted genes Laurent Journot, INSERM – Universités Montpellier

  28. LYAR interacting proteins Correlation p-value LYAR-interactors Ewing et al, 2007 Molecular Systems Biology

  29. Vote counting limitations • Weak evidence distributed across data sets will not be picked up. • This example meets strict “vote counting” criteria in only 2/23 data sets Correlation

  30. Correlation (Global) Support (# of datasets)

  31. Datasets Genes pairs Related work: Zhou XJ et al., Nat.Biotech 2005

  32. Summary • Reuse of public data: ‘adding value’ • Meta-analysis of coexpression • Some applications • Functional prediction • Candidate identification • Platform evaluation

  33. Ongoing and future work • Applications and analyses • Protein interactions and hubs • Prediction of gene function at the synapse • Differential expression analysis • Regionalization • Mouse models of brain injury • Mouse models of psychosis • Expanding our public database and software http://www.bioinformatics.ubc.ca/Gemma Web-based tools for biologists; web services coming soon • Integration with other information sources

  34. Thanks • And to: • NCBI GEO team • Groups who made data available • Collaborators who provided data prior to publication • Conrad Gilliam • Abraham Palmer • Andreas Kottmann • Etienne Sibille Gemma Xiang Wan Kelsey Hamer Luke McCarthy Kiran Keshav Suzanne Lane Meeta Mistra Jesse Gillis Joseph Santos Gozde Cozen David Quigley Anshu Sinha Spiro Pantazatos Wei-Keat Lim Tmm Homin Lee Amy Hsu Jon Sajdak Jie Qin Tzu-Lin Hsaio Collaborators Barclay Morrison Joseph Gogos Michael Hayden Blair Leavitt Tony Blau Panos Papapanou

  35. Answers to FAQs • No, they don’t have to be time course experiments. • Yes, we’re using cDNA as well as Affymetrix etc. • Yes, we see reproducible negative correlations. • Yes, we’re interested in finding differences as well as similarities between data sets. • No, we aren’t necessarily inferring regulatory relationships • Yes, we know that RNA is just one way of measuring cell state. • No, we don’t have {worm,fly,yeast…} data, but we’d like to.

More Related