answering biological questions using large genomic data collections n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Answering biological questions using large genomic data collections PowerPoint Presentation
Download Presentation
Answering biological questions using large genomic data collections

Loading in 2 Seconds...

play fullscreen
1 / 32

Answering biological questions using large genomic data collections - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Answering biological questions using large genomic data collections. Curtis Huttenhower 10-05-09. Harvard School of Public Health Department of Biostatistics. A Definition of Computational Functional Genomics. Prior knowledge. Genomic data. Gene ↓ Function. Gene ↓ Gene. Data ↓

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Answering biological questions using large genomic data collections' - channer


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
answering biological questions using large genomic data collections

Answering biological questions using large genomic data collections

Curtis Huttenhower

10-05-09

Harvard School of Public Health

Department of Biostatistics

a definition of computational functional genomics
A Definition ofComputational Functional Genomics

Prior knowledge

Genomic data

Gene

Function

Gene

Gene

Data

Function

Function

Function

mefit a framework for functional genomics
MEFIT: A Framework forFunctional Genomics

Related Gene Pairs

MEFIT

BRCA1BRCA2 0.9

BRCA1RAD51 0.8

RAD51TP53 0.85

Frequency

Low

Correlation

High

Correlation

mefit a framework for functional genomics1
MEFIT: A Framework forFunctional Genomics

Related Gene Pairs

MEFIT

BRCA1BRCA2 0.9

BRCA1RAD51 0.8

RAD51TP53 0.85

Frequency

Unrelated Gene Pairs

BRCA2SOX2 0.1

RAD51FOXP2 0.2

ACTR1H6PD 0.15

Low

Correlation

High

Correlation

mefit a framework for functional genomics2
MEFIT: A Framework forFunctional Genomics

Functional Relationship

Golub 1999

Butte 2000

Whitfield 2002

Hansen 1998

mefit a framework for functional genomics3
MEFIT: A Framework forFunctional Genomics

Functional area

Tissue

Disease

Functional Relationship

Biological Context

Golub 1999

Butte 2000

Whitfield 2002

Hansen 1998

functional interaction networks
Functional Interaction Networks

Global interaction network

Currently have data from30,000 human experimental results,15,000 expression conditions +15,000 diverse others, analyzed for200 biological functions and150 diseases

MEFIT

Vacuolar transport

network

Autophagy network

Translation network

predicting gene function
Predicting Gene Function

Predicted relationships between genes

Low

Confidence

High

Confidence

Cell cycle genes

predicting gene function1
Predicting Gene Function

Predicted relationships between genes

Low

Confidence

High

Confidence

Cell cycle genes

predicting gene function2
Predicting Gene Function

Predicted relationships between genes

Low

Confidence

High

Confidence

These edges provide a measure of how likely a gene is to specifically participate in the process of interest.

Cell cycle genes

functional associations between contexts
Functional Associations Between Contexts

Predicted relationships between genes

The average strength of these relationships indicates how cohesive a process is.

Low

Confidence

High

Confidence

Cell cycle genes

functional associations between contexts1
Functional Associations Between Contexts

Predicted relationships between genes

Low

Confidence

High

Confidence

Cell cycle genes

functional associations between contexts2
Functional Associations Between Contexts

Predicted relationships between genes

The average strength of these relationships indicates how associated two processes are.

Low

Confidence

High

Confidence

Cell cycle genes

DNA replication genes

functional associations between processes
Functional Associations Between Processes

HydrogenTransport

ElectronTransport

Edges

Associations between processes

Cellular Respiration

Moderately

Strong

Very

Strong

Cell Redox Homeostasis

Aldehyde Metabolism

Protein Processing

Peptide Metabolism

Vacuolar Protein Catabolism

Negative Regulation of Protein Metabolism

Energy Reserve Metabolism

Protein Depolymerization

Organelle Fusion

Organelle Inheritance

functional associations between processes1
Functional Associations Between Processes

HydrogenTransport

ElectronTransport

Edges

Associations between processes

Cellular Respiration

Moderately

Strong

Very

Strong

Cell Redox Homeostasis

Aldehyde Metabolism

Protein Processing

Peptide Metabolism

Vacuolar Protein Catabolism

Negative Regulation of Protein Metabolism

Energy Reserve Metabolism

Borders

Data coverage of processes

Protein Depolymerization

Organelle Fusion

Sparsely

Covered

Well

Covered

Organelle Inheritance

functional associations between processes2
Functional Associations Between Processes

HydrogenTransport

ElectronTransport

Edges

Associations between processes

AHP1

DOT5

GRX1

GRX2

Cellular Respiration

Moderately

Strong

Very

Strong

Cell Redox Homeostasis

Aldehyde Metabolism

Nodes

Cohesiveness of processes

Protein Processing

Peptide Metabolism

Below

Baseline

Baseline

(genomic

background)

Very

Cohesive

Vacuolar Protein Catabolism

Negative Regulation of Protein Metabolism

APE3

LAP4

PAI3

PEP4

Energy Reserve Metabolism

Borders

Data coverage of processes

Protein Depolymerization

Organelle Fusion

Sparsely

Covered

Well

Covered

Organelle Inheritance

validating human predictions
Validating Human Predictions

With Erin Haley, Hilary Coller

Autophagy

5½ of 7 predictions currently confirmed

Predicted novel autophagy proteins

Luciferase

(Negative control)

ATG5

(Positive control)

LAMP2

RAB11A

Not

Starved

Starved

(Autophagic)

comprehensive validation of computational predictions
Comprehensive Validation of Computational Predictions

With David Hess, Amy Caudy

Genomic data

Prior knowledge

Computational Predictions of Gene Function

SPELL

Hibbs et al 2007

bioPIXIE

Myers et al 2005

MEFIT

Retraining

Genes predicted to function in mitochondrion organization and biogenesis

New known functions for correctly predicted genes

Laboratory Experiments

Growth

curves

Petite

frequency

Confocal microscopy

evaluating the performance of computational predictions
Evaluating the Performance of Computational Predictions

Genes involved in mitochondrion organization and biogenesis

106

Original GO Annotations

135

Under-annotations

82

Novel Confirmations,

First Iteration

17

Novel Confirmations,

Second Iteration

340 total: >3x previously known genes in ~5 person-months

evaluating the performance of computational predictions1
Evaluating the Performance of Computational Predictions

Genes involved in mitochondrion organization and biogenesis

Computational predictions from large collections of genomic data can be accurate despite incomplete or misleading gold standards, and they continue to improve as additional data are incorporated.

106

Original GO Annotations

95

Under-annotations

40

Confirmed

Under-annotations

80

Novel Confirmations

First Iteration

17

Novel Confirmations

Second Iteration

340 total: >3x previously known genes in ~5 person-months

functional maps focused data summarization
Functional Maps:Focused Data Summarization

ACGGTGAACGTACAGTACAGATTACTAGGACATTAGGCCGTATCCGATACCCGATA

Data integration summarizes an impossibly huge amount of experimental data into an impossibly huge number of predictions; what next?

functional maps focused data summarization1
Functional Maps:Focused Data Summarization

ACGGTGAACGTACAGTACAGATTACTAGGACATTAGGCCGTATCCGATACCCGATA

How can a researcher take advantage of all this data to study his/her favorite gene/pathway/disease without losing information?

  • Functional mapping
  • Very large collections of genomic data
  • Specific predicted molecular interactions
  • Pathway, process, or disease associations
  • Underlying experimental results and functional activities in data
thanks
Thanks!

Hilary Coller

Erin Haley

TshekoMutungu

Olga Troyanskaya

Matt Hibbs

Chad Myers

David Hess

Edo Airoldi

FlorianMarkowetz

ShujiOgino

Charlie Fuchs

Interested? I’m accepting students and postdocs!

http://www.huttenhower.org

http://function.princeton.edu/hefalmp

NIGMS

next steps microbial communities
Next Steps:Microbial Communities
  • Data integration is off to a great start in humans
    • Complex communities of distinct cell types
    • Very sparse prior knowledge
      • Concentrated in a few specific areas
    • Variation across populations
    • Critical to understand mechanisms of disease
next steps microbial communities1
Next Steps:Microbial Communities
  • What about microbial communities?
    • Complex communities of distinct species/strains
    • Very sparse prior knowledge
      • Concentrated in a few specific species/strains
    • Variation across populations
    • Critical to understand mechanisms of disease
next steps microbial communities2
Next Steps:Microbial Communities

~120 available expression datasets

~70 species

DLD

DLD

  • Data integration works just as well in microbes as it does in humans
  • We know an awful lot about some microorganisms and almost nothing about others
  • Purely sequence-based and purely network-based tools for function transfer both fall short
  • We need data integration to take advantage of both and mine out useful biology!

ARG1

ARG1

LPD1

PDPK1

PDPK1

PKH2

PKH1

ARG2

ARG2

CAR1

PKH3

AGA

AGA

LPD1

PKH2

PKH1

CAR1

PKH3

Weskamp et al 2004

Kanehisa et al 2008

LLC

1.3

LLC

1.3

pdk-1

pdk-1

T21

F4.1

T21

F4.1

W04B5.5

W04B5.5

R04

B3.2

R04

B3.2

Flannick et al 2006

Tatusov et al 1997

next steps functional metagenomics
Next Steps:Functional Metagenomics
  • Metagenomics: data analysis from environmental samples
    • Microflora: environment includes us!
  • Another data integration problem
    • Must include datasets from multiple organisms
  • Another context-specificity problem
    • Now “context” can also mean “species”
  • What questions can we answer?
    • How do human microflora interact with diabetes,obesity, oral health, antibiotics, aging, …
    • What’s shared within community X?What’s different? What’s unique?
    • What’s perturbed in disease state Y?One organism, or many? Host interactions?
    • Current methods annotate ~50% of synthetic data,<5% of environmental data

DLD

ARG1

LPD1

PDPK1

PKH2

PKH1

ARG2

CAR1

PKH3

AGA

LLC

1.3

pdk-1

T21

F4.1

W04B5.5

R04

B3.2