making sense of large amounts of molecular data n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Making sense of large amounts of molecular data PowerPoint Presentation
Download Presentation
Making sense of large amounts of molecular data

Loading in 2 Seconds...

play fullscreen
1 / 36

Making sense of large amounts of molecular data - PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on

Making sense of large amounts of molecular data. Jason E. McDermott, PhD Research Scientist Computational Biology and Bioinformatics Group Pacific Northwest National Laboratory. How do components of biological systems interact to produce behavior?. Nucleic Acids. Proteins. Macromolecular

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Making sense of large amounts of molecular data' - bebe


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
making sense of large amounts of molecular data

Making sense of large amounts of molecular data

Jason E. McDermott, PhD

Research Scientist

Computational Biology and Bioinformatics Group

Pacific Northwest National Laboratory

how do components of biological systems interact to produce behavior
How do components of biological systems interact to produce behavior?

Nucleic Acids

Proteins

Macromolecular

Complex

molecular pathways
Molecular pathways

mTOR pathway

EGFR pathway

http://biocarta.com

scientific method overview
5Scientific Method Overview

Hypothesis

Hypothesis

Hypothesis

Hypothesis

Experimental design

Interpretation

Data generation

Predictions

Analysis/modeling

circumstantial evidence
Circumstantial Evidence

Traditional experimental approach

Cigarette butt on street

Neighbor was eyewitness to crime

Missing jewelry from the house

Fingerprints on doorknob

High-throughput experimental approach

Cigarette sales in city

Testimony from everyone on the block

All diamonds sold over last year in 10 mile radius

Fingerprints on every surface in the house

problem
Problem

New methods generating mountains of data

Very complex systems

Traditional methods fail in some cases

Progress will be made through better use of this data

Objectives

Formulate hypotheses for further investigation

Identify gene/protein ‘targets’

Identify pathways that drive disease

Develop systems-level biological understanding

what is a target
What is a ‘target’?

‘Critical nodes’

Regulators of important processes

Outcome of modeling (a prediction) that can be used to formulate a hypothesis

What are targets used for?

Mechanistic understanding of disease processes

Potential biomarkers of disease

Potential therapeutic treatments: drug development

slide9

Examples I’ll be talking about

Bacterial virulence (SalmonellaTyphimurium)

Viral pathogenesis (avian flu and SARS)

Ovarian cancer

Approaches I’ll be talking about

Machine learning

Biological networks

Data integration

salmonella typhimurium
Salmonella Typhimurium

LPS

TLR4

MEK

ERK

Egr-1

Bacterial detection

Invasion

SPI1+

LPS

Effectors

SPI2-T3S

Effectors

Bacterial survival

SCV

Virulence activation

SPI2-T3S

Effectors

Virulence activation

Pathogen

directed

(e.g. SifA, SlrP,

SseJ, SspH2)

(e.g. SifA, SlrP,

SseJ, SspH2)

Environmental Modulation

pH

Host

directed

Host defense

Environmental response

ssrA/B

ROS/

RNS

Environmental response

ssrA/B

ompR/

envZ

ompR/

envZ

phoP/Q

Mg2+

iNOS

NRAMP

phoP/Q

ydgT

ydgT

Fe2+

Pathogen Host

type iii secretion system secreted effectors

SlrP

SspH2

SseI

SseJ

SifA

SifB

SpvB

SseK-1

SopD-1

InvJ

SipC

+25 other known effectors

+??? other unknown effectors

Type-III secretion system secreted effectors

Karou Geddes

http://en.wikipedia.org/

svm based discrimination
SVM-based Discrimination

Positive

Negative

D2

D1

sieve validation using cyaa fusions
14SIEVE Validation Using CyaA Fusions

McDermott, et al. 2011. Infection and Immunity. 79(1):23-32

Niemann, et al. 2011. Infection and Immunity. 79(1): 33-43

biological networks
15
  • McDermott JE, et al. 2010. Drug Markers, 28(4):253-66.
Biological Networks
  • Types of networks
    • Regulatory networks
    • Protein-protein interaction networks
    • Biochemical reaction networks
    • Association networks
  • Network
    • Node = gene/protein or other component
    • Edge = inferred relationship between components
slide16

Genome

SNVs

CNVs

Comparison

methylation

Pathway enrichment

mRNA

LEAP

miRNA

Network analysis

protein

phosphorylation

metabolome

Merging disparate observations of a system to produce a single, more informative view

network inference method
Network inference method

Can we infer a relationship between two genes or proteins based on their expression profiles over a large number of different conditions?

conditions

A

gene

B

C

Faith, J., et al. “Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles.” 2007. PLoS Biology 5:e8

what are networks useful for
What are networks useful for?
  • Networks can be used for:
    • Pretty figures
    • Hypothesis generation
    • Functional modules and their organization
    • Topological identification of target critical nodes
    • Predicting future states of the network
  • Networks are NOT useful for:
    • Final mechanistic insight
    • Fine distinction of types of interactions between components
    • Causality
slide19

Hubs

  • High centrality, highly connected
  • Exert regulatory influences
  • Vulnerable
  • Bottlenecks
  • High betweenness
  • Regulate information flow within network
  • Removal could partition network

Yu H et al. PLoS Comp Biol 2007, 3(4):e59

bottlenecks in salmonella are essential for virulence
20Bottlenecks in Salmonella are essential for virulence
  • McDermott J, et al. 2009. J. Comp. Bio. 16(2):169-180
discovery of a novel class of effectors by integrating transcriptomic and proteomic networks
Discovery of a novel class of effectors by integrating transcriptomic and proteomic networks
respiratory virus pathogenesis
Respiratory virus pathogenesis
  • What are the causes of pathogenesis in respiratory viruses?
  • Goal: Identify and prioritize potential mediators of pathogenesis that are common and unique to influenza and SARS
  • Goal: Identify and prioritize potential mediators of high-pathogenecity viral infection
  • Approach:
    • Mouse models of infection
    • Transcriptomics
    • Network-based approach
    • Topological network analysis to define targets
    • Validation studies
hypotheses for validation
Hypotheses for Validation

KO Mouse

Infection

Survival

Death

Negative

Negative

Phenotype:

Altered

Altered

Altered

Negative

Network:

predicted targets abrogate influenza pathogenesis
25Predicted targets abrogate influenza pathogenesis

H5N1 infection

SARS infection

  • Tnfrsf1b (aka. Tnfr2)
    • Predicted common regulator for influenza and SARS pathogenesis
    • Tnfa binding
    • Negatively regulate TNFR1 signaling, which is proinflammatory
    • Promote endothelial cell activation/migration
    • Activation and proliferation of immune cells
slide26

10

5

0

-5

biological drivers in ovarian cancer
Biological Drivers in Ovarian Cancer
  • What genomic characteristics of ovarian cancer are executed at the protein level?
    • Can protein expression be used to identify the most important genomic changes?
  • How can we improve the survival of women with ovarian cancer?
    • Can proteomics provide insight into the biological processes associated with poor survival?
    • Can we use a pathway-based approach to suggest novel therapeutic targets?
proteomics
Proteomics
  • Chemoresistance in ovarian and breast cancer
  • Tumor samples from The Cancer Genome Atlas
    • Depth of genomic characterization
    • Many tumors
  • Proteomics and phosphoproteomics characterization of these tumors
  • Pathway/network analysis to reveal patterns and biomarkers
  • Integrate data into single view of the system
slide29

Clustering of Proteins and Phosphoproteins

Phosphoproteins

Proteins

iTRAQ Batch

Proteomic Subtypes

Transcriptomic Subtype

Log2 abundance relative to universal reference pool

slide30

A Subset of Proteins and Phosphopeptides Correlate with Patient Survival

Phosphorylation

(normalized to abundance)

Protein Abundance

Linear regression of abundance versus days-to-death suggests possible correlations with patient survival

slide31

PDGFRB Pathway

Correlated with short survival

Weak correlation

Correlated with long survival

Weak correlation

Not observed

mRNA abundance

phosphorylation

protein abundance

integrated co abundance network for ovarian cancer
Integrated Co-abundance Network for Ovarian Cancer

Module 1 (short survival)

Module 2 (long survival)

Correlated with short survival

Correlated with long survival

Protein

Phosphorylated protein

mRNA

survival analysis from network targets
Survival Analysis from Network Targets

Kaplan-Meier plots from integrated CNV, mRNA expression, and mutations

P-value 0.005

P-value 0.007

% survival

% survival

ATF3

DUSP1

FOSB

ZFP36

IGKV1-5

LAX1

AMPD1

IGHM

SLAMF7

Months survival

Months survival

conclusions
Conclusions
  • Several effective ways of big data integration
    • Machine learning approaches
    • Biological network representation
    • Data integration
  • Understanding of disease requires system-level views
  • Relatively simple approaches can yield novel insight
  • Combining different views of system can improve insight
  • Data analysis and modeling is a starting point- not an end point
acknowledgements
Acknowledgements
  • SysBEP (http://www.sysbep.org)
    • NIAID/NIH Y1-AI-8401
    • PI: Josh Adkins, PNNL
  • Systems Virology (http://www.systemsvirology.org)
    • NIAID/NIH HHSN272200800060C
    • PI: Michael Katze, UW
  • Clinical Proteomics Tumor Analysis Consortium
    • NCI/NIH 1U24CA160019
    • PIs: Richard Smith, PNNL; Karin Rodland, PNNL
  • Many, many people in these and other projects who helped with this work and made it possible
about me
36About Me
  • Email: Jason.McDermott@pnnl.gov
  • About: http://www.jasonya.com/wp/about/
  • Twitter: @BioDataGanache
  • Blog: The Mad Scientist’s Confectioner’s Club
    • http://www.jasonya.com/wp/