Functional interpretation of large scale omics data through pathway and network analysis
This presentation is the property of its rightful owner.
Sponsored Links
1 / 58

Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on
  • Presentation posted in: General

Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis. Bio-Trac 40 (Protein Bioinformatics) October 9, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology

Download Presentation

Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Functional interpretation of large scale omics data through pathway and network analysis

Functional Interpretation of Large-scale Omics Data through Pathway and Network Analysis

Bio-Trac 40 (Protein Bioinformatics)

October 9, 2008

Zhang-Zhi Hu, M.D.

Research Associate Professor

Protein Information Resource, Department of

Biochemistry and Molecular & Cellular Biology

Georgetown University Medical Center


Overview

Overview

  • Introduction

    • What are large-scale omics data?

    • What do they tell you? How to interpret?

  • Approaches

    • Omics data integration

    • Resources: databases and tools

  • Case studies

  • Systems biology

    • Top-down, bottom-up

    • Pathway, network modeling


Bioinformatics focus is changing

Genomics, Proteomics

Bioinformatics focus is changing…

  • Individual molecules

    • DNA, RNA, proteins

    • Sequence, structure, function

    • Evolutionary analysis

  • Population of molecules

    • Genome, proteome and other “-omes”

    • Interactions, complexes

    • Pathways, processes

    • High level organizations


From one gene multiple genetic variants multiple transcripts multiple protein products

From One Gene:multiple genetic variants, multiple transcripts, multiple protein products…

and PTMs…


To global knowledge the ome and omics

To Global Knowledge: The “-ome” and “-omics”

Genome

Transcriptome

Proteome

Metabolome

  • Other “-omes”:

  • ORFeome

  • Promoterome

  • Interactome

  • Receptome

  • Phenome

  • more…


Functional interpretation of large scale omics data through pathway and network analysis

Corresponding to ECM cluster (Chen et al., 2003; Qiu et al, 2007)

Gastric Cancer

ECM cluster

Global analysis

Genes

Potential Gene Markers

SPARC

COL3A1

SULF1

YARS

ABCA5

THY1

SIDT2


Identification of novel map kinase pathway signaling targets

Identification of novel MAP kinase pathway signaling targets

(PMA/TPA  K562 cells  MAPK pathway  targets)

Digest of U-24

~3500 spots

~91spot changes reproducible

Twenty-five targets of this signaling pathway were identified, of which only five were previously characterized as MKK/ERK effectors. The remaining targets suggest novel roles for this signaling cascade in cellular processes of nuclear transport, nucleotide excision repair, nucleosome assembly, membrane trafficking, and cytoskeletal regulation. -- Mol Cell. 6:1343-54, 2000


Drosophila embryo interaction map

Drosophila Embryo Interaction Map

Using Y2H technology, 102 bait protein homologous to human cancer genes, 2300 interactions detected, 710 high confidence.

The proteins in the map that bear an RA (Ras Association) or RBD (Raf-like Ras-binding) domain define a discrete subnetwork around Ras-like GTPases (colored in yellow).

The exploration of the present map leads to numerous biological hypothesis and expands our knowledge of regulatory protein networks important in human cancer as shown by the biological analysis of a particularly interesting network surrounding the Ras oncogene.

Genome Res. 15:376-84, 2005.


Functional interpretation of large scale omics data through pathway and network analysis

Omics Data

Microarray, 2D, IP, MS, etc.

Bioinformatics Databases

Gene, Protein, PPI, Pathway, PTM, etc.

Literature (MEDLINE)

~50% GO annotations

GO Profiling:

Molecular function, biological process, cellular component

Molecular networks

(e.g. interaction, association)

biological insights

Biological pathways (e.g. KEGG, Reactome, PID, BioCarta)

<10% pathway annotations

Strategy for Functional Analyses of Omics Data

Protein mapping

Data integration

Functional annotation

Text mining

Functional analysis

Pathway, network, biomarker discovery


Methods for functional analysis

Methods for Functional Analysis

  • Omics data integration

  • Functional profiling

  • Pathway analysis

  • Resources/knowledgebases

    • Molecular databases

    • Omics data repositories

  • Bioinformatics tools

    • Open source: DAVID, FatiGO, iProXpress

    • Commercial: Ingenuity, GeneGO

  • Literature

    • Text mining


Functional interpretation of large scale omics data through pathway and network analysis

Transcriptomics

iProXpress

Proteomics

mRNA microarray

Protein

Peptide

dbEST coding EST

Protein precursor

Natural peptides

Splicing forms

Protease/ Peptidase

DNA methylation profiling: coding genes

Peptidomics

Function Sites

Enzyme1

Signaling Pathways

Biological Processes

Metabolic Pathways

Epigenomics

Metabolites: HMDB

Enzyme2

dbSNP/ HapMap: NS-SNP

Metabolomics

Genomics

Functional Profiling and Analysis

Principles of multi-omics data integration for Systems Biology

Protein-Centric –Omics Analysis


Functional interpretation of large scale omics data through pathway and network analysis

Functional profiling

ID Mapping

Batch gene/protein retrieval and profiling

Enter ID, gi #

Information matrix

http://pir.georgetown.edu/pirwww/search/idmapping.shtml


Functional interpretation of large scale omics data through pathway and network analysis

Protein annotations

Comments (CC line)

Features (FT line)

References (RX line)

21 years!

Cross References (DR line)

Well annotated entry:human p53 (P53_HUMAN)

GO


Functional interpretation of large scale omics data through pathway and network analysis

what molecular function?

what biological process?

what cellular component?


Biological pathways and networks

Biological Pathways and Networks

Signaling pathways

Metabolic pathways

Organelle biogenesis

Molecular networks


Pathways

Pathways

Human metabolic maps

Global gene expression in skeletal muscle from gastric bypass patients before surgery and 1 year afterward.

General trend after surgery: up-regulated anaerobic metabolism; down-regulated oxidative phosphorylation

green, down-regulated genesred, up-regulated geneswhite, no data available

Proc Natl Acad Sci U S A.

2007 Feb 6;104(6):1777-82

http://www.pnas.org/cgi/data/0610772104/DC1/30


Databases of protein functions

Databases of Protein Functions

  • Metabolic Pathways

    • KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways

    • EcoCyc: Encyclopedia of E. coli Genes and Metabolism

    • MetaCyc: Metabolic Encyclopedia (Metabolic Pathways)

  • Inter-Molecular Interactions and Regulatory Pathways

    • IntAct: Protein interaction data from literature and user submission

    • BIND: Descriptions of interactions, molecular complexes and pathways

    • DIP: Catalogs experimentally determined interactions between proteins

    • Reactome - A curated knowledgebase of biological pathways

    • Pathway Interaction Database (PID)

    • BioCarta: Biological pathways of human and mouse

    • Pathway Commons

  • GO and GO annotation projects


Gene ontology go

- Molecular Function

- Biological Process

- Cellular Component

(http://www.geneontology.org/)

Gene Ontology (GO)


Go slim

GO Slim

http://www.geneontology.org/GO.slims.shtml


Biological pathway resource collection

Biological Pathway Resource Collection

http://www.pathguide.org/

  • Protein-protein interactions

  • Metabolic pathways

  • Signaling pathways

  • Pathway diagrams

  • Transcription factors / gene regulatory networks

  • Protein-compound interactions

  • Genetic interaction networks


Functional interpretation of large scale omics data through pathway and network analysis

http://www.pathwaycommons.org/pc/home.do


Kegg metabolic regulatory pathways

KEGG Metabolic & Regulatory Pathways

  • KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions.

(http://www.genome.ad.jp/kegg/pathway.html)


Biocarta cellular pathways

BioCarta Cellular Pathways

(http://www.biocarta.com/index.asp)

Transforming Growth Factor (TGF) beta signaling [Homo sapiens]


Transforming growth factor tgf beta signaling homo sapiens

Transforming Growth Factor (TGF) beta signaling [Homo sapiens]

(http://reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=170834&)

Reactome: events and objects (including modified forms and complex)

Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens]

Object -> REACT_7364.1: Phospho-R-SMAD [cytosol]

Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens]

Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol]

Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleus

Object -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] ……


Functional interpretation of large scale omics data through pathway and network analysis

PID

Transforming Growth Factor beta signaling


Functional interpretation of large scale omics data through pathway and network analysis

Transforming Growth Factor (TGF) beta signaling

Reactome

PID

~26 proteins in PID are not defined in Reactome, while only 2 in Reactome not defined in PID


Functional interpretation of large scale omics data through pathway and network analysis

LAP

TGF-b

TGF-b

TGF-b

II

II

I

I

STRAP

Smad 7

Shc

Smad 2

Smad 2

Smad 2

Smad 2

Smad 2

S

S

S

S

S

S

S

S

S

S

S

S

S

S

X

S

S

S

S

S

S

S

S

S

S

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

TAK1

Y

T

Y

T

Y

T

Y

Y

Y

K

T

T

K

Y

T

T

Y

P

P

P

P

U

P

P

P

U

P

P

P

P

P

P

P

P

Smad 4

Smad 4

Smad 4

Smad 2

Phosphorylation (P) at Serine (S), Threonine (T) and Tyrosine (Y)

Ubiquitination (U) at Lysine (K)

TGF-beta signaling – comparison between PID and Reactome

Furin

Growth signals

Ca2+

Growth signals

Stress signals

PRO:000000616

TGF-beta receptor

PRO:000000523

PRO:000000410

Cytoplasm

Smad 2

PRO:000000650

MEKK1

Smad 4

ERK1/2

Shc

XIAP

CaM

TAK1

X

Degradation

P38 MAPK

pathway

JNK

cascade

MAPKKK

Ski

Nucleus

Common in both Reactome & PID

X

Only reported in Reactome

* All others are in PID. Not all components in the pathway from both databases are listed

DNA binding and transcription regulation


Pride centralized standards compliant public data repository for proteomics data

GEO: a gene expression/ molecular abundance repository

PRIDE: centralized, standards compliant, public data repository for proteomics data

http://www.ncbi.nlm.nih.gov/geo/

IntAct: open source database system and analysis tools for protein interaction data

http://www.ebi.ac.uk/pride/

http://www.ebi.ac.uk/pride/


Analysis tools

Analysis Tools

  • iProXpress

    • http://pir.georgetown.edu/iproxpress/

  • DAVID

    • http://david.abcc.ncifcrf.gov/

  • Babelomics - FatiGO

    • http://babelomics.bioinfo.cipf.es/

  • Commercial:

    • Ingenuity: http://www.ingenuity.com/

    • GeneGO: http://www.genego.com/

  • Visual tools:

    • Cytoscape: http://www.cytoscape.org/

    • CellDesigner: http://www.celldesigner.org/


Functional interpretation of large scale omics data through pathway and network analysis

iProXpress: Integrative analysis of proteomic and gene expression data

Data

MS spectrum

Peptide ident.

Protein ident.

http://pir.georgetown.edu/iproxpress/

Information

Function

Pathway

Family

Categorize

Statistics

Association

Knowledge


Iproxpress pathway profiling

  • Organelle proteome data sets

iProXpress– Pathway Profiling

ER

Mit

  • Protein information matrix: extensive annotations including protein name, family classification, function, protein-protein interaction, pathway…

  • Functional profiling: iterative categorization, sorting, cross-dataset comparison, coupled with manual examination.

Mit

ER

KEGG pathway


Functional interpretation of large scale omics data through pathway and network analysis

iProXpress Analysis Interface

1

2

3

4

5

6

7

8

Cross-data groups comparative profiling


Functional interpretation of large scale omics data through pathway and network analysis

http://david.abcc.ncifcrf.gov/


A literature derived network for yeast

A Literature-Derived Network for Yeast

  • All MEDLINE abstracts processed using statistical co-occurrence and NLP methods:

  • Functional association (co-occurrence – grey shades

  • Physical interaction – green

  • Regulation of expression – red

  • Phosphorylation – dark blue

  • Dephosphorylation – light blue

  • Inference: Ssn3 ->Hsp104 (b) and Ume6 -> Ino2 & Erg9 (c) expressions

Jensen et al., 2006


Pathway studies analysis of proteomics and gene expression data from cancer research

Case Studies

Pathway studies: analysis of proteomics and gene expression data from cancer research

I. Estrogen Signaling Pathways (estrogen-induced apoptosis)

Breast cancer cells (+E2)  IP (AIB1, pY)  1D-gel  MS/MS

II. Purine Metabolic Pathways (radiation-induced DNA repair)

Human fibroblast (AT patient) + irradiation  2D-gel  MS

 DNA microarray

III. Melanosome Biogenesis (comparative organelle proteomic profiling)

Melanoma cell  isolation of stage specific melanosmes  MS


Functional interpretation of large scale omics data through pathway and network analysis

E2

Mimicking clinical condition: 2nd phase anti-estrogen drug resistance

MCF-7

MCF-7/5C

Estrogen deprived condition

Signaling pathway: early events?

Breast cancer cells

AIB1

Growth

Apoptosis

pY-IP

AIB1-IP

Integrated Bioinformatics

Expression Profiling, Pathway/Network Mapping

MS proteomics

I. Estrogen Signaling Pathways (estrogen-induced apoptosis)

200nM for 2h

Hu ZZ, et al. (2008) US HUPO


Functional interpretation of large scale omics data through pathway and network analysis

Proteins only in E2 treated MCF-7/5C cells from both pY-IP and AIB1-IP

GO profiling (biological process)

Transcription

Cell communication

Chromosome remodeling & co-repression, cell cycle inhibition, apoptosis


Functional interpretation of large scale omics data through pathway and network analysis

Pathway Mapping:

G(o) alpha-2 subunit (pY/5C +E2)

RAP1GAP (AIB1/5C+E2)


Functional interpretation of large scale omics data through pathway and network analysis

GPR30

E2

pY

pY

?

CDK1

GNAO2

Cytoplasm

?

AIB1

AIB1

Rap1GAP

Rap1a

E2

E2

ERa

ERa

Gas

TLE3

Apoptosis

MEK

RUNX3

ERK

BAD

Sirt3

Apoptosis

Cell growth

Sirt3

Nucleus

pY

CIP29

Hypothesized E2-induced Apoptosis Pathways

pY-IP

AIB1-IP

Function

GNAO2

G(o) alpha-2, GPCR signaling

Rap1GAP

Growth inhibition/apoptosis

CDK1

BAD-mediated apoptosis

Sirt3

Histone modification, apoptosis

TLE3

Co-repression, apoptosis

CIP29

Cell cycle arrest/apoptosis


Functional interpretation of large scale omics data through pathway and network analysis

Text mining for protein-protein interaction (PPI) information


Functional interpretation of large scale omics data through pathway and network analysis

2D-gel/MS

DNA Microarray

Proteins differentially expressed (1093)

mRNAs differentially expressed (231)

Intersections

Integrated Bioinformatics

Expression Profiling, Pathway/Network Mapping

(13 proteins/genes)

II. Purine Metabolic Pathways (radiation-induced DNA repair)

Ionizing Radiation

AT5BIVA

ATCL8

ATM introduced

AT patient fibroblast

ATM-wild type

ATM-mutated

ATM

Sensitive to IR damage

Resistant to IR damage

Hu ZZ, et al. (2008) J Prot. Bioinfo.


Functional interpretation of large scale omics data through pathway and network analysis

KEGG pathway profiles


Functional interpretation of large scale omics data through pathway and network analysis

(RRM2)


Functional interpretation of large scale omics data through pathway and network analysis

DNA synthesis DNA repair

dGTP X GTP

dGDPGDP

ATP X dATP

ADPdADP

Ribonucleoside diphosphate reductase subunit M2 (RRM2)

1.17.4.1

1.17.4.1

Purine metabolic pathway


Functional interpretation of large scale omics data through pathway and network analysis

RRM2

HDAC1

p53

BRCA1

Functional Association Networks

RRM2 connected to other major DNA repair and cell cycle proteins, such as p53, BRCA1, HDAC1.


Functional interpretation of large scale omics data through pathway and network analysis

ATM

p53

HDAC1

BRCA1

BRCA1

ATM

RRM2

p53

RRM1

RR complex

DNA repair

RRM2 in radiation-induced ATM-p53-mediated DNA repair pathway


Iii organelle proteomes

III. Organelle Proteomes

Comparative organelle proteome profiling allows to propose key proteins potentially involved in regulation of organelle biogenesis

Schematic drawing of melanosome biogenesis pathway and key proteins involved in each stage.

Chi A, et al. (2006) J. Prot. Res.


Towards systems biology

Genomics

Bibliomics

Transcriptomics

Literature Mining

Proteomics

Metabolomics

Bioinformatics

…mics

…mics

…omics

Towards Systems Biology

(Nature 422:193, 2003)

Integrated knowledge and tools are needed for Systems Biology’s research


What is systems biology

What is Systems Biology?

Systems Biology, 2004, 1(1):19-27.

‘Systems biology defines and analyses the interrelationships of all of the elements in a functioning system in order to understand how the system works.’-- Leroy Hood

  • How an organism works from an overall perspective.

  • Interactions of parts of biological systems

    • how molecules work together to serve a regulator function in cells or between cells.

    • how cells work to make organs, how organs work to make a person.

  • Systems biology is the converse of reductionist biology.


Reductionist vs systems biology

Reductionist vs. Systems Biology

The driving force for 21st century biology will be integration:

Integrating the activity of genes and regulators into regulatory networks

Integrating the interactions of amino acids into protein folding predictions

Integrating the interactions of metabolites into metabolic networks

Integrating the interactions of cells into organisms

Integrating the interactions of individuals into ecosystems

The driving force in 20th century biology has been reductionism:

From the population to the individual

From the individual to the cell

From the cell to the biomolecule

From the biomolecule to the genome

From the genome to the genome sequence

With the publication of genome sequences, reductionist biology has reached its endpoint


Universal organizing principles

Universal Organizing Principles

Large-scale organization

Level 4

Functional modules

Level 3

Regulatory motif, pathway

Level 2

Omics data, information

Level 1

Although the individual components are unique to a given organism, the topologic properties of cellular networks share surprising similarities with those of natural and social networks


Approaches top down or bottom up

Three types of models

Approaches: top-down or bottom-up

  • top-down: systemic-data driven, to discover or refine pre-existing models that describe the measured data (more on regulatory models). Emerges as dominant method due to “-omics”.

  • bottom-up: starts with the molecular properties to construct models to predict systemic properties followed by validation and model refinement (more on kinetic models) (Silicon cell program: http://www.siliconcell.net/)

Bruggeman FJ, Westerhoff HV. Trends Microbiol. 2007 15:45-50.


Top down

Top-down

Yeast two-hybrid

Combination of techniques (Y2H, protein arrays)

Integration of other types of information (expression, localization or genetic studies)

dynamic biologically relevant interaction subnetworks

Curr Opin Chem Biol. 2006 Dec;10(6):551-8.


Egfr gab1 erk akt network

EGFR-GAB1-ERK/Akt network

EGFR signaling network model is constructed based on the reaction stoichiometry and kinetic constants

Bottom-up

J Biol Chem. 2006 281:19925-38

  • The model allows predictions of temporal patterns of cellular responses to EGF under diverse perturbations (e.g., EGF doses):

  • The dynamics of GAB1 tyr-phosphorylation is controlled by positive GAB1-PI3K and negative MAPK-GAB1 feedbacks.

  • The essential function of GAB1 is to enhance PI3K/Akt activation and extend the duration of Ras/MAPK signaling.

  • GAB1 plays a critical role in cell proliferation and tumorigenesis by amplifying positive interactions between survival and mitogenic pathways


Gene regulatory networks grns

Gene regulatory networks (GRNs)

WIRED Systems biology looks at the connections between components in cells.

Essential elements of the role of Dorsal in establishing dorsoventral polarity in Drosophila embryonic development

Reprod Toxicol. 19:281-90, 2005


Modeling of the main modules of cell cycle progression

Modeling of the main modules of cell-cycle progression

  • Three functional units:

  • Start function: onset of S-phase

  • Cyclin cascades (C1, C2, C3)

  • End function: onset of mitosis to cell division

Chembiochem 5:1322-33, 2004


Challenges to systems biology

Challenges to Systems Biology

  • A complete characterization of an organism (molecular constituents  interactions  cell function)

  • Spatial-temporal molecular characterization of a cell

  • A thorough systems analysis of “molecular response” of a cell to external/internal perturbations

  • Information must be integrated into mathematical models to enable knowledge testing by formulating hypothesis and discovery of new biological mechanisms…


Functional interpretation of large scale omics data through pathway and network analysis

Cellular Maps?

signaling, metabolism, gene regulation …


  • Login