Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Deve...
This presentation is the property of its rightful owner.
Sponsored Links
1 / 68

Georg Gerber, PhD Gifford Laboratory, MIT CSAIL April 9, 2009 PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on
  • Presentation posted in: General

Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development. Georg Gerber, PhD Gifford Laboratory, MIT CSAIL April 9, 2009. Outline. Goals Expression data overview TF-TF interaction networks p air-wise mutual information Bayesian networks

Download Presentation

Georg Gerber, PhD Gifford Laboratory, MIT CSAIL April 9, 2009

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Georg gerber phd gifford laboratory mit csail april 9 2009

Initial Steps Toward Computational Discovery of Genetic Regulatory Networks in Pancreatic Islet Development

Georg Gerber, PhD

Gifford Laboratory, MIT CSAIL

April 9, 2009


Outline

Outline

  • Goals

  • Expression data overview

  • TF-TF interaction networks

    • pair-wise mutual information

    • Bayesian networks

  • Gene expression programs

  • ChIP-seq data

  • Directions for future work


Biological goals of building a transcriptional regulatory network of pancreatic specification

Biological goals of building a transcriptional regulatory network of pancreatic specification

  • Knowledge of distinct signaling/transcriptional steps involved in pancreatic specification

    • Optimize ES differentiation by determining signaling event(s) directly inducing each sequential TF

  • What is the network structure? Linear or cross-regulatory, parallel or all interrelated

    • Direct reprogramming using TFs would benefit from knowing hierarchy of each network

    • Are TFs that play role in specification of pancreas necessary for later function of pancreas or are they merely required to properly induce other necessary TFs?

  • Can knowledge of the pancreatic specification network teach us about lineage diversification within the pancreas (endocrine, exocrine, duct)?


Immediate computational goals

Immediate computational goals

  • Determine set of transcription factors active at different developmental stages

  • Discover network “wiring”

  • Determine how network changes/evolves throughout development

  • Compare in vivo and ESC networks


Outline1

Outline

  • Goals

  • Expression data overview

  • TF-TF interaction networks

    • pair-wise mutual information

    • Bayesian networks

  • Gene expression programs

  • ChIP-seq data

  • Directions for future work


Expression data overview

Expression data overview

E8.25

Embryonic ectoderm/notochord

Embryonic mesoderm

Definitive endoderm

(E7.75 and E8.75 as well)

E11.5

Stomach

endoderm

Intestinal

endoderm

Pancreatic

Endoderm

(E10.5 as well)

Lung

endoderm

Liver

endoderm

Esophageal

endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

Tcf2

Foxa2

DMSO

DMSO/

2 uM RA

6h/24h

50 ng/mLActA

6 days

ES

Sox17

GFP+

FACS sort Sox17GFP+Dpp4- definitive endoderm

and perform microarray

2 uM RA

  • Implant bead coated with DMSO/RA into foregut of E8.25 (4-6 somite) embryo

  • Explant embryo anterior to 1stsomite

  • Culture for 6/24 hours

  • Dissociate, sort for EpCAM+ endoderm

  • Amplify RNA and profile on Illumina Mouse Ref8 v2 chips


Expression data overview cont

Expression data overview (cont.)

  • 120 Illumina arrays (18118 genes/array)

  • 72 distinct experiments (41 in mESC’s)

  • Standardized mESC/in vivo experiments separately

  • 2758 genes w/ ≥ 2-fold change in ≥ 5 experiments

  • 154 TFs w/ ≥ 2-fold change in ≥ 5 experiments (out of 946 “definite” or “candidate” TFs from TFCat, Fulton et al, Genome Biology 2009)


Limitations of expression data for genetic network reconstruction

Limitations of expression data for genetic network reconstruction

  • Need 100’s of varied experiments for finding relevant/significant networks

  • Association ≠ causation

  • High false positive rates (high dimensional, noisy, dependent data)

  • High false negative rates (low TF transcript abundance, post-transcriptional regulation, etc.)


Outline2

Outline

  • Goals

  • Expression data overview

  • TF-TF interaction networks

    • pair-wise mutual information

    • Bayesian networks

  • Gene expression programs

  • ChIP-seq data

  • Directions for future work


Pair wise mutual information networks clr

Pair-wise mutual information networks (CLR)

  • Context Likelihood of Relatedness method: Faith et al., PLoS Biology 2007

  • Computes MI between all genes

  • Innovation: considers MI distribution for both target and source to compute p-values/estimate FDR


Clr cont

CLR (cont.)


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

E8.25 4-6s definitive endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

E8.75 13-15s definitive endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

E9.5 definitive endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

E10.5 pancreatic endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

E11.5 pancreatic endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

E11.5 intestinal endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

6h 83 uM RA bead

mES 2 uM RA 6h


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

24h 83 uM RA bead

mES 2 uM RA 24h


Outline3

Outline

  • Goals

  • Expression data overview

  • TF-TF interaction networks

    • pair-wise mutual information

    • Bayesian networks

  • Gene expression programs

  • ChIP-seq data

  • Directions for future work


Bayesian networks

Bayesian networks

  • Directed networks, allow for multiple parents

  • Encode conditional independence

  • Penalize complexity automatically

  • Software: Banjo (Alexander Hartemink, Duke University)


Georg gerber phd gifford laboratory mit csail april 9 2009

E8.25 4-6s definitive endoderm

TF-TF network (Bayes Net)


Georg gerber phd gifford laboratory mit csail april 9 2009

E8.75 13-15s definitive endoderm

TF-TF network (Bayes Net)


Georg gerber phd gifford laboratory mit csail april 9 2009

E9.5 definitive endoderm

TF-TF network (Bayes Net)


Georg gerber phd gifford laboratory mit csail april 9 2009

E10.5 pancreatic endoderm

TF-TF network (Bayes Net)


Georg gerber phd gifford laboratory mit csail april 9 2009

E11.5 pancreatic endoderm

TF-TF network (Bayes Net)


Georg gerber phd gifford laboratory mit csail april 9 2009

mES 2 uM RA 6h

6h 83 uM RA bead

TF-TF network (Bayes Net)


Georg gerber phd gifford laboratory mit csail april 9 2009

mES 2 uM RA 24h

24h 83 uM RA bead

TF-TF network (Bayes Net)


Outline4

Outline

  • Goals

  • Expression data overview

  • TF-TF interaction networks

    • pair-wise mutual information

    • Bayesian networks

  • Gene expression programs

  • ChIP-seq data

  • Directions for future work


Advantages to methods that discover groups of genes

Advantages to methods that discover groups of genes

  • Infer more robust relationships because considering many genes

  • Allow for enrichment analysis

    • Functional categories

    • Signaling pathways

    • TF DNA binding sequence motifs


Geneprogram

GeneProgram

  • Gerber et al, PLoS Comp Bio 2007

  • Discovers sets of genes co-expressed across subsets of conditions

  • Innovations:

    • Simultaneously models probabilistic structure of experiments (tissues) and genes

    • Uses Hierarchical Dirichlet Processes, a fully Bayesian method for automatically determining the number of expression programs and tissue groups

    • Outperforms state-of-the-art biclustering methods


Georg gerber phd gifford laboratory mit csail april 9 2009

Hierarchical clustering

Singular Value Decomposition (SVD)

Non-negative Matrix Factorization (NMF)

GeneProgram w/o tissue groups

Full GeneProgram model


Georg gerber phd gifford laboratory mit csail april 9 2009

tissue groups

GeneProgram produced a map of 12 tissue groups and 62 expression programs


Georg gerber phd gifford laboratory mit csail april 9 2009

tissue

GeneProgram produced a map of 12 tissue groups and 62 expression programs


Georg gerber phd gifford laboratory mit csail april 9 2009

GeneProgram produced a map of 12 tissue groups and 62 expression programs

expression programs (sorted by generality score)


Georg gerber phd gifford laboratory mit csail april 9 2009

GeneProgram produced a map of 12 tissue groups and 62 expression programs

expression program use by tissue


Expression program enrichment analysis

Expression program enrichment analysis

  • GO categories

    • FDR controlled to 5%

  • TRANSFAC motifs

    • Software: SAMBA

    • Scans +3000 to -200 bp for each motif

    • Uses PWM to score region, background to calculate p-value (Bonferroni corrected)


Georg gerber phd gifford laboratory mit csail april 9 2009

Expression programs (GO and motif enrichment)

E8.25 4-6s definitive endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

Expression programs (GO and motif enrichment)

E8.75 13-15s definitive endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

Expression programs (GO and motif enrichment)

E9.5 definitive endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

Expression programs (GO and motif enrichment)

E10.5 pancreatic endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

Expression programs showing TFs in programs and motif enrichment

E8.25 4-6s definitive endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

Expression programs showing TFs in programs and motif enrichment

E8.75 13-15s definitive endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

Expression programs showing TFs in programs and motif enrichment

E9.5 definitive endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

Expression programs showing TFs in programs and motif enrichment

E10.5 pancreatic endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

Expression programs showing TFs in programs and motif enrichment

E11.5 pancreatic endoderm


Outline5

Outline

  • Goals

  • Expression data overview

  • TF-TF interaction networks

    • pair-wise mutual information

    • Bayesian networks

  • Gene expression programs

  • ChIP-seq data

  • Directions for future work


Retinoic acid receptor chip seq data

Retinoic acid receptor ChIP-seq data

  • Generated in the Wichterle lab at Columbia (unpublished data, Motor Neuron Development Project)

  • mESC’s grown to embryoid body stage, profiled after 8h of RA exposure


Chip seq rar binding cyp26a1

ChIP-seq RAR binding: Cyp26a1


Chip seq rar binding rarb

ChIP-seq RAR binding: Rarb


Overlap of melton lab expression data and rar binding data

Overlap of Melton lab expression data and RAR binding data

Binding events determined with modified MACS method (Zhang et al, Genome Biology 2008); called if significant peak found w/in 50 kb of gene start site


Future computational directions

Future computational directions

  • Add publically available ES expression data

  • Apply more sophisticated TF binding motif methods (phylogeny, spatial arrangements, co-regulation)

  • Extend GeneProgram framework for add’l data types (TF expression, binding motifs, ChIP-seq, knockdown/overexpression, ?protein-protein interactions, etc.) → causal/predictive models

  • Infer dynamic rewiring networks over inferred developmental tree

  • Develop novel probabilistic methods for ChIP-seq data


Acknowledgements

Acknowledgements

  • Rich Sherwood (Melton lab) - all the expression data!

  • Arvind Jammalamadaka (Gifford lab) -initial data analysis/normalization methods

  • Shaun Mahony (Gifford lab) - RA ChIP-seq data analysis

  • Esteban Mazzoni (Wichterle lab) - RA ChIP-seq data


Backup slides

Backup slides


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

E11.5 stomach endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

E12.5 esophagus endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

E11.5 liver endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

TF-TF network (MI)

E11.5 lung endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

E8.25 anterior endoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

E8.25 4-6s ectoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

E8.25 4-6s mesoderm


Georg gerber phd gifford laboratory mit csail april 9 2009

6h 83 mM RA bead


Georg gerber phd gifford laboratory mit csail april 9 2009

d1 83 mM RA bead


Georg gerber phd gifford laboratory mit csail april 9 2009

mES 2mM RA 6h


Georg gerber phd gifford laboratory mit csail april 9 2009

mES 2mM RA 24h


Georg gerber phd gifford laboratory mit csail april 9 2009

mES differentiated 7d


Georg gerber phd gifford laboratory mit csail april 9 2009

GeneProgram outperformed popular biclustering algorithms in discovery of biologically meaningful gene sets from real microarray data

N = Novartis Tissue Atlas v2 (141 mouse and human tissues)

S = Shyamsundar et al. (115 human tissues)


  • Login