1 / 74

Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 http://biochem218.stanford.edu/

Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 http://biochem218.stanford.edu/. Discovering Transcription Factor Binding Sites in Co-Regulated Genes. Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy). Motivation. MicroArray analysis of

jerry
Download Presentation

Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 http://biochem218.stanford.edu/

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Molecular BiologyBiochem 218 – BioMedical Informatics 231http://biochem218.stanford.edu/ Discovering Transcription FactorBinding Sites in Co-Regulated Genes Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy)

  2. Motivation MicroArray analysis of whole genome gene expression Clustering of genes based on their expression pattern Searching for conserved sequence motifs regulating the expression

  3. Megacluster of Yeast Gene Expression

  4. T Cells Signaling DNA Damage Fibroblast Stimulation B Cells Signaling CMV Infection Anoxia Polio Infection Monocytes Signaling IL4 Hormone Human Gene Expression Signatures

  5. Pho 5 Pho 8 Pho 81 Pho 84 Pho … Transcription Start Finding Transcription Factor Binding Sites Upstream RegionsCo-expressed Genes GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC CACATCGCATCACGTGACCAGT...GACATGGACGGC GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG CGCTAGCCCACGTGGATCTTGA...AGAATGACTGGC

  6. Finding Transcription Factor Binding Sites Upstream RegionsCo-expressed Genes GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC CACATCGCATCACGTGACCAGT...GACATGGACGGC GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG CGCTAGCCCACGTGGATCTTGT...AGAATGGCCTAT

  7. Finding Transcription Factor Binding Sites Upstream RegionsCo-expressed Genes ATGGCTGCACCACGTTTATGC...ACGATGTCTCGC CACATCGCATCACGTGACCAGT...GACATGGACGGC GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA TTAGGACCATCACGTGA...ACAATGAGAGCG CGCTAGCCCACGTTGATCTTGT...AGAATGGCCTAT Pho4 binding

  8. Three Algorithms • BioProspector • Presented in 2000 • Extends Gibb’s sampling (stochastic method) • For any cluster of sequences • MDScan • Deterministic approach • Enumerative • Very fast • For sequences with some ranking information • MotifCut and MotifScan • Graph-based • Does not use PSSMs • Novel and sensitive

  9. Consensus motif: CACAAAA Degenerate motif: CRCAAAW A/T A/G Representing Ambiguous DNA Motifs • Sequence Patterns (Regular expressions) • IUPAC nomenclatures for DNA ambiguities

  10. Frequency weight Matrix Alignment Matrix Weight Matrix for Transcription Factor Binding Sites A DNA Motif as a position specific frequency weight matrix Sites ATGGCATG AGGGTGCG ATCGCATG TTGCCACG ATGGTATT ATTGCACG AGGGCGTT ATGACATG ATGGCATG ACTGGATG

  11. Weight Matrix with Consensus Sequence & Logotype with Degenerate Consensus Weight Matrix or Position Specific Scoring Matrix TTWHYCGGHY

  12. BioProspector Initialization Gather together upstream regulatory regions

  13. a1 a2 a3 a4 ak BioProspector Initialization Actual Location of Regulatory Motifs is Unknown

  14. a1' a2' a3' a4' ak' Initial Motif BioProspector Initialization Randomly initialize the beginning motif

  15. Motif Without a1' Segment BioProspector Iterative Update Take out one sequence at a time with its segment a1' a2' a3' a4' ak'

  16. Motif Without a1' Segment BioProspector Iterative Update Score each segment with the current motif Sequence 1 Segment (1-6): 1.5 a2' a3' a4' ak'

  17. Motif Without a1' Segment BioProspector Iterative Update Score each segment with the current motif Sequence 1 Segment (2-7): 3 a2' a3' a4' ak'

  18. Motif Without a1' Segment BioProspector Iterative Update Score each segment with the current motif Sequence 1 Segment (3-8): 2.7 a2' a3' a4' ak'

  19. Motif Without a1' Segment BioProspector Iterative Update Score each segment with the current motif Sequence 1 Segment (4-9): 9.0 a2' a3' a4' ak'

  20. Motif Without a1' Segment BioProspector Iterative Update Score each segment with the current motif Sequence 1 Segment (5-10): 3.2 a2' a3' a4' ak'

  21. Motif Without a1' Segment BioProspector Iterative Update Score each segment with the current motif Sequence 1 Segment (6-11): 27.1 a2' a3' a4' ak'

  22. Motif Without a1' Segment BioProspector Iterative Update Score each segment with the current motif Sequence 1 Segment (7-12): 11.2 a2' a3' a4' ak'

  23. Motif Without a1' Segment BioProspector Iterative Update Score each segment with the current motif Sequence 1 Segment (8-13): 2.9 a2' a3' a4' ak'

  24. Motif Without a1' Segment BioProspector Iterative Update Score each segment with the current motif Sequence 1 Segment (9-14): 9.1 a2' a3' a4' ak'

  25. a1" Candidate Motif Motif Without a1' Segment BioProspector Iterative Update Score sequence 1 in all possible alignments a2' a3' a4' ak'

  26. a1" Motif Without a2' Segment BioProspector Iterative Update Repeat the process until convergence a2' a3' a4' ak'

  27. Challenges for BioProspectorhttp://bioprospector.stanford.edu/ • Variable (0-n) motif sites per sequence • Motif enriched only in upstream sequences, not in the whole genome • Some motifs could have two conserved blocks separated by a variable length gap • Motifs are not highly conserved (~50%) • Some motifs show a palindromic symmetry • Assign motifs a measure of statistical significance

  28. Keep TH Sample TL Discard Thresholds Allow forVariable Motif Copies • Sequences that do not have the motif • Sequences with multiple copies of motif

  29. BioProspector Finds Motif With Two Blocks Two-block motifs: GACACATTACCTATGCTGGCCCTACGACCTCTCGC CACAATTACCACCA TGGCGTGATCTCAGACACGGACGGC GCCTCGATTACCGTGGTA TGGCTAGTTCTCAAACCTGACTAAA TCTCGTTAGATTACCACCCATGGCCGTATCGAGAGCG CGCTAGCCATTACCGAT TGGCGTTCTCGAGAATTGCCTAT

  30. 97.9 Sample Block2 start Sequence blk1 block2 Min Gap 22.4 26.5 30.1 18.9 Max Gap BioProspector Finds MotifsWith Two Blocks Two-block motifs

  31. GCGTAA AATGCG BioProspector Finds Motif With Inverse Complementary Blocks Two-block motifs Palindrome motifs:

  32. BioProspector Results:B. subtilis two-block promoter • B. subtilis transcription best studied • 136 σA-dependent promoter sequences [-100, 15] • Look for w1 = w2 = 5, gap[15, 20] two-block motif • Correctly identified motif [TTGACA, TATAAT] and 70% of all the sites • Occasionally predicted two promoters “Correct” site Second site abrB TTGACG TACAAT veg TTGACA TATAAT f105 TTTACA TACAAT

  33. BioProspector Web Server:http://bioprospector.stanford.edu/

  34. BioProspector Web Server:http://bioprospector.stanford.edu/

  35. Compare Prospectorhttp://compareprospector.stanford.edu/ Liu et al, 2004, Genome Res 14(3): 451-458.

  36. Motif Regions conserved between two species 1 kb Liu et al, 2004, Genome Res 14(3): 451-458 Compare Prospectorhttp://compareprospector.stanford.edu/

  37. Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene n Tch Biased sampling: Initial iterations: Tch Later iterations: Tcl Tcl Tch Tcl Liu et al, 2004, Genome Res 14(3): 451-8, Compare Prospectorhttp://compareprospector.stanford.edu/

  38. Compare Prospectorhttp://compareprospector.stanford.edu/ (Liu Y et al, Nucleic Acids Res 32:W204-7)

  39. Compare Prospectorhttp://compareprospector.stanford.edu/ (Liu Y et al, Nucleic Acids Res 32:W204-7)

  40. Yeast Rap1 Sequences • Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

  41. Cross link protein-DNA interaction Yeast Rap1 Sequences • Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

  42. Cross link protein-DNA interaction Shear DNA Yeast Rap1 Sequences • Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

  43. Immunoprecipitation Yeast Rap1 Sequences • Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

  44. PCR amplify and label DNA Yeast Rap1 Sequences • Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

  45. Hybridize with microarray and measure reading Yeast Rap1 Sequences • Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

  46. Cross link protein-DNA interaction Shear DNA Immunoprecipitation Purify DNA Purify DNA PCR amplify and label DNA Hybridize with microarray and measure reading Yeast Rap1 Sequences • Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment

  47. Chromatin Immune Precipitation

  48. Yeast Rap1 Sequences • Chromatin immunoprecipitation + microarray (ChIP-on-chip, ChIP-array, IP) experiment • Rap1 IP Enriched 727 DNA fragments • 45% are intergenic • Average length 1-2 KB • Some are false positives • Some have multiple Rap1 sites

  49. Useful Insights • In ChIP-array experiments, highly enriched sequences are usually the real targets • Transcription factor binding sites occurs more abundantly in these real targets • Search TF sites from high-confidence sequences first before examine the rest sequences? Motif Discovery Scan (MDscan)

  50. m-matches for an 8-mer MDscan Algorithm:Define m-matches For a given w-mer and any other random w-mer TGTAACGT 8-mer TGTAACGT matched 8 AGTAACGT matched 7 TGCAACAT matched 6 TGACACGG matched 5 AATAACAG matched 4 Pick a reasonable m, e.g. in yeast

More Related