1 / 51

Transcriptional and post-transcriptional regulation of gene expression

Transcriptional and post-transcriptional regulation of gene expression. protein. Translation Localization Stability. mRNA. 3’UTR. Pol II. DNA. Activation Repression. Where does each transcription factor bind in the genome, in each cell type, at a given time ? Near which genes ?

sitara
Download Presentation

Transcriptional and post-transcriptional regulation of gene expression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transcriptional and post-transcriptional regulation of gene expression protein Translation Localization Stability mRNA 3’UTR Pol II DNA Activation Repression

  2. Where does each transcription factor bind in the genome, in each cell type, at a given time ? Near which genes ? • What is the cis-regulatory code of each factor ? Does they require any co-factors ? DNA Activation Repression

  3. ChIP-seq Transcription factor of interest Antibody Genome Analyzer II (Solexa)

  4. Control: input DNA Genome Analyzer II (Solexa)

  5. ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGACGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGATTAGTGAATTCACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGACGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGATTAGTGAATTC TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCTGCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTAATCACTTAAG Average length ~ 250bp

  6. 25-40bp ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGACGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGATTAGTGAATTC TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCTGCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTAATCACTTAAG Average length ~ 250bp

  7. 25-40bp ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGACGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGATTAGTGAATTC TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCTGCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTAATCACTTAAG Average length ~ 250bp

  8. BCL6 ChIP-seq • Lymphoma cell line (OCI-Ly1) • Solexa/Illumina • 6 lanes for ChIP, 1 for input DNA, 1 for QC • 36nt long sequences • 32 Million reads • Aligned/mapped to hg18 with Eland Melnick lab at WCMC

  9. Read mapping with Eland Solexa Read AAAATACGCGTATTCTCCCAAAACAATATC AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGATG Reference Human Genome (hg18)

  10. Read mapping with Eland Solexa Read AAAATACGCCTATTCTCCCAAAACAATATC AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGATG Reference Human Genome (hg18)

  11. Read mapping with Eland Solexa Read AAAATACGCCTATTCTCCCATAACAATATC AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGATG Reference Human Genome (hg18)

  12. Reads can map to multiple locations/chromosomes Solexa Read 2 Solexa Read 1 Reference Human Genome (hg18)

  13. Reads map to one strand or the other Solexa Read 2 Solexa Read 1 hg18

  14. >HWI-EAS83_30UCEAAXX:1:2:915:1011 AGGTCACAAAACAAGTCCTAACAAATTTAAGAGTAT U0 1 13 62 chr8.fa 59699745 R DD>HWI-EAS83_30UCEAAXX:1:2:915:1011 AGGTCACAAAACAAGTCCTAACAAATTTAAGAGTAT U0 1 13 62 chr8.fa 59699745 R DD >HWI-EAS83_30UCEAAXX:1:2:826:1245 GTCAGAAAAATCCTTTTTATTATATAAACAATACAT U2 0 0 1 chr5.fa 121195098 F DD 15G 20G >HWI-EAS83_30UCEAAXX:1:2:900:945 GTCATCAAACTCCAAGGATTCTGTTTTCAACATACT U0 1 1 0 chr18.fa 8914049 R DD >HWI-EAS83_30UCEAAXX:1:2:1037:1118 GAAAGTGATTAGCAGATTGTCATTTAATAATTGTCT U2 0 0 1 chr1.fa 97496963 F DD 18G 28G >HWI-EAS83_30UCEAAXX:1:2:898:874 GATAAATTTTTTCCTACAATCTTAAATTATTACACA U1 0 1 0 chr3.fa 95643444 R DD 10C >HWI-EAS83_30UCEAAXX:1:2:918:928 AAAAATTAAACAATTCTAAAAATATTTTTATCTTAA U2 0 0 1 chr2.fa 177727639 R DD 18C 31G >HWI-EAS83_30UCEAAXX:1:2:1324:4 GCACATGTCATACTCTTTCTAGCTCTCTTATTTTTC U0 1 0 0 chr8.fa 79132719 R DD >HWI-EAS83_30UCEAAXX:1:2:899:1015 AAATTAATGTAAAAAATAGGATACTGAATTGTGATA U1 0 1 0 chr10.fa 69774166 F DD 30G >HWI-EAS83_30UCEAAXX:1:2:909:926 GTAGTTAACAATAATTTATTTTATACTTCAAAATTC U1 0 1 17 chrX.fa 26496842 R DD 7A >HWI-EAS83_30UCEAAXX:1:2:701:1702 GTCAGAATTAATTAATCAAAACACCAAATGTACTTC U0 1 0 0 chr12.fa 72700465 F DD >HWI-EAS83_30UCEAAXX:1:2:996:1003 ATTTTGACTTTATTATTTTTTCTTCAATGTTTTTAA NM 0 0 0 >HWI-EAS83_30UCEAAXX:1:2:884:1090 GAAAGTACATCAAATACATATTATATACTTTACATA R2 0 0 2 >HWI-EAS83_30UCEAAXX:1:2:911:937 AATCCATATACATTTCTTTTTAATCATTTCCTCTTT U1 0 1 0 chr11.fa 94204222 F DD 20G >HWI-EAS83_30UCEAAXX:1:2:1517:330 GTGAGTTTCTTAATCCTGAGTTCTAATTTTATTTCA R0 29 255 255 >HWI-EAS83_30UCEAAXX:1:2:904:1031 ACATTTTATAAATTTTTAATTTCATTTTAATTTATA NM 0 0 0 >HWI-EAS83_30UCEAAXX:1:2:1291:1469 GTTTTTAAAATCAACACTTTTATTATAGAAGTAGCA U0 1 0 1 chr12.fa 62166701 R DD >HWI-EAS83_30UCEAAXX:1:2:1697:828 GTACTGATGTAAACTTGGTAAAAACATTGACATAAA U0 1 0 0 chr14.fa 65160857 F DD >HWI-EAS83_30UCEAAXX:1:2:1415:583 GAAGAAAATGACTATGTCAAAATATTATCTCTCAAT U0 1 0 0 chr5.fa 97782464 F DD >HWI-EAS83_30UCEAAXX:1:2:1561:1653 GTTTTACTGATTTTCTTACTTACTAAACTACCTGTT U0 1 0 0 chr7.fa 133200265 F DD >HWI-EAS83_30UCEAAXX:1:2:1579:943 AATGATACGGCGACCACCGACAGGTTCAGAGTTCTA NM 0 0 0 >HWI-EAS83_30UCEAAXX:1:2:1705:268 GAGAATTATTCAGAAGTCAAATCTGTGCTTAGTTTA U2 0 0 1 chr5.fa 162472124 R DD 3G 7C >HWI-EAS83_30UCEAAXX:1:2:1489:318 GTATGTATCATATATATTTATGTATCATATATATTT R1 0 3 2 >HWI-EAS83_30UCEAAXX:1:2:1003:1113 GATTGCTCCATTATTTGTTAAAAACATAGTAAAATA NM 0 0 0 >HWI-EAS83_30UCEAAXX:1:2:895:1072 ATGAGATCAGTACTTCAAAGAGATATCTGCACTCCC U0 1 1 9 chr12.fa 33830898 R DD >HWI-EAS83_30UCEAAXX:1:2:853:1178 GTTAGTCCCAATATTCCATTAATCCCAATAAATATA U2 0 0 1 chr6.fa 110722427 F DD 15G 19G >HWI-EAS83_30UCEAAXX:1:2:1432:972 GAGATAATAATAGCAGTTATGGCATCGAGATAATTT U0 1 0 0 chr2.fa 47305609 R DD >HWI-EAS83_30UCEAAXX:1:2:1718:341 GTAGAGGGCACACATCACAAACAAGTTTCTGAGAAT R2 0 0 3 >HWI-EAS83_30UCEAAXX:1:2:1171:302 GAATATCCACTTGCAGACTTTACAAACAAATTTTTT R2 0 0 4 >HWI-EAS83_30UCEAAXX:1:2:1055:1126 GGCAGATGAAACTTCTATACACTATATTTTAGCCAG U0 1 0 0 chr13.fa 90021137 F DD >HWI-EAS83_30UCEAAXX:1:2:971:1371 GAAAGAAAAACTATTGAAAAAATAGTTACTTTCCAA U0 1 0 0 chr1.fa 74303257 R DD >HWI-EAS83_30UCEAAXX:1:2:1774:614 GTGTAGATGATATCGAGGGCATTAGAAGTAAATAGC U0 1 0 0 chr5.fa 16031200 F DD >HWI-EAS83_30UCEAAXX:1:2:1207:808 GAGAGGAAATAATAAAGATAAAAGTAGAAAAAGTGA U0 1 0 0 chr1.fa 187326417 F DD >HWI-EAS83_30UCEAAXX:1:2:1680:815 GATAATTATGTTGTTGTAATTATTGTTTGTTTTTTT U0 1 0 0 chr15.fa 46739015 R DD >HWI-EAS83_30UCEAAXX:1:2:1688:260 GTTGACAATCCAGCTGTCATAGAAACTGACTATTTT U0 1 0 0 chr12.fa 38910133 R DD >HWI-EAS83_30UCEAAXX:1:2:1051:916 AAAAATTCTCCCAAAACAACAAGATGTAAATATACC U0 1 0 0 chr3.fa 101625712 R DD >HWI-EAS83_30UCEAAXX:1:2:1771:308 GTTCTTACACTGATATGAAGAAATACCTGAGACTGG U0 1 2 67 chr2.fa 214128537 R DD >HWI-EAS83_30UCEAAXX:1:2:911:917 GAGAAACACACATATTTTTGTAAGTGCCATCACATC U1 0 1 0 chr7.fa 13668652 R DD 18C >HWI-EAS83_30UCEAAXX:1:2:1105:348 GTATTATCTAACACACAAGATGATGTTTGTTTTTAT NM 0 0 0 >HWI-EAS83_30UCEAAXX:1:2:1048:857 GAGTGTAGAAAATTTTCTGCCCTAAAATATTTGTTA U1 0 1 0 chr6.fa 74625385 F DD 13G >HWI-EAS83_30UCEAAXX:1:2:743:1729 GTATCCTAAAGTGTATCTTATGTTTTTTCATCTTCT U1 0 1 0 chr12.fa 7400023 R DD 9C >HWI-EAS83_30UCEAAXX:1:2:1287:64 AATAAAACAAATTCCAATGGCTTAGATTCTACTTAA U2 0 0 1 chr10.fa 98020799 R DD 15C 20C >HWI-EAS83_30UCEAAXX:1:2:940:1059 AAATGGTCATACTTCCCAAAGCGATCTACAGATTCA U1 0 1 29 chr3.fa 50834510 R DD 19C >HWI-EAS83_30UCEAAXX:1:2:898:1061 ACATTTCCACATTTCTGTGGAAGCCTCACAATCATT R2 0 0 2 >HWI-EAS83_30UCEAAXX:1:2:913:932 ATTAATCAACAGCAACATTAATCAACTGAATCAACA U0 1 0 0 chr2.fa 46078825 R DD >HWI-EAS83_30UCEAAXX:1:2:43:1647 GAATAAATAATCAAAACATATAATACATTTTTTTAT U1 0 1 0 chr5.fa 41496935 F DD 32G >HWI-EAS83_30UCEAAXX:1:2:1412:731 ATATACACATATATATACATATATATATACACATAT R0 47 255 255 >HWI-EAS83_30UCEAAXX:1:2:1389:1196 GAGAAGGAAATGTGTTTTCTAAGTTTCTTTATCTTC U1 0 1 0 chr4.fa 188020201 F DD 32G >HWI-EAS83_30UCEAAXX:1:2:1264:1479 GTGTAGGAAAGAAAAAAGGAGGTTGTGTAGAAAAGA U0 1 0 0 chr2.fa 192227804 F DD >HWI-EAS83_30UCEAAXX:1:2:38:890 TTTATTTAAATCTTTTAAAAANTTTTTTCCAACAAA NM 0 0 0 >HWI-EAS83_30UCEAAXX:1:2:1341:1065 GATACATATACACAAAGTAAAACTATTCAGCCTCTA U0 1 0 0 chr17.fa 51416321 F DD >HWI-EAS83_30UCEAAXX:1:2:1132:929 GAGTTGTATTAATCTTAAATTGATAATTTACCATAT U1 0 1 0 chr10.fa 2376138 F DD 24G >HWI-EAS83_30UCEAAXX:1:2:1758:275 GCATTTTAACAAAATCACCATATCTGGGTAACCATT U1 0 1 0 chr21.fa 27648337 R DD 18C >HWI-EAS83_30UCEAAXX:1:2:914:1000 GAAAGCACTTTATAATAAAACAACATTGGAGCACCT U1 0 1 0 chr8.fa 67496303 F DD 16G

  15. Number of reads per Eland type U0 21019702 65% U1 3280059 10% U2 1007173 3% R0 3661054 11% R1 815275 2% R2 406002 1% NM 20504996% QC 306352 1%

  16. Peak detection • Calculate read count at each position (bp) in genome • Determine if read count is greater than expected

  17. Peak detection • We need to correct for input DNA reads (control) - non-uniformaly distributed (form peaks too) - vastly different numbers of reads between ChIP and input

  18. Peak detection using ChIPseeqer

  19. genome Read count Expected read count T A T T A A T T A T C C C C A T A T A T G A T A T genome Expected read count = total number of reads * extended fragment length / chr length

  20. Is the observed read count at a given genomic position greater than expected ? Frequency x = observed read count λ = expected read count Read count The Poisson distribution

  21. Is the observed read count at a given genomic position greater than expected ? x = 10 reads (observed) λ = 0.5 reads (expected) genome P(X>=10) = 1.7 x 10-10 log10 P(X>=10) = -9.77 -log10 P(X>=10) = 9.77 The Poisson distribution

  22. Read count Expected read count -Log(p) Expected read count = total number of reads * extended frag len / chr len

  23. Read count Expected read count -Log(p) Expected read count = total number of reads * extended frag len / chr len Input reads

  24. ChIP INPUT Read count Read count Expected read count Expected read count Genome positions (bp) Genome positions (bp) -Log(Pc) -Log(Pi) Threshold Log(Pc) - Log(Pi)

  25. Normalized Peak score (at each bp) P(XChIP) R = -log10 P(Xinput) Will detect peaks with high read counts in ChIP, low in Input Works when no input DNA !

  26. Non-mappable fraction of the genome • chr18 9369067/76117153 0.123087459668913 (=12%) • chr2 33849240/242951149 0.139325292921335 • chr3 27854877/199501827 0.139622164963933 • chr4 27090014/191273063 0.141630052737745 • chr6 24330283/170899992 0.142365618132972 • chr8 20932821/146274826 0.143106107677065 • chr5 26029902/180857866 0.143924633059643 • chr12 19382853/132349534 0.14645199279659 • chr11 20039443/134452384 0.149044906485258 • chr20 10017788/62435964 0.160449000194824 • chr7 26182588/158821424 0.164855517225434 • chr10 22968951/135374737 0.169669404417753 • chr17 14496284/78774742 0.184021980040252 • chrX 31269270/154913754 0.201849540099583 • chr1 55186693/247249719 0.223202247602959 • chr13 28668063/114142980 0.251159230291692 • chr16 23552340/88827254 0.265147676410215 • chr14 29689825/106368585 0.279122120502026 • chrM 4628/16571 0.279283084907368 • chr9 43125838/140273252 0.307441635415995 • chr19 20251255/63811651 0.317359834491667 • chr15 31877970/100338915 0.317702957023205 • chr21 16867677/46944323 0.359312392256674 • chr22 21176578/49691432 0.426161556382597 • chrY 43209644/57772954 0.747921665906161 (=74%) We enumerated all 30-mers, counted # occurrences, calculated non-unique fraction of genome

  27. Peak detection • Determine all genomic regions with R>=15 • Merge peaks separated by less than 100bp • Output all peaks with length >= 100bp • Process 23M reads in <7mins

  28. BCL6: 18,814 peaks ChIP reads Input reads Detected Peaks 80% are within <20kb of a known gene

  29. Where does each transcription factor bind in the genome, in each cell type, at a given time ? Near which genes ? • What is the cis-regulatory code of each factor ? Does they require any co-factors ? DNA Activation Repression

  30. Regulatory Sequence Discovery using FIRE

  31. Discovering regulatory sequences associated with peak regions True TF binding peak? Yes correlation is quantified using the mutual information Yes Target regions Yes True TF peak Yes Yes No Yes Yes … Absent Motif No Present No Random regions No No No No …

  32. Highly informative k-mer MI CTCATCG 0.0618 TCATCGC 0.0485 AAAATTT 0.0438 GATGAGC 0.0434 AAAAATT 0.0383 ATGAGCT 0.0334 TTGCCAC 0.0322 TGCCACC 0.0298 ATCTCAT 0.0265 ... ... ACGCGCG 0.0018 CGACGCG 0.0012 TACGCTA 0.0011 ACCCCCT 0.0010 CCACGGC 0.0009 TTCAAAA 0.0005 AGACGCG 0.0004 CGAGAGC 0.0003 CTTATTA 0.0002 MI=0.081 MI=0.045 MI=0.040 ... Not informative Motif Search Algorithm

  33. A/G C/G/T T/G A/T/G C/G A/C/G Optimizing k-mers into more informative degenerate motifs True TF binding peak? ATCCGTACA Yes Yes Target regions Yes Yes Yes Yes … No No Random regions ATCC[C/G]TACA No No No which character increases the mutual information by the largest amount ? No …

  34. A/C C/G/T T/C A/T/C C/G A/C/G Optimizing k-mers into more informative degenerate motifs True TF binding peak? Yes Yes Target regions Yes Yes Yes Yes … No No ATCC[C/G]TACA Random regions No No . . . No No …

  35. Mutual information change Similarity to ChIP-chip RAP1 motif Motif Conservation with S. bayanus

  36. Motifs optimized so far k-mer MI CTCATCG 0.0618 TCATCGC 0.0485 AAAATTT 0.0438 GCTCATC 0.0434 AAAAATT 0.0383 ATGAGCT 0.0334 TTGCCAC 0.0322 TGCCACC 0.0298 ATCTCAT 0.0265 ... MI=0.081 Highly informative k-mers MI=0.045 optimize ? Only optimize k-mer if I(k-mer;expression | motif) is large enough (for all motifs optimized so far) Conditional mutual information I(X;Y|Z)

  37. Motif co-occurrence anallysis Discovered Motifs Enrichment Depletion FIRE automatically compares discovered motifs to known motifs in TRANSFAC and JASPAR

  38. ChIPseeqer: an integrated framework for ChIP-seq data analysis • ChIPseeqer (peak detection) • ChIPseeqer2Track (for Genome Browser) • ChIPseeqer2FIRE (+ motif analysis) • ChIPseeqer2iPAGE (+ pathway analysis) • ChIPseeqer2cons (conservation analysis)

  39. Installing and setting up programs Install ChIPseeqer and FIRE: http://physiology.med.cornell.edu/faculty/elemento/lab/chipseq.shtml http://tavazoielab.princeton.edu/FIRE/ Execute following commands: export FIREDIR=/Applications/FIRE-1.1   export PATH=$PATH:$FIREDIR   export CHIPSEEQERDIR=/Applications/ChIPseeqer-1.0   export PATH=$PATH:$CHIPSEEQERDIR:$CHIPSEEQERDIR/SCRIPTS chmod +x $CHIPSEEQERDIR/ChIP* chmod +x $CHIPSEEQERDIR/SCRIPTS/*.pl

  40. Peak Detection - Input file: CTCF.bed cd ~/Desktop/elemento Or download from: http://physiology.med.cornell.edu/faculty/elemento/lab/files/chipseq/ - 2947043 U0 reads in BED format (check by typing wc –l CTCF.bed) (view by typing more CTCF.bed and q to exit) - No input DNA for this experiment

  41. Peak Detection Step 1: Split big read file into one file per chromosome  split_bed_or_mit_files.pl CTCF.bed   Expected output: Opening CTCF.bed Current directory = . Creating ./reads.chr1 …

  42. Peak Detection Step 2. Detect peaks   ChIPseeqer --chipdir=. --t=15 --fraglen=250 --format=bed -outfile=CTCF_peaks_t15.txt   Expected output: Processing reads in chrY ... done. Processing reads in chrX ... done. Processing reads in chr9 ... done. Processing reads in chr8 ... done. Step 3. Count how many peaks were found wc -l CTCF_peaks_t15.txt

  43. Making a Genome Browser track Command lines: cd JuliaChild wc –l CTCF_peaks_t15.txt ChIPseeqer2track --targets=CTCF_peaks_t15.txt --trackname=“CTCF peaks” Expected output: CTCF_peaks_t15.txt.wgl.gz created. To check that the file was created: ls

  44. Making a Genome Browser track http://genome.ucsc.edu/cgi-bin/hgGateway

  45. Making FIRE input files Command line (type instructions below as one single line): ChIPseeqer2FIRE --targets=CTCF_peaks_t15.txt –genome=wg.fa --suffix=CTCF_peaks_t15_FIRE wg.fa is also available from: http://physiology.med.cornell.edu/faculty/elemento/lab/files/chipseq/ (decompress with gunzip wg.fa.gz) Expected output: Extracting sequences ... Done. Extracting randomly selected sequences ... Done. CTCF_peaks_t15_FIRE.txt and CTCF_peaks_t15_FIRE.seq have been generated. …

  46. FIRE analysis Command line (type instructions below as one single line): fire.pl --expfile=CTCF_peaks_t15_FIRE.txt --fastafile_dna=CTCF_peaks_t15_FIRE.seq --nodups=1 --minr=2 --species=human --dorna=0 --dodnarna=0 Expected output: Extracting sequences ... Done. Extracting randomly selected sequences ... Done. CTCF_peaks_t15_FIRE.txt and CTCF_peaks_t15_FIRE.seq have been generated. …

  47. FIRE main output file open CTCF_peaks_t15_FIRE.txt_FIRE/DNA/CTCF_peaks_t15_FIRE.txt.summary.pdf Randomly selected sequences Peak sequences

More Related