1 / 42

ChIP-seq

ChIP-seq. Xiaole Shirley Liu STAT115, STAT215. Outline. ChIP-chip on yeast Technology and data analysis : MDscan motif finding, regulatory network ChIP-X on human Tiling microarrays and peak finding High throughput sequencing and peak finding Data analysis and examples

Download Presentation

ChIP-seq

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ChIP-seq Xiaole Shirley Liu STAT115, STAT215

  2. Outline • ChIP-chip on yeast • Technology and data analysis: MDscan motif finding, regulatory network • ChIP-X on human • Tiling microarrays and peak finding • High throughput sequencing and peak finding • Data analysis and examples • Analysis: peak finding, gene expression analysis, sequence motif finding, regulatory network • Holistic picture of gene regulation

  3. Motivation • Motif finding works well in bacteria, OK in yeast, marginal in worm/fly, and almost never in mammals • Cistrome: Genome-wide in vivo binding sites of DNA-binding proteins • ChIP-chip and ChIP-seq gives cistrome results

  4. ChIP-chip Technology • Chromatin ImmunoPrecipitation + microarray • ChIP-on-chip or ChIP-chip • Also known as Genome Scale Location Analysis • Detect genome-wide in vivo location of TF and other DNA-binding proteins • Find all the DNA sequences bound by TF-X? • Cook all the dishes with cinnamon • Can learn the regulatory mechanism of a transcription factor or DNA-binding protein much better and faster

  5. Chromatin ImmunoPrecipitation (ChIP)

  6. TF/DNA Crosslinking in vivo

  7. Sonication (~500bp)

  8. TF-specific Antibody

  9. Immunoprecipitation

  10. Reverse Crosslink and DNA Purification

  11. Promoter Array Hybridization Genes Intergenetic ChIP

  12. ChIP-DNA chip Detection • Started in yeast, use promoter cDNA microarray • ~ 6000 spots, each 800-1000 bp • Two color assay • Control: no antibody, or chromatin (a little bit of everything) • Need triplicates to cancel noise • Applied to all yeast TFs • TF modified to contain a tag • Tag can be precipitated with Immunoglobin

  13. ChIP-chip Motif Finding • ChIP-chip gives 10-5000 binding regions ~600-1000bp long. Precise binding motif? • Raw data is like perfect clustering, plus enrichment values • MDscan • High ChIP ranking => true targets, contain more sites • Search TF motif from highest ranking targets first (high signal / background ratio) • Refine candidate motifs with all targets • Used successfully in ChIP-chip motif finding

  14. m-matches for TGTAACGT Similarity Defined by m-match For a given w-mer and any other random w-mer TGTAACGT 8-mer TGTAACGT matched 8 AGTAACGT matched 7 TGCAACAT matched 6 TGACACGG matched 5 AATAACAG matched 4 Pick a reasonable m to call two w-mers similar

  15. A 9-mer ATTGCAAAT Higher enrichment TTTGCGAAT TTGCAAATC Seed motif pattern ChIP-chip selected upstream sequences ATTGCAAAT TTTGCGAAT TTTGCAAAT GCCACCGT ACCACCGT ACCACGGT GCCACGGC … GCAAATCCA GCAAATTCG GCAAATCCA GGAAATCCA GGAAATCCT TTGCAAATC TTGCGAATA TTGCAAATT TTGCCCATC TTTGCAAAT CAAATCCAA CAAATCCAA GAAATCCAC TGCAAATCC TGCAAATTC MDscan Seeds

  16. Seed1 m-matches Update Motifs With Remaining Seqs Extreme High Rank All ChIP-selected targets

  17. Seed1 m-matches Refine the Motifs Extreme High Rank All ChIP-selected targets

  18. Look for candidate motifs Refine motifs Regress b/t upstream mtf match score and downstream expression Motif Regressor • EM Conlon, XS Liu MDscan Expression log ratio Genes

  19. Motif Regressor Rational • For each TF: Upstream Downstream Seq Mtf Match Gene Exp Gene1 3.2 1.8 Gene2 2.8 0.3 Gene3… • Upstream sequence X motif matching score measures: • Number of sites • Strength of matching

  20. Motif Regressor Strategy • Rank genes by log2 (expression fold change) • Try MDscan (width 5-17) on induced and repressed genes separately • Find 50 candidate motifs from top 100 genes • Refine candidate motifs with top 500 genes • Report <= 30 distinct motifs • Score each upstream sequence with each motif • Linear regression to eliminate insignificant motifs

  21. Linear Regression Example Person IQ Age Education Height Eye color Spend/week # of CD A 120 30 High 171 blue $4000 30 B 250 41 PhD 155 brown $1500 18 C 150 8 Grade10 115 black $100 90 D 180 16 Grade12 140 gray $200 15 E 90 4 Preschool 88 green $500 26 F 130 17 High 178 black $80 500 G 110 21 College 182 blue $800 220 … Gene Express Mtf1 Mtf2 Mtf3 Mtf4 Mtf5 Mtf6 Single X X X -- -- -- Regression

  22. Yeast TF Regulatory Network Protein Transcribe Regulate Gene

  23. Ndt80 & Sum1 regulated genes ChIP-chip Better Explains Expression Sum1 regulated genes Ndt80 regulated genes

  24. Tiling Probes Genome Tiling Microarrays • Promoter array doesn’t work for human ChIP-chip • Binding could appear in much further intergenic sequences, introns, exons, or downstream sequences. Genomic DNA on the chromosome

  25. DNA Purification

  26. ChIP Ctrl Chromosome ChIP-chip on Tiling Microarray ChIP-DNA Noise

  27. ChIP-chip • Detect genome-wide location of transcription and epigenetic factors • Affymetrix genome tiling arrays are cheaper • $2000 7 arrays * 6 million probes * (3 ChIP + 3 Ctrl) • But data is noisier and less informative • Two peaks? How about ChIP alone? Over 42M probes? ChIP Log Probe Intensity Ctrl Chromosome Coordinates

  28. Affymetrix Tiling Array Peak Finding • Challenges: • Massive data, probe values noisy • Only 1/3 of researchers get it to work the first time • Previous algorithms only work by comparing 3 ChIP with 3 Ctrl • Model-based Analysis of Tiling arrays (MAT) • Work with single ChIP (no rep, no ctrl) • Find individual failed samples • More sensitive, specific, and quantitative with 3 ChIP & 3 Ctrl MAT: Johnson et al, PNAS 2006

  29. MAT • Most of the probes in ChIP-chip measures non-specific hybridization and background noise • Estimate probe behavior by checking other probes with similar sequence on the same array • Probe sequence plays a big role in signal value

  30. Model Sequence-Specific Probe Effect • First detailed model of probe sequence on probe signal • AATGC ACTGT GCACA GATCG GCCAT 7 A, 7 C, 6 G, 5 T, map to 2 places in genome • Use all the probes on the array to estimate the parameters Position-specific A, C, G effect Probe signal # of T’s intercept A,C,G,T count squared 25-mer copy number

  31. 6M Probes 2K bins Observed probe intensity Model predicted probe intensity Observed probevariance within eachbin Probe Standardization • Fit the probe model array by array

  32. Raw probe values at two spike-in regions with concentration 2X 2X 2X ChIP Ctrl Sequence-based probe behavior standardization ChIP standardized Ctrl standardized Window-based neighboring probe combination for ChIP-region detection ChIP Window (ChIP – Ctrl) (3 ChIP – 3 Ctrl)

  33. Is a ChIP experiment working? • MAT window scores ~ normal with long tails • Estimate pvalue of normal from left half of data • FDR = A / B (Ctrl/ChIP peaks are all FPs) • Spike-in shows MAT FDR estimate is accurate • Can find individual failed replicate A B

  34. ChIP-Seq ChIP-DNA Noise Map 30-mers back to the genome Sequence millions of 30-mer ends of fragments

  35. Binding MACS: Model-based Analysis for ChIP-Seq • Use confident peaks to model shift size

  36. Peak Calls • Tag distribution along the genome ~ Poisson distribution (λBG = total tag / genome size) • ChIP-Seq show local biases in the genome • Chromatin and sequencing bias

  37. Peak Calls • Tag distribution along the genome ~ Poisson distribution (λBG = total tag / genome size) • ChIP-Seq show local biases in the genome • Chromatin and sequencing bias • 200-300bp control windows have to few tags • But can look further Dynamic λlocal = max(λBG, [λctrl, λ1k,] λ5k, λ10k) ChIP Control 300bp 1kb 5kb 10kb http://liulab.dfci.harvard.edu/MACS/ Zhang et al, Genome Bio, 2008

  38. Cis-tr-ome: integrated analysis pipeline and data collection Liu et al, Genome Biol 2011

  39. http://cistrome.org/ap/ • Work for hg19, mm9, ce6, and dm4 • ChIP-chip / seq peak calling • Checking corr and overlap • Visualize signal across different elements • Annotate nearby genes • Motif analysis • Conservation analysis, lift over from one genome to another • Heatmap and clustering of many factors • Gene expression profiling analysis Liu et al, Genome Biol 2011

  40. ER TF?? Estrogen Receptor • Carroll et al, Cell 2005 • Overactive in > 70% of breast cancers • Where does it go in the genome? • ChIP-chip on chr21/22, motif and expression analysis found its partner FoxA1

  41. ER AP1 Estrogen Receptor (ER) Cistrome in Breast Cancer • Carroll et al, Nat Genet 2006 • ER may function far away (100-200KB) from genes • Only 20% of ER sites have PhastCons > 0.2 • ER has different effect based on different collaborators NRIP

  42. ER NRIP AP1 Estrogen Receptor (ER) Cistrome in Breast Cancer • Carroll et al, Nat Genet 2006 • ER may function far away (100-200KB) from genes • Only 20% of ER sites have PhastCons > 0.2 • ER has different effect based on different collaborators

More Related