1 / 45

Transcription factor binding motifs (part II)

Transcription factor binding motifs (part II). 10/22/07. Information from negative control. Motivation: combine information from TF binding and non-binding sequences to identify discriminative information. Methods: REDUCE (Bussmaker et al. 2001) Motif Regressor (Conlon et al. 2003).

swann
Download Presentation

Transcription factor binding motifs (part II)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transcription factor binding motifs (part II) 10/22/07

  2. Information from negative control Motivation: combine information from TF binding and non-binding sequences to identify discriminative information. Methods: • REDUCE (Bussmaker et al. 2001) • Motif Regressor (Conlon et al. 2003)

  3. Motif Regressor Algorithm • Rank all genes by expression and obtain their upstream sequences • Use MDscan to find motifs from most induced and most repressed genes • Score each upstream sequence for matches to each MDscan reported motif • Perform simple linear regression between motif-matching score and gene expression to remove insignificant motifs • Perform stepwise regression on the significant motifs to find groups acting together to affect expression

  4. Motif matching score • Extract upstream sequence Xmg (e.g. 800 bp) from each gene. Define which measures the overall enrichment of a motif. sum over sliding windows

  5. Look for candidate motifs Refine motifs Regress b/t upstream mtf match score and downstream expression Motif Regressor Approach • Look at one expression experiment MDscan Expression log ratio Genes

  6. Motif Regressor Linear Regression

  7. Motif Regressor Linear Regression • Multiple regression model: expression explained as the sum of motifs’ effects Error term Expression of gene g Upstream motif- match score Baseline expression Regression coefficient

  8. Further motif selection by stepwise regression • Stepwise regression to further select significant motifs. • Step 1: Include only intercept • Step 2: Sequentially add new motifs that give the largest reduction in error. • Step 3: Sequentially remove motifs that give the smallest increase in error. • Repeat Steps 2 and 3 until converge.

  9. Application • Yeast cells are grown under amino acid starvation. • Gene expression (~6000 genes) was measured at 30 minutes after amino acid starvation. • Motif Regressor was applied to identify sequence motifs.

  10. Comparative genomics • Evolutionary tree • Darwin’s principle from evolution • Cross-species sequence alignment • Conservation of genes • Conservation of regulatory sequence • Quantifying sequence conservation • Methods • MCS score (Kellis) • Phylocon • Results • Yeast (Kellis) • Advantage: no requirement for prior functional information • Drawback: specie-specific motifs may not be learned (Fraenkel)

  11. Non-uniform conservation rates • Genes are typically conserved • Intergenic regions are typically not conserved • Why?

  12. Motif finding by using multiple genomes • Basic assumption: functional sequences evolve more slowly than non-functional sequences, as they are subject to selection pressure. • Basic approach: • Identify conserved regions by sequence alignment algorithms • Restrict motif finding in conserved regions.

  13. Gal4 motif is highly conserved Motif: Gal4 – CGGNNNNNNNNNNNCCG

  14. Methods • Wasserman et al. 2000 • MCS (Kellis et al. 2003; Xie et al. 2005) • PhyloCon (Wang and Stormo 2003) • EMnEM (Moses et al. 2004) • OrthoMEME (Prakash et al. 2004) • PhyME (Sinha et al. 2004) • CompareProspector (Liu et al. 2004) • PhyloGibbs (Siddharthan et al. 2005) • Ortholog Sampler (Li and Wong 2005) • MultiModule (Zhou and Wong 2005)

  15. Methods • Wasserman et al. 2000 • MCS (Kellis et al. 2003; Xie et al. 2005) • PhyloCon (Wang and Stormo 2003) • EMnEM (Moses et al. 2004) • OrthoMEME (Prakash et al. 2004) • PhyME (Sinha et al. 2004) • CompareProspector (Liu et al. 2004) • PhyloGibbs (Siddharthan et al. 2005) • Ortholog Sampler (Li and Wong 2005) • MultiModule (Zhou and Wong 2005)

  16. MCS Basic Idea frequency p0 pobs Conservation rate Select those highly conserved motifs: pobs >> p0 (Xie et al. 2005)

  17. frequency p0 pobs Conservation rate MCS Definition of MCS: observed frequency total #occurrence expected frequency p0 is estimated by random sampling. Choose cutoff at MCS = 6 (Xie et al. 2005)

  18. Application to human regulatory motifs

  19. Results

  20. Tissue specificity of detected motifs

  21. PhyloCon Basic Idea: (Wang and Stormo 2003) • Both sequence conservation and gene co-regulation information are used for motif finding. • Orthologous regions are viewed as sequence profiles. • Align of sequence profiles instead of sequences. species 1 species 2 species 3 species 4 profile

  22. PhyloCon

  23. Profile Comparison • Compare two columns first. fb = {fA, fC, fG, fT} a column of profile • pb = {pA, pC, pG, pT} background base frequency • nb = {nA, nC, nG, nT} observed counts at the specified position • likelihood ratio: • Log-likelihood ratio:

  24. Profile Comparison background • Compare two columns first • ALLR measures the similarities between two columns. • Sum over ALLR at all positions to get a score comparing two profiles. total counts frequencies

  25. Profile merging • Iteratively merge un-orthologous groups that have high ALLR scores.

  26. Sampling motifs on Phylogenetic trees • Motivation: The alignment-based method does not work well if the species are distant. • Basic idea • Avoid aligning multiple species to gather othorlogous gene information. • Directly model the evolution of the genomic sequences. • Assuming that motifs evolve slower than background sequences.

  27. An evolution model

  28. Evolution model Probability of a nucleotide change

  29. Main Algorithm • Step 1: Building an evolution model. • Motif evolution is modeled by decreasing branch length by a fixed rate, say 50%. • Step 2: Infer model parameters by using a Gibbs sampler.

  30. Limitation of comparative genomics approach • Species-specific motifs cannot be learned from this approach.

  31. Divergence of TF binding Borneman et al. 2007

  32. Divergence of TF binding • Divergence binding can be caused by: • divergence of TF motifs (e.g., Ste12) • or some unknown mechanism (e.g. Tec1) Borneman et al. 2007

  33. Other directions • Combining multiple motif finding algorithms. (e.g. Harbison et al. 2004, Jensen and Liu 2005). • Directly identify TF binding sites through experiments (CHIP-chip). Then apply motif finding algorithms to binding data. experimental data. (e.g. MDscan).

  34. Challenge of Specificity • A 7-mer is expected to occur every 16,384 base pairs by chance • In human, this means 3 X 109 / 16,384 ~ 180,000 sites in total • Total number of genes ~ 25,000 • Most of predicted binding sites are false positives! • Need other restrictive information to reduce false positives.

  35. Some Biological Notes • TF binding does not mean it is functional. • Some TFs always bind to DNA, but they are functional only if they are phosphorylated. • Motif sites contain a large number of false positives. • Motifs are short DNA elements (~10 bp). Higher eukaryotes have large genome size, and these short elements may occur frequently by chance. • Epigenetic factors also play an important role in regulation of TF binding. • Chromatin structure, histone modifications, DNA methylation, etc.

  36. Reading list • Conlon et al. 2003 • Proposed Motif Regressor. Filter out motifs that are unassociated with gene expression changes. • Xie et al. 2005 • MCS. Use comparative approach to identify human regulatory motifs. Highly biological. • Wang and Stormo 2003 • Phylocon. An elegant “multi-gene, multi species” approach for motif finding.

  37. Acknowledgements • X.S.Liu

More Related