1 / 32

Combining the results of different motif discovery programs for de novo prediction of TFBS A critical approach

Combining the results of different motif discovery programs for de novo prediction of TFBS A critical approach. Speaker: Thomas Engleitner. Question : Can we trust the results of tools for de novo motif (TFBS) detection? If not, how can we improve the results?. Introduction.

desmond
Download Presentation

Combining the results of different motif discovery programs for de novo prediction of TFBS A critical approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining the results of different motif discovery programs for de novo prediction of TFBS A critical approach Speaker: Thomas Engleitner

  2. Question : Can we trust the results of tools for de novo motif (TFBS) detection? If not, how can we improve the results?

  3. Introduction • Why de novo motif discovery ? • Finding TFBS is a time and money consuming problem in the lab • Prediction tools do not only identify TFBS in the input sequences but provide PSSM to search genome-wide for a given TF

  4. Introduction • Many different computational approaches for the identification of motifs in biological sequences • HMM, hexamer counts, EM algorithms • Correct prediction for eukaryotic TFBS is still a hard problem in Computational Biology

  5. Introduction • Detection rate for every tool alone is bad • Tompa et al. suggests combining different tools to improve results of motif discovery • Hypothesis: TFBS reported by more than one tool are more reliable • Best ranked(according to Tompa et al.) are Meme MotifSampler and Weeder Tompa et al., Assessing computational tools for the discovery of transcription factor binding sites, Nature Biotechnology, 23,1,137-144

  6. Preliminary Considerations

  7. Sequence data constraints in this study • Validation / Knowledge ! The Motif has to be validated experimentally • Appearance ! Motif must appear in all sequences in the dataset one or more times • Motiflength ! Length of motif must be sufficient

  8. Sequence data constraints in this study • One motif that satisfy our constraints is the Camp response Element

  9. Sequence data constraints in this study • Test data set: • 7 human DNA Sequences each containing the CRE • For each sequence the binding position of CREB as well as the Binding sites sequence is known

  10. Next step: Use dataset as input for Meme MotifSampler and Weeder • Motifs that are reported by all Tools and show an userdefined overlap were taken and compared to the known CRE Consensus based approach

  11. For those hits it is checked if they overlap with the known binding site of CREB

  12. None of the overlapping hits shows overlap with the known CRE First Result: Possible Solution : Parameter Tuning

  13. All programs have a wide variety of parameters that can be changed by the user • Idea: Tune the parameters for each program such that the TP rate is maximized But what is a TP hit for each program alone?

  14. TP/ FP Example

  15. Results • Meme • Tested parameters: Number of motifs Motifwidth

  16. Results • MotifSampler • Tested parameters: Prior probability Motifwidth Number of motifs

  17. Results • Weeder • Tested parameters: Motifwidth Number of Mutations

  18. Results • We have seen that the initial parameter settings have great influence on the results • The runs which shows the best TP rate were selected and the TP hits were allocated to the corresponding sequences

  19. Results • MotifSampler sequence X65568 • Weeder sequence X00274 • Meme sequence X65568 • Also the second and third best Hits do not report the same Sequences • Conclusion : Even with tuned parameters for each programm the result is even worse !!!!

  20. Discussion • Combining the output of three different programs leads to no better motif prediction • To address this the parameters for each program were varied systematically • We have found that the parameter choice has great influence on the overall result

  21. Discussion • Even if the Run is done with the best parameter settings the CRE motif is only identified in one sequence of the dataset by 2 programs • Remember: Normally the user does not know much about the motiflength, distribution within the dataset, etc • De novo prediction of TFBS without any knowledge is nearly impossible

  22. Discussion • Even if masked sequences were used the result is not better (Result not shown) • This is also true for another dataset containing sequences having the Hormon Response Element (Result not shown)

  23. Take home message: Results of tools for de novo prediction of TFBS are very sensitive to the initial parameters Do not trust those motifs that are reported

  24. Thank you for your attention. Any questions ?

More Related