1 / 25

and thanks to Eli Rusman

HMM Sampling and Applications to Gene Finding and Alignment European Conference on Computational Biology 2003 Simon Cawley * and Lior Pachter +. and thanks to Eli Rusman. * Affymetrix + UC Berkeley Mathematics Dept. Conservation of alternative splicing between human and mouse.

Download Presentation

and thanks to Eli Rusman

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HMM Sampling and Applications toGene Finding and AlignmentEuropean Conference on Computational Biology 2003Simon Cawley* and Lior Pachter+ and thanks to Eli Rusman * Affymetrix+ UC Berkeley Mathematics Dept

  2. Conservation of alternative splicing between human and mouse • Modrek and Lee: 40-60% of human genes have alternative splice forms. Nature Genetics 2002. • Nurtdinov et al. 75% of human alternative splice forms are conserved in mouse. Human Molecular Genetics 2003. Can we develop ab-initio methods for detecting conserved alternative splice sites?

  3. Sequence Alignment A C A T T A G A A A G A T T A C C A C A

  4. Finding the optimal alignment max A C A T T A G A A A G A T T A C C A C A

  5. Match/mismatch probabilities for positions i,j in each sequence Alignment forward variables for positions [1,i] and [1,j] in each sequence gap probabilities Sampling to find alternative alignments ai,j = w ai-1,j + w ai,j-1 + si,j ai-1,j-1 A C A T T A G A A A G A T T A C C A C A

  6. Linear Space Sampling Sequences length T,U To obtain k samples Time complexity: O(TU+k(T+U)) Memory requirements: O(T+U) Hirschberg’s divide and conquer algorithm Time complexity: O(TU) Memory requirements: O(T+U)

  7. pre-mRNA ALTERNATIVE SPLICING SPLICING TRANSLATION TRANSLATION Protein I Protein II Alternative Splicing in Mammalian Genomes

  8. Cross-species simultaneous gene finding and alignment M. Alexandersson, S. Cawley, L. Pachter, SLAM- Cross-species gene finding and alignment with a generalized pair hidden Markov model, Genome Research, 13 (2003) p 496-502

  9. Exon 3 Exon 1 Exon 2 Intron 1 Intron 2 5’ 3’ CNS CNS CNS Modeling gene features [human] [mouse]

  10. The SLAM hidden Markov model

  11. SLAM components • Splice site detector • VLMM • Intron and intergenic regions • 2nd order Markov chain • independent geometric lengths • Coding sequence • PHMM on protein level • generalized length distribution • Conserved non-coding sequence • PHMM on DNA level

  12. SLAM input and output • Input: • Pair of homologous sequences. • Output: • CDS and CNS predictions in both sequences. • Protein predictions. • Protein and CNS alignment.

  13. http://bio.math.berkeley.edu/slam/

  14. Input:

  15. Output:

  16. Methodology for identifying alternative splice sites • Compiled SLAM gene predictions for the human, mouse and rat genomes. • Identified a set of 3400 human/mouse/rat gene triples with consistent predictions from hs/mm and hs/rn analyses. • For each triple, sampled sub-optimal parses from hs/mm and hs/rn runs • Collected alternative exons (non-Viterbi exons) that appeared in both the hs/mm and hs/rn runs • Examined overlap with RefSeq genes, mRNAs and ESTs

  17. SLAM whole genome predictions • Built a whole genome homology map (Colin Dewey) http://baboon.math.berkeley.edu/~cdewey/homologyMaps/ • Pre-aligned the homologous blocks to reduce the SLAM search space (Nicolas Bray using AVID) http://baboon.math.berkeley.edu/mavid/ http://hanuman.math.berkeley.edu/kbrowser/ • Ran SLAM on the resulting blocks http://bio.math.berkeley.edu/slam/mouse/ http://bio.math.berkeley.edu/slam/rat/

  18. [human] [mouse] [rat]

  19. Comparing predicted alternative exons to ESTs and mRNAs

  20. Conclusions • Sampling is memory efficient, fast, and should be used routinely for alignment applications. • Conserved alternative splice forms can be detected ab-initio. • The extent of alternative splicing conservation is currently unclear. Sampling provides an alternative approach for investigating this problem- one that is not sensitive to biases in EST data. • Problem: design effective and scalable validation strategies for alternative splice sites.

More Related