1 / 29

BIOINFORMATICS

BIOINFORMATICS. Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein. Deepak Verghese CS 6890. Number of models have incorprated evolutionary information in them. GPHMM CONSERVED Exon method 2 step GLASS n ROSETTA

mariko
Download Presentation

BIOINFORMATICS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIOINFORMATICS Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein Deepak Verghese CS 6890

  2. Number of models have incorprated evolutionary information in them • GPHMM • CONSERVED Exon method • 2 step GLASS n ROSETTA • TWINSCAN which extends GENESCAN • etc

  3. Do not exploit all information in evolutionary pattern • Not easily extended to multiple genome sequences.

  4. EVOLUTIONARY HIDDEN MARKOV MODEL (EHMM) A Probabilistic model of both Genome Structure and Evolution • Composed of : • Hidden Markov Model (HMM) • Phylogenetic Tree

  5. ADVANTAGES • Can handle any number of sequences in an alignment. • Can have properties of higher order HMM’s • Can handle variability in the sequences along the alignment • State of art evolutionary models can be incorporated later • Evolutionary events between different genomes are not treated independently

  6. MODEL • SCOPE • Not to compete with the existing finding methods • on performance but to illustrate the power of this approach. • Relies on a pre produced alignment.

  7. MARKOV CHAINS • A set of states • The transitions from one state to all other states, including itself, are governed by a probability distribution • First order Markov chain: the probabilities depend solely on the current state • n-th order Markov chain: n previous states

  8. HIDDEN MARKOV MODEL 5 Components • A set of states • Matrix of transition probabilities ( A ) • Set of alphabets ( C ) • Set of emission distribution (e) • Initial state distribution ( B )

  9. A C A - - - A T G T C A A C T A T C A C A C - - A G C A G A - - - A T C A C C G - - A T C Example of hidden Markov model NO 1:1 correspondence between states and symbols Why the name Hidden ?

  10. Components • State k • Emits symbols (observables) C • PROBABILISTIC MODEL Emission Distribution e Initial state distribution B Transition Probabilities A

  11. Path Π Different paths possible for same sequence

  12. In EHMM Emission distribution e specified by Evolutionary model Ek Phylogenetic tree T

  13. PHYLOGENETIC TREES

  14. In Phylogenetic trees Leaves represent present day species Character states of inner nodes are missing data Interior nodes represent hypothesized ancestors The length of the brances of a tree represent the evolutionary difference. Motivation :The problem of explaining the evolutionary history of today's species

  15. Evolution is often modeled by continuous markov chains Here evolution along the branches of the phylogenetic tree is modelled by Ek Transition probability Pk ( t ) For a branch length t P k ( t ) = exp ( t Q k ) Increasing the number of sequences is increasing the amount of evolutionary information. THE ALIGNMENT COLUMN CORRESPONDS TO THE STATE OF ELOVUTION AT THE LEAVES OF THE PHYLOGENETIC TREE

  16. THE PEOPABILITY OF GENERATING AN ALIGNMENT COLUMN IN STATE K EQUALS PROBABILITY OF OBSERVING A GIVEN CHARACTER PATTERN ON THE LEAVES OF T WHEN GIVEN E k Phylogenetic tree of the entries of the 3 alignment columns

  17. Codon based evolutionary model used to calculate emission probability of columns of A • Nucleotide Based evolutionary model used to calculate emission probability of column B • Emission probability of C is got from the equilibrium distribution of the the relevant evolutionary model

  18. Parameter Estimation Parameters of HMM are estimated by a combination of Baum – Welch Powell Evolutionary model E divided into E equ E evo

  19. Initial State Distribution B can be estimated by Baum-Welch but It is generally set to 0.000 01 for all states except the intergenic . The expectation step of Baum-Welch estimates the number of nucleotides emitted from each state the expected number of state transitions Expected number of times a state is used. Powell another optimization method estimates E evo phylogenetic tree T Baum – Welch method is used to estimate E equ A

  20. Therefore Likelihood of an alignment ( x ) given a parameterization of the EHMM Can be found by the equation Here we are summing over all possible paths This can be done in linear time by Dynamic Programming

  21. EUKARYOTIC GENOME MODEL can be used to generate alignments. Reduced model produces only inner exons. EHMM is fully probabilistic and can be used to simulate data and find genes. eukaryotic EHMM

  22. Results Benefits of modeling evolution with a EHMM using a data set of orthologous mouse/human gene pair Benefit will depend on divergence between sequences compared Key parameter for modelling the difference between exons and introns is the dN/dS ratio.

  23. Moreover we see that Evolutionary model shows a distinct difference between the intergenic /intron state and the codon state

  24. Evaluations were performed on both single and aligned sequences

  25. Graphical Representation

  26. Simple model used now not comparable to state of art methods Any number of aligned sequences can be handled

  27. Extensions of the model • GENESCAN can be extended into HMM • Splice site finders • Models of ribosome binding site and promoter regions • Non – geometric length distributions of exons • Pseudo higher order EHMM can be constructed. • Idea of pair HMM to multiple sequences

  28. Disadvantages in present model • Existing frame work does not model gaps but treats it as missing data. • Optimal data for EHMM is a multiple alignment of full – length genome. • Challenge in constructions of the alignment is to reduce the noise per signal ratio. BUT ………..

More Related