1 / 24

Progressive MSA

Progressive MSA. Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic distance - number of mismatched positions divided by the total number of matched positions (gaps not considered). Example.

zubin
Download Presentation

Progressive MSA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Progressive MSA • Do pair-wise alignment • Develop an evolutionary tree • Most closely related sequences are then aligned, then more distant are added. • Genetic distance - number of mismatched positions divided by the total number of matched positions (gaps not considered).

  2. Example • Domain: a segment of a protein that can fold to a 3D structure independent of other segments of the protein. • Card Domain • Caspase recruitment domains (CARDs) are modules of 90 - 100 amino acids involved in apoptosis signaling pathways. • http://www.mshri.on.ca/pawson/card.html

  3. These are equivalent trees B A A B C C C C B A A B

  4. Previous tree was Rooted These are Unrooted trees

  5. Gaps • Clustalw attempts to place gaps between conserved domains. • In known sequences, gaps are preferentially found between secondary structure elements (alpha helices, beta strands). • Clustalw attempts to place gaps between conserved domains. • In known sequences, gaps are preferentially found between secondary structure elements (alpha helices, beta strands).

  6. Problem with Progressive Alignment: Errors made in early alignments are propagated throughout the MSA

  7. Profiles & Gaps • From an MSA, a conserved region identified and a scoring matrix (profile) constructed for that region. • Each position has a score associated with an amino acid substitution or gap. • Blocks- also extracted from MSA, but no gaps are permitted.

  8. Block Server • Results • TLE short form • TLEl Long form

  9. Hidden Markov Models • Probabilistic model of a Multiple sequence alignment. • No indel penalties are needed • Experimentally derived information can be incorporated • Parameters are adjusted to represent observed variation. • Requires at least 20 sequences

  10. D1 D2 D3 D4 D5 D6 I0 I1 I2 I3 I4 I5 I6 B M1 M2 M3 M4 M5 M6 E • The bottom line of states are the main states (M) • These model the columns of the alignment • The second row of diamond shaped states are called the insert states (I) • These are used to model the highly variable regions in the alignment. • The top row or circles are delete states (D) • These are silent or null states because they do not match any residues, they simply allow the skipping over of main states.

  11. The Evolution of a Sequence • Over long periods of time a sequence will acquire random mutations. • These mutations may result in a new amino acid at a given position, the deletion of an amino acid, or the introduction of a new one. • Over VERY long periods of time two sequences may diverge so much that their relationship can not see seen through the direct comparison of their sequences.

  12. Hidden Markov Models • Pair-wise methods rely on direct comparisons between two sequences. • In order to over come the differences in the sequences, a third sequence is introduced, which serves as an intermediate. • A high hit between the first and third sequences as well as a high hit between the second and third sequence, implies a relationship between the first and second sequences.Transitive relationship

  13. Introducing the HMM • The intermediate sequence is kind of like a missing link. • The intermediate sequence does not have to be a real sequence. • The intermediate sequence becomes the HMM.

  14. Introducing the HMM • The HMM is a mix of all the sequences that went into its making. • The score of a sequence against the HMM shows how well the HMM serves as an intermediate of the sequence. • How likely it is to be related to all the other sequences, which the HMM represents.

  15. B M1 M2 M3 M4 E Match State with no Indels MSGL MTNL Arrow indicates transition probability. In this case 1 for each step

  16. B M1 M2 M3 M4 E Match State with no Indels MSGL MTNL S=0.5 T=0.5 M=1 Also have probability of Residue at each positon

  17. B M1 M2 M3 M4 E Typically want to incorporate small probability for all other amino acids. MSGL MTNL S=0.5 T=0.5 M=1

  18. B M1 M2 M3 M4 E Permit insertion states MS.GL MT.NL MSANI I0 I1 I2 I3 I4 Transition probabilities may not be 1

  19. B M1 M2 M3 M4 E Permit insertion states MS..GL MT..NL MSA.NI MTARNL I0 I1 I2 I3 I4

  20. DELETE PERMITS INCORPORATION OF LAST TWO SITES OF SEQ1 MS..GL-- MT..NLAG MSA.NIAG MTARNLAG D1 D2 D3 D4 D5 D6 I0 I1 I2 I3 I4 I5 I6 B M1 M2 M3 M4 M5 M6 E

  21. D1 D2 D3 D4 D5 D6 I0 I1 I2 I3 I4 I5 I6 B M1 M2 M3 M4 M5 M6 E • The bottom line of states are the main states (M) • These model the columns of the alignment • The second row of diamond shaped states are called the insert states (I) • These are used to model the highly variable regions in the alignment. • The top row or circles are delete states (D) • These are silent or null states because they do not match any residues, they simply allow the skipping over of main states.

  22. Dirichlet Mixtures • Additional information to expand potential amino acids in individual sites. • Observed frequency of amino acids seen in certain chemical environments • aromatic • acidic • basic • neutral • polar

More Related