1 / 12

Optimatization of a New Score Function for the Detection of Remote Homologs

Optimatization of a New Score Function for the Detection of Remote Homologs. Kann et al. Introduction. New method to calculate a score function, aiming to optimize the ability to discriminate between homologs and non-homologs Existing software uses the following to compute an alignment score:.

amy-downs
Download Presentation

Optimatization of a New Score Function for the Detection of Remote Homologs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al

  2. Introduction • New method to calculate a score function, aiming to optimize the ability to discriminate between homologs and non-homologs • Existing software uses the following to compute an alignment score:

  3. Number of times AA i is aligned with AA j Number of gaps in alignment Number of residues in each gap beyond one Score function / Substitution matrix Contribution to score for AA match/mismatch Contribution to score for gap initialization Contribution to score for gap extension

  4. Current Methods to Calculate Homology • p(Sr > x): probability that a random pair of proteins of the same length would have that score • E: expected number of random proteins in the db that would have at least that score • P: probability that there is at least one random pair with a higher score • As p(Sr > x), E, P increase, the likelihood that the given pair is homologous decreases

  5. Current Score Matrices • PAM (percent accepted mutations) – Dayhoff • GCB, JTT: used to apply to larger sequence datasets • BLOSUM62 – Henikoff & Henikoff, constructed using a dataset of aligned sequence blocks • STR – protein sequences aligned based on their observed structures

  6. Limitations of Current Score Functions • Current score functions assume independent evolution of each location, overlooking correlations • Score functions derived from a db of properly aligned proteins, not on alignments between random sequences • Gap penalty a priori

  7. Theory Z score for alignment: • Characterize the significance of alignment score by calculating the likelihood that this score or higher would be obtained by a random match • Account for variations in E with the length of the proteins

  8. Theory • Score function optimized by maximizing the confidence <C> over the training set • Avoids dependence on extreme E values (easily detected or overly distant homologies) • Eliminates contribution of falsely identified homologies (overly distant)

  9. Database Preparation • Use set of known homologs whose homology cannot be reliably determined with standard pairwise comparison, in order to optimize score function for detection of distant homologs • Training set: 900 pairs of protein in same COG with < 25% sequence identity

  10. Optimization of Score Function • Align using BLOSOM62 matrix • Calculate Z and C for each pair of homologs, then averaged over pairs in training set to yield <C> • Generate initial alignments using gap penalties that yielded highest C values • ~10 cycles of optimization and realignments until score function converged

  11. Results • Small changes in gap penalties: most of the improvement cones from refinements of • OPTIMA: resulting score function • has significantly improved average confidence <C> value compared with other score matrices • <p(Sr > x)>, <P> significantly decreased

  12. Summary • Aim: optimize score matrix to discriminate between homologs and non-homologs • OPTIMA score function: more successful at discriminating between homologs and non-homologs compared with standard score matrices • Gap penalties treated as additional parameters to be optimized

More Related