1 / 28

PEAKS: De Novo Sequencing using MS/MS spectra

PEAKS: De Novo Sequencing using MS/MS spectra. Bin Ma, U. Western Ontario, Canada Kaizhong Zhang, U. Western Ontario, Canada Chengzhi Liang, Bioinformatics Solutions Inc. Canada. Outline. Background Tandem Mass Spectrometry De novo sequencing Problem Definition and Algorithm.

tertius
Download Presentation

PEAKS: De Novo Sequencing using MS/MS spectra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang, U. Western Ontario, Canada Chengzhi Liang, Bioinformatics Solutions Inc. Canada

  2. Outline • Background • Tandem Mass Spectrometry • De novo sequencing • Problem Definition and Algorithm. • Software implementation – PEAKS • Future work

  3. Background • Human has 100,000 different proteins. Because of the existence of post translational modifications, each protein can have many different versions. • Diseases are closely related to the abnormal proteins or the expression levels of proteins. • Given a tissue, the identification of the proteins (and their modified versions) in it is a fundamental problem for the drug design.

  4. Proteins and Peptides • A protein is a sequence of 20 different types of amino acids. • A protein is a string over alphabet with size 20 • A peptide is a substring of the protein. • The 20 amino acids have 19 distinct masses. • I and L have the same mass and cannot (difficult) be distinguished by MS/MS. • Regard them as the same letter.

  5. tissue protein gel fraction …VITK | GTDIMNEMR | SMW… peptide Tandem Mass Spectrometry • MS/MS is the only reliable way for protein identification.

  6. database de novo sequencing: LGSSEVEQVQLVVDGVK peptide sequence: LGSSEVEQVQLVVDGVK tandem mass spectrometer: MS/MS spectrum

  7. How Does a Peptide Fragment? m(b1)=1+m(A1) m(b2)=1+m(A1)+m(A2) m(b3)=1+m(A1)+m(A2)+m(A3) m(y1)=19+m(A4) m(y2)=19+m(A4)+m(A3) m(y3)=19+m(A4)+m(A3)+m(A2)

  8. Matching Sequence with Spectrum

  9. De Novo Sequencing • For any peptide P= a1…an, m(P) = Σi ai. • De Novo Sequencing • Given a spectrum, a mass value m, compute a sequence P, s.t. m(P)=m, and the matching score score(P) is maximized.

  10. A Simpler Case – Only Y-ions

  11. 19 Y-ions Determined By a Suffix y1 y2 y3 score(Q) can be defined for a suffix Q.

  12. Counting Both y and b ions

  13. Strategies • Consider a pair of prefix R and a suffix Q simultaneously. • Consider only those pairs (R,Q) that satisfy a nice property, which we call “chummy” • Chummy pairs allow: • The score of a chummy pair can be computed recursively from a smaller chummy pair. • There are a series of chummy pairs that grow to the optimal solution.

  14. Dynamic Programming • Combining Lemma A, B, we can compute • Suppose (R,Q) is the pair maximizing DP(u,v) under the condition m(R)+m(Q)+a=m. Then RaQ is the optimal peptide.

  15. PEAKS – The Software

  16. Comparison of PEAKS and Lutefisk Red = Correct

  17. Users

  18. Implementation Particulars • More accurate scoring: • sum of the logarithmic intensities • many other ion types • coexisting ions, e.g., x2, y2, z2 • Deconvolution • converting multiply-charged peaks to singly-charged ones • Recalibration • compress/stretch the spectrum for calibration error • Noise reduction

  19. Acknowledgement • Bin Ma, Kaizhong Zhang were supported by NSERC. • Chengzhi Liang was supported by BSI. • Thanks the development team in BSI for the software development.

  20. Tandem Mass Spectrometer detector ions precursor ions fragment ions + mass analyzer P + + AK mass analyzer MPSER PAK + + + + + + fragment P AK AK PA K P + PAK PAK + + K + PAK PA SG… + + PAK PA K … de novo sequencing

  21. Algorithm Sandwich • DP(0,0) = 0;DP(u,v) = -infinity for (u,v)!=(0,0); • for u from 1 to m/2 do for v from u-max(a) to u+max(a) do for a in Σ do if u<v then else • find u,v,a, s.t. u+v+a=m and DP(u,v) maximized; • backtracking;

  22. Dynamic Programming • for u from 0 to m • backtracking

  23. Dynamic Programming • We hope DP(u,v) for u+v=m gives the optimal prefix and suffix. • The optimal solution can be obtained by concatenation of the prefix and suffix.

  24. Chummy Pairs • Two strings Ra and bQ are called chummy pairs, iff. either of the following two is true: (C1) (C2) (LGE, LVR)  (C2) (LGE, VR)  (C1) (LGE, R)  (C1) (LG,VR) is not chummy

  25. Chummy pairs • Lemma A – Suppose Ra and bQ are a chummy pair. u=m(Ra), v=m(bQ). If (C1) is true, If (C2) is true,

  26. Chummy Pairs • Lemma B – Let P be the optimal solution. Then there is a chummy pair (R,Q) and a letter a such that P=RaQ. Also, there is a chummy pair series such that

More Related