1 / 35

PEAKS: De Novo Sequencing using Tandem Mass Spectrometry

PEAKS: De Novo Sequencing using Tandem Mass Spectrometry. Bin Ma Dept. of Computer Science University of Western Ontario. Outline. Background Sandwich algorithm for de novo sequencing Software implementation – PEAKS. Background. Diseases are closely related to the abnormal proteins.

hestia
Download Presentation

PEAKS: De Novo Sequencing using Tandem Mass Spectrometry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario

  2. Outline • Background • Sandwich algorithm for de novo sequencing • Software implementation – PEAKS

  3. Background • Diseases are closely related to the abnormal proteins. • Given a tissue, the identification of the proteins (and their posttranslational modifications) in it is a fundamental problem in proteomics. • MS/MS is the most common way for protein identification.

  4. Sample Preparation tissue gel fraction GTDIMR HPLC PAK To MS/MS MPSER …… …… peptides Add trypsin

  5. Tandem Mass Spectrometer QTOF detector ions parent ions fragment ions + Quadrupole mass analyzer P + + AK TOF mass analyzer MPSER PAK + + + + + + collision P AK AK PA K P + PAK PAK + + K + PAK PA SG… + + PAK PA K … peptide sequencing ESI

  6. database de novo sequencing: LGSSEVEQVQLVVDGVK peptide sequence: LGSSEVEQVQLVVDGVK tandem mass spectrometry: MS/MS spectrum

  7. How Does a Peptide Fragment? m(b1)=1+m(A1) m(b2)=1+m(A1)+m(A2) m(b3)=1+m(A1)+m(A2)+m(A3) m(y1)=19+m(A4) m(y2)=19+m(A4)+m(A3) m(y3)=19+m(A4)+m(A3)+m(A2)

  8. Matching Sequence with Spectrum

  9. De Novo Sequencing • De Novo Sequencing (Dancik et al., JCB 6:327-342.) • Given a spectrum, a mass value M, compute a sequence P, s.t. m(P)=M, and the matching score is maximized. • We consider the matching score of P is the sum of the scores of the matched peaks. • We use intensity of a peak as its score to illustrate PEAKS’ algorithm.

  10. Spectrum Graph Approach • Convert the peak list to a graph. A peptide sequence corresponds to a path in the graph. • Bartels (1990), Biomed. Environ. Mass Spectrom 19:363-368. • Taylor and Johnson (1997). Rapid Comm. Mass Spec. 11:1067-1075. (Lutefisk) • Dancik et al. (1999), JCB 6:327-342. • Chen et al. (2001), JCB 8:325-337. • ……

  11. Warm up – Counting Only Y-ions

  12. 19 The Score of a Suffix y1 y2 y3 Let Q be a suffix of the peptide. It can determine some y-ions. score(Q) are the sum of scores of those y-ions of Q.

  13. 19 Recursive Computation of DP(m) Q’ a Suppose Q is such that DP(m)=score(Q). score(Q’)=DP(m(Q’)) Do not know a?

  14. Dynamic Programming • for m from 0 to M • backtracking

  15. Counting Both y and b Ions

  16. Good News y1 y2 y3 bn-3 bn-2 bn-1

  17. Bad News

  18. Ions Determined By a Pair P=LGEY Q=LLVR score(P,Q) is the sum of matched peak intensities. A peak can only count once.

  19. Chummy Pairs • Two strings P and Q are called chummy pairs, iff. either of the following two is true: (C1) (C2)

  20. Recursive Computation of score(P,Q) P=LGEY Q=LLVR u=m(P), v=m(Q)

  21. Chummy pairs • Lemma 1 – Suppose P and Q are a chummy pair. u=m(P), v=m(Q). If (C1) is true, If (C2) is true,

  22. Chummy Pairs • Lemma 2 – Let (P,Q) be a chummy pair, a be a letter. • (C1) (P,aQ) is a chummy pair but (Pa,Q) is not. • (C2) (Pa,Q) is a chummy pair but (P,aQ) is not. • Lemma 3 – Let S be the optimal solution. Then there is a chummy pair (P,Q) and a letter a such that S=PaQ. Also, there is a chummy pair series such that

  23. Dynamic Programming • Combining Lemma 1, 2, 3, we can compute • Suppose (P,Q) is the pair maximizing DP(u,v) under the condition m(P)+m(Q)+m(a)=M. Then PaQ is the optimal peptide.

  24. Algorithm Sandwich • DP(0,0) = 0;DP(u,v) = -infinity for (u,v)!=(0,0); • for u from 1 to M/2 step d do for v from u-m(W) to u+m(W) step d do for a in Σ do if u<v then else • find u,v,a, s.t. u+v+m(a)=M and DP(u,v) maximized; • backtracking; Time:

  25. PEAKS – The Software

  26. Comparison • LCQ data (Iontrap instrument): • Generously provided by Dr. Richard Johnson. 144 spectra. • Micromass Q-Tof data: • Measured in UWO’s Protein ID lab. 61 spectra • Sciex Q-Star data: • Provided by U. Victoria’s Genome BC Proteomics Centre. 13 good/okay spectra.

  27. PEAKS v.s. Lutefisk • completely correct sequences: • 38/144 v.s. 15/144 • correct amino acids: • 1067/1702 v.s. 767/1702 v.s. • partially correct sequences with 5 or more contiguous correct amino acids: • 94/144 v.s. 64/144

  28. PEAKS v.s. Micromass PLGS • completely correct sequences: • 13/61 v.s. 7/61 • correct amino acids: • 456/764 v.s. 232/764 • partially correct sequences with 5 or more contiguous correct amino acids: • 38/61 v.s. 24/61

  29. PEAKS v.s. Sciex BioAnalyst • completely correct sequences: • 7/13 v.s. 1/13 • correct amino acids: • 115/150 v.s. 86/150 • partially correct sequences with 5 or more contiguous correct amino acids: • 12/61 v.s. 7/61

  30. Users The company logos have been deleted from the original presentation. Please visit http://www.bioinformaticssolutions.com for a list of users.

  31. Other Techniques Used by PEAKS • Preprocess the MS/MS spectra • Deconvolution, noise reduction, and signal enhancement. • It does a better job than spectrometer vendor’s software. • Recalibration • compress/stretch the spectrum for calibration error • Positional Confidence • Estimate the confidence level of individual amino acids.

  32. Sophisticated Ion Matching Score • Score of one peak matching b ion

  33. PEAKS 2.x’s Additional Feature • Identify the proteins by matching the de novo (partial) sequences. • Then further match the spectra with the peptides of the proteins.

  34. Collaborators and References • Sandwich algorithm: • B. Ma, K. Zhang, C. Liang, CPM’03. (sandwich algorithm) • PEAKS: • B. Ma, K. Zhang, C. Hendrie, C. Liang, M. Li, A. Doherty-Kirby, G. Lajoie, Rapid Comm. Mass Spec. (software feature, score function, experiments) • Acknowledgement: • PEAKS development team. (Bioinformatics Solutions Inc.).

More Related