1 / 54

Algorithmic Problems in Peptide Sequencing

Algorithmic Problems in Peptide Sequencing. Outline. Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry Algorithms for Peptide Identifications De Novo Sequencing An Algorithm for Perfect Spectra Peptide Identification in Real World Discussions. Briefings.

amaro
Download Presentation

Algorithmic Problems in Peptide Sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithmic Problems in Peptide Sequencing

  2. De Novo Sequencing for Peptide Identificaiton Outline • Basics of Proteomics • Roles and Anatomy of Proteins • Tandem Mass Spectrometry • Algorithms for Peptide Identifications • De Novo Sequencing • An Algorithm for Perfect Spectra • Peptide Identification in Real World • Discussions

  3. De Novo Sequencing for Peptide Identificaiton Briefings • We mainly focus on the following result: • Ting Chen, Ming-Yang Kao, Matthew Tepel, John Rush and George Church, A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry, Journal of Computational Biology, 8(3): 325-337, 2001. • Its preliminary version also appears in The 11th Annual SIAM-ACM Symposium on Discrete Algorithms (SODA 2000), page 389-398, 2000. • One of the most-cited algorithm articles in the computational proteomics community.

  4. De Novo Sequencing for Peptide Identificaiton Outline • Basics of Proteomics • Roles and Anatomy of Proteins • Tandem Mass Spectrometry • Algorithms for Peptide Identifications • De Novo Sequencing • An Algorithm for Perfect Spectra • An Improved Version • Peptide Identification in Real World • Discussions

  5. De Novo Sequencing for Peptide Identificaiton Neutral peptide Residue (of the peptides) Anatomy of Protein Molecules H O H H O NH C C C OH NH C Rx Rx Stable state in nature Basic building blocks

  6. De Novo Sequencing for Peptide Identificaiton O O O O H H H H C C C C N N C C R4 R4 arginine (R) or lysine (K) H H K 146.19 128.17 R 174.13 156.11 N C H H R3 O N C COOH H H H R5 C H2 N C C C N R1 H R2 O Proteins and Peptides O H H H H H C H2 N C C N C N C N C COOH H R1 R2 O R3 H R5 trypsin + H2O OH Rectangles stand for amino acid residues

  7. De Novo Sequencing for Peptide Identificaiton Amino Acid Molecules • Please visit http://www.ionsource.com/ for more information.

  8. De Novo Sequencing for Peptide Identificaiton Outline • Basics of Proteomics • Roles and Anatomy of Proteins • Tandem Mass Spectrometry • Algorithms for Peptide Identifications • De Novo Sequencing • An Algorithm for Perfect Spectra • Peptide Identification in Real World • Discussions

  9. De Novo Sequencing for Peptide Identificaiton Sample + _ Detector Ionizer Mass Analyzer Tandem Mass Spectrometry • Mass Spectrometers measure the mass of charged ions. • A mass spectrometer has 3 major components. Adapted from Nathan Edwards’ slides

  10. De Novo Sequencing for Peptide Identificaiton Proteomics via Mass Spectrometers Enzymatic Digest and Fractionation First stage MS MS/MS Precursor selection and dissociation Adapted from Nathan Edwards’ slides

  11. De Novo Sequencing for Peptide Identificaiton Outline • Basics of Proteomics • Roles and Anatomy of Proteins • Tandem Mass Spectrometry • Algorithms for Peptide Identification • De Novo Sequencing • An Algorithm for Perfect Spectra • Peptide Identification in Real World • Discussions

  12. De Novo Sequencing for Peptide Identificaiton Peptide Identification • Given: • A MS/MS spectrum (m/z, intensity, possibly along with its retention time) • The precursor mass • Output: • The amino-acid sequence of the peptide • Imagine a deck of cards that you can cut many times and obtains the sums of the upper or lower half

  13. De Novo Sequencing for Peptide Identificaiton y-ions R E G L b-ions m/z L E R G Peptide Fragmentation Mechanism N-Terminus C-Terminus b-ions y-ions

  14. De Novo Sequencing for Peptide Identificaiton Peaks in a Spectrum • Peptide: L – G – E – R

  15. De Novo Sequencing for Peptide Identificaiton Manual De Novo Sequencing

  16. De Novo Sequencing for Peptide Identificaiton Outline • Basics of Proteomics • Roles and Anatomy of Proteins • Tandem Mass Spectrometry • Algorithms for Peptide Identification • De Novo Sequencing • An Algorithm for Perfect Spectra • Peptide Identification in Real World • Discussions

  17. De Novo Sequencing for Peptide Identificaiton M De Novo Sequencing • De Novo: From the beginning in Latin. • Database search tools match against known peptides. • Problem Definitions: • Given a spectrum ( a set of real intervals ), • a mass value M, • compute a sequence P, ( a set of real number with specific order) • s.t. m(P)=M, and the matching score is maximized. • m(P) is the sum of residue mass.

  18. De Novo Sequencing for Peptide Identificaiton M De Novo Sequencing: An Ideal Case • An ideal tandem mass spectrum is noise-free and contains only b- and y-ions, and every mass peak has the same height. • The task is to find paths connecting two endpoints on a directed acyclic graph. • The problem is : how to construct the ion ladder?

  19. De Novo Sequencing for Peptide Identificaiton y1 y3 y2 R E G L m/z L R G E Ion Ladders in an Ideal Case • Based on an ideal ion ladder, we can determine the sequence by concatenating prefixes (or suffixes) in order. • However, we cannot determine the ion type of a peak before identifying it. Given only L+ , ER+, LGE+, R+

  20. De Novo Sequencing for Peptide Identificaiton NC-Spectrum Model • We generate a (superset of ) ladder of ions. • A Trick: Even if we cannot determine the ion types, we know that an ion is either b-ion or y-ion. • Assume that we want to generate b-ion ladder. • If a peak is a b-ion, add the peak value to the list. • If a peak is a y-ion, add the complementary b-ion value to the list. • This phase doubles the number of peaks.

  21. De Novo Sequencing for Peptide Identificaiton GER LG Q2 Q1 Q4 Q3 0 m m/2 P1 P2 P3 P4 ER LGE L R NC-Spectrum Model • For the peptide sequence LGRE, we construct all possible b-ions with respect to current spectrum. • {P1, Q3, P4} or {P2, P3, Q1} are both complete ladders. Pi: observed peaks Qi: artificial peaks

  22. De Novo Sequencing for Peptide Identificaiton NC-Spectrum Model • Given a peak list = {P1,P2,P3, … , Pk} • The coordinates of all points along the line: • Pk – 1 • Qk = M – Pk+1 (why?) • We still have to add two endpoints: • 0 • M– 18 Since the ion loses a Hydrogen (M – (Pk – 1 ) ) - 1

  23. De Novo Sequencing for Peptide Identificaiton NC Spectrum Model: A Summary • We are given k peaks. • Now we have at most 2k+2 vertices. • Two vertices are adjacent if their coordinates differ by the weight of some amino acid. • The spectrum graph can be constructed in O(n2). (Why?) • The de novo sequencing is to search a path (or paths) representing a good path from coordinate 0 to M-18. • Such a path is not necessarily an ion ladder, though.

  24. De Novo Sequencing for Peptide Identificaiton Dynamic Programming Strategy • Dynamic Programming can solve this problem efficiently. • Uni-directional (forward) DP does not work since it could produce a solution containing both candidates for each peak. Q2 Q1 Q4 Q3 0 m m/2 P1 P2 P3 P4

  25. De Novo Sequencing for Peptide Identificaiton Dynamic Programming Strategy (Cont’d) • Dynamic Programming can solve this problem efficiently using a different encoding scheme. • We approach the middle part from both end sides. Q2 Q1 Q4 Q3 0 m m/2 P1 P2 P3 P4

  26. De Novo Sequencing for Peptide Identificaiton Dynamic Programming Strategy (Cont’d) • Mass(b-ion) + Mass(y-ion) = PrecursorMass +2 • These b-ion candidates are nested pairs in the spectrum graph. 0 m m/2

  27. De Novo Sequencing for Peptide Identificaiton Relabeling the Vertices • To encode the spectrum graph by the nested pairs, we need to relabel the vertex number. • {0 = x0, x1, x2, …, xk, yk, …, y2, y1, y0 = m} • xi and yi are both generated from the same peak. • We go one level further in each iteration. 0 m m/2 x0 xk yk y0

  28. De Novo Sequencing for Peptide Identificaiton How Dynamic Programming Works • We design the |V|×|V| matrix M for representing partial path candidates. • M(i, j) = 1 iff [xo, xi] and [yj, yo] can occur simultaneouly in a legal path. • For 1≦ s ≦ i, 1 ≦ s ≦ j, s occurs exactly once in the determined partial path. ? xi yj 0 m m/2

  29. De Novo Sequencing for Peptide Identificaiton How Dynamic Programming Works (Cont’d) x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 m/2 m 0 M(0,0) = 1 x0 y0 M(0,1) = 1 x0 y1 y0 M(1,0) = 1 x0 x1 y0

  30. De Novo Sequencing for Peptide Identificaiton M(0,1) = 1 x0 y1 y0 M(1,0) = 1 x0 x1 y0 How Dynamic Programming Works (Cont’d) x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 m/2 m 0 M(2,0) = 0 x0 x1 x2 y0 • M(1,0) =1 , but we cannot reach x2 from x0 nor x1. M(2,1) = 1 x0 x2 y1 y0 • M(0,1) =1 , and we can reachx2 from x0.

  31. De Novo Sequencing for Peptide Identificaiton M(0,1) = 1 x0 y1 y0 M(1,0) = 1 x0 x1 y0 How Dynamic Programming Works (Cont’d) x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 m/2 m 0 M(0,2) = 0 x0 y2 y1 y0 • M(0,1) =1 , but we cannot reach y2 from y0 nor y1. M(1, 2) = 1 x0 x1 y2 y0 • M(1,0) =1 , and we can reach y2 from y0.

  32. De Novo Sequencing for Peptide Identificaiton 0 m/2 Dynamic Programming: Preview • In the i-th iteration, we determine and record all possible (partial) paths in [0, xi] and [ yi, m]. m … … xi-1 y0 x0 yt xi or yi? t < i-1 … … xi-1 x0 yt y0 xi yi

  33. De Novo Sequencing for Peptide Identificaiton Dynamic Programming: Preview(Cont’d) Path extension • How can we reach yi? • To calculate M(xj, yi) for all j < i, • For every j < i, check if yi is adjacent to yt and M(xj, yt) = 1, for some t < i • Then M(xj, yi) = 1. Otherwise, it is 0. … … xj y0 x0 yi yt … … xj x0 yi yt y0

  34. De Novo Sequencing for Peptide Identificaiton Dynamic Programming: Preview(Cont’d)Path extension • Similarly, how can we reach xi? • To calculate M(xi, yj) for all j < i, • For every j < i, check if xi is adjacent to xt and M(xt, yj) = 1, for some t < i • Then define M(xi, yj) =1. … … y0 x0 xt xi yj … … xt x0 xi yj y0

  35. De Novo Sequencing for Peptide Identificaiton Dynamic Programming m/2 m 0 x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

  36. De Novo Sequencing for Peptide Identificaiton Dynamic Programming: Initialization m/2 m 0 x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

  37. De Novo Sequencing for Peptide Identificaiton Dynamic Programming: 1st iteraton We then compute M(1,0) and M(0,1). m/2 m 0 x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 Check the arcs (x0, x1) and (y1, y0)

  38. De Novo Sequencing for Peptide Identificaiton Dynamic Programming: Recursion (a) For j = 2 to k For i = 0 to j-2 (a) If M(i, j-1) = 1 and edge(Xi, Xj) = 1, then M(j, j-1) = 1. m/2 m 0 x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 Can we adjust the leftmost endpoint to xj?

  39. De Novo Sequencing for Peptide Identificaiton Dynamic Programming: Recursion (b) For j = 2 to k For i = 0 to j-2 (b) If M(i, j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i, j) = 1. m/2 m 0 x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 Can we adjust the rightmost endpoint to yj?

  40. De Novo Sequencing for Peptide Identificaiton Dynamic Programming: Recursion (c) For j = 2 to k For i = 0 to j-2 (c) If M(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j, i) = 1. m/2 m 0 x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 Can we adjust the leftmost endpoint to xj?

  41. De Novo Sequencing for Peptide Identificaiton Dynamic Programming: Recursion (d) For j = 2 to k For i = 0 to j-2 (d) If M(j-1, i) = 1 and edge(Yi, Yj) = 1, then M(j-1, j) = 1. m/2 m 0 x0 x1 x2 x3 x4 y4 y3 y2 y1 y0 Can we adjust the rightmost endpoint to yj?

  42. De Novo Sequencing for Peptide Identificaiton Dynamic Programming (Cont’d) Now for j = 3 m/2 m 0 x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

  43. De Novo Sequencing for Peptide Identificaiton Dynamic Programming (Cont’d) Now for j = 4 m/2 m 0 x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

  44. De Novo Sequencing for Peptide Identificaiton Dynamic Programming: Constructing the Answer • Legal path: Starting our search from the outermost regions ( the last row/column): • [x4, y4] -> [x3, y3] -> [x2, y2] ->[x1, y1] • We backtrack M to search each edge corresponding to the feasible solution m/2 m 0 x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

  45. De Novo Sequencing for Peptide Identificaiton Dynamic Programming: Review • Chen et al. create a new NC-specturm graph G=(V, E), where V=2k+2 and k is the number of mass peaks (ions). • Given the NC-spectrum graph, we can solve the idealde novo peptide sequencing problem in O(|V|2) time and O(|V|2) space. • M construction : O(|V|2) time • Constructing a feasible solution : O(|V|) time • Therefore we find a feasible solution in O(|V|2) time and O(|V|2) space.

  46. De Novo Sequencing for Peptide Identificaiton Outline • Basics of Proteomics • Roles and Anatomy of Proteins • Tandem Mass Spectrometry • Algorithms for Peptide Identification • De Novo Sequencing • An Algorithm for Perfect Spectra • Peptide Identification in Real World • Discussions

  47. De Novo Sequencing for Peptide Identificaiton Noises in Real Spectra • The de novo strategy is too fragile to handle frequent errors. • False negative peaks • Missing ions will break the path. The algorithms may find wrong paths by concatenating two partial paths. • False positive peaks • The main critique of de novo strategy • Peak value is not the ion mass • Peak values represent the mass over charge value of ions. • It relies on the vendor. (Applied Biosystem)

  48. De Novo Sequencing for Peptide Identificaiton False Positives in Real Spectra • Different types of ions • a-x, b-y, c-z • Internal fragments/immonium ions • Neutral losses • Neutral loss of water (~18Da) • Neutral loss of ammonia (~17Da) • PTM (like adding new letters) • Phosphorylation, glycopeptides • Isotopes • Unpurified samples

  49. De Novo Sequencing for Peptide Identificaiton Database Search Tools • MASCOT: http://www.matrixscience.com/ • The de facto identification tool

  50. De Novo Sequencing for Peptide Identificaiton Database Search Tools (Cont’d) • Brian Searle of Proteome Software informs us:

More Related