1 / 115

Week 4: Functional Genomics via Mass Spectrometry

This article explores the use of mass spectrometry for functional genomics, specifically identifying and quantifying proteins in the proteome.

scala
Download Presentation

Week 4: Functional Genomics via Mass Spectrometry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 4: Functional Genomics via Mass Spectrometry Bafna

  2. Peptide MS • Instrument software usually detects peaks, and computes features (peak, area, m/z…) m/z Bafna

  3. Nobel Citation 2002 Bafna

  4. Nobel Citation, 2002 Bafna

  5. MS based proteomics • Identification • Identify all the proteins in the proteome, specific organelles, specific pathways, complexes… • Quantitation • Is a protein differentially-expressed in certain conditions? • Others • Protein 3D structure, protein protein interactions,… We will consider an informatics-centered perspective Bafna

  6. MS versus Micro-array sample sample cDNA Protein/Peptide? • Unlike micro-array, peptide id is not trivial at the end of the MS experiment! • Identification is an important part of pre-processing Bafna

  7. Proteomics via MS Enzymatic Digestion (Trypsin) + Fractionation Q: Sufficient to identify peptides? Bafna

  8. Protein Identification • Is identifying peptides sufficient? • Rough probability for co-occurrence of a 15-aa peptide? With higher accuracy instruments, it may be possible to do intact proteins as well. Bafna

  9. The peptide backbone • The peptide backbone breaks to form • fragments with characteristic masses. • Amino-acid mass? • Residue mass? (aa mass – water) H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei Bafna

  10. Ionization H+ H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei The peptide backbone breaks to form fragments with characteristic masses. Ionized parent peptide Bafna

  11. Mass Spectrum Mass Spectrometry m/z Bafna

  12. Tandem MS of peptides Secondary Fragmentation Ionized parent peptide Bafna

  13. Tandem MS: Ionization H+ H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei The peptide backbone breaks to form fragments with characteristic masses. Ionized parent peptide Bafna

  14. Fragment ion generation The peptide backbone breaks to form fragments with characteristic masses. H+ H...-HN-CH-CONH-CH-CO-NH-CH-CO-…OH Ri Ri+1 Ri-1 C-terminus N-terminus AA residuei-1 AA residuei AA residuei+1 Ionized peptide fragment Bafna

  15. Ionization basics • Residue: amino-acid minus water • Prefix residue mass at a break (PRM) = sum of residue masses • Charged ions can be represented as offsets from PRMs Bafna

  16. Tandem MS for Peptide ID 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Intensity [M+2H]2+ 0 250 500 750 1000 Bafna m/z

  17. Peak Assignment 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions y6 100 Peak assignment implies Sequence (Residue tag) Reconstruction! y7 % Intensity [M+2H]2+ y5 b3 b4 y2 y3 b5 y4 y8 b8 b9 b6 b7 y9 0 250 500 750 1000 Bafna m/z

  18. Database Searching for peptide ID • For every peptide from a database • Reject if it has the wrong mass, else: • Generate a hypothetical spectrum • Compute a correlation between observed and experimental spectra • Choose the best • Database searching is very powerful and is the de facto standard for MS. • Sequest, Mascot, Inspect, and many others …SARLSQETFSDLWKLLPENNVLSPLP…. Bafna

  19. Modules for Peptide Id D S V I/F • Interpretation (D) • Input Spectrum • Output: all that can be extracted from the spectrum (peptides/tags/parent mass/charge) • Indexing/Filtering • Input: Db (set of peptides) • Output: pre-processing of the database, peptide subset. • Scoring • Input; peptide set, spectrum • Output: ranked list of scores • Validation • Significance of the top hit. Db

  20. De novo interpretation of mass spectra D S V I/F • The so called de novo algorithms focus exclusively on the D module. • There is no database (I/F). • Limited scoring and validation • Important when no database exists! • Also important for db search

  21. De Novo Interpretation: Example 100 200 300 400 500 0 88 145 274 402 b-ions S G E K 420 333 276 147 0 y-ions y 2 y 1 b 1 b 2 M/Z

  22. The simplest case • Suppose only (and all) the prefix ions were visible. Would identification be easy? • We have two problems: • There is a mix of b and y ions. Separating them is critical! • Other ions besides b,y, including neutral losses, noise and so on. We need to account for them. 0 88 145 274 402 b-ions S G E K 420 333 276 147 0 y-ions S 88 G 145 E 274 K 402

  23. Separating b-, and y-ions is solved using a combinatorial formulation (forbidden pairs) • Separating b,y from all others is solved using a statistical approach. • Together, they form the basis for a de novo sequencer.

  24. De Novo Interpretation: Example 100 200 300 400 500 0 88 145 274 402 b-ions S G E K 420 333 276 147 0 y-ions Ion Offsets b=P+1 y=S+19=M-P+19 y 2 y 1 b 1 b 2 M/Z

  25. Computing possible prefixes • We know the parent mass M=401. • Consider a mass value 88 • Assume that it is a b-ion, or a y-ion • If b-ion, it corresponds to a prefix of the peptide with residue mass 88-1 = 87. • If y-ion, y=M-P+19. • Therefore the prefix has mass • P=M-y+19= 401-88+19=332 • Compute all possible Prefix Residue Masses (PRM) for all ions.

  26. Putative Prefix Masses • Only a subset of the prefix masses are correct. • The correct mass values form a ladder of amino-acid residues Prefix Mass M=401by 88 87 332 145 144 275 147 146 273 276 275 144 S G E K 0 87 144 273 401

  27. Spectral Graph • Each prefix residue mass (PRM) corresponds to a node. • Two nodes are connected by an edge if the mass difference is a residue mass. 87 G 144

  28. Spectral Graph 0 273 332 401 87 144 146 275 100 200 300 S G E K • Each peak, when assigned to a prefix/suffix ion type generates a unique prefix residue mass. • Spectral graph: • Each node u defines a putative prefix residue M(u). • (u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag) or 0. • Paths in the spectral graph correspond to a interpretation

  29. Re-defining de novo interpretation 0 273 332 401 87 144 146 275 100 200 300 S G E K • Find a subset of nodes in spectral graph s.t. • 0, M are included • Each peak contributes at most one node (interpretation)(*) • Each adjacent pair (when sorted by mass) is connected by an edge (valid residue mass) • An appropriate objective function (ex: the number of peaks interpreted) is maximized 87 G 144

  30. Two problems 0 273 332 401 87 144 146 275 100 200 300 S G E K • Too many nodes. • A. Only a small fraction correspond to b/y ions (leading to true PRMs). • B. Even if the b/y ions were correctly predicted, each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem). • In general, the forbidden pairs problem is NP-hard

  31. However,.. • The b,y ions have a special non-interleaving property • Consider pairs (b1,y1), (b2,y2) • Note that b1+y1 = b2+y2 • If (b1 < b2), then y1 > y2

  32. Non-Intersecting Forbidden pairs 100 0 400 200 • If we consider only b,y ions, ‘forbidden’ node pairs are non-intersecting, • The de novo problem can be solved efficiently using a dynamic programming technique. 332 300 87 S G E K

  33. The forbidden pairs method • There may be many paths that avoid forbidden pairs. • We choose a path that maximizes an objective function, • EX: the number of peaks interpreted • Here we assume a function , which gives a score to a PRM. The score captures the likelihood that the PRM is correct.

  34. The forbidden pairs method 332 100 300 0 400 200 87 • Sort the PRMs according to increasing mass values. • For each node u, f(u) represents the forbidden pair • Let m(u) denote the mass value of the PRM. f(u) u

  35. D.P. for forbidden pairs • Consider all pairs u,v • m[u] <= M/2, m[v] >M/2 • Define S(u,v) as the best score of a forbidden pair path from 0->u, v->M • Is it sufficient to compute S(u,v) for all u,v? 332 100 300 0 400 200 87 u v

  36. D.P. for forbidden pairs • Note that the best interpretation is given by 332 100 300 0 400 200 87 u v

  37. D.P. for forbidden pairs • Denote the forbidden pair of node v by f(v). • What is f(f(v))? • Note that we have one of two cases. • Either u < f(v) (and f(u) > v) • Or, u > f(v) (and f(u) < v) • Case 1. • Extend v, do not touch f(u) 100 300 0 f(u) 400 200 u v w

  38. The complete algorithm for all u /*increasing mass values from 0 to M/2 */ for all v /*decreasing mass values from M to M/2 */ if (u > f[v]) else if (u < f[v]) If (u,v)E /*maxI is the score of the best interpretation*/ maxI = max {maxI,S[u,v]}

  39. De Novo: Second issue • Given only b,y ions, a forbidden pairs path will solve the problem. • However, recall that there are MANY other ion types. • Typical length of peptide: 15 • Typical # peaks? 50-150? • #b/y ions? • Most ions are “Other” • a ions, neutral losses, isotopic peaks….

  40. De novo: Weighting nodes in Spectrum Graph • Factors determining if the ion is b or y • Intensity • Support ions • b- and y-ions are the most likely ions to lose water/ammonia • Isotopic peaks

  41. Offset frequency function • b, and y-ions show offsets due to neutral losses

  42. A simple example of a Bayesian scoring model • Classify all peaks as absent (X), low intensity, or high-intensity. • Suppose we see the following supporting peaks for a mass value • Low b • High y • Absent a • Low y-18 • We are interested in Pr(m| supporting peaks) • Through a Bayesian inversion, we say that • Pr(supporting peaks|m) αPr(m| supporting peaks) Pr(m) Bafna/Ideker

  43. A simple example of a Bayesian scoring model b a y y-H2O Bafna/Ideker

  44. A simple example of a Bayesian scoring model b a y y-H2O Bafna/Ideker

  45. BN scoring • Create tables of joint occurrences of ions using previously annotated spectra • Use these to score the BN Bafna/Ideker

  46. Weighting nodes • A probabilistic network to model support ions (Pepnovo)

  47. De Novo Interpretation Summary • The main challenge is to separate b/y ions from everything else (weighting nodes), and separating the prefix ions from the suffix ions (Forbidden Pairs). • As always, the abstract idea must be supplemented with many details. • Noise peaks, incomplete fragmentation • A PRM is first scored on its likelihood of being correct, and the forbidden pair method is applied subsequently.

  48. A quick survey of other modules D S V I/F • The D module is the only module used in de novo algorithms • In database search, most of the attention is devoted to scoring and validation. • A variant of sequence alignment can be used for scoring, along with statistical inference. Db

  49. A brief overview of scoring with PTMs • Each node corresponds to the match of a peak with a sequence prefix, and is associated with a score. • The score of each node is defined through a statistical learning of spectral peaks. • The score of a peptide is the best scoring path, and can be computed using DP. • Shifts correspond to post-translational modifications • Score features: • Peak intensity • Residue composition • Support from neutral losses Bafna

  50. The I/F module D S V I/F • De novo tags can be used to index the database. Only the peptides that match the tags are selected for scoring.

More Related