1 / 160

Protein Sequencing and Identification by Mass Spectrometry

Protein Sequencing and Identification by Mass Spectrometry. Outline. Tandem Mass Spectrometry De Novo Peptide Sequencing Spectrum Graph Protein Identification via Database Search Identifying Post Translationally Modified Peptides Spectral Convolution Spectral Alignment.

Download Presentation

Protein Sequencing and Identification by Mass Spectrometry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Sequencing and Identification by Mass Spectrometry

  2. Outline • Tandem Mass Spectrometry • De Novo Peptide Sequencing • Spectrum Graph • Protein Identification via Database Search • Identifying Post Translationally Modified Peptides • Spectral Convolution • Spectral Alignment

  3. Different Amino Acid Have Different Masses H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 C-terminus N-terminus AA residuei-1 AA residuei+1 AA residuei

  4. Peptide Fragmentation Collision Induced Dissociation H+ H...-HN-CH-CO . . .NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1 Prefix Fragment Suffix Fragment • Peptides tend to fragment along the backbone. • Mass spectrometer is a sophisticated (and rather expensive!) scale to measure the masses of these fragments

  5. Breaking Protein into Peptides and Peptides into Fragment Ions • Most mass spectrometers can only measure masses of short peptides (e.g., 20 amino acids) rather than masses of entire proteins (usually hundreds of amino acids). That’s why: • Proteases, e.g. trypsin, break protein into short peptides. • A Tandem Mass Spectrometer further breaks the peptides down into fragment ions and measures the mass of each piece. • Mass Spectrometer accelerates the fragmented ions; heavier ions accelerate slower than lighter ones. • Mass Spectrometer measure mass/chargeratio of an ion.

  6. N- and C-terminal Peptides P A G N F A P G N F A N P G F C-terminal peptides N-terminal peptides A N F P G P A N F G

  7. Terminal peptides and ion types P G N F Peptide H2O Mass (D) 57 + 97 + 147 + 114 = 415

  8. Masses of fragment ions P G N F Peptide H2O Mass (D) 57 + 97 + 147 + 114 = 415 P G N F Peptide without H2O Mass (D) 57 + 97 + 147 + 114 – 18 = 397

  9. N- and C-terminal Peptides 486 P A G N F A 71 P G N F 415 301 A N P G F 185 C-terminal peptides N-terminal peptides A N F P G 332 154 P A N F G 429 57

  10. N- and C-terminal Peptides 486 71 415 301 185 C-terminal peptides N-terminal peptides 332 154 429 57

  11. Theoretical Spectrum 486 71 415 Reconstruct peptide from the set of masses of fragment ions (mass-spectrum) 5771154185301332415429486 301 185 332 154 429 57

  12. Reconstructing Peptides Reconstruct peptide from the set of masses of fragment ions (mass-spectrum) 57 71 154 185 301 332 415 429 486

  13. Reconstructing Peptides • Reconstruct peptide from the set of masses of fragment ions • (mass-spectrum) • 5771 81 100 112 131 154 160 172 177 185 201 221 235 301 312 325 332 370 387 409 415 423 429 460 472 486

  14. Reconstructing Peptides • Reconstruct peptide from the set of masses of fragment ions • (mass-spectrum) • 5771 81 100 112 131 160 172 177 185 201 221 235 301 312 325 370 387 409 415 423 429 460 472 486

  15. Peptide Fragmentation b2-H2O b3- NH3 a2 b2 a3 b3 HO NH3+ | | R1 O R2 O R3 O R4 | || | || | || | H -- N --- C --- C --- N --- C --- C --- N --- C --- C --- N --- C -- COOH | | | | | | | H H H H H H H y3 y2 y1 y2 - NH3 y3 -H2O

  16. G V D L K L 57 Da = ‘G’ K D V G 99 Da = ‘V’ H2O D Mass Spectra • The peaks in the mass spectrum: • Prefix • Fragments with neutral losses (-H2O, -NH3) • Noise and missing peaks. mass 0 and Suffix Fragments.

  17. G V D L K • Peptide Identification Intensity MS/MS mass 0 mass 0 Protein Identification with MS/MS

  18. Tandem Mass Spectrum • Tandem Mass Spectrometry mainly generates N- and C-terminal fragment ions • Chemical noise often complicates the spectrum. • Represented in 2-D: mass/charge axis vs. intensity axis

  19. Tandem Mass-Spectrometry

  20. Breaking Proteins into Peptides HPLC GTDIMR To MS/MS PAKID MPSERGTDIMRPAKID...... MPSER …… …… protein peptides

  21. Mass Spectrometry Matrix-Assisted Laser Desorption/Ionization (MALDI) From lectures by Vineet Bafna (UCSD)

  22. collision cell MS-2 MS-1 Ion Source Tandem Mass Spectrometry MS LC Scan 1707 MS/MS Scan 1708

  23. Protein Identification by Tandem Mass Spectrometry (MS/MS) S e q u e n c e MS/MS instrument • database search • Sequest, Mascot, etc • de novo interpretation • Lutefisk, Peaks, etc

  24. W R V A L Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. T G E P L K C W D T Database of all peptides = 20nAAAAAAAA,AAAAAAAC,AAAAAAAD,AAAAAAAE,AAAAAAAG,AAAAAAAF,AAAAAAAH,AAAAAAI, AVGELTI, AVGELTK , AVGELTL, AVGELTM, YYYYYYYS,YYYYYYYT,YYYYYYYV,YYYYYYYY W R V A L T G E P L K C W D T De Novo vs. Database Search Database Search De Novo Mass, Score AVGELTK

  25. De Novo vs. Database Search: A Paradox • The database of all peptides is huge ≈ 20n peptides of length n • The database of all known peptides is much smaller ≈ 108 peptides • However, de novo algorithms can be much faster, even though their search space is much larger! • A database search scans all peptides in the database of all known peptides to find best one. • De novo eliminates the need to scan database of all peptides by modeling the problem as a graph search.

  26. Three Algorithmic Problems • Searching for a million words in a text. Suppose it takes 1 sec to find a word in a text. How much time would it take to find 1 million words in the text? • Searching for a word without even looking at 99.999% of the text. Suppose you search for a word in a text. Would it be possible to ignore 99.999% of the text, scan only the remaining part and guarantee that the word you are looking for will be found? • Finding spelling errors in a book written in an unknown language. Given a book (in an unknown language) and a misspelled word (with insertions, deletions, and substitutions of letters) correct spelling errors in the word.

  27. Three Algorithmic Problems • Searching for a million words in a text. Suppose it takes 1 sec to find a word in a text. How much time would it take to find 1 million words in the text? 1 million seconds? • Searching for a word without even looking at 99.999% of the text. Suppose you search for a word in a text. Would it be possible to ignore 99.999% of the text, scan only the remaining part and guarantee that the word you are looking for will be found? • Finding spelling errors in a book written in an unknown language. Given a book (in an unknown language) and a misspelled word (with insertions, deletions, and substitutions of letters) correct spelling errors in the word.

  28. Genomics: Problems Solved. • Searching for a million words in a text. Aho-Corasik algorithm takes roughly the same time with a million words as it takes with a single word. • Searching for a word without even looking at 99.999% of the text. Filtration algorithms (like FASTA or BLAST) ignore 99.999% of the text. • Finding spelling errors. Sequence alignment algorithms (like Smith-Waterman) do it in quadratic time

  29. Proteomics: Three Problems • Comparing a million spectra against a database. Suppose it takes 1 sec to interpret a spectrum. How much time would it take to interpret 1 million spectra? • Mass-spectrometry database search without even looking at 99.999% of the database. Suppose you compare a spectrum against a database. Would it be possible to ignore 99.999% of the database, scan only the remaining part and guarantee that you still can identify a peptide of interest? • Blind PTM search and discovery of new PTM types. Given a spectrum of a peptide with unknown PTM types, find this peptide in the database. Discover new PTM types by data mining of large MS/MS datasets.

  30. Three Solutions • Comparing a million spectra against a database. InsPecT (Tanner et al., Anal. Chem, 2005) • MS/MS database search without even looking at 99.999% of the database. PepNovoTag+InsPecT (Tanner et al., Anal. Chem, 2005) • Blind PTM search and discovery of new PTM types. Given a spectrum of a peptide with unknown PTM types, find this peptide in the database. Discover new PTM types by data mining of large MS/MS datasets. MS-Alignment (Tsur et al., Nature Biotech., 2005)

  31. Filtration: Combining De Novo Sequencing and Database Search in Mass-Spectrometry • So far de novo and database search were presented as two separate techniques • Database search is rather slow: many labs generate more than 100,000 spectra per day. SEQUEST takes approximately 1 minute to compare a single spectrum against SWISS-PROT (54Mb) on a desktop. • It will take SEQUEST more than 2 months to analyze the MS/MS data produced in a single day. • Can slow database search be combined with fast de novo analysis?

  32. De novo Peptide Sequencing Sequence

  33. Building Spectrum Graph • How to create vertices (from masses) • How to create edges (from mass differences) • How to score vertices • How to score paths • How to find the best path

  34. S E Q U E N C E b-ions (prefix or N-terminal ions) Mass/Charge (M/Z)

  35. a-ions = b-ions - CO = b-ions - 28 S E Q U E N C E Mass/Charge (M/Z)

  36. Shifting Peaks: a-ions = b-ions - CO = b-ions - 28 S E Q U E N C E Mass/Charge (M/Z)

  37. y-ions (suffix of C-terminal ions) E C N E U Q E S Mass/Charge (M/Z)

  38. Intensity Mass/Charge (M/Z)

  39. Intensity Mass/Charge (M/Z)

  40. noise Mass/Charge (M/Z)

  41. MS/MS Spectrum Intensity Mass/Charge (M/z)

  42. Some Mass Differences between Peaks Correspond to Amino Acids u q e e q s u e n n c e e e q c s n e u s e c e

  43. Some Mass Differences between Peaks Correspond to Amino Acids u q e e q s u e n n c e e e q c s n e u s e c e

  44. Ion Types • Some masses correspond to fragment ions, others are just random noise • Knowing ion typesΔ={δ1, δ2,…, δk} lets us distinguish fragment ions from noise • We can learn ion types δi and their probabilities qi by analyzing a large test sample of annotated spectra.

  45. Example of Ion Type • Δ={δ1, δ2,…, δk} • Ion types {b, b-NH3, b-H2O, b-CO} correspond to Δ={0, 17, 18, 28} *Note: In reality the δ value of ion type b is -1 but we will “hide” it for the sake of simplicity

  46. Match between Spectra and the Shared Peak Count • The match between two spectra is the number of masses (peaks) they share (Shared Peak Count or SPC) • In practice mass-spectrometrists use the weighted SPC that reflects intensities of the peaks • Match between experimental and theoretical spectra is defined similarly

  47. Peptide Sequencing Problem Goal: Find a peptide with maximal match between an experimental and theoretical spectrum. Input: • S: experimental spectrum • Δ: set of possible ion types Output: • A peptide whose theoretical spectrum matches the experimental spectrum the best

  48. Shifting Peaks: a-ions = b-ions - CO = b-ions - 28 S E Q U E N C E Mass/Charge (M/Z)

  49. Reverse Shifts Shift in H2O Shift in H2O+NH3

  50. Vertices of the Spectrum Graph • Masses of potential N-terminal peptides • Vertices are generated by reverse shifts corresponding to ion types Δ={δ1, δ2,…, δk} • Every N-terminal peptide can generate up to k ions m-δ1, m-δ2, …, m-δk • Every mass s in an MS/MS spectrum generates k vertices V(s) = {s+δ1, s+δ2, …, s+δk} corresponding to potential N-terminal peptides • Vertices of the spectrum graph: {initial vertex}V(s1) V(s2) ... V(sm) {terminal vertex}

More Related