1 / 50

Proteomics Informatics David Fenyő

Proteomics Informatics David Fenyő. Course Information. http://fenyolab.org/pi2018. Protein Identification and Quantitation. Samples. Peptides. Mass Spectrometry. Quantity. intensity. m/z. Identity. Central Dogma of Molecular Biology. Transcription. Replication. Translation.

naeva
Download Presentation

Proteomics Informatics David Fenyő

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proteomics InformaticsDavid Fenyő

  2. Course Information http://fenyolab.org/pi2018

  3. Protein Identification and Quantitation Samples Peptides Mass Spectrometry Quantity intensity m/z Identity

  4. Central Dogma of Molecular Biology Transcription Replication Translation Modification P

  5. Central Dogma of Molecular Biology Transcription Replication Slow Degradation Translation X Fast Degradation Modification P X

  6. Motivating Example: Protein Regulation GRB7 ERBB4 Breast Cancer ERBB2 ERBB2 GRB7 ERBB4 ERBB2 ERBB2 GRB7 ERBB4 ERBB2 ERBB2

  7. Motivating Example: Protein Complexes Alber et al., Nature 2007

  8. Motivating Example: Signaling Choudhary & Mann, Nature Reviews Molecular Cell Biology 2010

  9. Mass Spectrometry Based Proteomics Lysis Fractionation Digestion Mass spectrometry Peak Finding Charge determination De-isotoping Integrating Peaks Searching MS Identified and Quantified Proteins

  10. Mass Spectrometry Ion Source Mass Analyzer Detector intensity mass/charge

  11. Mass Spectrometry Ion Source Mass Analyzer 1 Frag-mentation Mass Analyzer 2 Detector y b

  12. Example data – ESI-LC-MS/MS 762 100 875 [M+2H]2+ % Relative Abundance 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z m/z MS/MS Time

  13. Information Content in a Single Mass Measurement Human 10 8 6 Avg. #of matching peptides 4 3 2 1 1 2 3 4 6 8 10 #of matching peptides 1000 2000 3000 Tryptic peptide mass [Da] S. cerevisiae 10 8 6 Avg. #of matching peptides 4 3 2 1 1 2 3 4 6 8 10 #of matching peptides 1000 2000 3000 Tryptic peptide mass [Da]

  14. Protein Identification by Mass Spectrometry Samples Peptides MS/MS Protein DB Compare, score, test significance Identified peptides and proteins

  15. Tandem MS – Database Search Sequence DB Lysis Fractionation Pick Protein Digestion LC-MS Pick Peptide Repeat for all proteins MS/MS All Fragment Masses Repeat for all peptides MS/MS Compare, Score, Test Significance

  16. Search Results

  17. Search Results Most proteins show very reproducible peptide patterns

  18. Search Results

  19. Spectrum Library Search Spectrum Library Lysis Fractionation Digestion LC-MS/MS Pick Spectrum Repeat for all spectra MS/MS Compare, Score, Test Significance Identified Proteins

  20. Interpretation of Mass Spectra S G F L E E D E L K 100 % Relative Abundance 0 250 500 750 1000 m/z

  21. Interpretation of Mass Spectra 88 145 292 405 534 663 778 907 1020 1166 b ions S S G G F F L L E E E E D D E E L L K K 100 % Relative Abundance 0 250 500 750 1000 m/z

  22. Interpretation of Mass Spectra 88 145 292 405 534 663 778 907 1020 1166 b ions S S G G F F L L E E E E D D E E L L K K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Relative Abundance 0 250 500 750 1000 m/z

  23. Interpretation of Mass Spectra 88 145 292 405 534 663 778 907 1020 1166 b ions S S G G F F L L E E E E D D E E L L K K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 875 [M+2H]2+ % Relative Abundance 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z

  24. Interpretation of Mass Spectra 88 145 292 405 534 663 778 907 1020 1166 b ions S S G G F F L L E E E E D D E E L L K K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 875 [M+2H]2+ % Relative Abundance 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z

  25. Interpretation of Mass Spectra 88 145 292 405 534 663 778 907 1020 1166 b ions S S G G F F L L E E E E D D E E L L K K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 875 113 [M+2H]2+ 113 % Relative Abundance 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z

  26. Interpretation of Mass Spectra 88 145 292 405 534 663 778 907 1020 1166 b ions S S G G F F L L E E E E D D E E L L K K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 129 875 [M+2H]2+ % Relative Abundance 129 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z

  27. Interpretation of Mass Spectra 88 145 292 405 534 663 778 907 1020 1166 b ions S S G G F F L L E E E E D D E E L L K K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 875 [M+2H]2+ % Relative Abundance 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z

  28. Interpretation of Mass Spectra 88 145 292 405 534 663 778 907 1020 1166 b ions S S G G F F L L E E E E D D E E L L K K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 875 [M+2H]2+ % Relative Abundance 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z

  29. Interpretation of Mass Spectra 88 145 292 405 534 663 778 907 1020 1166 b ions S S G G F F L L E E E E D D E E L L K K 1166 1080 1022 875 762 633 504 389 260 147 y ions 762 100 875 [M+2H]2+ % Relative Abundance 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z

  30. De Novo Sequencing Amino acid masses 762 100 875 [M+2H]2+ % Relative Abundance 633 292 405 534 1022 260 389 504 907 1020 663 778 1080 0 250 500 750 1000 m/z Mass Differences Sequences consistent with spectrum

  31. Significance Testing False protein identification is caused by random matching An objective criterion for testing the significance of protein identification results is necessary. The significance of protein identifications can be tested once the distribution of scores for false results is known.

  32. Protein Quantitation by Mass Spectrometry Sample i C ij Protein j Lysis Peptide k Fractionation Digestion MS I LC - MS ik

  33. Protein Quantitation by Mass Spectrometry

  34. Protein Quantitation by Mass Spectrometry

  35. Protein Quantitation by Mass Spectrometry

  36. Protein Quantitation by Mass Spectrometry Light Heavy Lysis Assumption: All losses after mixing are identical for the heavy and light isotopes and Fractionation Digestion Sample i Protein j Peptide k LC-MS MS H L Oda et al. PNAS 96 (1999) 6591 Ong et al. MCP 1 (2002) 376

  37. Protein Quantitation LC-MS Targeted MS Shotgun proteomics 1. Records M/Z 1. Select precursor ion Digestion MS Fractionation 2. Selects peptides based on abundance and fragments 2. Precursor fragmentation Lysis MS/MS MS/MS 3. Use Precursor-Fragment pairs for identification 3. Protein database search for peptide identification MS Uses predefined set of peptides Data Dependent Acquisition (DDA)

  38. Proteogenomics Samples Peptides MS/MS Protein DB Compare, score, test significance Identified peptides and proteins

  39. Proteogenomics Next-generation sequencing of the genome and transcriptome Samples Peptides MS/MS Sample-specific Protein DB Compare, score, test significance Identified peptides and proteins

  40. Proteogenomics Non-Tumor Sample Genome sequencing Identify germline variants Identify alternative splicing, somatic variants and novel expression Genome sequencing RNA-Seq Tumor Sample Tumor Specific Protein DB Alt. Splicing Novel Expression Exon 1 Exon 2 Exon X Exon 1 Exon 3 Exon 2 Reference Human Database (Ensembl) Variants Fusion Genes TCGAGAGCTG TCGAGAGCTG TCGAGAGCTG TCGAGAGCTG TCGAGAGCTG TCGATAGCTG Gene Y Exon 2 Gene Y Exon 1 Gene X Exon 2 Gene X Exon 1 Exon 1 Gene Y Gene X

  41. Proteogenomics ERBB2 Breast Cancer Breast

  42. Proteogenomics ERBB2 Breast Cancer Breast Ovarian Cancer

  43. Posttranslational Modifications Peptide with two possible modification sites Matching MS/MS spectrum Intensity m/z Which assignment does the data support? 1,1or2, or 1and2?

  44. Protein Interactions E F A D A C B Digestion Mass spectrometry Identification

  45. Data Analysis - Normalization Normalized: mean=0, std=1 Raw Data

  46. Data Analysis - Normalization Normalized 3 replicates Normalized 3 replicates + one more replicate a few months later

  47. Data Analysis

  48. Molecular Markers A molecular signature is a computational or mathematical model that links high-dimensional molecular information to phenotype or other response variable of interest. FDA calls them “in vitro diagnostic multivariate assays”

More Related