1 / 85

Proteogenomics

Proteogenomics. Protein Identification by Mass Spectrometry. Samples. Peptides. MS/MS. Protein DB. Compare, score, test significance. Identified peptides and proteins. Tumor Specific Databases. Next-generation sequencing of the genome and transcriptome. Samples. Peptides. MS/MS.

mandell
Download Presentation

Proteogenomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proteogenomics

  2. Protein Identification by Mass Spectrometry Samples Peptides MS/MS Protein DB Compare, score, test significance Identified peptides and proteins

  3. Tumor Specific Databases Next-generation sequencing of the genome and transcriptome Samples Peptides MS/MS Sample-specific Protein DB Compare, score, test significance Identified peptides and proteins

  4. Sequencing by Synthesis Illumina Sequencing by synthesis

  5. Genomics Data Analysis Images Intensities Reads Alignments

  6. RNA-Seq Data Analysis Paired-end short reads De Novo Assembly Alignment to genome Transcript X Reference genome Exon 1 Exon 2

  7. Example: RNA-Seq Data chr4

  8. Example: RNA-Seq Data

  9. Bias

  10. RNA-Seq coverage of genes: 5’ or 3’ Bias

  11. Mutations

  12. Alternative splicing

  13. Novel Expression

  14. Tumor Specific Databases Ruggles KV et al., MCP 2015

  15. Example of variant peptide Protein: NP_001138550 zinc finger protein 805 isoform 2 [Homo sapiens] Genome location: chr19:57764586+ 1485 0 DNA Variant: G183A Protein Variant: V62I MQGERLRPGLDSQKEKLPGKMSPKHDGLGTADSVCSRIIQDRVSLGDDVHDCDSHGSGKNPVIQEEENIFKCNECEKVFNKKRLLARHERIHSGVKPYECTECGKTFSKSTYLLQHHMVHTGEKPYKCMECGKAFNRKSHLTQHQRIHSGEKPYKCSECGKAFTHRSTFVLHNRSHTGEKPFVCKECGKAFRDRPGFIRHYIIHSGENPYECFECGKVFKHRSYLMWHQQTHTGEKPYECSECGKAFCESAALIHHYVIHTGEKPFECLECGKAFNHRSYLKRHQRIHTGEKPYVCSECGKAFTHCSTFILHKRAHTGEKPFECKECGKAFSNRADLIRHFSIHTGEKPYECMECGKAFNRRSGLTRHQRIHSGEKPYECIECGKTFCWSTNLIRHSIIHTGEKPYECSECGKAFSRSSSLTQHQRMHTGRNPISVTDVGRPFTSGQTSVNIQELLLGKNFLNVTTEENLLQEEASYMASDRTYQRETPQVSSL

  16. Example of variant peptide Protein: NP_001138550 zinc finger protein 805 isoform 2 [Homo sapiens] Genome location: chr19:57764586+ 1485 0 DNA Variant: G183A Protein Variant: V62I MQGERLRPGLDSQKEKLPGKMSPKHDGLGTADSVCSRIIQDRVSLGDDVHDCDSHGSGKNPVIQEEENIFKCNECEKVFNKKRLLARHERIHSGVKPYECTECGKTFSKSTYLLQHHMVHTGEKPYKCMECGKAFNRKSHLTQHQRIHSGEKPYKCSECGKAFTHRSTFVLHNRSHTGEKPFVCKECGKAFRDRPGFIRHYIIHSGENPYECFECGKVFKHRSYLMWHQQTHTGEKPYECSECGKAFCESAALIHHYVIHTGEKPFECLECGKAFNHRSYLKRHQRIHTGEKPYVCSECGKAFTHCSTFILHKRAHTGEKPFECKECGKAFSNRADLIRHFSIHTGEKPYECMECGKAFNRRSGLTRHQRIHSGEKPYECIECGKTFCWSTNLIRHSIIHTGEKPYECSECGKAFSRSSSLTQHQRMHTGRNPISVTDVGRPFTSGQTSVNIQELLLGKNFLNVTTEENLLQEEASYMASDRTYQRETPQVSSL NPIIQEEENIFK ____________

  17. Example of introduced stop codon Protein: NP_003499 frizzled-9 precursor [Homo sapiens] Genome location: chr7:72848337+ 1776 0 DNA Variant: C155A Protein Variant: Y52* MAVAPLRGALLLWQLLAAGGAALEIGRFDPERGRGAAPCQAVEIPMCRGIGYNLTRMPNLLGHTSQGEAAAELAEFAPLVQYGCHSHLRFFLCSLYAPMCTDQVSTPIPACRPMCEQARLRCAPIMEQFNFGWPDSLDCARLPTRNDPHALCMEAPENATAGPAEPHKGLGMLPVAPRPARPPGDLGPGAGGSGTCENPEKFQYVEKSRSCAPRCGPGVEVFWSRRDKDFALVWMAVWSALCFFSTAFTVLTFLLEPHRFQYPERPIIFLSMCYNVYSLAFLIRAVAGAQSVACDQEAGALYVIQEGLENTGCTLVFLLLYYFGMASSLWWVVLTLTWFLAAGKKWGHEAIEAHGSYFHMAAWGLPALKTIVILTLRKVAGDELTGLCYVASTDAAALTGFVLVPLSGYLVLGSSFLLTGFVALFHIRKIMKTGGTNTEKLEKLMVKIGVFSILYTVPATCVIVCYVYERLNMDFWRLRATEQPCAAAAGPGGRRDCSLPGGSVPTVAVFMLKIFMSLVVGITSGVWVWSSKTFQTWQSLCYRKIAAGRARAKACRAPGSYGRGTHCHYKAPTVVLHMTKTDPSLENPTHL

  18. Example of introduced stop codon Protein: NP_003499 frizzled-9 precursor [Homo sapiens] Genome location: chr7:72848337+ 1776 0 DNA Variant: C155A Protein Variant: Y52* MAVAPLRGALLLWQLLAAGGAALEIGRFDPERGRGAAPCQAVEIPMCRGIGYNLTRMPNLLGHTSQGEAAAELAEFAPLVQYGCHSHLRFFLCSLYAPMCTDQVSTPIPACRPMCEQARLRCAPIMEQFNFGWPDSLDCARLPTRNDPHALCMEAPENATAGPAEPHKGLGMLPVAPRPARPPGDLGPGAGGSGTCENPEKFQYVEKSRSCAPRCGPGVEVFWSRRDKDFALVWMAVWSALCFFSTAFTVLTFLLEPHRFQYPERPIIFLSMCYNVYSLAFLIRAVAGAQSVACDQEAGALYVIQEGLENTGCTLVFLLLYYFGMASSLWWVVLTLTWFLAAGKKWGHEAIEAHGSYFHMAAWGLPALKTIVILTLRKVAGDELTGLCYVASTDAAALTGFVLVPLSGYLVLGSSFLLTGFVALFHIRKIMKTGGTNTEKLEKLMVKIGVFSILYTVPATCVIVCYVYERLNMDFWRLRATEQPCAAAAGPGGRRDCSLPGGSVPTVAVFMLKIFMSLVVGITSGVWVWSSKTFQTWQSLCYRKIAAGRARAKACRAPGSYGRGTHCHYKAPTVVLHMTKTDPSLENPTHL _____ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ ________________________________________________________ _______________________________

  19. Example of removed stop codon Protein: NP_899231 serine protease 48 precursor [Homo sapiens]. Genome location: chr4:152198324+ 52,163,266,170,390 0,2623,4975,5944,13945 DNA Variant: T984G Protein Variant: *329E MGPAGCAFTLLLLLGISVCGQPVYSSRVVGGQDAAAGRWPWQVSLHFDHNFIYGGSLVSERLILTAAHCIQPTWTTFSYTVWLGSITVGDSRKRVKYYVSKIVIHPKYQDTTADVALLKLSSQVTFTSAILPICLPSVTKQLAIPPFCWVTGWGKVKESSDRDYHSALQEAEVPIIDRQACEQLYNPIGIFLPALEPVIKEDKICAGDTQNMKDSCKGDSGGPLSCHIDGVWIQTGVVSWGLECGKSLPGVYTNVIYYQKWINATISRANNLDFSDFLFPIVLLSLALLCPSCAFGPNTIHRVGTVAEAVACIQGWEENAWRFSPRGRELTGEPLLTLGDFIYNLK Protein: NP_899231 serine protease 48 precursor [Homo sapiens]. Genome location: chr4:152198324+ 52,163,266,170,390 0,2623,4975,5944,13945 DNA Variant: T984G Protein Variant: *329E MGPAGCAFTLLLLLGISVCGQPVYSSRVVGGQDAAAGRWPWQVSLHFDHNFIYGGSLVSERLILTAAHCIQPTWTTFSYTVWLGSITVGDSRKRVKYYVSKIVIHPKYQDTTADVALLKLSSQVTFTSAILPICLPSVTKQLAIPPFCWVTGWGKVKESSDRDYHSALQEAEVPIIDRQACEQLYNPIGIFLPALEPVIKEDKICAGDTQNMKDSCKGDSGGPLSCHIDGVWIQTGVVSWGLECGKSLPGVYTNVIYYQKWINATISRANNLDFSDFLFPIVLLSLALLCPSCAFGPNTIHRVGTVAEAVACIQGWEENAWRFSPRGR

  20. Examples of novel peptides

  21. Examples of novel peptides

  22. Examples of novel peptides

  23. Examples of novel peptides

  24. Tumor Specific Databases Ruggles KV et al., MCP 2015

  25. Predicted and observed SNV peptides in two breast PDX’s Ruggles KV et al., Mol Cell Proteomics 15 (2016) 1060-71

  26. Predicted and observed junction peptides in two breast PDX’s Ruggles KV et al., Mol Cell Proteomics 15 (2016) 1060-71

  27. Variant peptides in 105 Breast Tumors • TP53: Tumor suppressor • 273 Arg Cys (rs121913343) • AAs 273-280 involved in DNA interaction • Somatic in 3 tumors • KRAS: Cell proliferation regulating GTPase • 12 GlyVal • Variant shown to cause constitutive activation • Somatic in 2 tumors • MYO1C: Unconventional myosin IC • 826 GlnArg (rs9905106) • Somatic in 1 tumor, germline in 83 Mertins P et al., Nature 2016

  28. Variant peptides in 105 Breast Tumors Mertins P et al., Nature 2016

  29. Quality of variant identifications

  30. Quality of variant identifications

  31. Quality of variant identifications

  32. Effects of Sequence Variation on the Proteome Protein sequence changes A modification site is changed Protein sequence does not change but the protein level increases or decreases

  33. Effects of Sequence Variation on the Proteome Protein sequence changes A modification site is changed Protein sequence does not change but the protein level increases or decreases How do we utilize cancer-specific variants?

  34. Antibodies …… … … D1 V1 V2 Vn Dn J1 J2 Jn VDJ Recombination Variable heavy- chain domain CDR1 CDR1 CDR2 CDR2 CDR3 CDR3 (Fingerprint) • Somatic hypermutation

  35. HIV Antibodies J.F. Scheid et al, “Sequence and structural convergence of broad and potent HIV antibodies that mimic CD4 binding”, Science, 333 (2011) 1633-1637

  36. Antibodies A Functional IgG Requires Paired Light and Heavy Chains VL VH CL CH1 = Light CH2 CH3 Heavy Standard IgG

  37. Single-Chain Llama Antibodies

  38. Single-Chain Llama Antibodies • Atypical single-chain IgG antibody produced in camelid family (e.g. llama) • Retain high affinity for antigen without light chain • Antigen binding domain can be cloned and expressed to make “Nanobodies”: • - Extremely Cheap & Unlimited Amounts • - Tiny (~15 kDa) , Fold well & Stable in Solution • - Easily Engineered for Special Needs VHH Nanobody CH2 CH3 Single-chain IgG Standard IgG

  39. New MS-based Nanobody Discovery

  40. New MS-based Nanobody Discovery

  41. DNA Library Construction Trim Read 1: 301 bp Overlap: ~200 bp Read 2: 301 bp Trim Read 2 Quality Read 1 Quality 1 5 1 5 30-34 10-14 50-59 10-14 30-34 50-59 250-299 150-199 150-199 250-299

  42. DNA Library Construction Trim Read 1: 301 bp Overlap: ~200 bp Read 2: 301 bp Trim Merging of reads Merged read length Merged read quality 1 5 10-14 30-34 50-59 150-199 250-299

  43. Identifying Peptides

  44. Identifying full-length sequences from peptides Nanobody PrimarySequences with CDR Regions Annotated Identified Peptides Mapping Annotated Nanobody Sequences with MS coverage • CDR regions are identified based on approximate position in the sequence and the presence of specific leading and trailing amino acids. • Nanobody sequences ranked based on: MS coverage and length of individual CDR regions with CDR3 carrying highest weight; overall coverage including scaffold region; HT-Seq counts. • Nanobody sequences grouped by CDR3. One sequence is assigned to a group where its hamming distance to an existing member is 1. Ranking Ranked Nanobody Lists Grouping Ranked Nanobody Groups

  45. Identifying full-length sequences from peptides

  46. Nanobody Production Scheme Sequence of Discovered Nanobody Candidates Gene synthesis & Codon optimization Expression Vector Cloning MAQVQLVESGGGLVQAGGSLRLSCVASGRTFSGYAMGWFRQTPGREREAVAAITWSAHSTYYSDSVKDRFTISIDNTRNTGYLQMNSLKPEDTAVYYCTVRHGTWFTTSRYWTDWGQGTQVTVS ~ $100 / sequence Transformation E.coliExpression One-Step Purification ~ 2 mg / 1 L

  47. Using Anti-GFP Nanobodies GFP Homemade Nanobody

  48. Creating Super-high-affinity Reagent Against GFP GFP: Clone A Clone B KD = 0.7 nM Overlay KD = 16 nM GFP Nano Nano Super-high-affinity KD = 0.03 nM

  49. Central Dogma of Molecular Biology Transcription Replication Translation Modification P

  50. Central Dogma of Molecular Biology Transcription Replication Translation Modification Functional Gene Products P

More Related