1 / 79

Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry

Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University. Bioinformatics I . 5’. 3’. DNA. Transcription. Splicing. mRNA. Translation. Poly-peptide. Folding. Protein. Transport / Localization Oligomerization

chiku
Download Presentation

Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics for Proteomics Shu-Hui Chen (陳淑慧) Department of Chemistry National Cheng Kung University

  2. Bioinformatics I 5’ 3’ DNA Transcription Splicing mRNA Translation Poly-peptide Folding Protein • Transport / Localization • Oligomerization • PTM (Post-Translational Modification) Function Function How do we find protein coding regions, introns and exons in genomic DNA sequences?

  3. What is Proteomics ? Systematic analysis of All protein sequences All protein expression pattern All protein interactions This involves Protein isolation Protein separation Protein identification Functional characterization of all proteins

  4. The tools of Proteomics Traditional protein chemistry assay methods struggle to establish Identity Identity requires: Specificity of measurement (Precision) Mass Spectrometry MS-based data acquisition algorithm A reference for comparison Protein sequence databases Search algorithms

  5. MS-based Proteomics and Bioinformatics • MS instrument is so far not sensitive enough to resolve proteins in a biological system solely based on signals measured. • MS, however, is able to acquire sufficient data for mapping a protein from the database using new computer algorithms to analyze the data. • This is the field of bioinformatics

  6. Instrumentation Sample inlet vacuum Ion source Mass analyzer Data acquisition

  7. “Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.

  8. MS-based Protein Identification  Mass Mapping Peptide Sequencing

  9. Conventional Methodology- Expression Proteomics

  10. Ion intensity m/z -NH-CH(R1)-CO-NH-CH(R2)-CO- trypsin -NH-CH(R1)-COOH H2N-CH(R2)-CO- Trypsin Digestion We know that trypsin cleaves polypeptides C-terminal to basic amino acids.

  11. Mass Spectrometry Protein identified by database mapping

  12. Automated Database Search Number 1 match: tumor necrosis factor type 1 receptor associated protein TRAP-1 (Mr): 76030.27 Total coverage: 33.4%

  13. Bioinformatics I Minimal content of a « protein sequence » db • Sequences !! • Accession number (AC) • Taxonomic data • References • ANNOTATION/CURATION • Keywords • Cross-references • Documentation

  14. Bioinformatics I SWISS-PROT/TrEMBL • Collaboration between the SIB (CH) and EMBL/EBI (UK) • SWISS-PROT: Fully annotated (manually), non-redundant, cross-referenced, documented protein sequence database. • TrEMBL: is automatically generated (from annotated EMBL coding sequences (CDS)) and annotated using software tools. http://www.expasy.org/sprot/

  15. ExPASy Web Server ExPASy = Expert Protein Analysis System

  16. History for MS Searching MOWSE 1993 By Pappin and Bleasby SEQUEST 1994 By Yates and Eng MOWSEⅡ 1996 Molecular Weight Search 1997 MOWSEⅢ 1998 MASCOT By Matrix science

  17. Scoring algorithm Final score= -10*LOG(P), where P is absolute probability that the observed match is a random event E value (expected value) = describes the number of hits one can expect to see by chance when searching a database of a particular size. A value of zero indicates that no matches would be expected by chance. Significant hits at 95% confidence level (p<0.05) there is less than a 1 in 20 chance that the observed match is a random event. Increase mass tolerance 7 5

  18. MS-based Protein Identification Mass Mapping  Peptide Sequencing

  19. Tandem Mass Spectrometry- MS/MS MS/MS acquisition is controlled by software setting

  20. Protein Identification • Peptide Sequencing using MSMS peptide ABCDEF A BCDEF CID AB CDEF precursor ion ABC DEF ABCD EF ABCDE F ABC AB ABCDE A ABCD A B C D E m/z

  21. Nomenclature used for CID peptide fragmentation- Low Energy (eV)- Q, TOF, FT “Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.

  22. Protein Identification by Database Search

  23. Ion intensity m/z -NH-CH(R1)-CO-NH-CH(R2)-CO- trypsin -NH-CH(R1)-COOH H2N-CH(R2)-CO- Trypsin Digestion We know that trypsin cleaves polypeptides C-terminal to basic amino acids.

  24. Sequence Tag Approach for Peptide Sequencing “Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.

  25. The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

  26. Bioinformatics I NCBI BLAST http://www.ncbi.nlm.nih.gov/blast/ BLAST: Basic Local Alignment Search Tool

  27. Bioinformatics I 1: MY-TAIL--ORIS-RICH- ¦x ¦¦¦¦ x¦x¦ ¦¦¦¦ 2: MONTAILLEURESTRICHE Global Alignment 1: TAILO RICH ¦¦¦¦x ¦¦¦¦ 2: TAILL RICHE Two Local Alignments ¦ = Identity x = Mismatch - = Insertion / Deletion Sequence alignments and comparison 1: MYTAILORISRICH 2: MONTAILLEURESTRICHE

  28. Bioinformatics I HBA_CHICK VL-SAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHF-DL 48 HBAD_CHICK ML-TAEDKKLIQQAWEKAASHQEEFGAEALTRMFTTYPQTKTYFPHF-DL 48 HBPI_CHICK AL-TQAEKAAVTTIWAKVATQIESIGLESLERLFASYPQTKTYFPHF-DV 48 HBB_CHICK VHWTAEEKQLITGLWGKV--NVAECGAEALARLLIVYPWTQRFFASFGNL 48 HBE_CHICK VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFASFGNL 48 HBRH_CHICK VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFDNFGNL 48 MYG_CHICK GL-SDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGL 49 .... . ..* . .. * * * *.. .* * * * .. HBA_CHICK SH-----GSAQIKGHGKKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRV 93 HBAD_CHICK SP-----GSDQVRGHGKKVLGALGNAVKNVDNLSQAMAELSNLHAYNLRV 93 HBPI_CHICK SQ-----GSVQLRGHGSKVLNAIGEAVKNIDDIRGALAKLSELHAYILRV 93 HBB_CHICK SSPTAILGNPMVRAHGKKVLTSFGDAVKNLDNIKNTFSQLSELHCDKLHV 98 HBE_CHICK SSPTAIMGNPRVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCDKLHV 98 HBRH_CHICK SSPTAIIGNPKVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCEKLHV 98 MYG_CHICK KTPDQMKGSEDLKKHGATVLTQLGKILKQKGNHESELKPLAQTHATKHKI 99 . *. .. ** .*.. . . .. .. . *.. * .. HBA_CHICK DPVNFKLLGQCFLVVVAIHHPAALTPEVHASLDKFLCAVGTVLTAKYR-- 141 HBAD_CHICK DPVNFKLLSQCIQVVLAVHMGKDYTPEVHAAFDKFLSAVSAVLAEKYR-- 141 HBPI_CHICK DPVNFKLLSHCILCSVAARYPSDFTPEVHAEWDKFLSSISSVLTEKYR-- 141 HBB_CHICK DPENFRLLGDILIIVLAAHFSKDFTPECQAAWQKLVRVVAHALARKYH-- 146 HBE_CHICK DPENFRLLGDILIIVLASHFARDFTPACQFAWQKLVNVVAHALARKYH-- 146 HBRH_CHICK DPENFRLLGNILIIVLAAHFTKDFTPTCQAVWQKLVSVVAHALAYKYH-- 146 MYG_CHICK PVKYLEFISEVIIKVIAEKHAADFGADSQAAMKKALELFRNDMASKYKEF 149 . .... . .* . . ... . .* . .. **. HBA_CHICK ---- 141 HBAD_CHICK ---- 141 HBPI_CHICK ---- 141 HBB_CHICK ---- 146 HBE_CHICK ---- 146 HBRH_CHICK ---- 146 MYG_CHICK GFQG 153 Consensus length: 154; Identity : 19 ( 12.3%); Similarity: 51 ( 33.1%) Character to show that a position in the alignment is perfectly conserved: '*' Character to show that a position is well conserved: '.' Multiple Sequence Alignment (MSA) • Programs: • CLUSTALW • T_COFFEE • MULTALIGN

  29. Searching databases with multiple alignments PSI-BLAST: Position-Specific Iterative BLAST (Altschul et al., 1997) • Starting with a single sequence, PSI-BLAST searches a database • using BLAST and builds a multiple sequence alignment and a profile. • The profile is then used to search the protein database again. • Running the program several times can further refine the profile • and increase search sensitivity.

More Related