1 / 48

Protein Analysis Tools 2 nd April, 2012

Protein Analysis Tools 2 nd April, 2012. Ansuman Chattopadhyay, PhD, Head Molecular Biology Information Service Health Sciences Library System University of Pittsburgh ansuman@pitt.edu http://www.hsls.pitt.edu/guides/genetics. What we’ll do:. Brief overview of CLC Main Workbench

brody
Download Presentation

Protein Analysis Tools 2 nd April, 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Analysis Tools2nd April, 2012 Ansuman Chattopadhyay, PhD, Head Molecular Biology Information Service Health Sciences Library System University of Pittsburgh ansuman@pitt.edu http://www.hsls.pitt.edu/guides/genetics

  2. What we’ll do: • Brief overview of CLC Main Workbench • find genomic context of a protein sequence • search for the presence of conserved domains • create a  multiple sequence alignment plot

  3. What we’ll do: • analyze primary structure such as, hydrophobicity, hydrophylicity, antigenicity, repeat sequence detection etc. • predict secondary structure • predict post translational modification such as, • Phosphorylation, glycosylation, …. • search for interacting partners • predict domain driven protein-protein interactions

  4. Workshop Resources http://www.hsls.pitt.edu/molbio/tutorials

  5. HSLS MolBio Videos

  6. Sequence Analysis Software Suits • Wisconsin GCG • VectorNTI • DNA STAR-LaserGene • Geneious • CLC Main

  7. Why CLC Main ? • Windows • Mac • Linux • DNA, RNA, Protein, • Microarray Data Analysis • Regular Update • HSLS Licensed

  8. CLC Main Access • HSLS CLC Main Registration • Link: http://www.hsls.pitt.edu/molbio/clcmain • Access via Pitt - Network Connect • Instruction video: http://goo.gl/JNjMt

  9. CLC Main Workbench Overview • Graphical Users Interface • Protein sequences Import • Sequence Navigation

  10. CLC Main Graphical User Interface (GUI)

  11. CLC Main

  12. Navigate a protein sequence

  13. Videos • CLC Main –getting started (basic navigation steps): http://media.hsls.pitt.edu/media/molbiovideos/clc-navigation-ac0312.swf • CLC Main Workbench Walkthrough (Part1): http://media.hsls.pitt.edu/media/molbiovideos/clcmain-walkthrough-part1-ac0112.swf • CLC Main Workbench Walkthrough (Part2): http://media.hsls.pitt.edu/media/molbiovideos/clcmain-walkthrough-part2-ac0112.swf

  14. Import a Protein Sequence

  15. Protein Sequence • Human PLCg1 • Refseq no: NP_002651 • Uniprot Accession Number: P19174 • FASTA file • Raw sequence CLC features: Search, Import, Create new sequence

  16. Videos • Import a DNA /Protein sequence into CLC Main (Part1):http://media.hsls.pitt.edu/media/molbiovideos/clc-import-part1-ac0112.swf • Import a DNA /Protein sequence into CLC Main (Part 2):http://media.hsls.pitt.edu/media/molbiovideos/clc-import-part2-ac0112.swf

  17. CLC protein sequence

  18. Protein sequence manipulation • Create a new protein with PLCg1 SH2-SH2-SH3 domains

  19. Sequence Alignment • Pair-wise Alignment • Global • Local • Multiple Sequence Alignment

  20. Sequence Alignment

  21. Pair-wise Sequence Alignment

  22. Multiple Sequence Alignment

  23. Multiple Sequence Alignment • Tools: ClustalW and T-coffee

  24. PLCg1 Orthologous sequences • PLCg1: • Mouse: NP_067255 • Rat: NP_037319 • Cow: NP_776850 • Dog: XP_542998 • Zebra fish: NP_919388 • Human: NP_002651 • NP_067255,NP_037319,NP_776850,XP_542998,NP_919388,NP_002651

  25. Videos • Create a multiple sequence alignment plot using CLC(part1): http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212 part1.swf • Create a multiple sequence alignment plot using CLC (part2): http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212-part2.swf • Create a multiple sequence alignment plot: http://media.hsls.pitt.edu/media/clres2705/msa.swf • Compare two peptide sequences.: http://media.hsls.pitt.edu/media/clres2705/blast2.swf

  26. Starting with a short peptide sequence find: • the whole protein sequence • orthologs in other species (nematode) Tool: UCSC BLAT NCBI BLAST against SwissProt

  27. Peptide to whole protein • Peptide seq: SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR

  28. Videos • Place a mRNA or peptide sequence into the human genome (BLAT): http://www.hsls.pitt.edu/molbio/videos/play?v=12e • Find homologous sequences: http://media.hsls.pitt.edu/media/clres2705/blast.swf

  29. Find homologous sequence SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPR

  30. Sequence Manipulation & Format Conversion • Sequence Manipulation Suite • http://bioinformatics.org/sms2/ • Readseq • http://thr.cit.nih.gov/molbio/readseq/ GenePept FASTA

  31. Hands-On • Retrieve amino acid sequence present between position 25 to 45 in Sequence A (MS Word Doc) • Identify the rat gene which encodes this peptide fragment and retrieve its whole protein sequence • Find the fruit fly homolog of this protein. • What % identity the fruit fly protein shares with its rat homolog? • Predict potential MAPKphosphorylation sites present in the fruit fly protein

  32. Protein Domain Search: InterPro Scan • InterPro is a database of protein families, domains, regions, repeats and sites in which identifiable features found in known proteins can be applied to new protein sequences. >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

  33. Videos: • Find protein domains, PTM, secondary str etc: http://media.hsls.pitt.edu/media/clres2705/uniprot.swf • Start with a protein pattern and find what proteins posses that domain: http://media.hsls.pitt.edu/media/clres2705/scanprosite.swf • Search for protein domains,repeats and sites: http://media.hsls.pitt.edu/media/clres2705/interpro.swf

  34. Protein Domain Search: ScanProsite >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

  35. Pattern Search • [AC]-x-V-x(4)-{ED}: • This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp} • F-[GSTV]-P-R-L-[G>]

  36. Pattern Search

  37. Protein Primary Structure Analysis • Tool: ExPASy from SIB • Calculated Mol Wt • Theoritical PI • Extinction coefficients • Estimated half-life • Hydropathicity plot : Kyte & Doolittle • Hydrophilicity plot: Hopp T.P., Woods K.R

  38. Antigenic Site Prediction • Tool: Emboss Antigenic >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

  39. EmBoss Antigenic • Antigenic predicts potentially antigenic regions of a protein sequence, using the method of Kolaskar and Tongaonkar.Analysis of data from experimentally determined antigenic sites on proteins has revealed that the hydrophobic residues Cys, Leu and Val, if they occur on the surface of a protein, are more likely to be a part of antigenic sites. A semi-empirical method which makes use of physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes was developed by Kolaskar and Tongaonkar to predict antigenic determinants on proteins. Application of this method to a large number of proteins has shown that their method can predict antigenic determinants with about 75% accuracy which is better than most of the known methods. This method is based on a single parameter and thus very simple to use.

  40. Transmembrane Region prediction

  41. Transmembrane Site Prediction >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK Tool: TMHMM Server

  42. Protein Secondary Structure >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

  43. Protein-Protein Interactions Prediction >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK Tool: STRING

  44. Hands-on • Take the human BCL2 protein sequence and • Find its domain architecture • Predict the topology of its transmembrane region • Design suitable antigenic site for antibody generation • What is its calculated Mol Wt and Ext Coefficient? • Predict its secondary structure • What % of this protein possesses alpha helical structure? • Predict its potential interacting partners

  45. Hands-on • Prediction of potential phosphorylation sites present in a protein sequence. • Sequence: human BCL2 • >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

  46. Phosphorylation Site Prediction: Tool: NetPhos >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

  47. Phosphorylation Site Prediction: Tool: GPS >gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK

  48. Thank you!Any questions? Carrie Iwema Ansuman Chattopadhyay iwema@pitt.eduansuman@pitt.edu 412-383-6887 412-648-1297 http://www.hsls.pitt.edu/guides/genetics

More Related