1 / 35

Sequence Based Analysis Tutorial

Sequence Based Analysis Tutorial. NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center. Retrieval, Sequence Search & Classification Methods. Retrieve protein info by text / UID Sequence Similarity Search

homer
Download Presentation

Sequence Based Analysis Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Based AnalysisTutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center

  2. Retrieval, Sequence Search & Classification Methods • Retrieve protein info by text / UID • Sequence Similarity Search • BLAST, FASTA, Dynamic Programming • Family Classification • Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural Networks • Integrated Search and Classification System

  3. Sequence Similarity Search (I) • Based on Pair-Wise Comparisons • Dynamic Programming Algorithms • Global Similarity: Needleman-Wunch • Local Similarity: Smith-Waterman • Heuristic Algorithms • FASTA: Based on K-Tuples (2-Amino Acid) • BLAST: Triples of Conserved Amino Acids • Gapped-BLAST: Allow Gaps in Segment Pairs • PHI-BLAST: Pattern-Hit Initiated Search • PSI-BLAST: Position-Specific Iterated Search

  4. Sequence Similarity Search (II) • Similarity Search Parameters • Scoring Matrices – Based on Conserved Amino Acid Substitution • Dayhoff Mutation Matrix, e.g., PAM250 (~20% Identity) • Henikoff Matrix from Ungapped Alignments, e.g., BLOSUM 62 • Gap Penalty • Search Time Comparisons • Smith-Waterman: 10 Min • FASTA: 2 Min • BLAST: 20 Sec

  5. Feature Representation • Features of Amino Acids: Physicochemical Properties, Context (Local & Global) Features, Evolutionary Features • Alternative Amino Acids: Classification of Amino Acids To Capture Different Features of Amino Acid Residues

  6. Substitution Matrix • Likelihood of One Amino Acid Mutated into Another Over Evolutionary Time • Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7) • Positive Score: Conservative Substitution (e.g., Lys/Arg, +3) • High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys)

  7. Secondary Structure Features • a Helix Patterns of Hydrophobic Residue Conservation Showing I, I+3, I+4, I+7 Pattern Are Highly Indicative of an a Helix (Amphipathic) • b Strands That Are Half Buried in the Protein Core Will Tend to Have Hydrophobic Residues at Positions I, I+2, I+4, I+6

  8. BLAST BLAST (Basic Local Alignment Search Tool) • Extremely fast • Robust • Most frequently used It finds very short segment pairs (“seeds”) between the query and the database sequence These seeds are then extended in both directions until the maximum possible score for extensions of this particular seed is reached

  9. BLAST Search • From BLAST Search Interface • Table-Format Result with BLAST Output and SSEARCH (Smith-Waterman) Pair-Wise Alignment Link to NCBI taxonomy Link to PIRSF report Click to see alignment Links to iProClass and UniProtKB reports Click to see SSearch alignment

  10. Blast Result & Pairwise Alignment BLAST Aligment

  11. How do you build a tree? • Pick sequences to align • Align them • Verify the alignment • Keep the parts that are aligned correctly • Build and evaluate a phylogenetic tree • Integrated Analysis

  12. Unrooted Neighbor-Joining Tree Branch length drawn to scale Rooted NJ Tree (guide tree) Root place at a position where the means of the branch lengths on either side of the root are equal Progressive Alignment guided by the tree Alignment starts from the tips of the tree towards the root Multiple Sequence Alignment: CLUSTALW Pairwise alignment: Calculate distance matrix Mean number of differences per residue Thompson et al., NAR 22, 4675 (1994).

  13. PIR Multiple Alignment and Tree • From Text/Sequence Search Result or CLUSTAL W Alignment Interface

  14. Alignment of a region involved in catalytic activity Create Pattern and search in database: P-[IV]-[WY]-x(3)-H-[MR]-V-x(3,4)-Q-x(1,2)-D-x(4,5)-G-A-N PIR Pattern Search • Signature Patterns for Functional Motifs • From Text/Sequence Search Result or Pattern Search Interface A P-[IV]-[WY]-x(3)-H-[MR]-V-x(3,4)-Q-x(1,2)-D-x(4,5)-G-A-N B Test sequence against PROSITE database O05689

  15. Pattern Search Result (I) • One Query Pattern Against UniProtKB or UniRef100 DBs Display the query pattern Indicate pattern sequence region(s) Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report

  16. Pattern Search Result (II) • One Query Sequence Against PROSITE Pattern Database

  17. Profile Method • Profile: A Table of Scores to Express Family Consensus Derived from Multiple Sequence Alignments • Num of Rows = Num of Aligned Positions • Each row contains a score for the alignment with each possible residue. • Profile Searching • Summation of Scores for Each Amino Acid Residue along Query Sequence • Higher Match Values at Conserved Positions

  18. Prosite PS50157 profile for Zinc finger C2H2

  19. 1 PIRSF scan Shows PIRSF that the query belongs to • Search One Query Protein Against all the Full-length and Domain HMM models for the fully curated PIRSFs by HMMER • The matched regions and statistics will be displayed. Statistical data for all domains Statistical data per domain Alignment with consensus sequence

  20. Lab Section

  21. Rat eye lens phosphoproteomics in normal and cataract Kamei et al., Biol. Pharm. Bull., 2005. Normal Cataract (-) pI (+) More phosphorylated spots in cataract sample. Digestion and MS from Spot 16 gave these peptides: MDVTIQHPWFKR ALGPFYPSR CSLSADGMLTFSG YRLPSNVDQSALS Mw MDVTIQHPWFKR We want to identify the protein(s) that contain these peptides Use Peptide Search

  22. Peptide Search

  23. Peptide Search & Results Species restricted search Sorting arrows Search in UniProtKB, 23 proteins Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report Matching peptide highlighted in the sequence

  24. Retrieve more sequences Batch Retrieval Results (I) • Retrieve multiple proteins in from iProClass using a specific identifier or a combination of them • Provides a means to easily retrieve and analyze proteins when the identifiers come from different databases

  25. ID Mapping

  26. Blast Similarity Search What proteins are related to rat CRYAA? • Perform sequence similarity search >P24623 http://pir.georgetown.edu/pirwww/search/blast.shtml

  27. Pairwise Alignment

  28. PIR Text Search (http://pir.georgetown.edu/search/textsearch.shtml) UniProtKBDatabase and unique UniParc sequences Let’s search for human crystallins PIR protein family classification database

  29. Let’s look for crystallins which have 3D structure Refine your search or start over Display PDB ID

  30. Domain Display allows to compare simultaneously Pfam domains present in multiple proteins Share same domain architecture Let’s perform a multiple alignment on the sequences containing PF00030

  31. Multiple Alignment

  32. Interactive Phylogenetic Tree and Alignment Beta B1 and gamma crystallins share the same domains, SCOP fold and share significant sequence similarity suggesting that they are related

  33. Pattern Search (I) Select P07320 and perform a pattern search Search for proteins containing this pattern (PS00225) in rat

  34. Pattern Search Result Beta and gamma Crystallins have multiple copies of this pattern

More Related