1 / 38

Sequence Based Analysis Tutorial

Learn about sequence retrieval, similarity search, classification methods, and integrated bioinformatics systems for protein analysis.

jnorwood
Download Presentation

Sequence Based Analysis Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at Georgetown University Medical Center

  2. Retrieval, Sequence Search & Classification Methods • Retrieve protein info by text / UID • Sequence Similarity Search • BLAST, FASTA, Dynamic Programming • Family Classification • Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural Networks • Integrated Search and Classification System

  3. Sequence Similarity Search • Based on Pair-Wise Comparisons • Dynamic Programming Algorithms • Global Similarity: Needleman-Wunch • Local Similarity: Smith-Waterman • Heuristic Algorithms • FASTA: Based on K-Tuples (2-Amino Acid) • BLAST: Triples of Conserved Amino Acids • Gapped-BLAST: Allow Gaps in Segment Pairs • PHI-BLAST: Pattern-Hit Initiated Search • PSI-BLAST: Position-Specific Iterated Search

  4. Sequence Similarity Search • Similarity Search Parameters • Scoring Matrices – Based on Conserved Amino Acid Substitution • Dayhoff Mutation Matrix, e.g., PAM250 (~20% Identity) • Henikoff Matrix from Ungapped Alignments, e.g., BLOSUM 62 • Gap Penalty • Search Time Comparisons • Smith-Waterman: 10 Min • FASTA: 2 Min • BLAST: 20 Sec

  5. Feature Representation • Features: Residue Physicochemical Properties, Context (Local & Global) Features, Evolutionary Features • Alternative Alphabets: Classification of Amino Acids To Capture Different Features of Amino Acid Residues

  6. Substitution Matrix • Likelihood of One Amino Acid Mutated into Another Over Evolutionary Time • Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7) • Positive Score: Conservative Substitution (e.g., Lys/Arg, +3) • High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys)

  7. BLAST BLAST (Basic Local Alignment Search Tool) • To search a sequence against the database • Extremely fast • Robust • Most widely used It finds very short segment pairs between the query and sequence in the database These segments are then extended in both directions until the maximum possible score of this particular segment is reached

  8. BLAST Search • From BLAST Search Interface • Table-Format Result with BLAST Output and SSEARCH (Smith-Waterman) Pair-Wise Alignment

  9. SSEARCH Alignment BLAST Alignment BLAST/SSEARCH Results

  10. Family Classification Methods • Based on Family Information • ClustalW Multiple Sequence Alignment • ProSite Pattern Search • Profile Search • Hidden Markov Models (HMMs) • Neural Networks • Integrated Analysis

  11. Multiple Sequence Alignment • ClustalW • Progressive Pairwise Approach • Base on Exhaustive Pairwise Alignments • Neighbor Joining • Joining Order Corresponding to a Tree • Alignment Varies • Dependent on Joining Order

  12. How do you build a tree? • Pick sequences to align • Align them • Verify the alignment • Keep the parts that are aligned correctly • Build and evaluate a phylogenetic tree

  13. Multiple Alignment and Tree • From Text/Sequence Search Result or ClustalW Alignment Interface

  14. Motif Patterns (Regular Expressions) • Signature Patterns for Functional Motifs ProClass Motif Alignments

  15. PIR Pattern Search • From Text/Sequence Search Result or Pattern Search Interface • One Query Sequence Against PROSITE Pattern Database • One Query Pattern (PROSITE or User-Defined) Against Sequence DB

  16. Pattern Search Result (I) • One Query Sequence Against PROSITE Pattern Database

  17. Pattern Search Result (II) • One Query Pattern Against Sequence Database

  18. Profile Method • Profile: A Table of Scores to Express Family Consensus Derived from Multiple Sequence Alignments • Num of Rows = Num of Aligned Positions • Each row contains a score for the alignment with each possible residue. • Profile Searching • Summation of Scores for Each Amino Acid Residue along Query Sequence • Higher Match Values at Conserved Positions

  19. PIR HMM Domain/Motif Search • From Text/Sequence Search Result or HMM Search Interface • HMMER Model Building & Sequence Search • Search One Query Protein Against All HMMs • Search One HMM Against Sequence DB

  20. HMM Search Result (I) • One Query Protein Against All Pfam HMMs

  21. HMM Search Result (II) • Search User-Built HMM Against Protein Sequence DB • Input Sequences (Optional Residue Ranges) -> Multiple Sequence Alignment -> Model Building -> HMM Search

  22. Secondary Structure Features • a Helix Patterns of Hydrophobic Residue Conservation Showing I, I+3, I+4, I+7 Pattern Are Highly Indicative of an a Helix (Amphipathic) • b Strands That Are Half Buried in the Protein Core Will Tend to Have Hydrophobic Residues at Positions I, I+2, I+4, I+6

  23. Integrated Bioinformatics System for Function and Pathway Discovery • Data Integration • Associative Analysis

  24. PIR-NREF iProClass Query Sequence Family Classification & Functional Analysis BLAST Search HMM Domain Search Top-Matched Superfamilies/Domains HMM Motif Search Pattern Search SignalP/TMHMM Predicated Superfamilies/Domains/Motifs/Sites/SignalPeptides/TMHs CLUSTALW SSEARCH Superfamily/Domain/Motif Alignments Family Relationships & Functional Features Analytical Pipeline

  25. Integrated Bioinformatics System • Global Bioinformatics Analysis of 1000’s of Genes and Proteins • Pathway Discovery, Target Identification

  26. Lab Section

  27. Peptide Search & Results

  28. Blast Similarity Search

  29. Blast Search Results

  30. Pair-Wise Alignment

  31. Multiple Sequence Alignment

  32. Pattern Search Results

  33. HMM Domain Search Result

  34. Building HMM Profile

  35. Using HMM Profile for Searching

  36. Rabbit Alpha Crystallin A Chain An iProClass View of the entry

More Related