1 / 46

Analyzing Families of Sequences

Analyzing Families of Sequences. MARC: Developing Bioinformatics Programs July 13, 2009 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu Ricardo Gonzalez Mendez ricardo.gonzalez7@upr.edu. Bioinformatics .

ahanu
Download Presentation

Analyzing Families of Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing Families of Sequences MARC: Developing Bioinformatics Programs July 13, 2009 Alex Ropelewski ropelews@psc.edu Hugh Nicholas nicholas@psc.edu Ricardo Gonzalez Mendez ricardo.gonzalez7@upr.edu

  2. Bioinformatics The interdisciplinary science of using computational approaches to analyze, classify, collect, represent and store biological data with the goal of accelerating and enhancing the understanding of DNA, RNA and Protein sequences.

  3. Sequence Analysis Process of applying computational methods to a biological molecule represented as a character string. The goal is to infer information about the structure, function, or evolutionary history of the sequence.

  4. What is a Sequence? A sequence is a way to represent a protein, DNA, or RNA molecule as a character string. Phospholipase A2 - Bos taurus (Bovine). MRLLVLAALLTVGAGQAGLNSRALWQFNGMIKCKIPSSEPLLDFNNYGCYCGLGGSGTPV DDLDRCCQTHDNCYKQAKKLDSCKVLVDNPYTNNYSYSCSNNEITCSSENNACEAFICNC DRNAAICFSKVPYNKEHKNLDKKNC

  5. A - Alanine R - Arginine N - Asparagine D - Aspartic acid C - Cysteine E - Glutamic acid Q - Glutamine G - Glycine H - Histidine I - Isoleucine L - Leucine K – Lysine M – Methionine F - Phenylalanine P - Proline S - Serine T - Threonine W - Tryptophan Y - Tyrosine V - Valine B - Asparagine or aspartic acid Z - Glutamine or glutamic acid J - Leucine or Isoleucine X - Any Amino Acid U - Selenocysteine O - Pyrrolysine Representing Proteins N Q P G I C L C Y Image from Wikipedia Commons: http://en.wikipedia.org/wiki/File:Oxytocin.jpg

  6. Why study families of sequences? Families share a common function, structure, and are related through evolution Aldehyde Dehydrogenase Family Members

  7. The Goal CURATED FAMILY: • All related sequences sharing a common function (Homologous Sequences) • All substantial motifs • Evolutionary history • Structural information • Experimental information

  8. Structural Libraries Evolutionary Analysis Hidden Markov Model Classification Libraries Multiple Sequence Alignment Initial Query Profile & PSSM Sequence Libraries Local Patterns The Process Homology Modeling CURATED DATASET

  9. The Toolkit GenBank Blast Clustalw Meme EMBL Fasta T-Coffee Mast UniProt Smith-Waterman MSA hmmer Pfam Needleman-Wunsch Probcons Profile-ss PDB Figtree Phylip Notung PDB Python BioPython Genedoc

  10. The Project Part I: Submit three candidate families for your course project. Part II: Collect an initial set of sequences Part III: Generate a multiple sequence alignment, identify patterns and motifs and use them to improve the quality of your alignment, and identify additional distantly related family members Part IV:Integrate the sequence analysis results to the structure and function and evolution of the family Part V: Write a draft paper, or research grant and develop an oral presentation for a conference

  11. Structural Libraries Evolutionary Analysis Hidden Markov Model Classification Libraries Multiple Sequence Alignment Initial Query Profile & PSSM Sequence Libraries Local Patterns Part I Homology Modeling CURATED DATASET

  12. Part 1 – Selecting Query Learning Objectives: Teach students ability select an appropriate subject for experimentation. Teach students how to use PubMed: Find reviews, background information, and prior work to understand what is known about the subject Teach students how to concisely summarize and properly cite prior research works

  13. PubMed URL: http://www.pubmed.gov/ National Library of Medicine’s database of articles published in biomedical journals Currently contains over 18 million citations, dating from 1948 About 90% of records are English-language sources or have English abstracts About 80% of the citations include the published abstract About 5,200 Journals Some links to full-text articles at participating publishers web sites

  14. Data in PubMed Title of the journal article Names of the authors Abstract published with the article MeSH (Medical Subject Headings) tags Journal source First author affiliation Language of the article Publication type (review, letter, etc.)

  15. Simple PubMed Search Enter Search Term Click Go Search Results

  16. Basic PubMed Search • Pubmed Feature Tabs: • Limits: Limit search to certain dates, languages, etc. • Preview: Allows viewing and selecting of search fields • History: Log of recent searches • Clipboard: Allows items to be temporarily saved • Details: Shows how PubMed ran the search Search Database Selection Go to advanced search page Display Format Click on tab for all articles Click on tab for review articles Select to Sort results Select to save or email results Page through results

  17. PubMed Boolean Logic Salmonella and Eggs Salmonella orEggs Salmonella notEggs Salmonella and Eggs and Hamburger Salmonella andEggs or Hamburger Salmonella and (Eggs or Hamburger)

  18. PubMedAdvanced Search

  19. PubMed Limits

  20. MeSH tags Medical Subject Headings Controlled vocabulary/key word system Used to help locate appropriate articles Articles in PubMed usually have between 5 to 15 MeSH tags associated with them. MeSH Tutorial at: http://www.nlm.nih.gov/bsd/disted/mesh

  21. MeSH Search 1) Select MeSH 2) Enter Search Term 3) Click Go 4) Select MeSH Term 5) Select Search Box

  22. MeSH Search Search Box Click Search PubMed

  23. MeSH Search Click tab to see review articles Click tab to see all articles

  24. MeSH Search Multiple MeSH Terms

  25. Evolutionary Analysis Hidden Markov Model Multiple Sequence Alignment Initial Query Profile & PSSM Local Patterns Part II Structural Libraries Homology Modeling CURATED DATASET Classification Libraries Sequence Libraries

  26. Part II - Libraries Learning Objective: Be able to search major libraries of biomolecules to collect sequences of interest Understand information contained in the major sequence, structure and classification libraries Understand searching methods and their limitations Understand the effect of search parameters Be able to select appropriate methods and parameters for a variety of sequences

  27. Searching Sequence Libraries – Results

  28. Sequence Libraries – Results

  29. Structural Libraries Evolutionary Analysis Hidden Markov Model Classification Libraries Initial Query Part III Homology Modeling CURATED DATASET Multiple Sequence Alignment Sequence Libraries Local Patterns Profile & PSSM

  30. Part III – Multiple Alignment Learning Objective: Be able to construct a biologically correct alignment for a family of sequences Understand what makes an alignment biologically correct Be able construct and refine multiple sequence alignments Be able to create abstract representations of multiple alignments and search databases with them. Be able to tie local patterns(motifs) found back to the biology of the sequences Understand the methods used to abstract an alignment and the advantages and disadvantage of commonly used methods. Understand the effect of search parameters Be able to select appropriate methods and parameters for a variety of sequences

  31. Multiple Sequence Alignment

  32. Aliphatic Amino Acids (I,V,L)

  33. Similarity of Amino Acids Valine – Val – V Leucine – Leu – L Isoleucine – Ile – I

  34. Similarity of Amino Acids

  35. Understanding Motifs Functional Residues

  36. Substrate Binding NAD Binding

  37. Hidden Markov Model Classification Libraries Initial Query Profile & PSSM Sequence Libraries Local Patterns Part IV Structural Libraries Evolutionary Analysis Homology Modeling CURATED DATASET Multiple Sequence Alignment

  38. Part IV – Structure and Phylogeny Learning Objective: Understand and integrate the sequence analysis results to the structure and function of the protein family Understand the evolutionary patterns of gene and species Integrate evolutionary information with structural information to understand how the function has evolved within the protein family Predict or design experiments to be carried out in-vitro Design drugs Mutate the proteins Mutate regulatory areas within genome to change expression

  39. IntegratingAlignment, Motifs & Structure Active Site

  40. Integrating Alignment, Motifs & Structure Conserved Asn Binds Substrate

  41. Integrating Alignment, Motifs & Structure Catalytic Thiol (Cys)

  42. Evolutionary Relationships

  43. Part V – Prepare Work for Publication Learning Objectives: Teach students ability to concisely summarize and properly cite relevant prior research works Teach students ability to concisely summarize their research works Teach students to revise papers based on reviewers comments Teach students how to write a research grant Teach students how to give and prepare an oral research presentation.

  44. Workshop Projects Biologists: You will be working through the same five step project that your students will during your class. By the time that you leave here, you should have a good start on a research publication, grant or have ideas for in-vitro experiments.

  45. Workshop Projects Computer Scientists: Take your favorite string matching algorithm and apply it to biological sequence data. Compare your algorithms performance with some of the algorithms discussed in this workshop in terms of speed, selectivity, or sensitivity. Feel free to use a parallel algorithm.

More Related