1 / 34

Biological databases

Biological databases. International genome sequencing and protein structure determination. Protein Data Bank (PDB). Sequence data = strings of letters. Nucleotides (bases) Adenine ( A ) Cytosine ( C ) Guanine ( G ) Thymine ( T ). triplet codons genetic code.

Download Presentation

Biological databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biological databases

  2. International genome sequencing and protein structure determination Protein Data Bank (PDB)

  3. Sequence data = strings of letters Nucleotides (bases) Adenine (A) Cytosine (C) Guanine (G) Thymine (T) triplet codons genetic code 20 amino acids (A, L, V, S etc.)

  4. Three-dimensional protein structure = atomic coordinates in 3D space Conversion into metric

  5. Protein folding

  6. Data types primary data sequence DNA amino acid primary database DMPVERILEALAVE… secondary data secondary protein structure secondary db “motifs”:regular expressions, blocks, profiles, fingerprints tertiary data tertiary protein structure tertiary db atomic co-ordinates e. g., alpha-helices, beta-strands interaction data pathways and functional networks interaction db binary protein-protein interactions/ networks domains, folding units

  7. Nucleic acid EMBL GenBank DDBJ (DNA Data Bank of Japan) Protein PIR MIPS SWISS-PROT TrEMBL NRL-3D Primary biological databases

  8. International nucleotide data banks EMBL Europe GenBank USA International Advisory Meeting Collaborative Meeting NLM EMBL NCBI EBI DDBJ Japan TrEMBL NRDB NIG CIB

  9. GenBank file format

  10. GenBank file format

  11. Swiss-Prot

  12. SWISS-PROT file format

  13. SWISS-PROT file format

  14. SWISS-PROT file format

  15. SWISS-PROT file format

  16. Other primary protein databases • TrEMBL (translated EMBL) in SWISS-PROT format rapid access to sequence data from genome projects computer-annotated supplement to SWISS-PROT translations of all coding sequences (CDS) in EMBL • SP-TrEMBL

  17. Other primary protein databases The Protein Information Resource (PIR) • integrated system of protein sequence databases and derived related databases, e. g., alignment databases • rapid searching, comparison, and pattern matching of protein sequences • retrieval of descriptive, bibliographic, feature, and concurrent cross-reference information • aims to be comprehensive and consistently annotated

  18. PIR: related databases NRL-3D Sequence-Structure Database • produced by PIR from sequence and annotation information extracted from three-dimensional structures in the Protein Databank (PDB) • allows keyword and similarity searches

  19. Two other useful sites INFOBIOGEN-The Public Catalog of Databases http://www.infobiogen.fr/services/dbcat/ KEGG-Kyoto Encyclopedia of Genes and Genomes http://www.genome.ad.jp/kegg/ Kyoto Encyclopedia of Genes and Genomes (KEGG) is an effort to computerize current knowledge of molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes and to provide links from the gene catalogs produced by genome sequencing projects.

  20. Sequence Retrieval System (SRS) • Database browser that allows users to • retrieve • link • access • entries from all interconnected resources. • Users can formulate queries across a range of different database types.

  21. Guide to Protein Databases: http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture1/index.html http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture2/index.html With thanks to Dr Roman Laskowski.

  22. Interaction databases

  23. Biomolecule-ligand interactions • SRS: Enzymes, reactions and metabolic pathway databases • Receptor-ligand database searches relibase.ebi.ac.uk/

  24. Interaction databases Yeast model • YPD - http://www.incyte.com/sequence/proteome • proteome database of model organism • 6142 proteins : 3430 known, 804 similarity, 1908 unknown • data on protein interaction maps • derived from literature and experiment • Curagen - http://curatools.curagen.com • Curagen -Yeast two-hybrid screen data • 957 putative interactions of 1004 yeast proteins • Uetz et al., 2000 - Nature 403 p623-630

  25. Protein-Protein Interaction Databases http://www.hgmp.mrc.ac.uk/GenomeWeb/prot-interaction.html

  26. Protein-Protein Interactions DIP Biocarta KEGG

  27. KEGG http://www.genome.ad.jp/kegg/ • Search database for metabolic and regulatory pathways • Compute KEGG: Generate possible reaction pathways between two compounds http://www.genome.ad.jp/

  28. Metabolic pathways Signal transduction pathways (species-specific, Homo sapiens shown)

  29. Biocarta pathway database http://www.biocarta.com

More Related