Archives and information retrieval
Download
1 / 20

Archives and Information Retrieval - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

Archives and Information Retrieval. CSC 487/687 Computing for Bioinformatics. Introduction. Learning objectives: What is the general arrangement of biological data in the public databases? To know the information retrieval skills that will allow you to make effective use of the databases.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Archives and Information Retrieval' - saddam


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Archives and information retrieval

Archives and Information Retrieval

CSC 487/687 Computing for Bioinformatics


Introduction
Introduction

  • Learning objectives:

    • What is the general arrangement of biological data in the public databases?

    • To know the information retrieval skills that will allow you to make effective use of the databases.

    • To become familiar with basic operations.

    • How does one retrieve information on a particular subject in the literature?


Primary public domain bioinformatics servers

Public Domain

Bioinformatics

Facilities

Genome

Net

(KEGG & DDBJ)

Japan

European Bioinformatics

Institute (EBI)

United Kingdom

National Center

For Biotechnology

Information (NCBI)

United States

Databases

Databases

Analysis

Tools

Analysis

Tools

Databases

Analysis

Tools

Primary public domain bioinformatics servers


The archives
The Archives

  • Massive biological experimental data

  • These biological information databases can be classified into two types

    • The first level databases

      • Come from the raw data which were obtained via the experiments. “simple”

    • The second level databases

      • Further reorganized based on.. in order to achieve some specific goals


The archives1
The Archives

  • Some examples:

    • The first level databases

      • Nucleic acid sequence databases: GenBank, EMBL Data Library, DNA Database of Japan (DDBJ)

      • Protein sequence database: SWISS-PROT, PIR

      • Protein structure database: PDB

    • The second level databases

      • GDB

      • TRANSFAC

      • SCOP


Nucleic acid sequence databases
Nucleic acid sequence databases

  • International DNA Sequence Database Collaboration

    • NCBI (GenBank) – USA (1982)

    • EMBL (Data Library)– Europe (1982)

    • DDBJ (DNA Data Bank)– Japan (1988)


NCBI

  • Established in USA in 1988 as a national resource for molecular biology information

  • creates public databases

  • conducts research in computational biology

  • develops software tools for analyzing genome data

  • disseminates biomedical information


Nucleic acid sequence databases1
Nucleic acid sequence databases

  • GenBank

    • nucleic acid sequence and the protein sequence

    • literature work

    • biological annotation

    • A new release is made every two months

    • GenBank information retrieval system


Ncbi entrez

ENTREZ

PubMed

PopSet

Genomes

Protein

databases

OMIM

Taxonomy

GenBank

MedLine

NCBI ENTREZ

  • A platform that provides access to and links to databases with biological information


Ncbi entrez1

MedLine

Literature Database

OMIM

Database of human genes and genetic disorders

GenBank

Database of all publicly available DNA sequences

Database of amino acid sequences from SwissProt, PIR, PRF,

PDB, and translations from annotated coding regions in

GenBank and RefSeq.

Protein

databases

Database of genomes from organisms and viruses

Genomes

Database of DNA sequences that have been collected to

analyze the evolutionary relatedness of a population.

PopSet

Taxonomy

Database of names of organisms with sequences in GenBank or Prot

NCBI ENTREZ


Pubmed center
PubMed Center

  • the U.S. National Library of Medicine's digital archive of life sciences journal literature

  • Access to the full text of articles in PMC is free, except where a journal requires a subscription for access to recent articles


Omim o nline m endelian i nheritance in m an
OMIM-Online Mendelian Inheritance in Man

  • A catalog of human genes linked to diseases

  • Began by Victor A. McKusick at Johns Hopkins University

  • A good place to start when you want to research a certain disease or biological molecule

  • This database is cross-referenced to PubMed and other NCBI-based databases


How to submit sequence data to genbank
How to submit sequence data to GenBank

  • Bankit based web interface

    • http://www.ncbi.nlm.nih.gov/BankIt

  • Sequin program

    • http://www.ncbi.nlm.nih.gov/Sequin



Protein databases
Protein databases

  • The Protein Information Resource (PIR) was established in 1984 by the National Biomedical Research Foundation (NBRF).

  • The PIR Protein Sequence Database evolved from the original NBRF Protein Sequence Database, developed over 20 years

  • PIR-International is a collaboration between NBRF, the Munich Information Center for Protein Sequences (MIPS), and the Japan International Protein Information Database (JIPID)

  • collect and publish what is now the oldest and largest database of biomolecular sequence, source, literature, and feature information.


PIR

  • PIR-International Protein Sequence Database: an annotated, non-redundant and cross-referenced database of protein sequences.

  • PIR Alignment Database, PIR-ALN: contains sequence alignments of superfamilies, families and homology domains produced from information in the Protein Sequence Database.

  • FAMBASE Family Database: a searchable database containing a single representative sequence from each protein family.

  • RESID Database of Amino Acid Modifications: based on feature information in the Protein Sequence Database.


PIR

  • http://www-nbrf.georgetown.edu/pir/


Swiss prot
SWISS-PROT

  • http://www.ebi.ac.uk/swissprot/

  • an well-annotated protein sequence database established in 1986.

  • It is maintained collaboratively by the Swiss Institute for Bioinformatics (SIB) and the European Bioinformatics Institute (EBI).

  • a curated protein sequence database that provides a high level of annotation, a minimal level of redundancy and a high level of integration with other databases. Note: UniProtKB/TrEMBL and UniProtKB/Swiss-Prot have been incorporated into the UniProt (Universal Protein Resource). a one-stop shop allowing easy access to all publicly available information about protein sequences.


Prosite
PROSITE

  • http://ca.expasy.org/prosite/

  • a method of determining what is the function of uncharacterized proteins translated from genomic or cDNA sequences.

    • a database of biologically significant sites

    • patterns formulated in such a way that with appropriate computational tools it can rapidly and reliably identify to which known family of protein (if any) the new sequence belongs.


PDB

  • http://www.rcsb.org/pdb/

  • The single international repository for public data on the 3-dimensional structures of biological macromolecules

  • Is established by the Brookhaven National Lab of United States

  • The contents are primarily experimental data derived from X-ray crystallography and NMR experiments

  • Rasmol may demonstrate 3D structure of the biological macromolecule according to the PDB document


ad