Biological databases
1 / 41

Biological Databases - PowerPoint PPT Presentation

  • Updated On :

Biological Databases. Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado . What can be discovered about a gene by a database search?. A little or a lot, depending on the gene

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Biological Databases' - Gabriel

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Biological databases l.jpg
Biological Databases

Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado

What can be discovered about a gene by a database search l.jpg
What can be discovered about a gene by a database search?

  • A little or a lot, depending on the gene

    • Evolutionary information: homologous genes, taxonomic distributions, allele frequencies, synteny, etc.

    • Genomic information: chromosomal location, introns, UTRs, regulatory regions, shared domains, etc.

    • Structural information: associated protein structures, fold types, structural domains

    • Expression information: expression specific to particular tissues, developmental stages, phenotypes, diseases, etc.

    • Functional information: enzymatic/molecular function, pathway/cellular role, localization, role in diseases

Using a database l.jpg
Using a database

  • How to get information out of a database:

    • Browsing: no targeted information to retrieve

    • Search: looking for particular information

  • Searching a database:

    • Must have a key that identifies the element(s) of the database that are of interest.

      • Name of gene

      • Sequence of gene

      • Other information

    • Helps to have particular informational goals

Searching for information about genes and their products l.jpg
Searching for informationabout genes and their products

  • Gene and gene product databases are often organized by sequence

    • Genomic sequence encodes all traits of an organism.

    • Gene products are uniquely described by their sequences.

    • Similar sequences among biomolecules indicates both similar function and an evolutionary relationship

  • Macromolecular sequences provide biologically meaningful keys for searching databases

Searching sequence databases l.jpg
Searching sequence databases

  • Start from sequence, find information about it

  • Many kinds of input sequences

    • Could be amino acid or nucleotide sequence

    • Genomic or mRNA/cDNA or protein sequence

    • Complete or fragmentary sequences

  • Exact matches are rare (even uninteresting in many cases), so often goal is to retrieve a set of similar sequences.

    • Both small (mutations) and large (required for function) differences within “similar” can be interesting.

What might we want to know about a sequence l.jpg
What might we want to know about a sequence?

  • Is this sequence similar to any known genes? How close is the best match? Significance?

  • What do we know about that gene?

    • Genomic (chromosomal location, allelic information, regulatory regions, etc.)

    • Structural (known structure? structural domains? etc.)

    • Functional (molecular, cellular & disease)

  • Evolutionary information:

    • Is this gene found in other organisms?

    • What is its taxonomic tree?

Ncbi and entrez8 l.jpg
NCBI and Entrez

  • One of the most useful and comprehensive sources of databases is the NCBI, part of the National Library of Medicine.

  • NCBI provides interesting summaries, browsers for genome data, and search tools

  • Entrez is their database search interface

  • Can search on gene names, sequences, chromosomal location, diseases, keywords, ...

Blast searching with a sequence l.jpg
BLAST: Searching with a sequence

  • Goals is to find other sequences that are more similar to the query than would be expected by chance (and therefore are homologous).

  • Can start with nucleotide or amino acid sequence, and search for either (or both)

  • Many options

    • E.g. ignore low information (repetitive) sequence, set significance critical value

    • Defaults are not always appropriate: READ THE NCBI EDUCATION PAGES!

Slide12 l.jpg

Distant hit human sorbitol dehydrogenase l.jpg
Distant hit:Human sorbitol dehydrogenase

Taxonomy report l.jpg
Taxonomy report

(link from “Results of BLAST” page)

What did we just do l.jpg
What did we just do?

  • Identify loci (genes) associated with the sequence. Input was Alcohol Dehydrogenase

  • For each particular “hit”, we can look at that sequence and its alignment in more detail.

  • See similar sequences, and the organisms in which they are found.

  • But there’s much more that can be found on these genes, even just inside NCBI…

Ncbi is not all there is l.jpg
NCBI is not all there is...

  • Links to non-NCBI databases

    • Reactome & KEGG for pathways

    • HGNC for nomenclature

    • UCSC Human Genome Browser

  • Other important gene/protein resources not linked to:

    • UniProt (most carefully annotated)

    • PDB (main macromolecular structure repository)

  • Other key biological data sources

    • Gene Ontology/Open Biological Ontologies

    • Enzyme

  • Scientific society:

  • Journals, Conferences…

Gene names harder than you think l.jpg
Gene Names: Harder than you think…

Take home messages l.jpg
Take home messages

  • There are a lot of molecular biology databases, containing a lot of valuable information

  • Not even the best databases have everything (or the best of everything)

  • These databases are moderately well cross-linked, and there are “linker” databases

  • Sequence is a good identifier, maybe even better than gene name!