Biological databases l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 41

Biological Databases PowerPoint PPT Presentation


  • 340 Views
  • Updated On :
  • Presentation posted in: General

Biological Databases. Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado . What can be discovered about a gene by a database search?. A little or a lot, depending on the gene

Download Presentation

Biological Databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Biological databases l.jpg

Biological Databases

Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado


What can be discovered about a gene by a database search l.jpg

What can be discovered about a gene by a database search?

  • A little or a lot, depending on the gene

    • Evolutionary information: homologous genes, taxonomic distributions, allele frequencies, synteny, etc.

    • Genomic information: chromosomal location, introns, UTRs, regulatory regions, shared domains, etc.

    • Structural information: associated protein structures, fold types, structural domains

    • Expression information: expression specific to particular tissues, developmental stages, phenotypes, diseases, etc.

    • Functional information: enzymatic/molecular function, pathway/cellular role, localization, role in diseases


Using a database l.jpg

Using a database

  • How to get information out of a database:

    • Browsing: no targeted information to retrieve

    • Search: looking for particular information

  • Searching a database:

    • Must have a key that identifies the element(s) of the database that are of interest.

      • Name of gene

      • Sequence of gene

      • Other information

    • Helps to have particular informational goals


Searching for information about genes and their products l.jpg

Searching for informationabout genes and their products

  • Gene and gene product databases are often organized by sequence

    • Genomic sequence encodes all traits of an organism.

    • Gene products are uniquely described by their sequences.

    • Similar sequences among biomolecules indicates both similar function and an evolutionary relationship

  • Macromolecular sequences provide biologically meaningful keys for searching databases


Searching sequence databases l.jpg

Searching sequence databases

  • Start from sequence, find information about it

  • Many kinds of input sequences

    • Could be amino acid or nucleotide sequence

    • Genomic or mRNA/cDNA or protein sequence

    • Complete or fragmentary sequences

  • Exact matches are rare (even uninteresting in many cases), so often goal is to retrieve a set of similar sequences.

    • Both small (mutations) and large (required for function) differences within “similar” can be interesting.


What might we want to know about a sequence l.jpg

What might we want to know about a sequence?

  • Is this sequence similar to any known genes? How close is the best match? Significance?

  • What do we know about that gene?

    • Genomic (chromosomal location, allelic information, regulatory regions, etc.)

    • Structural (known structure? structural domains? etc.)

    • Functional (molecular, cellular & disease)

  • Evolutionary information:

    • Is this gene found in other organisms?

    • What is its taxonomic tree?


Ncbi and entrez l.jpg

NCBI and Entrez


Ncbi and entrez8 l.jpg

NCBI and Entrez

  • One of the most useful and comprehensive sources of databases is the NCBI, part of the National Library of Medicine.

  • NCBI provides interesting summaries, browsers for genome data, and search tools

  • Entrez is their database search interfacehttp://www.ncbi.nlm.nih.gov/Entrez

  • Can search on gene names, sequences, chromosomal location, diseases, keywords, ...


Blast searching with a sequence l.jpg

BLAST: Searching with a sequence

  • Goals is to find other sequences that are more similar to the query than would be expected by chance (and therefore are homologous).

  • Can start with nucleotide or amino acid sequence, and search for either (or both)

  • Many options

    • E.g. ignore low information (repetitive) sequence, set significance critical value

    • Defaults are not always appropriate: READ THE NCBI EDUCATION PAGES!


Slide12 l.jpg

  • Major choices:

    • Translation

    • Database

    • Filters

    • Restrictions

    • Matrix


Close hit rat adh alpha l.jpg

Close hit: Rat ADH alpha


Distant hit human sorbitol dehydrogenase l.jpg

Distant hit:Human sorbitol dehydrogenase


Parameters at bottom l.jpg

Parameters (at bottom!)


Slide18 l.jpg

Click on:


Taxonomy report l.jpg

Taxonomy report

(link from “Results of BLAST” page)


What did we just do l.jpg

What did we just do?

  • Identify loci (genes) associated with the sequence. Input was Alcohol Dehydrogenase

  • For each particular “hit”, we can look at that sequence and its alignment in more detail.

  • See similar sequences, and the organisms in which they are found.

  • But there’s much more that can be found on these genes, even just inside NCBI…


More from entrez gene l.jpg

More from Entrez Gene


And more l.jpg

And more…


Pubmed l.jpg

PubMed


Gene expression l.jpg

Gene Expression


Detailed expression information l.jpg

Detailed expression information


Ncbi is not all there is l.jpg

NCBI is not all there is...

  • Links to non-NCBI databases

    • Reactome & KEGG for pathways

    • HGNC for nomenclature

    • UCSC Human Genome Browser

  • Other important gene/protein resources not linked to:

    • UniProt (most carefully annotated)

    • PDB (main macromolecular structure repository)

  • Other key biological data sources

    • Gene Ontology/Open Biological Ontologies

    • Enzyme

  • Scientific society: iscb.org

  • Journals, Conferences…


Gene names harder than you think l.jpg

Gene Names: Harder than you think…


Take home messages l.jpg

Take home messages

  • There are a lot of molecular biology databases, containing a lot of valuable information

  • Not even the best databases have everything (or the best of everything)

  • These databases are moderately well cross-linked, and there are “linker” databases

  • Sequence is a good identifier, maybe even better than gene name!


  • Login