biological databases l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Biological Databases PowerPoint Presentation
Download Presentation
Biological Databases

Loading in 2 Seconds...

play fullscreen
1 / 41

Biological Databases - PowerPoint PPT Presentation


  • 454 Views
  • Uploaded on

Biological Databases. Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado . What can be discovered about a gene by a database search?. A little or a lot, depending on the gene

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Biological Databases' - Gabriel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
biological databases
Biological Databases

Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado

what can be discovered about a gene by a database search
What can be discovered about a gene by a database search?
  • A little or a lot, depending on the gene
    • Evolutionary information: homologous genes, taxonomic distributions, allele frequencies, synteny, etc.
    • Genomic information: chromosomal location, introns, UTRs, regulatory regions, shared domains, etc.
    • Structural information: associated protein structures, fold types, structural domains
    • Expression information: expression specific to particular tissues, developmental stages, phenotypes, diseases, etc.
    • Functional information: enzymatic/molecular function, pathway/cellular role, localization, role in diseases
using a database
Using a database
  • How to get information out of a database:
    • Browsing: no targeted information to retrieve
    • Search: looking for particular information
  • Searching a database:
    • Must have a key that identifies the element(s) of the database that are of interest.
      • Name of gene
      • Sequence of gene
      • Other information
    • Helps to have particular informational goals
searching for information about genes and their products
Searching for informationabout genes and their products
  • Gene and gene product databases are often organized by sequence
    • Genomic sequence encodes all traits of an organism.
    • Gene products are uniquely described by their sequences.
    • Similar sequences among biomolecules indicates both similar function and an evolutionary relationship
  • Macromolecular sequences provide biologically meaningful keys for searching databases
searching sequence databases
Searching sequence databases
  • Start from sequence, find information about it
  • Many kinds of input sequences
    • Could be amino acid or nucleotide sequence
    • Genomic or mRNA/cDNA or protein sequence
    • Complete or fragmentary sequences
  • Exact matches are rare (even uninteresting in many cases), so often goal is to retrieve a set of similar sequences.
    • Both small (mutations) and large (required for function) differences within “similar” can be interesting.
what might we want to know about a sequence
What might we want to know about a sequence?
  • Is this sequence similar to any known genes? How close is the best match? Significance?
  • What do we know about that gene?
    • Genomic (chromosomal location, allelic information, regulatory regions, etc.)
    • Structural (known structure? structural domains? etc.)
    • Functional (molecular, cellular & disease)
  • Evolutionary information:
    • Is this gene found in other organisms?
    • What is its taxonomic tree?
ncbi and entrez8
NCBI and Entrez
  • One of the most useful and comprehensive sources of databases is the NCBI, part of the National Library of Medicine.
  • NCBI provides interesting summaries, browsers for genome data, and search tools
  • Entrez is their database search interfacehttp://www.ncbi.nlm.nih.gov/Entrez
  • Can search on gene names, sequences, chromosomal location, diseases, keywords, ...
blast searching with a sequence
BLAST: Searching with a sequence
  • Goals is to find other sequences that are more similar to the query than would be expected by chance (and therefore are homologous).
  • Can start with nucleotide or amino acid sequence, and search for either (or both)
  • Many options
    • E.g. ignore low information (repetitive) sequence, set significance critical value
    • Defaults are not always appropriate: READ THE NCBI EDUCATION PAGES!
slide12
Major choices:
    • Translation
    • Database
    • Filters
    • Restrictions
    • Matrix
taxonomy report
Taxonomy report

(link from “Results of BLAST” page)

what did we just do
What did we just do?
  • Identify loci (genes) associated with the sequence. Input was Alcohol Dehydrogenase
  • For each particular “hit”, we can look at that sequence and its alignment in more detail.
  • See similar sequences, and the organisms in which they are found.
  • But there’s much more that can be found on these genes, even just inside NCBI…
ncbi is not all there is
NCBI is not all there is...
  • Links to non-NCBI databases
    • Reactome & KEGG for pathways
    • HGNC for nomenclature
    • UCSC Human Genome Browser
  • Other important gene/protein resources not linked to:
    • UniProt (most carefully annotated)
    • PDB (main macromolecular structure repository)
  • Other key biological data sources
    • Gene Ontology/Open Biological Ontologies
    • Enzyme
  • Scientific society: iscb.org
  • Journals, Conferences…
take home messages
Take home messages
  • There are a lot of molecular biology databases, containing a lot of valuable information
  • Not even the best databases have everything (or the best of everything)
  • These databases are moderately well cross-linked, and there are “linker” databases
  • Sequence is a good identifier, maybe even better than gene name!