dna databanks l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
DNA Databanks PowerPoint Presentation
Download Presentation
DNA Databanks

Loading in 2 Seconds...

play fullscreen
1 / 85

DNA Databanks - PowerPoint PPT Presentation


  • 176 Views
  • Uploaded on

DNA Databanks. Speaker: Yu-Chung Chang 張猷忠 Institute of Biochemistry National Yang-Ming University. DNA databanks GenBank , DDBJ , EMB L, … Protein databases PIR, Swiss-Prot, PRF, GenPept, TrEMBL, PDB, … EST databases dbEST, DOTS, UniGene, GIs, STACK, … Structure databases

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'DNA Databanks' - taini


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dna databanks

DNA Databanks

Speaker: Yu-Chung Chang 張猷忠

Institute of Biochemistry

National Yang-Ming University

biological databases
DNA databanks

GenBank, DDBJ, EMBL,…

Protein databases

PIR, Swiss-Prot, PRF, GenPept, TrEMBL, PDB,…

EST databases

dbEST, DOTS, UniGene, GIs, STACK,…

Structure databases

MMDB, PDB, Swiss-3DIMAGE,…

Pathway databases

KEGG, BRITE, TRANSPATH,…

Integrated databases

SRS

Motif or cis-element databases

Prosite, Pfam, BLOCKS, TransFac, PRINTS, URLs,…

Gene, protein & disease databases

GeneCards, OMIM, OMIA,…

Taxonomy databases

Literature databases

PubMed, Medline,…

Patent database

Apipa, CA-STN, IPN, USPTO, EPO, Beilstein,…

Others…

RNA databases,…

Biological Databases
dna databanks3
DNA Databanks
  • cDNA resources
    • Genbank (NCBI), Nucleotide Sequence Database (EMBL), DDBJ , MGC,…
  • Genomic DNA resources
    • HTG, dbGSS, GOLD, ERGO,…
  • EST resources
    • dbEST, UniGene, GIs, STACKS, DOTS,…
  • Others
    • dbSTS, UniSTS, dbSNP, TransFac, ISIS, Repbase, ...
genbank at national center for biotechnology information ncbi
GenBank at National Center for Biotechnology Information (NCBI)
  • GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.
  • There are approximately 11,720,000,000 bases in 10,897,000 sequence records as of February 2001.
  • GenBank is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.
  • http://www.ncbi.nlm.nih.gov/Entrez/
european molecular biology laboratory embl
European Molecular Biology Laboratory (EMBL)
  • The EMBL Nucleotide Sequence Database constitutes Europe's primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing  projects and patent applications.
  • http://www.ebi.ac.uk/Databases/index.html
dna data bank of japan ddbj http www ddbj nig ac jp
DNA Data Bank of Japan (DDBJ) http://www.ddbj.nig.ac.jp/
  • Database Search
    • Getentry, SFgate & WAIS, SRS, Homology Search, TXSearch, SQmatch
  • Data Analysis
    • malign, clustal w
  • Genome Analysis
    • GTOP
  • Protein Structure
    • PDB Retriever, SSThread, LIBRA I
genome projects
Genome Projects
  • Whole genome sequences
  • EST projects
  • MGC projects
  • SNP projects
  • GSS projects
  • STS projects
genbank sequence submission policy
GenBank Sequence Submission Policy

At this time the following types of submissions are NOT acceptable.

  • sequences of less than 50 bp in length.
  • computer generated or otherwise predicted sequences (i.e. EST assembled sequences).
  • third party sequences downloaded from a sequence database or journal.
  • one genomic sequence with multiple exons joined together without the sequence of the intervening introns.
  • primer only sequences.
genbank sequence submission policy cont
GenBank Sequence Submission Policy (cont.)

At this time the following types of submissions are NOT acceptable.

  • protein only sequences.
  • non-biologically contiguous sequences containing internal unsequenced spacers.
  • sequences containing a mix of genomic and mRNA sequence represented as a single sequence
  • EST submissions should be submitted through the dbEST system.
  • as of 1 January, 2000, Genome Survey Sequences (GSSs) should not be submitted through Bankit; use the dbGSS system.
data submission
WWW

Bankit

WebIn

Sakura

Sequin

e-mail

Sequin

Diskette

Sequin

Data Submission
dna databases at ncbi
Nucleotides

dbEST

UniGene

dbGSS

dbSTS

UniSTS

RefSeq

MGC

dbSNP

HTGs

UniVec

DNA Databases at NCBI
dbest http www ncbi nlm nih gov dbest index html
dbEST http://www.ncbi.nlm.nih.gov/dbEST/index.html

dbEST

  • dbEST is a database of expressed sequence tags; short, single pass read cDNA (mRNA) sequences. Also includes cDNA sequences from differential display experiments and RACE experiments.
dbgss http www ncbi nlm nih gov dbgss index html
dbGSShttp://www.ncbi.nlm.nih.gov/dbGSS/index.html
  • Database of genome survey sequences.
  • Short, single pass read genomic sequences.
  • Exon trapped sequences.
  • Cosmid/BAC/YAC ends.
  • Alu PCR sequences.
  • GSS sequences are available from two sources: dbGSS and the GSS division of GenBank. The sequences and accession numbers in both sources are the same but the record formats differ.
dbsts http www ncbi nlm nih gov dbsts index html
dbSTShttp://www.ncbi.nlm.nih.gov/dbSTS/index.html
  • Database of sequence tagged sites.
  • Short sequences that are operationally unique in the genome, used to generate mapping reagents.
  • STS sequences are available from two sources: dbSTS and the STS division of GenBank. The sequences and accession numbers in both sources are the same but the record formats differ.
htgs http www ncbi nlm nih gov htgs
HTGshttp://www.ncbi.nlm.nih.gov/HTGS/
  • High throughput genome sequences from large scale genome sequencing centers.
  • Unfinished (phase 0, 1, 2) and finished (phase 3) sequences.
  • Sequence data in this division are available for BLAST homology searches against either the "htgs" database or the "month" database, which includes all new submissions for the prior month.
dbsnp http www ncbi nlm nih gov snp
dbSNPhttp://www.ncbi.nlm.nih.gov/SNP/
  • Database of single nucleotide polymorphisms.
  • Small-scale insertions/deletions.
  • Polymorphic repetitive elements.
  • Microsatellite variation.
new htc high throughput cdna division
New HTC (High Throughput cDNA) division
  • At the May 2000 collaborative meeting DDBJ/EMBL/GenBank agreed to create a new database division HTC to represent unfinished High Throughput cDNA sequences. HTC sequences may include 5'UTR and 3'UTR regions and (part of a) coding region. Upon finishing of these sequences, they will be moved to the corresponding taxonomic division. HTC sequence entries will include the keyword 'HTC'. The keyword will be removed once the entry has been included in the taxonomic division.
mammalian gene collection mgc http www ncbi nlm nih gov mgc
Mammalian Gene Collection (MGC)http://www.ncbi.nlm.nih.gov/MGC/
  • The Mammalian Gene Collection (MGC) project is a new effort by the NIH to generate full-length complementary DNA (cDNA) resources.
entrez searching
Entrez Searching
  • Subject searching
  • Phrase searching
  • Searching for authors
  • Searching for unique identifiers
  • Searching by molecular weight
  • Range searching
  • Truncating searching (Wildcard searching)
  • Combining sets
entrez subject searching
Entrez -Subject searching
  • Text searching
    • hiv-1
  • Subject terms are automatically combined
    • hiv-1 protease, hiv-1 AND protease

$

L

entrez phrase searching
Entrez -Phrase searching
  • “hiv-1 protease”
  • Using quotes forces Entrez to check a phrase list against which the search terms are matshed.
  • It is not adjacency searching.
  • If the search phrase is not in the phrase list, Entrez treats it as a subject searching.
entrez searching for authors
Entrez -Searching for authors
  • Chang YC
    • Search only the author field
  • Chang
    • Search all fields
    • Subject searching
  • Do not use punctuation.
entrez searching for unique identifiers
Entrez -Searching for unique identifiers
  • Accession numbers
    • GenBank/EMBL/DDBJ: U12345, AF123456
    • GenPept: AAA12345
    • SwissProt & PIR: P12345
    • RefSeq: NM_123456, NT_123456, NP_123456, NC_123456, XM_123456, XP_123456
  • Sequence identification numbers
    • GI numbers: 6995995
    • Version numbers: AF123456.3
entrez searching by molecular weight
Entrez -Searching by molecular weight
  • 010600[Molecular Weight]
  • 012345[MOLWT]
  • 010000:050000[MOLWT]
  • 002000:010000[MOLWT] AND human[Organism]
  • [field name]  feature table
entrez range searching
Entrez -Range searching
  • Accession numbers [ACCN], sequence length [SLEN], and molecular weight [MOLWT]
  • AF114696:AF114714[ACCN]
    • Not for GI and Version numbers
  • 3000:4000[SLEN]
  • 002002:002100[MOLWT]
entrez truncating searching
Entrez -Truncating searching
  • Wildcard searching
  • Root word plus *
    • bacte*, retroviru*
  • Only retrieve the first 150 variations of truncated terms
  • Left-handed trunction is not possible
    • *ology
entrez combining sets
Entrez -Combining sets
  • Use your search History to combine documents
    • #1 AND #4

L

entrez boolean operators
Entrez -Boolean operators
  • AND, OR, NOT
  • bacteria AND virus NOT phage
    • (bacteria AND virus) NOT phage
  • hiv-1 OR bacterial protease
    • hiv OR (bacterial AND protease)

L

entrez limit a search to a particular database field
Entrez -Limit a search to a particular database field
  • You are only intrested in nucleotide sequences from the mouse
    • Select Nucleotide database from the black menu bar or the Search pull-down menu.
    • Select limits.
    • In the "Limits To:" section, select Organism from the Search Field pull-down menu.
    • Type "mouse" without quotes in the query box and select Go.
entrez limit a search to a particular database field38
Entrez -Limit a search to a particular database field
  • You are only interested in protein sequences that are less than 50 amino acids in length.
    • Select the Protein database from the black menu bar or the Search pull-down menu.
    • Select Limits.
    • In the "Limited To:" section, select Sequence Length from the Search Field pull-down menu.
    • Type "0:50" without quotes in the query box and select Go.
entrez exclude certain kinds of sequences
Entrez -Exclude certain kinds of sequences
  • You are interested in mitochondrial carriers but you do not want the EST sequences.
    • Select the Nucleotide database from the black menu bar or the Search pull-down menu.
    • Type "mitochondrial carrier" without quotes in the query box.
    • Select Limits.
    • In the "Limited To:" section, checkthe box next to “Exclude ESTs" and select Go.
entrez limit the search to a particular molecule type
Entrez -Limit the search to a particular molecule type
  • You are only interested in Cryptosporidium ribosomal RNA sequences.
    • Select the Nucleotide database from the black menu bar or the Search pull-down menu.
    • Type "cryptosporidium" without quotes in the query box.
    • Select Limits.
    • In the "limited to:" section, select the "Molecule" pull-down menu and choose rRNA and select Go.
entrez limit the search to a particular gene location
Entrez -Limit the search to a particular gene location
  • You are interested in the genes in the chloroplast of flowering plants.
    • Select the Nucleotide database from the black menu bar or the Search pull-down menu.
    • Type "flowering plants" without quotes in the query box.
    • Select Limits.
    • In the "Limited To:" section, select the "Gene Location" pull down menu and choose chloroplast and select Go.
entrez limit the search to records from a particular sequence database
Entrez-Limit the search to records from a particular sequence database
  • You are interested only in cysteine phosphatase protein sequences submitted directly to PIR.
    • Select the Protein database from the black menu bar or the Search pull-down menu.
    • Type "cysteine phosphatase" without quotes in the query box.
    • Select Limits.
    • In the "Limited To:" section, select the "Only From" pull-down menu and choose PIR and select Go.
entrez limit the search by date
Entrez -Limit the search by date
  • You want to see any nucleotide sequences from pigs added to the database (or updated) in the last 30 days.
    • Select the Nucleotide database from the black menu bar or the Search pull-down menu.
    • Type "pigs" without quotes in the query box.
    • Select Limits.
    • In the "Limited To:" section, select Organism from the Search Field pull-down menu.
    • And in the "Limited To:" section, select the "Modification Date" pull down menu and choose 30 days and select Go.
entrez limit the search by date44
Entrez -Limit the search by date
  • You want to retrieve all mouse or human nucleotide sequences added to the database (or updated) during 1997.
    • Select the Nucleotide database from the black menu bar or the Search pull-down menu.
    • Type "mouseOR human" without quotes in the query box.
    • Select Limits.
    • In the "Limited To:" section, select Organism from the Search Field pull-down menu.
    • And in the "Limited To:" section, select the "Modification Date" pull down menu and choose Modification Date. In the date boxes, type the dates in the format YYYY/MM/DD. You can tab from box to box in the date fields. Select Go.
entrez using more than one limit at a time
Entrez -Using more than one limit at a time
  • You are interested in the protein translations of human GenBank nucleotide sequences added to the protein database (or updated) in the last 30 days. You do not want patent records.
    • Select the Protein database from the black menu bar or the Search pull-down menu.
    • Type "human" without quotes in the query box.
    • Select Limits.
    • In the "Limited To:" section, select Organism from the Search Field pull-down menu.
    • On the same screen, select the exclude patents check box, select GenBank from the Only From pull-down menu, and finally select 30 days from the Modification Date pull-down menu and select Go.
entrez writing advanced search statements
Entrez -Writing advanced search statements
  • Find all human nucleotide sequences with LTR annotations.
    • In the Nucleotide database use the following expression -

LTR[FKEY] AND human[ORGN]

  • Find drosophila population studies published in the Journal of Molecular Evolution
    • In the PopSet database use the following expression -

j mol evol[JOUR] AND drosophila[ORGN]

entrez writing advanced search statements47
Entrez -Writing advanced search statements
  • Find all human protein sequences with lengths between 50 and 60 amino acids and that were entered into the database during 1999.
    • In the Protein database use the following expression -

human[ORGN] AND 50[SLEN]:60[SLEN] AND 1999[MDAT]

feature key name partial list
allele

attenuator

CAAT_signal

CDS

enhancer

exon

gene

GC_signal

iDNA

intron

J_region

LTR

misc_binding

misc_feature

mRNA

polyA_signal

polyA_site

STS

3’UTR

5’clip

Feature Key Name (partial list)

ftp://ncbi.nlm.nih.gov/genbank/gbrel.txt

feature qualifiers partial list
/anticodon

/bound_moiety

/citation

/codon

/codon_start

/cons_splice

/db_xref

/direction

/EC_number

/evidence

/function

/gene

/map

/note

/organism

/phenotype

/rpt_family

/translation

Feature Qualifiers (partial list)
gold genome online database http wit integratedgenomics com gold
Gold: Genome OnLine Databasehttp://wit.integratedgenomics.com/GOLD/
  • Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding complete and ongoing genome projects around the world.
deambulum http www infobiogen fr services deambulum english
Deambulumhttp://www.infobiogen.fr/services/deambulum/english/Deambulumhttp://www.infobiogen.fr/services/deambulum/english/
deambulum http www infobiogen fr services deambulum english61
Deambulumhttp://www.infobiogen.fr/services/deambulum/english/Deambulumhttp://www.infobiogen.fr/services/deambulum/english/
deambulum readseq http www infobiogen fr services deambulum english
Deambulum: READSEQhttp://www.infobiogen.fr/services/deambulum/english/
slide63
NCGR: National Center for Genome ResourcesGSDB: Genome Sequence Databasehttp://www.ncgr.org/research/sequence/data_retrieval.html
slide64
NCGR: National Center for Genome ResourcesGSDB: Genome Sewquence Databasehttp://www.ncgr.org/research/sequence/data_retrieval.html
slide65
NCGR: National Center for Genome ResourcesGSDB: Genome Sequence Databasehttp://www.ncgr.org/research/sequence/data_retrieval.html
exint an exon intron database of eukaryotic organism http intron bic nus edu sg exint exint html
ExInt: An Exon-Intron Database of Eukaryotic Organismhttp://intron.bic.nus.edu.sg/exint/exint.html
exint an exon intron database of eukaryotic organism http intron bic nus edu sg exint exint html75
ExInt: An Exon-Intron Database of Eukaryotic Organismhttp://intron.bic.nus.edu.sg/exint/exint.html
exint an exon intron database of eukaryotic organism http intron bic nus edu sg exint exint html76
ExInt: An Exon-Intron Database of Eukaryotic Organismhttp://intron.bic.nus.edu.sg/exint/exint.html
methdb http www methdb de
MethDBhttp://www.methdb.de./
  • The purpose of this database is to provide the scientific community with a resource to
    • store DNA methylation data
    • search for methylation patterns and profiles
    • correlate methylation and expression data of genes
small rna database http mbcr bcm tmc edu smallrna smallrna html
Small RNA Databasehttp://mbcr.bcm.tmc.edu/smallRNA/smallrna.html
utrdb utrsite http bigarea area ba cnr it 8000 embit utrhome
UTRdb & UTRsitehttp://bigarea.area.ba.cnr.it:8000/EmbIT/UTRHome/
tard http wwwicg bionet nsc ru srcg translation
TARDhttp://wwwicg.bionet.nsc.ru/SRCG/Translation/
  • Gene expression is often regulated at the level of mRNA translation. The structural characteristics of mRNA correlate with translation efficiency and specificity. Determination of "active elements" could be very useful for prediction of the gene expression pattern under both normal and stress conditions because not all mRNAs can be translated when stressed. Prediction of the gene expression pattern can might be useful for biotechnology and cDNA analysis.
rad rna abundance database http www cbil upenn edu rad2
RAD: RNA Abundance Databasehttp://www.cbil.upenn.edu/RAD2/
  • RAD (RNA Abundance Database) is a public gene expression database designed to hold data from array-based (microarrays, high-density oligo arrays, macroarrays) and nonarray-based (SAGE) experiments.
  • The ultimate goal is to allow comparative analysis of experiments performed by different laboratories using different platforms and investigating different biological systems.
cook your food by yourself
Farms

Markets

Restaurents

Cooking skills

Sequencing centers

Nucleotide databases

Value-added databases

Bioinformatics

Cook your food by yourself
exercise
Exercise
  • Please try to write a search statement for finding all mouse nucleotide sequences with CDS annotations.