BioInformatics Consultation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

BioInformatics Consultation Practice 1 Gá bor Pauler , Ph.D. Tax.reg.no: 63673852-3-22 PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on
  • Presentation posted in: General

BioInformatics Consultation Practice 1 Gá bor Pauler , Ph.D. Tax.reg.no: 63673852-3-22 Bank account: 50400113-11065546 Location: 1st Széchenyi str., 7666 Pogány, Hungary Tel: +36-309-015-488 E-mail: pauler @ t-online.hu. Content of the Practice. Sequence Databases

Download Presentation

BioInformatics Consultation Practice 1 Gá bor Pauler , Ph.D. Tax.reg.no: 63673852-3-22

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bioinformatics consultation practice 1 g bor pauler ph d tax reg no 63673852 3 22

BioInformatics Consultation

Practice 1

Gábor Pauler, Ph.D.

Tax.reg.no: 63673852-3-22

Bank account: 50400113-11065546

Location: 1st Széchenyi str., 7666 Pogány, Hungary

Tel: +36-309-015-488

E-mail: [email protected]


Content of the practice

Content of the Practice

  • Sequence Databases

    • Basic terms and types

    • File formats

      • Nucleotid/ Amino acid code tables

      • FASTA

      • EMBL

        • Content

        • Feature table

        • Sequence description

      • Relational databases

    • Text based search

    • Sequence searches

  • Home Assignment 1: Sequence search

  • References


Sequence databases basic terms and types 1

Sequence Databases: Basic terms and types 1

WARNING: In our discussion we assume that basic terms of molecular genetics are known. If not, please check out: BioinfoNotes

  • Definition:

    • They are for complex description of nucleotide / protein sequences, with all auxiliary information attached

  • Data sources:

    • Primary (Elsődleges):

      • RNA/DNA/Protein sequence samples sequenced and submitted by reserachers

      • They contain 1-2 genes in average (1.5-3Kbases)

    • Genom projects (Genomprojektek): human, mouse, etc:

      • They prepare complete gene map with Chromosome walking (Kromoszómán lépkedés) method

      • They try to identify the following types of „navigation signals” in the genome:

      • Expressed Seqence Tags, EST (Expresszálódó szekvencia jelek)

        • Connection sequences of protein factors controling gene expression

      • Genomic Survey Sequences, GSS (Genomikus rész-szekvenciák)

        • Mobile genetic elements causing mutation:

          • Transposable-DNA, P-elements, etc.

        • Moreover, all of their mutated form

    • Environmental DNA samples

      • DNA fragments collected from ocean water: it allows to research genetic material of such microorganism, which cannot be breeded in laboratory (eg. sulphur consuming bacteria at sea bottom vulcanic wells)

    • Secondary (Másodlagos):

      • Protein sequences translated back from cDNA

      • Nucleotid sequences translated back from Amino Acid sequence of a protein using most frequent codon usage (Kodon használat) of Amino Acids in the given organism

  • Distribution of data sources by races:


Sequence databases basic terms and types 2

Sequence Databases: Basic terms and types 2

  • Sequence databases by content checking:

    • Redundant (Redundáns)

      •  It can contain repeating sequences, they are not checked

      •  Therefore, it can grow faster

    • Non-Redundant, NR (Nem redundáns)

      •  There is a sequence matching test at submitting ensuring that a sequence is stored only once,  but therefore it is slower growing

  • Types of sequence databases and their histors:

    • Major nucleotide databases:

      • 1980 EBI Heidelberg – Hinxton, UK: EMBL (http://www.ebi.ac.uk/embl/)

      • 1980 NCBI Bethesda, USA: GenBank (http://www.ncbi.nlm.nih.gov/genbank/)

      • 1980s CIB Japan: DDBJ (http://www.ddbj.nig.ac.jp/searches-e.html)

        • They are cross-synchronized

        • They are doubling in approximately every two years (in 2009 Dec: 173M records containing 271GBases)

        • Their Primary key (Elsődleges kulcs) is Accession Number, AC (Elérési szám), a unique, compulsory-to-fill ID of sequences they can be referenced with

    • Major protein databases:

      • 1986 SIB Sweden: Swiss-PROT (http://www.ebi.ac.uk/uniprot/)

        • This is strictly superwised by curators, sequence submitting is not automatic

        • So this original database was relatively small (600K basic sequences) however quite a reliable

        • It contains primary sequences and also cDNA translated ones

        • It usually contains the following auxiliary data also

          • Function of protein

          • Secontary a and tertiary structure

          • Remarkable protein domains (Domének)

          • Homolog (Homológ) parts: they have the same function

      • TrEMBL (http://www.ebi.ac.uk/uniprot/ ): Automatically translated proteins from cDNA stored in EMBL. It is not superwised, so less reliable and often without auxiliary explanation, but grows faster than Swiss-PROT

      • UniProt (http://www.ebi.ac.uk/uniprot/): Unification of Swiss-PROT and TrEMBL


Content of the practice1

Content of the Practice

  • Sequence Databases

    • Basic terms and types

    • File formats

      • Nucleotid/ Amino acid code tables

      • FASTA

      • EMBL

        • Content

        • Feature table

        • Sequence description

      • Relational databases

    • Text based search

    • Sequence searches

  • Home Assignment 1: Sequence search

  • References


Sequence databases file formats nucleotid amino acid code tables

Sequence Databases: File formats: Nucleotid/ Amino acid code tables

  • In all formats, nucletide(Nukleotid) sequences are described by series of 1-char codes:

    • There are 2 standards, GCG (more frequent) and Sanger (almost extinct): they denote the 4 basic nuclein acids identically (A,G,C,T/U) but differ in how to denote uncertainly sequenced nucleotide positions (Pozíció)

  • Amino acid (Aminosav) sequences can be described by series of 1 or 3 char codes:

    • 1-char is used more

    • frequently, as it is shorter

    • 3-char is longer but has

    • the same lenght as the co-

    • ding nucleotide sequence,

    • as Triplet (Triplet) of 3 nuc-

    • letiodes called Codon (Ko-

    • don) codes 1 amino acid

    • Codons can also be des-

    • cribed in masked format, u-

    • sing GCG-coded wildcards


Sequence databases file formats fasta embl genbank relational

Sequence Databases: File formats: FASTA, EMBL, GenBank, Relational

Sequence

SeqID

Accession

Name

Author

Title

Sequence

Lenght

Submitted

Organism

Gene

GeneID

Name

Position

Lenght

Discovered

Organism

GeneLocation

LocationID

Comment

PositionFrom

PositionTo

SeqID

GeneID

IntrExon

IntrExonID

PositionFrom

PositionTo

LocationID

  • FASTA format: a simple text file format describing nucleotide/protein sequences:

    • It starts with „>” character and the description of the sequence

    • In the next line the sequence starts with 1-char codes and it goes uninterrupted by any other character (Enter, Space, Tab, numbers, etc.) until the end of file:

    •  It does not contain any auxiliary info

    •  It is a common interchange format

      among more difficult formats and used for I/O of bioinfo software

  • Formats of complex sequence databases (EMBL, GenBank):

  • Historically: these contained multi-line text files as„records”, where:

    • Number and lenght of lines is not fixed (can be ≤80 characters),

    • Function of lines are denoted by 2 prefix characters, eg.: ID-internal ID, AC-accession#

    •  They give complex description, not just the sequence, with many auxiliary info

    •  They can be read and edited by any word processor

    •  However, in their original format, they reflect the technical level of the mid 1980s mainframe systems and DID NOT FORM DATABASE in modern terms:

      • As there can be variable number of lines in one record, and lines has variable lenght, the records can be searched only sequentially (Szekkvenciálisan): you have to read through the whole, lengthy record to capture any detail  Damn slow!!!

      • Description of auxiliary data was sometimes not fully standardized

  • Modern databases: Therefore, modern sequence database servers use these text file records ONLY AT I/O, and data is stored as Relational database (Relációs adatbázis):

    • Non-fixed lenght data structure of records (eg. one

      sequence can have many genes which can have

      many introns/exons, internal 1:many relationships)

    • Is broken up into several database tables (Adatbá-

      zis tábla) consisting fixed number of data fields(Me-

      ző)of given types (Text, Number, Date, etc.) and

      unlimited number of records(Rekord) can be read/

      write and search incredibly fast (0.5-1M recs/sec)

    • Tables are connected with 1:manyrelations (Relá-

      ció) consist foreign key(Idegen kulcs) fields refe-

      rencing to primary key field of other table to descri-

      be the original non-fixed lenght data structure:

>SequenceName,SeqID,CommentText

ggatccagtg…

ID ATCSCH42 standard;

AC X51799;

FT Exon 2770..2899;

FT Intron 2900..2980;

SQ Sequence 6981 BP;

ggatccagtg…


Sequence databases file formats embl content 1

Sequence Databases: File formats: EMBL content 1

ID ATCSCH42 standard; DNA; PLN; 6801 BP.

XX

AC X51799;

XX

SV X51799.1

XX

DT 16-MAR-1990 (Rel. 23, Created)

DT 11-MAR-1999 (Rel. 59, Last updated, Version 3)

XX

DE Arabidopsis thaliana cs/ch-42 gene for a chloroplast

XX

KW chlorata locus; chloroplast protein; unidentified reading

XX

OS Arabidopsis thaliana (thale cress)

OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta;

OC euphyllophytes; Spermatophyta; Magnoliophyta;

OC core eudicots; Rosidae; eurosids II; Brassicales;

OC Arabidopsis.

XX

RN [1]

RP 1-6801

RA Mayerhofer R.;

RT ;

RL Submitted (06-FEB-1990) to the EMBL/GenBank

RL Mayerhofer R., MPI fuer Zuechtungsforschung,

RL Koeln 30, F R G.

XX

RN [2]

RP 1-6801

RA Koncz C., Mayerhofer R., Koncz-Kalman Z., Nawrath C.,.

RT Isolation of a gene encoding a novel chloroplast

RT in Arabidopsis thaliana;

RL EMBO J. 9:1337-1347(1990).

XX

DR AGIS; X51799; 17-SEP-1999.

DR MENDEL; 12580; Arath;1780;12580.

DR SWISS-PROT; P16127; CHLI_ARATH.

DR SWISS-PROT; P16128; YCCH_ARATH.

First, lets see the text file record of EMBL: (Genbank slightly differs but has same logic)

  • Annotations part: Record identification info:

    • ID Primary key (Elsődleges kulcs) in the given database

    • AC Accession number (Elérési szám) syn-chornized ID through more databases (EMBL, GenBank, DDBJ), there can be more ACs for one sequence

    • SV Sequence version (Szekvencia-változat)

    • DT Date (Dátum) of last submit/modify

    • DE Description (A szekvencia rövid leírása)

    • KW Keyword (Kulcsszó), there can be more

    • OS Organism species (Szekvenc.forrás-faj)

    • OC Organism classification (Szekvencia forrás- taxonómiai besorolás)

    • OG Organelle (Forrás-szervezet)

  • References part: This describes in which publications the sequence was published (can contain hyperlinks also):

    • RN Reference number (Száma),

    • RC Reference comment (Megjegyzés),

    • RP Reference positions (Oldalszám),

    • RX Reference cross-reference (Eredeti v. kereszthivatkozás),

    • RA Reference authors (Szerzők),

    • RT Reference title (Cím),

    • RL Reference location (Folyóirat),

    • DR Database cross-reference (Adatbázis),

    • CC Comments (Általános megjegyzés).

  • Empty row for spacing the file: XX


Sequence databases file formats embl content feature table

Sequence Databases: File formats: EMBL content: Feature table

FH Key Location/Qualifiers

FH

FT source 1..6801

FT /chromosome=4

FT /db_xref=taxon:3702

FT /organism=Arabidopsis thaliana

FT /strain=columbia

FT /map=39.4

FT CDS complement(<1..872)

FT /db_xref=MENDEL:12580

FT /db_xref=SWISS-PROT:P16128

FT /note=ORF (291 AA)

FT /protein_id=CAA36096.1

FT /translation=MLCFSASRLDDFDLGSSPPKK

FT DLDFGLDLPITRQVPSKANTDVQAKASAEK

FT FEAVESPQGSRKKASQTHTMCVQPQSVD

FT SEIAHIAVNRETSPDIHELCRSGTKEDCPID

FT HLCSDKIEHQQEEMGTDTQAEIQDNTKGA

FT DLSEKLPLDP

FT precursor_RNA 2770..4382

FT /note=primary transcript

FT mRNA join(2770..2899,2981..3095,3205..4382)

FT /note=exon 1

FT CDS join(2796..2899,2981..3095,3205..4260)

FT /db_xref=SWISS-PROT:P16127

FT /note=chloroplast protein

FT /protein_id=CAB38561.1

FT /translation=MASLLGTSSSAIWASPSLSSPS

FT IQIRPKKNRSRYHVSVMNVATEINSTEQVV

FT LNVIDPKIGGVMIMGDRGTGKSTTVRSLVD

FT RVEKGEQVPVIATKINMVDLPLGATEDRVC

FT YVDEVNLLDDHLVDVLLDSAASGWNTVER

FT RFGMHAQVGTVRDADLRVKIVEERARFDS

FT QIDRELKVKISRVCSELNVDGLRGDIVTNRA

FT RLRKDPLESIDSGVLVSEKFAEIFS

FT exon 2770..2899

FT /number=1

FT intron 2900..2980

FT /number=1

FT exon 2981..3095

FT /number=2

FT intron 3096..3204

FT /number=2

FT exon 3205..4382

FT /number=3

FT polyA_signal 4378..4382

  • It contains analysis of special features in the sequence:

  • FH feature table header (fejlécsor)

  • FT feature table data (adatsor)

    • FT Source rows: describe basis coordinates of a feature in a hierarchy of Organistaion>Chromoso-me>Structural part in (StartCoord1..EndCoord1, StartCoord2..EndCoord2) format

    • FT CDS rows (Master): describe the protein product translated from feature:

      • At which protein database record it can be found

      • 1-char sequence of amino acids in protein in the direction from N-terminal end to COOH-terminal end (It complies with 5’-3’ direction of the coding DNA strand (DNS szál)

    • FT Precursor_RNA: describe start/end coordinates ofinmature mRNA (mRNS)

    • FT mRNA: describe start/end coordinates ofmature mRNA after splicing

    • FT CDS rows (Additional): describes protein products of alternative splicing, in the same format as above

    • FT exon: describe exon’s consecutive number in spicing and start/stop coordinates

    • FT intron: describe Introns’s consecutive number in spicing and start/stop coordinates

    • FT FeatureName: start/stop coordinates of any other special feature

    • Promoter: start/stop coordinates of promoter pats of genes

    • Domain: Domains (Domének) in protein products

    • Mutation: altered sequences of mutant versions


Sequence databases file formats embl content sequence description

Sequence Databases: File formats: EMBL content: Sequence description

SQ Sequence 6801 BP; 2093 A; 1242 C; 1374 G; 2092 T; 0 other;

ggatccagtg gtagcttttc actcaaatct tgtaccttgg cagtttggct tgtacgagtg

60

cctggtgata ttttgcctga gagggttgtt agagaatgtc cagcatctga gttatacagt

120

gctcctttag tgttatcctg tatttctgcc tgagtgtctg tacccatttc ttcctgttga

180

tgttctatct tgtctgaaca taaatgagat gagatgcttg gtgaagtctg

  • SQ: Sequence header (Szekv. fejléc):

    • Frequency of nucleine/amino acids in nucleotide/protein sequence

  • 60,120, 180…:Sequence rows:

    • 6×10 base/aminoacids with one row, tabbed with Space characters

    • Variations in storing different type of sequences:

    • DNA: Chromosomal DNA: It is described by the upper strand (Felső szál) in 5’-3’ direction with 1-char GCG codes (if all genes are coded on the upper strand, as this is the most frequent case)

    • mRNA: Messenger RNA: almost same as above, except that Timin (T) is replaced with Uracil (U)

    • cDNA: Coding DNA from mature mRNA: This is stored as RNA sequence also with (G,C,U,A) codes

    • tRNA: Transfer RNA: It is stored as non-modified colinear sequence of DNA

    • Protein: it is described with 1-char amino acid codes in N-terminal.. COOH-terminal direction.

  • Don’t forget that meaning of codons at translation show slight variations in different organisms, and even at their mithocodrial DNA (Motokondriális DNS)

  • Differing from Standard Codon Table

  • Normally, for a biologist it is enogh to know the content of the text file records and will never see how they are really stored

  • However if somebody seriously tampers with bioinformatics software, should know:

    • How they are store data in a modern relational database behind the software,

    • How structure of a relational database can be described,

    • How to import/export data directly to a relational database if necessary


Sequence databases relational databases entity relationship diagrams

Sequence databases: Relational databases: Entity Relationship Diagrams

Address

AddressID

Door

Floor

Building

HouseNum

Street

StreetType

LinePhone

Fax

Zip

Modifier

Modified

Status

EntityName

EntityNameID

Text

Integer

Fraction

Binary

Date

Time

Image

Sound

Movie

ReqForeignKey

OptForeignKey

Modifier

Modified

Status

MasterEntity

MasterID

MasterName

Invoice

InvoiceID

InvoiceNum

ItemCount

NetTotalVal

GrossTotal

VATTotal

Paid

IssueDate

IssueTime

SellerID

BuyerID

SalesPersID

Modifier

Modified

Status

Item

ItemID

Quantity

NetVal

GrossVal

InvoiceID

BarCode

Modifier

Modified

Status

StreetType

StreetType

TypeName

VAT

VATCode

VATPercent

MeasUnit

MeasUnit

UnitName

Country

Country

CntName

Zip

Zip

City

Country

LegalFormat

LegalFormat

FormatName

ITJ

ITJCode

Description

VATCode

PersProdSales

PersProdSalID

SumOfSales

SalesPersID

BarCode

Product

BarCode

Description

UnitPrice

VATCode

ITJCode

MeasUnit

Seller

SellerID

SellerName

LegalFormat

SellerTaxReg

CellPhone

E-mail

URL

AddressID

Buyer

BuyerID

FirstName

LastName

CellPhone

E-mail

AddressID

SalesPers

SalesPersID

FirstName

LastName

CellPhone

E-mail

AddressID

Entity relationship diagram (Egyedkapcsolati diagram) (ERD): is used to represent structure of a Relational Database System (RDS)

  • Tables are rounded corner boxes with Entity-Name at the top. Blue background denotes codetable/master entities with minimal data change in time, yellow denotes relational/trans-action entities: rapid, irrevocable data changes in time

  • Fields are listed with their data type icons:

    ( , , , , , , , , ) and names: italic means op-tional-, normal means required-, bold means auto-filled attribute

  • Data fields are purple, primary keys are orange prompted by ( ), foreign keys are olive prompted by( ), auto-filled system log-ging attributes are black

  • 1:many relations are denoted by ( ) con-necting primary-and foreign keys:

    • Independent side of relation

      is denoted on ERD with

      dashed line ( ), depen-

      dent with solid ( )

    • Its referential integrity check

      is denoted with ( ),

      unswitchability with ( ),

      cascade delete disabled

      with ( ).

  • Lets see a simple example of invoicing:


Sequence databases relational databases examples

Sequence databases: Relational databases: Examples

  • There are alternate ERD symbols also (eg. when  denotes many:1 relation-ship: foreign keyprimary key) but it is easy to shift between them once relational logic is understood

  • We put here some examples of ERD database designs of current bioinfo software to show their complexity (see references for further details)


Content of the practice2

Content of the Practice

  • Sequence Databases

    • Basic terms and types

    • File formats

      • Nucleotid/ Amino acid code tables

      • FASTA

      • EMBL

        • Content

        • Feature table

        • Sequence description

      • Relational databases

    • Text based search

    • Sequence searches

  • Home Assignment 1: Sequence search

  • References


Text based search

Text-based search

Click

We use this if we do not know the accession number (AC) of the sequence, but we know some auxiliary information (organism, author, journal, etc.):

  • Entrez (http://www.ncbi.nlm.nih.gov/sites/gquery) this is the multi-search engine of NCBI (National Center of Biotechnology Information). At the Main screen:

    • We can launch general search (slower), or

    • Select the specific database to search (faster):

      • PUBMed: search among publication abstracts and references. Its user interface clicking at Advanced search:

        • Search box: keyword search with AND, OR,NOT logic operators and * wildcards

        • Search builder: easy-to-use graphic interface to build more difficult search terms pressing Add to search button:

          • Searches can be more focused if we give the database fiield (ID..CC) where keywords should be searched

          • We can do the same in search box putting field nam in brackets: [Title]

        • Search history: List of our recent searches. Items can be combined with logic operators in the following format:

          #1 AND #2 to build even more complex searches

      • PubMed Central: we can search here free full text publications on similar user interface

        Other databases recommended for full text search:

  • UniProt: for proteins: http://www.ebi.ac.uk/uniprot/

  • PIR: Mainly for proteins: http://pir.georgetown.edu/

  • SRS: EBI multisearch interface: http://srs.ebi.ac.uk

  • DB-GET:Japanese multi:http://www.genome.jp/dbget/

Click

Click

Click

Click

Click

Click

Click

Click


Sequence searches

Sequence searches

We use this if we know the accession number (AC)

  • Unique (egyedi): If we are looking for single sequence

    • Entrez (http://www.ncbi.nlm.nih.gov/sites/gquery ):

      • Main screen:

        • Nucleotide search: we get similar search builder than at text based search

        • Results are retrieved in FASTA or EMBL or GenBank format

  • Another sites for sequence search:

    • GeneBank: http://www.ncbi.nlm.nih.gov/genbank/

    • Uni-Prot: http://www.ebi.ac.uk/uniprot/

    • EMBL: http://www.ebi.ac.uk/embl/

    • OMIM: http://www.ncbi.nlm.nih.gov/omim

    • RefSeq: http://www.ncbi.nlm.nih.gov/refseq/

    • PDB: http://www.pdb.org/pdb/home/home.do

    • Pfam: http://pfam.sanger.ac.uk/

    • SCF: http://www.sciencechatforum.com

    • ClustalW: http://www.ebi.ac.uk/Tools/clustalw2/index.html

    • BLAST: http://blast.ncbi.nlm.nih.gov/Blast.cgi

  • Batch (Kötegelt): If we look for multiple sequences:

    • Entrez (http://www.ncbi.nlm.nih.gov/sites/batchentrez)

      • Database: select the sequence database

      • File(Tallózás): name and path of a text file containing ACs of the requested sequences (Only 1 AC in 1 line!)

      • Retrieve: Show results in FASTA format concatenated after one other

Click

Click

Click

Click

Click

>X000328

ATTGCGCTATCGTATAGCAT

>X000329

ATTGCGCTATCGTATAGCAT

>X000330

ATTGCGCTATCGTATAGCAT

X000328

X000329

X000330

Click

Click


Home assignment 1 sequence search

Home Assignment 1: Sequence search

  • A, Search for publications and sequences related to Yeast (Élesztő) mitochondrial DNA! (2.5pts)

  • B, Download the following sequence in EMBL format: X51799 What is the total lenght of its introns? (2.5pts)


References

References

  • Text search:

    • NCBI Enterez: http://www.ncbi.nlm.nih.gov/sites/gquery

    • UniProt: http://www.ebi.ac.uk/uniprot/

    • PIR: http://pir.georgetown.edu/

    • SRS: http://srs.ebi.ac.uk

    • DB-GET: http://www.genome.jp/dbget/

  • Sequence search:

    • NCBI Enterez: http://www.ncbi.nlm.nih.gov/sites/gquery

    • NCBI Enterez batch: http://www.ncbi.nlm.nih.gov/sites/batchentrez

    • GeneBank: http://www.ncbi.nlm.nih.gov/genbank/

    • Uni-Prot: http://www.ebi.ac.uk/uniprot/

    • EMBL: http://www.ebi.ac.uk/embl/

    • OMIM: http://www.ncbi.nlm.nih.gov/omim

    • RefSeq: http://www.ncbi.nlm.nih.gov/refseq/

    • PDB: http://www.pdb.org/pdb/home/home.do

    • Pfam: http://pfam.sanger.ac.uk/

    • SCF: http://www.sciencechatforum.com

    • ClustalW: http://www.ebi.ac.uk/Tools/clustalw2/index.html

    • BLAST: http://blast.ncbi.nlm.nih.gov/Blast.cgi

  • Examples of Entity Relationship Diagrams of bioinfo databases:

    • NCBI Lipid Onthology: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1669719/

    • DNA Alignment editor: http://www.biomedcentral.com/1471-2105/9/154

    • Biomed Data Warehouse: http://www.biomedcentral.com/1471-2105/7/170


  • Login