Computer Storage of Sequences. (Chapter 2 of Bioinformatics: Sequence and Genome Analysis By David W. Mount). CSE730: Seminar on “Information Retrieval of Biomedical Text and Data”. Outline. Storing DNA/Protein sequences into computer files or databases.
(Chapter 2 of
Bioinformatics: Sequence and Genome Analysis
By David W. Mount)
CSE730: Seminar on
“Information Retrieval of Biomedical Text and Data”
Sequence is stored as ASCII text (i.e. sequence of A,G,C,T…) along with annotations.
Different sequence formats recognized by different sequence analyzer programs.
Sequence Format includes accessory information, gene names, source organism, investigator name, references, and the actual sequence.
GenBank at the National Center of Biotechnology Information (NCBI), National Library of Medicine, Washington, DC
Protein Information Resource (PIR) database at the National Biomedical Research Foundation in Washington, DC
The SwissProt protein sequence database at ISREC, Swiss Institute for Experimental Cancer Research.
European Molecular Biology Laboratory (EMBL) Outstation at Hixton, England
DNA DataBank of Japan (DDBJ) at Mishima, Japan
DNA sequence in GenBank is formatted into distinct attributes as following
Tools for Data Retrieval and submission
BLAST: Basic Local Alignment Search Tool
Q-BLAST: It is a queuing system to BLAST that allows users to retrieve results at their convenience and format their results.
Access to BLAST service
Protein Information Resource
3 Major Databases:
(Nonredundant REFerence protein database)
The current version (July 2002) consists of more than 809,000 non-redundant PIR-PSD, SwissProt and TrEMBL proteins organized with more than 36,200 PIR superfamilies, 145,340 families, and links to over 50 molecular biology databases.
Swiss-Prot Sequence Entry Example
Sequence Format Conversion program.
Can convert to/from:
Presented by:Hemal Patel &Jeetal Shah