protein sequence retrieval and other database information
Download
Skip this Video
Download Presentation
Protein sequence retrieval AND other database information

Loading in 2 Seconds...

play fullscreen
1 / 21

Protein sequence retrieval AND other database information - PowerPoint PPT Presentation


  • 164 Views
  • Uploaded on

Protein sequence retrieval AND other database information. databases. Protein sequence(primary) SWISS-PROT PIR-International Protein sequence (composite) OWL NRDB. Protein sequence (secondary). PROSITE PRINTS Pfam. Macromolecular structures. Protein Data Bank (PDB)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Protein sequence retrieval AND other database information ' - wyanet


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
databases
databases
  • Protein sequence(primary)
    • SWISS-PROT
    • PIR-International
  • Protein sequence (composite)
    • OWL
    • NRDB
macromolecular structures
Macromolecular structures
  • Protein Data Bank (PDB)
  • Nucleic Acids Database (NDB)
  • HIV Protease Database
  • ReLiBase
  • PDBsum
  • CATH
  • SCOP
  • FSSP
slide5

Nucleotide sequences

    • GenBank
    • EMBL
    • DDBJ
  • Genome sequences
    • Entrez genomes
    • GeneCensus
    • COGs
slide6

Integrated databases

    • InterPro
    • Sequence retrieval system (SRS)
    • Entrez
protein sequence alignment and database searching

Protein Sequence Alignment and Database Searching

Alignment of Two Sequences (Pair-wise Alignment)

The Scoring Schemes or Weight Matrices

Techniques of Alignments

DOTPLOT

Multiple Sequence Alignment (Alignment of > 2 Sequences)

Extending Dynamic Programming to more sequences

Progressive Alignment (Tree or Hierarchical Methods)

Iterative Techniques

Stochastic Algorithms (SA, GA, HMM)

Non Stochastic Algorithms

Database Scanning

FASTA, BLAST, PSIBLAST, ISS

Alignment of Whole Genomes

MUMmer (Maximal Unique Match)

slide8

Input Query

Amino Acid Sequence

DNA Sequence

Blastp

tblastn

blastn

blastx

tblastx

Compares

Against

Protein

Sequence

Database

Compares

Against

translated

Nucleotide

Sequence

Database

Compares

Against

Nucleotide

Sequence

Database

Compares

Against

Protein

Sequence

Database

Compares

Against

translated

nucleotide

Sequence

Database

An Overview of BLAST

comparison of whole genomes
Comparison of Whole Genomes
  • MUMmer (Salzberg group, 1999, 2002)
    • Pair-wise sequence alignment of genomes
    • Assume that sequences are closely related
    • Allow to detect repeats, inverse repeats, SNP
    • Domain inserted/deleted
    • Identify the exact matches
  • How it works
    • Identify the maximal unique match (MUM) in two genomes
    • As two genome are similar so larger MUM will be there
    • Sort the matches found in MUM and extract longest set of possible matches that occurs in same order (Ordered MUM)
    • Suffix tree was used to identify MUM
    • Close the gaps by SNPs, large inserts
    • Align region between MUMs by Smith-Waterman
secondary protein database
Secondary protein database
  • SWISS-PROT (1986)
    • Best annotated, least redundant
  • PIR (Protein Information Resource)
    • More automated annotation
    • Collaborations with MIPS and JIPID
secondary protein databases
Secondary protein databases
  • SWISS-PROT (1986)
    • Best annotated, least redundant
  • PIR (Protein Information Resource)
    • More automated annotation
    • Collaborations with MIPS and JIPID
  • Uniprot (2003)
    • UniProt (Universal Protein Resource) is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR.
databases1
Primary (archival)

GenBank/EMBL/DDBJ

UniProt

PDB

Medline (PubMed)

BIND

Secondary (curated)

RefSeq

Taxon

UniProt

OMIM

SGD

Databases
organismal divisions
Organismal Divisions

Used in which database?

BCT Bacterial DDBJ - GenBank

FUN Fungal EMBL

HUM Homo sapiens DDBJ - EMBL

INV Invertebrate all

MAM Other mammalian all

ORG Organelle EMBL

PHG Phage all

PLN Plant all

PRI Primate (also see HUM) all (not same data in all)

PRO Prokaryotic EMBL

ROD Rodent all

SYN Synthetic and chimeric all

VRL Viral all

VRT Other vertebrate all

functional divisions
Functional Divisions

PAT Patent

EST Expressed Sequence Tags

STS Sequence Tagged Site

GSS Genome Survey Sequence

HTG High Throughput Genome (unfinished)

HTC High throughput cDNA (unfinished)

CON Contig assembly instructions

Organismal divisions:

BCTFUNINVMAMPHGPLN

PRIRODSYNVRLVRT

est expressed sequence tag
EST: Expressed Sequence Tag

Expressed Sequence Tags are short

(300-500 bp) single reads from mRNA (cDNA)

which are produced in large numbers.

They represent a snapshot of what is expressed

in a given tissue, and developmental stage.

Also see: http://www.ncbi.nlm.nih.gov/dbEST/

http://www.ncbi.nlm.nih.gov/UniGene/

slide18
STS

Sequenced Tagged Sites, are operationally

unique sequence that identifies the combination of primer pairs used in a PCR assay that generate a mapping reagent which maps to a single position within the genome.

Also see: http://www.ncbi.nlm.nih.gov/dbSTS/http://www.ncbi.nlm.nih.gov/genemap/

gss genome survey sequences
GSS: Genome Survey Sequences
  • Genome Survey Sequences are similar in nature
  • to the ESTs, except that its sequences are genomic
  • in origin, rather than cDNA (mRNA).
  • The GSS division contains:
    • random "single pass read" genome survey sequences.
    • single pass reads from cosmid/BAC/YAC ends (these could
    • be chromosome specific, but need not be)
    • exon trapped genomic sequences
    • Alu PCR sequences

Also see: http://www.ncbi.nlm.nih.gov/dbGSS/

htg high throughput genome
HTG: High Throughput Genome

High Throughput Genome Sequences are

unfinished genome sequencing efforts records.

Unfinished records have gaps in the nucleotides sequence, low accuracy, and no annotations on the records.

Also see: http://www.ncbi.nlm.nih.gov/HTGS/

Ouellette and Boguski (1997) Genome Res.7:952-955

which tool
Which tool?

mRNA

Genomic

STS/

GSS

EST

Other

Other

HTGS

dbEST

Simple

  • Better control of annotations
  • pop/phylo
  • segmented sets

Simple

dbSTSdbGSS

Customized software or tbl2asn

E-mail

or FTP

WWW

BankIt

Sequin

or tbl2asn

WWW

BankIt

E-mail

or FTP

E-mail

or FTP

E-mail

ad