1 / 10

Accessing information on molecular sequences

Accessing information on molecular sequences. Bio 224 Dr. Tom Peavy Sept 1, 2010. What is an accession number?. An accession number is a label that is used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence.

lorand
Download Presentation

Accessing information on molecular sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accessing information on molecular sequences Bio 224 Dr. Tom Peavy Sept 1, 2010

  2. What is an accession number? An accession number is a label that is used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X02775 GenBank genomic DNA sequence NT_030059 Genomic contig Rs7079946 dbSNP (single nucleotide polymorphism) N91759.1 An expressed sequence tag (1 of 170) NM_006744 RefSeq DNA sequence (from a transcript) NP_007635 RefSeq protein AAC02945 GenBank protein Q28369 SwissProt protein 1KT7 Protein Data Bank structure record DNA RNA protein

  3. NCBI’s RefSeq project: accession for genomic, mRNA, protein sequences AccessionMoleculeMethodNote AC_123456 Genomic Mixed Alternate complete genomic AP_123456 Protein Mixed Protein products; alternate NC_123456 Genomic Mixed Complete genomic molecules NG_123456 Genomic Mixed Incomplete genomic regions NM_123456 mRNA Mixed Transcript products; mRNA NM_123456789 mRNA Mixed Transcript products; 9-digit NP_123456 Protein Mixed Protein products; NP_123456789 Protein Curation Protein products; 9-digit NR_123456 RNA Mixed Non-coding transcripts NT_123456 Genomic Automated Genomic assemblies NW_123456 Genomic Automated Genomic assemblies NZ_ABCD12345678 Genomic Automated Whole genome shotgun data XM_123456 mRNA Automated Transcript products XP_123456 Protein Automated Protein products XR_123456 RNA Automated Transcript products YP_123456 Protein Auto. & Curated Protein products ZP_12345678 Protein Automated Protein products

  4. Six ways to access DNA and protein sequences 1) Entrez Gene with RefSeq database (NCBI) 2) UniGene 3) Nucleotide or Protein databases (NCBI) 4) European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) 5) ExPASy Sequence Retrieval System (separate from NCBI) 6) UCSC Genome Browser

  5. What is an EST? • Expressed Sequence Tag sequence • “A short strand of DNA that is part of a cDNA molecule and can act as an identifier of a gene.” • In essence, a single pass DNA sequencing reaction for a particular cDNA

  6. UniGene: unique genes via ESTs • • UniGene at NCBI: • www.ncbi.nlm.nih.gov/UniGene • UniGene clusters contain many ESTs, which are DNA sequences (typically 500 base pairs in length) corresponding to the mRNA from an expressed gene. ESTs are sequenced from a complementary DNA (cDNA) library. • • UniGene data come from many cDNA libraries. • Thus, when you look up a gene in UniGene • you get information on its abundance • and its regional distribution. Pages 20-21

  7. Cluster sizes in UniGene This is a gene with 1 EST associated; the cluster size is 1

  8. Cluster sizes in UniGene This is a gene (or 1 cluster) with10 ESTs associated; the cluster size is 10 Note: HTC= high thoroughput cDNAs

  9. FASTA format

  10. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene Orthologous genes for various model species can be easily identified using this site (curated database)

More Related