1 / 71

National Center for Biotechnology Information (NCBI) ncbi.nlm.nih

National Center for Biotechnology Information (NCBI) www.ncbi.nlm.nih.gov Bunu databases’in icine koy lecture 5i de sonuna. Page 24. Fig. 2.5 Page 25. www.ncbi.nlm.nih.gov. Fig. 2.5 Page 25. PubMed is… National Library of Medicine's search service 16 million citations in MEDLINE

tonya
Download Presentation

National Center for Biotechnology Information (NCBI) ncbi.nlm.nih

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. National Center for Biotechnology Information (NCBI) www.ncbi.nlm.nih.gov Bunu databases’in icine koy lecture 5i de sonuna Page 24

  2. Fig. 2.5 Page 25 www.ncbi.nlm.nih.gov

  3. Fig. 2.5 Page 25

  4. PubMed is… • National Library of Medicine's search service • 16 million citations in MEDLINE • links to participating online journals • PubMed tutorial (via “Education” on side bar) Page 24

  5. Entrez integrates… • the scientific literature; • DNA and protein sequence databases; • 3D protein structure data; • population study data sets; • assemblies of complete genomes Page 24

  6. Entrez is a search and retrieval system that integrates NCBI databases Page 24

  7. BLAST is… • Basic Local Alignment Search Tool • NCBI's sequence similarity search tool • supports analysis of DNA and protein databases • 100,000 searches per day Page 25

  8. OMIM is… • Online Mendelian Inheritance in Man • catalog of human genes and genetic disorders • edited by Dr. Victor McKusick, others at JHU Page 25

  9. Cancer Chromosomes Contains cytogenetic, clinical, and reference information from integrated information from the NCI Mitelman Database of Chromosome Aberrations in Cancer, the NCI Recurrent Aberrations in Cancer database, and the NCI/NCBI SKY/M-FISH & CGH Database.

  10. CDD Conserved Domain Database, a collection of sequence alignments and profiles representing protein domains conserved in molecular evolution. Select 'Domains' from the Entrez pull down menu.

  11. http://www.ncbi.nlm.nih.gov/About/tools/restable_mol.html

  12. Books is… • searchable resource of on-line books Page 26

  13. TaxBrowser is… • browser for the major divisions of living organisms • (archaea, bacteria, eukaryota, viruses) • taxonomy information such as genetic codes • molecular data on extinct organisms Page 26

  14. Structure site includes… • Molecular Modelling Database (MMDB) • biopolymer structures obtained from • the Protein Data Bank (PDB) • Cn3D (a 3D-structure viewer) • vector alignment search tool (VAST) Page 26

  15. Accessing information on molecular sequences Page 26

  16. Accession numbers are labels for sequences NCBI includes databases (such as GenBank) that contain information on DNA, RNA, or protein sequences. You may want to acquire information beginning with a query such as the name of a protein of interest, or the raw nucleotides comprising a DNA sequence of interest. DNA sequences and other molecular data are tagged with accession numbers that are used to identify a sequence or other record relevant to molecular data. Page 26

  17. What is an accession number? An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X02775 GenBank genomic DNA sequence NT_030059 Genomic contig Rs7079946 dbSNP (single nucleotide polymorphism) N91759.1 An expressed sequence tag (1 of 170) NM_006744 RefSeq DNA sequence (from a transcript) NP_007635 RefSeq protein AAC02945 GenBank protein Q28369 SwissProt protein 1KT7 Protein Data Bank structure record DNA RNA protein Page 27

  18. Four ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Note: LocusLink at NCBI was recently retired. The third printing of the book has updated these sections (pages 27-31). Page 27

  19. 4 ways to access protein and DNA sequences [1] Entrez Gene with RefSeq Entrez Gene is a great starting point: it collects key information on each gene/protein from major databases. It covers all major organisms. RefSeq provides a curated, optimal accession number for each DNA (NM_006744) or protein (NP_007635) Page 27

  20. From the NCBI home page, type “rbp4” and hit “Go” revised Fig. 2.7 Page 29

  21. revised Fig. 2.7 Page 29

  22. By applying limits, there are now just two entries

  23. Entrez Gene (top of page) Note that links to many other RBP4 database entries are available revised Fig. 2.8 Page 30

  24. Entrez Gene (middle of page)

  25. Entrez Gene (bottom of page)

  26. Fig. 2.9 Page 32

  27. Fig. 2.9 Page 32

  28. Fig. 2.9 Page 32

  29. FASTA format Fig. 2.10 Page 32

  30. What is an accession number? An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X02775 GenBank genomic DNA sequence NT_030059 Genomic contig Rs7079946 dbSNP (single nucleotide polymorphism) N91759.1 An expressed sequence tag (1 of 170) NM_006744 RefSeq DNA sequence (from a transcript) NP_007635 RefSeq protein AAC02945 GenBank protein Q28369 SwissProt protein 1KT7 Protein Data Bank structure record DNA RNA protein Page 27

  31. NCBI’s important RefSeq project: best representative sequences RefSeq (accessible via the main page of NCBI) provides an expertly curated accession number that corresponds to the most stable, agreed-upon “reference” version of a sequence. RefSeq identifiers include the following formats: Complete genome NC_###### Complete chromosome NC_###### Genomic contig NT_###### mRNA (DNA format) NM_###### e.g. NM_006744 Protein NP_###### e.g. NP_006735 Page 29-30

  32. NCBI’s RefSeq project: accession for genomic, mRNA, protein sequences AccessionMoleculeMethodNote AC_123456 Genomic Mixed Alternate complete genomic AP_123456 Protein Mixed Protein products; alternate NC_123456 Genomic Mixed Complete genomic molecules NG_123456 Genomic Mixed Incomplete genomic regions NM_123456 mRNA Mixed Transcript products; mRNA NM_123456789 mRNA Mixed Transcript products; 9-digit NP_123456 Protein Mixed Protein products; NP_123456789 Protein Curation Protein products; 9-digit NR_123456 RNA Mixed Non-coding transcripts NT_123456 Genomic Automated Genomic assemblies NW_123456 Genomic Automated Genomic assemblies NZ_ABCD12345678 Genomic Automated Whole genome shotgun data XM_123456 mRNA Automated Transcript products XP_123456 Protein Automated Protein products XR_123456 RNA Automated Transcript products YP_123456 Protein Auto. & Curated Protein products ZP_12345678 Protein Automated Protein products

  33. Four ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Page 31

  34. protein DNA RNA complementary DNA (cDNA) UniGene Fig. 2.3 Page 23

  35. UniGene: unique genes via ESTs • • Find UniGene at NCBI: • www.ncbi.nlm.nih.gov/UniGene • UniGene clusters contain many expressed sequence • tags (ESTs), which are DNA sequences (typically • 500 base pairs in length) corresponding to the mRNA • from an expressed gene. ESTs are sequenced from a • complementary DNA (cDNA) library. • • UniGene data come from many cDNA libraries. • Thus, when you look up a gene in UniGene • you get information on its abundance • and its regional distribution. Pages 20-21

  36. Cluster sizes in UniGene This is a gene with 1 EST associated; the cluster size is 1 Fig. 2.3 Page 23

  37. Cluster sizes in UniGene This is a gene with 10 ESTs associated; the cluster size is 10

  38. Cluster sizes in UniGene (human) Cluster size (ESTs)Number of clusters 1 42,800 2 6,500 3-4 6,500 5-8 5,400 9-16 4,100 17-32 3,300 500-1000 2,128 2000-4000 233 8000-16,000 21 16,000-30,000 8 UniGene build 194, 8/06

  39. UniGene: unique genes via ESTs Conclusion: UniGene is a useful tool to look up information about expressed genes. UniGene displays information about the abundance of a transcript (expressed gene), as well as its regional distribution of expression (e.g. brain vs. liver). We will discuss UniGene further later (gene expression). Page 31

  40. Five ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Page 31

  41. Ensembl to access protein and DNA sequences Try Ensembl at www.ensembl.org for a premier human genome web browser. We will encounter Ensembl as we study the human genome, BLAST, and other topics.

  42. click human

  43. enter RBP4

  44. Five ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Page 33

More Related