1 / 39

Introduction to Entrez Genome Projects

Introduction to Entrez Genome Projects. Data scope of genome resources at NCBI. Environmental samples?. Organisms. Nematoda. C.elegans, C.briggsae. Microbes. Viruses. Fungi/small eukaryotes. Plants. A.thaliana Barley Corn Oat Rice Soybean Tomato Rice Wheat. Fishes. Insects.

coral
Download Presentation

Introduction to Entrez Genome Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Entrez Genome Projects

  2. Data scope of genome resources at NCBI Environmental samples? Organisms Nematoda C.elegans, C.briggsae Microbes Viruses Fungi/small eukaryotes Plants A.thaliana Barley Corn Oat Rice Soybean Tomato Rice Wheat Fishes Insects D.melanogaster, A.gambia, D.pseudoobscura, Honey bee, Chicken Dog Mouse/Rat pig, cow chimpanzee Human

  3. Data scope of genome resources at NCBI Sequences • Nucleotide • EST, cDNA, mRNA, STS • patents • GSS • Traces • Genomic: complete genome, • whole genome shotgun assembly • (different assembly methods) • BAC clone based sequencing • Resequencing • Annotation

  4. Entrez Genome Project • NOT Entrez Genome? • Entrez Genomes is a collection of COMPLETE chromosomes, • plasmids, organelles, and viruses. • Created in 1995. • Doesn’t have a way of linking all the data for a given organism • Other than by taxid. • Problems • How to define COMPLETE genome • Same organism sequenced by different groups: • Agrobacterium tumefaciens str. C58 (Cereon and U.Washington) • Corynebacterium glutamicum ATCC 13032 (Japan and Germany) • Bacillus licheniformis DSM 13 (USA and Germany ) • Genome project is more than chromosomes and proteins • Not Entrez Taxonomy? • Designed as taxonomic hierarchy, not organized by genomes • Collects all Entrez links associated with the organism • Problems • Same organism sequenced by different groups • Sequence links are lumped together, for example, Oryza sativa

  5. Entrez Genome Project complete and incomplete large-scale sequencing, assembly, annotation, and mapping projects for cellular organisms • Project is defined by • Organism • Project type ( and/or sequencing method) • Sequencing center

  6. 5 Large-scale EST sequencing (complete) Center D 3 Assembly and annotation (incomplete) Center E Nucleotide data at NCBI (dbEST) Nucleotide data at NCBI (GenBank) Schematic diagram of a generic eukaryotic genome project 6 Large-scale cDNA sequencing (incomplete) Center B 1 Genomic sequencing (WGS) and assembly and annotation (complete) Center B Genomic data at NCBI (RefSeq) Organism-specific overview Links to third-party sites 2 Genomic sequencing (WGS) (complete) Center A Nucleotide data at NCBI (GenBank) 4 BAC-ends sequencing (incomplete) Center F project overview external data NCBI data

  7. Entrez Genome Project Is it implemented Hierarchical structure Flexible project types Related projects Entrez links Relational database Manually curated: organism descriptions Related resources/links Sequencing centers Submission form

  8. Entrez Genome Project Is it presented Genome Project > Overview > Project Brief description (Docsum defline) Project data Lineage Image Chromosome info Map Viewer search Related Projects Publications Organism description Resource links: NCBI Resources (Tools) Organism data in GenBank Sequencing Centers Sequencing Projects Related Resources Organism groups Eukaryotes Animals Plants Fungi Protists Prokaryotes Archaea Bacteria Entrez search Reports Statistics Sequencing Centers Eukaryotic projects Prokaryotic projects Sequence links

  9. Eukaryotic Projects List

  10. Organism name Short summary Taxonomicgroups Sequencing status Estimated size

  11. Prokaryotic Genomic Data Amount of Data genomes (nucleotides, proteins, RNAs) expression analysis (microarrays, etc.) microbial community sequencing (Sargasso Sea, etc.) Organization of Data currently by type of data taxonomically

  12. Growth of complete microbial genomes in the last ten years. September 1, 2005 – 254 complete genomes Deluge of Data

  13. Anatomy of a Prokaryotic Project

  14. Anatomy of a Prokaryotic Project Externaldata and sites Genome Information Organism and strain description Prokaryotic genome attributes

  15. Prokaryotic Projects List

  16. Filters -complete sequences -draft assembly sequences -no sequences -organism groups Sorting -attributes (gram stain, habitat...) -data (genome size, GC content...) Microbial Projects List

  17. Microbial Projects List Complete Genomes Organism - Kingdom – Genome – GC – Accessions – Release – Center – NCBI Size Content Date Links

  18. Microbial Projects List Genomes in Progress Organism - Kingdom – - Contigs - Genome – GC – Accessions – BLAST – Center Size

  19. Microbial Projects List Organism Info Organism - Kingdom – Genome – GC – Gram – Shape – Arrangement – Spores – Motility – Salinity – Oxygen – Habitat – Temp. – Host - Disease

  20. Microbial Projects List

  21. Organism/Genome Attributes

  22. Project types

  23. Environmental samples

  24. Comparative genomics

  25. Future Directions - linking other data (microarrays) - comparative genomics projects (ex. Bacillus) - environmental microbial community sequencing projects - links to granting agencies - International Nucleotide Sequence Databases meta-genomic data provided by scientific communities

  26. Sequencing Centers Funding Agencies Genome Curators Existing Complete and In Progress Genomes Project Database Submission of Projects create project from existing data create project from announced sequencing projects direct submission from outside users

  27. Submission of Projects http://www.ncbi.nlm.nih.gov/genomes/mpfsubmission.cgi

  28. Entrez Genome Project • Curators • Prokaryotes Eukaryotes • William Klimke Ethan Carver • Stacy Ciufo Melissa Landrum • Leigh Riley Anjana Raina Gert Roosen Barbara Ruef • Rich McVeigh Patti Sherman • Nikolai Daraselia Janet Weber • Emir Khatipov Lynn Schriml • Software developersGraphics • Andrei Kochergin Svetlana Iazvovskaia • Sergei Resenchuk Usability • Mark Johnson • Project coordinators • Tatiana Tatusova Kim Pruitt

  29. Entrez Genome Project http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=genomeprj 1391 projects indexed and searchable in Entrez 1706 – in works 1040 organism-specific overview projects with manual descriptions Genome sequencing projects Organism Complete In progress Total Prokaryotes 254 421 675 Eukaryotes 19 185 204 Total 273 606 879 Comments, suggestions are welcome Mail to : genomeprj@ncbi.nlm.nih.gov genomes@ncbi.nlm.nih.gov

More Related