1 / 32

Lecture 1: Introduction to Entrez

NCBI PowerScripting. Lecture 1: Introduction to Entrez. October 16-19, 2007. The Entrez Query System at NCBI. The Entrez Query System at NCBI. Entrez Help Document. Entrez Functions. Search one or all of 31 databases. Generate brief “document summaries” for a list of records.

bettyashley
Download Presentation

Lecture 1: Introduction to Entrez

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCBI PowerScripting Lecture 1: Introduction to Entrez October 16-19, 2007

  2. The Entrez Query System at NCBI

  3. The Entrez Query System at NCBI

  4. Entrez Help Document

  5. Entrez Functions • Search one or all of 31 databases. • Generate brief “document summaries” for a list of records. • Link from one list of records to another. • Perform boolean operations on lists of records. • Format records for display and download.

  6. Word weight Computational PubMed abstracts Taxonomy VAST Phylogeny Computational Genomes Nucleotide sequences BLAST BLAST Computational Computational Links Between and Within Nodes 3-D Structure 3 -D Structures Protein sequences

  7. Entrez Transactions • Each record in an Entrez database is assigned an integer called a UID, or “unique identifier”. • Entrez transactions are performed on lists of UIDs. • Transactions include boolean operations and the tracking of links within and between database records.

  8. Entrez Database Queries • Entrez supports text searches with field restrictions, boolean operators (sometimes implicit), and term grouping • Field restrictions vary among the databases • Term-mapping happens • Explicitly fielded searches are not term-mapped • Quoted phrases are searched as a unit

  9. Term Mapping (PubMed) • Untagged terms that are entered in the search box are matched (in this order) against: • - a MeSH (Medical Subject Headings) translation table • - a Journals translation table • the Full Author translation table • Author index • the Full Investigator (Collaborator) translation table • - and an Investigator (Collaborator) index

  10. Term: cold • PubMed:"chronic obstructive pulmonary disease"[Text Word]OR "pulmonary disease, chronic obstructive"[MeSH Terms]OR ("common cold"[TIAB]NOT Medline[SB]) OR "common cold"[MeSH Terms]OR "cold"[MeSH Terms]OR cold[Text Word] • PMC:"pulmonary disease, chronic obstructive"[MeSH Terms]OR "common cold"[MeSH Terms]OR "cold"[MeSH Terms]OR cold[Text Word] • Nucleotide:cold[All Fields] • Taxonomy: cold[All Names]

  11. Term: mouse • PubMed:("mice"[TIAB]NOT Medline[SB]) OR "mice"[MeSH Terms] OR mouse[Text Word] • PMC:"mice"[MeSH Terms]OR mouse[Text Word] • Nucleotide:"Mus musculus"[Organism]OR mouse[All Fields] • Taxonomy:mouse[All Names] • Genome: "Mus musculus"[Organism]OR mouse[All Fields]

  12. Entrez Help Document

  13. Entrez Help Document

  14. Viewing Indexed Terms on the Web Preview-Index Tab

  15. Patterns are Recognized PubMed, PMC, Nucleotide, Protein, Structure and others • miller baker: miller[All Fields] AND baker[All Fields] • miller j baker m: miller j[Author] AND baker m[Author] All Databases • AF123456, P12243,555: direct retrieval of record

  16. Search History • Separate search history is maintained for each database. • Previous searches can be recalled and combined using a query key and a cookie, called a “WebEnv”. • Available on the Web under the 'History Tab'

  17. DocSums • Brief summaries of database records are generated quickly on frontend servers. • Full records are retrieved from backend machines.

  18. Overview of Key Entrez Databases

  19. The Entrez Bubblegram: via einfo.fcgi

  20. Pubmed17,454,100 Recordsbiomedical literature citations and abstracts • Key Field Restrictions • [author] • [title] • [pdat] – publication date • [mesh] Medical Subject Headings • [journal] • [volume]

  21. CoreNucleotide41,888,768 Recordssequence database (GenBank) • Key Field Restrictions • [organism] • [accession] • [author] • [title] • [sequence length] • [properties] • [gene]

  22. Protein18,192,257 RecordsProtein sequence records • Key Field Restrictions • [organism] • [title] • [author] • [molecular weight] • [sequence length] • [gene] • [ecno] enzyme commission number

  23. Gene3,723,441 RecordsGene database: locus-centered records • Key Field Restrictions • [organism] • [gene] official symbol of gene locus • [chromosome] • [title] • [accession]

  24. Eutilities • A set of eight server-side programs. • Support a uniform URL syntax. • Translate a standard set of URL-encoded input parameters for the array of programs comprising the Entrez system.

  25. Entrez Functions and EUtils • Searches: esearch.fcgi • DocSums: esummary.fcgi • Links: elink.fcgi • Uploads: epost.fcgi • Downloads: efetch.fcgi • Global Query: egquery.fcgi • Spelling: espell.fcgi • Information: einfo.fcgi

  26. A Docsum via esummary.fcgi and via the Web

  27. A Simple Eutilities Pipeline

  28. An Esearch Followed by Multiple Rounds of Efetch http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?usehistory=y&db=gene&term=mammalia[orgn] Elapsed time: 0 seconds 0%, 0 records of 161815 retrieved. Tue Jan 25 20:46:32 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=0&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 40 seconds 0.3%, 500 records of 161815 retrieved. Tue Jan 25 20:47:09 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=500&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 79 seconds 0.61%, 1000 records of 161815 retrieved. Tue Jan 25 20:47:48 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=1000&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 118 seconds 0.92%, 1500 records of 161815 retrieved. Tue Jan 25 20:48:27 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=1500&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 158 seconds 1.23%, 2000 records of 161815 retrieved. Tue Jan 25 20:49:07 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=2000&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 204 seconds 1.54%, 2500 records of 161815 retrieved. Tue Jan 25 20:49:53 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=2500&rettype=native&WebEnv=0ImHxGDH2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xml

  29. A Download of 161825 Mammalian Entrez Gene Records S E C O N D S Efetch calls

More Related