retrieving information using entrez l.
Skip this Video
Loading SlideShow in 5 Seconds..
Retrieving Information: Using Entrez PowerPoint Presentation
Download Presentation
Retrieving Information: Using Entrez

Loading in 2 Seconds...

play fullscreen
1 / 62

Retrieving Information: Using Entrez - PowerPoint PPT Presentation

  • Uploaded on

Retrieving Information: Using Entrez. Retrieving information: how it works:. Servers have the records you want You need to understand the data they have, and how it is organized There are often many ways to get to an answer.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Retrieving Information: Using Entrez' - duscha

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
retrieving information how it works
Retrieving information: how it works:
  • Servers have the records you want
  • You need to understand the data they have, and how it is organized
  • There are often many ways to get to an answer.
  • Route to get there is not always obvious, but you need to think of alternatives and traps.
  • Use some query language – each system has its own.
  • Retrieve data in a specified format.
  • Save it in a way that will be useful to you.
what you may be looking for
What you may be looking for:
  • Did a BLAST search – and you need more info about some of the proteins they found similarities to.
  • Heard on about a disease gene that was recently discovered, and you want to know more about it.
  • Want to build a dataset for local blast searches.
  • A colleague wants you to do an alignment of all sequences from a given protein family.
what you are looking for
What you are looking for:
  • PubMed paper from author X
  • Sequence from gene X in organism Y
  • All information about organelle W in model organism Y
  • All information about disease X in human
  • Orthologs of that disease genes in other model organisms
central dogma ncbi version
Central Dogma: NCBI version



Write a paper

about it


entrez pathway to discovery
Entrez: Pathway to Discovery

Term frequency statistics


MEDLINE abstracts

Literature citations in sequence databases

Literature citations in sequence databases

Protein sequences

Nucleotide sequences

Nucleotide sequence similarity

Amino acid sequence similarity

Coding region features


Related Articles

Type in your last name and find a paper form one of your teammates


From Fig 1 ofEntrez search and retrieval system

Jim Ostell

Chapter 14, the NCBI Handbook.


a query
A query
  • Word <free text> : too many hits
    • More words (the Boolean ‘AND’ is the default)
    • Limit query to specified field
    • Limit query in time
    • Do Boolean on queries
      • #1 AND #2
      • #3 NOT #5
      • #7 OR #8

No abstract

With abstract

Full Text on-line

Full Text in PubMed Central


boguski m [au] 99

boguski ms [au] 80

other types of links in entrez
Other types of links in Entrez
  • Next slides to explore other kind of things linked into Entrez records.
  • RefSeq represents the NCBI curated “reference sequences” for all ‘worked’ genome.
  • Historically, these used to be referred to as “GenBank-Gold”.
  • RefSeq are either genomic, mRNA or protein sequences.
  • Not all sequences are in RefSeq
  • All RefSeq sequences are assembled/taken from things in GenBank.
some of the features of the refseq
Some of the features of the RefSeq:
  •  non-redundancy  
  • explicitly linked nucleotide and protein sequences  
  • updates to reflect current knowledge of sequence data and biology  
  • data validation and format consistency  
  • distinct accession series  
  • ongoing curation by NCBI staff and collaborators, with review status indicated on each record
accession number space
Accession number space
  • GenBank:
    • 1+5 (L12345, U00001)
    • 2+6 (AF000001, AC000003)
    • 4+2+6 (WGS)
      • All have accession.version
  • Protein:
    • 1+5 (SwissProt/UniProt)
    • 3+5 (GenPept)
      • All have accession.version
  • RefSeq:
    • N*_12345

Download all the data

Entrez and RefSeq