getting started pcb3063 term project and ncbi s omim pubmed and sequence resources
Skip this Video
Download Presentation
Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources

Loading in 2 Seconds...

play fullscreen
1 / 65

Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources - PowerPoint PPT Presentation

  • Uploaded on

Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources. Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries/ U.F. Genetics Institute PCB3063, General Genetics [email protected] Today’s Session. Your term project

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources' - lidia

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Getting Started: PCB3063 Term Projectand NCBI’s OMIM, PubMed and Sequence Resources

Michele R. Tennant, Ph.D., M.L.I.S.

Health Science Center Libraries/

U.F. Genetics Institute

PCB3063, General Genetics

[email protected]

today s session
Today’s Session
  • Your term project
  • Resources to help you with your project …
    • HSCL Website, Catalog, etc.
    • NCBI Resources:
      • OMIM – “review articles”
      • PubMed – journal articles
      • Nucleotides/RefSeq – gene sequences
  • Receive your term project topic
your term project
Your Term Project
  • Scientific poster on an assigned genetic disorder
  • Should cover all aspects of genetics –
    • Mode of inheritance
    • What gene normally does
    • What protein is encoded by gene
    • Map location and gene structure
    • Types of mutations and what they do to protein
    • Potential for gene therapy
    • Etc. (more info next time)
your term project1
Your Term Project
  • Four assignments for your project:
    • Part A:
      • Identify disorder/gene in OMIM and MeSH
      • E-learning assessment by start of class Feb. 11
    • Part B:
      • Literature and sequence searches
      • E-learning assessment by start of class Feb. 25 AND paper form and search print-outs in class Feb. 25
    • Part C:
      • Structure, SNP, map and clinical db searches
      • E-learning assessment by start of class Mar. 30 AND paper form and search print-outs in class Mar. 30
    • Poster Presentations – Apr. 15
  • Note – keep Parts A, B, and C (and the corresponding search print-outs) once they are returned to you; you may need to include resubmit them with your poster.
  • National Center for Biotechnology Information
  • Located on the Bethesda National Institutes of Health campus
  • Part of the National Library of Medicine (NLM), which is part of the NIH
  • Created by Congress in 1988
  • Home of GenBank since 1992
ncbi mandates
NCBI Mandates
  • Develop automated systems for the storage, retrieval, and analysis of molecular, genetic and biochemical information
  • Develop software for the study of molecule structure and function
ncbi mandates1
NCBI Mandates
  • Facilitate the use of molecular databases and programs by both researchers and clinicians
  • Coordinate international cooperation in gathering molecular, genetics and biochemical data
effective searchers
Effective Searchers ...
  • know the content of the database
    • subjects, type of data, years of coverage, curated vs. non-curated
  • understand the structure of the database
    • record structure, searchable fields, controlled vs non-controlled vocabularies
  • understand searching options and tools
    • thesaurus, limits, AND/OR, etc.
  • Search tool on the NCBI website
  • Contains a variety of databases:
    • Nucleotide sequence; Protein sequence; Molecular structure; SNPs; Expression data; Journal literature
    • Each “database” contains “records”
    • Each “record” in database contains “fields”
entrez search options
Entrez Search Options
  • Similar among the various databases
    • Entrez conventions: AND, OR, NOT, *
    • Three ways to search:
      • Basic: just enter your search terms
      • Advanced: more controlled search - uses limits, preview/index, history
      • Complex Boolean: command language with qualifiers in brackets;
        • syntax= term [field] AND term [field] etc.
entrez differences
Entrez Differences
  • Differences among the various databases
    • Different search fields available
    • Different limits available
    • Some controlled, some non-controlled
    • Some archival, some curated
two ways to get to ncbi
Two Ways to Get to NCBI
  • Directly at -
  • Through HSC Library’s webpage:
    • Click on “Databases” icon
    • Click on “NCBI” icon

Click on “Databases” from HSCL Website

omim online mendelian inheritance in man
OMIM - Online Mendelian Inheritance in Man
  • Catalog of human genes and genetic disorders
  • 19,854 records (as of 1/27/10)
  • Records are basically “review articles”
  • Records link to PubMed, sequences, structures, etc.
  • Built on Entrez architecture
  • Search tip – look for your disease or gene in “title” field on “Limits” page
We will search for information on “Sipple Syndrome”, but first we limit so that we search only in the title field


Limit so that your terms reside only in the “title”

Type in Sipple Syndrome, then click “Go”

Link to discussion of Sipple Syndrome

Link to OMIM Gene Map

Table of Contents for Sipple Syndrome record

Record was retrieved via these words in title

Link to record for the RET Oncogene

  • Journal literature database
  • Pre-clinical and clinical information – best literature database to use for Dr. Miyamoto’s project
  • Approximately 5,200 journals covered; currently over 18,000,000 records
  • Most citations include abstract
  • Can search via keyword, but has been built to take advantage of controlled vocabulary search
controlled vocabulary
Controlled Vocabulary
  • Controlled terms act as “umbrella” to pick up all synonyms, spelling differences (hemoglobin/haemoglobin), singular vs plural, etc.
  • In PubMed, use MeSH Database to find and search controlled MeSH terms (Medical Subject Headings)
  • Once in MeSH Database, can use additional options to enhance search (major heading, subheadings, etc.)
mesh example
MeSH Example
  • Find journal articles on the “immunological aspects of breast cancer and vaccines”; but only those papers where “immunological aspects of breast cancer” is the main point of the articles you find.
  • Search PubMed
Enter PubMed through our direct link (rather than through NCBI) and you will be able to directly see if the HSCL owns the journal articles you find
The “ufhsclib” indicates that you have entered PubMed correctly, and that the journals the library owns will be apparent

Use the MeSH Database as a dictionary to find the appropriate MeSH term, and then to refine your search

Note that we have left PubMed and are in the MeSH “dictionary”


You typed “breast cancer” into MeSH database

Use “breast neoplasms” rather than breast cancer

Click on the link to refine the search

Topical subheadings help focus search to one or more aspects of the subject

Check here and your topics will be the main point of the articles you find – you won’t get peripheral citations. Not recommended the first time you search a topic – if there are few papers in existence for your topic, you may be left with no articles at all

Send your search to the search box

MeSH automatically builds the search for you – in this example, you are looking for papers in which the immunological aspects of breast cancer are the main point of all the articles you retrieve

Click “Search PubMed”

Once you have sent the search to the search box, and clicked on “search PubMed”, you leave the MeSH Database, and the search is performed in PubMed

Note that this is the search the MeSH Database built for you – it used the MesH term “breast neoplasms”, glued “immunology” directly to the search by using the slash, and picked up all the different types of breast neoplasms. MeSH also retrieved only the papers where these topics were the main points of the articles. You did not need to do any of this yourself – MeSH did it for you once you found the proper MeSH term, and clicked on subheading. Now we need to complete the second half of the search – vaccines

Now we need to complete the second half of the search – vaccines. Pull down the drop-down so you are in MeSH again, and search for the MeSH term. Look through the list to see if there is one that is most appropriate. Since we are looking for vaccines related to breast cancer, perhaps “cancer vaccines” would be useful. Read the “scope note” to be sure.

Scope Note

As in the breast cancer search, you can choose a subheading and limit to articles where this topic is the main point; I’ve chosen not to do so here (if you don’t choose suheadings or main point, remember to click on the check box next to “cancer vaccines”.) Send to search box; click “search PubMed”

You’ve now found articles on cancer vaccines, but you need to combine the breast cancer and cancer vaccines concepts

boolean operators
Boolean Operators
  • Search statements may be combined using AND, OR, NOT




To combine searches, choose “Advanced Search”

The Advanced Search screen displays your PubMed history; from here you can combine your two searches using the appropriate Boolean operator

For Part B, print the PubMed history, which shows your searches.

You have now found papers in which the immunology of breast cancer is the main point of the article, and those papers are also about cancer vaccines
mesh etc
MeSH etc.
  • MeSH Database:
    • Found appropriate search terms
    • Automatically exploded “breast neoplasms”, so narrower terms (“breast neoplasms, male”, “carcinoma, ductal, breast”, etc) were ORed together
    • Allowed the addition of subheadings (immunology) to narrow to a particular aspect
    • Allowed narrowing to “main point”
  • Use History to combine (AND)
mesh caveats
MeSH Caveats
  • Performing a MeSH search is usually more precise and exhaustive than a keyword search, however:
    • The most recent papers are not searched - therefore should also complete a keyword search “in process”
    • Very new concepts/scientific terms may not yet be represented by MeSH
    • Very specific or rare concepts may never be represented by MeSH
  • So sometimes you will need to do a keyword search as well
in process
In Process
  • In our “breast cancer, immunology, cancer vaccine” example, perform the following keyword search, only in the newest records (in process)
    • ((vaccin*) AND (breast cancer* OR breast neoplasm* OR breast tumor*)) AND in process [sb]
    • Try as many synonyms as possible
    • [sb] must be included to tell computer to just search the “in process” part of the database
    • * truncates to word root
  • This search picks up the current articles that do not yet have MeSH terms
link out to e journals
Link Out to E-journals
  • Remember, if you entered PubMed directly from the HSCL’s icon, you can see if the HSCL owns the journal articles you found
  • Choose the “abstract” or “citation” displays from the pulldown menu
  • Brown and blue icons tell if the HSCL owns that journal issus electronically or in print
  • Will NOT tell you what is available at Marston Science Library
what if pubmed does not indicate the article is owned at uf
What if PubMed does not indicate the article is owned at UF?
  • Use the “Catalog” to see if the paper is available in print at the HSCL, Marston Science Library or elsewhere on campus
  • The catalog may also be used to help locate books, government documents, videotapes, etc – items that are not indexed in PubMed

Click on “Catalog” from HSCL Website

entrez nucleotides genbank
Entrez Nucleotides (GenBank)
  • Database of nucleotide sequences (ATGC)
  • Actually contains data from several databases - GenBank, EMBL, DDBJ, RefSeq
  • Hard to search because many submitting scientists send in redundant information and poorly annotated information
nucleotide data domain
Nucleotide Data Domain
  • As of December 15, 2009
    • Over 110,118,557,163bases
    • Over 112,910,950sequence records
    • Over 200,000 species represented
    • Some complete genomes and chromosomes
organisms represented
Organisms Represented
  • Homo sapiens
  • Many model organisms, including:
    • Mus musculus
    • Caenorhabditis elegans
    • Oryza sativa
    • Drosophila melanogaster
    • Arabidopsis thaliana
  • Non-model organisms as well (trout, etc.)
international collaboration
International Collaboration
  • Contributors:
    • GenBank
    • European Molecular Biology Laboratory (EMBL)
    • DNA Databank of Japan (DDBJ)
  • Daily exchange of data among these groups
genbank sample record
GenBank Sample Record
  • Before searching, we will look at the GenBank sample record
  • Retrieve the sample record from the main page – click on “DNA & RNA”, then “GenBank”, then choose the “record” link.
  • Note that the “Features” field provides useful biological information, and may be searched
Click any link in sample record to access definition of field and search tips

“Definition” field acts as record title – search [titl]

Unique identifier; assigned by NCBI; required by journals/grants

Link to PubMed citation/abstract

The “Features” field provides the most biological information; search as [fkey]

Numbers indicate location on the nucleotide sequence

searching nucleotides
Searching “Nucleotides”
  • Database is difficult to search:
    • Redundant records
    • Archival - poor or missing annotation
  • Best searches are done using commands; need a class to learn all
  • Perform a basic Search in Entrez Nucleotides for “human presenilin 1”
  • Why did we get so many non-human records?
  • Check “Details” to see how search was parsed
Choose “nucleotide” from dropdown, then click “search”

Search for HUMAN presenilin 1

But end up with rat, mouse, etc.

specify gene and organism
Specify Gene and Organism
  • One trick – search your topic as a “gene” (psen1) and choose your taxon as “organism” (human)
    • Note – you may miss relevant sequences, but should not pick up irrelevant sequences
  • Easiest way to perform this search:
    • searching with commands: psen1 [gene] AND human [organism]
best sequences
Best Sequences
  • The subset “RefSeq” contains the “best” sequence:
    • Non-redundant
    • Well understood and annotated
    • Checked for sequencing error
  • Not all genes or organisms have RefSeq records available
refseq records
RefSeq Records
  • Easy to distinguish –
    • Two letters, underscore, then numbers (NM_123456)
  • Easy to search for –
    • Use “only from” on “limits” page
    • Complex boolean – AND srcdb_refseq [prop]
    • Click tab(easiest way)
  • Example - Find only RefSeq records for human presenilin 1
  • Compare information in RefSeq record to that found in previous nucleotide records
Click on the RefSeq tab to retrieve only the “best” sequences (highly annotated, complete, nonredundant)

The typical RefSeq accession number format: 2 letters, an underscore, and then numbers

viewing formats
Viewing Formats
  • The “Default” view is the standard GenBank record
  • Researchers often use the “FASTA” format for analysis
  • Change the record format at the “Display” pull-down menu