Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources
This presentation is the property of its rightful owner.
Sponsored Links
1 / 65

Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on
  • Presentation posted in: General

Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources. Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries/ U.F. Genetics Institute PCB3063, General Genetics [email protected] Today’s Session. Your term project

Download Presentation

Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Getting Started: PCB3063 Term Projectand NCBI’s OMIM, PubMed and Sequence Resources

Michele R. Tennant, Ph.D., M.L.I.S.

Health Science Center Libraries/

U.F. Genetics Institute

PCB3063, General Genetics

[email protected]


Today s session

Today’s Session

  • Your term project

  • Resources to help you with your project …

    • HSCL Website, Catalog, etc.

    • NCBI Resources:

      • OMIM – “review articles”

      • PubMed – journal articles

      • Nucleotides/RefSeq – gene sequences

  • Receive your term project topic


Your term project

Your Term Project

  • Scientific poster on an assigned genetic disorder

  • Should cover all aspects of genetics –

    • Mode of inheritance

    • What gene normally does

    • What protein is encoded by gene

    • Map location and gene structure

    • Types of mutations and what they do to protein

    • Potential for gene therapy

    • Etc. (more info next time)


Your term project1

Your Term Project

  • Four assignments for your project:

    • Part A:

      • Identify disorder/gene in OMIM and MeSH

      • E-learning assessment by start of class Feb. 11

    • Part B:

      • Literature and sequence searches

      • E-learning assessment by start of class Feb. 25 AND paper form and search print-outs in class Feb. 25

    • Part C:

      • Structure, SNP, map and clinical db searches

      • E-learning assessment by start of class Mar. 30 AND paper form and search print-outs in class Mar. 30

    • Poster Presentations – Apr. 15

  • Note – keep Parts A, B, and C (and the corresponding search print-outs) once they are returned to you; you may need to include resubmit them with your poster.


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

NCBI

  • National Center for Biotechnology Information

  • Located on the Bethesda National Institutes of Health campus

  • Part of the National Library of Medicine (NLM), which is part of the NIH

  • Created by Congress in 1988

  • Home of GenBank since 1992


Ncbi mandates

NCBI Mandates

  • Develop automated systems for the storage, retrieval, and analysis of molecular, genetic and biochemical information

  • Develop software for the study of molecule structure and function


Ncbi mandates1

NCBI Mandates

  • Facilitate the use of molecular databases and programs by both researchers and clinicians

  • Coordinate international cooperation in gathering molecular, genetics and biochemical data


Effective searchers

Effective Searchers ...

  • know the content of the database

    • subjects, type of data, years of coverage, curated vs. non-curated

  • understand the structure of the database

    • record structure, searchable fields, controlled vs non-controlled vocabularies

  • understand searching options and tools

    • thesaurus, limits, AND/OR, etc.


Entrez

Entrez

  • Search tool on the NCBI website

  • Contains a variety of databases:

    • Nucleotide sequence; Protein sequence; Molecular structure; SNPs; Expression data; Journal literature

    • Each “database” contains “records”

    • Each “record” in database contains “fields”


Entrez search options

Entrez Search Options

  • Similar among the various databases

    • Entrez conventions: AND, OR, NOT, *

    • Three ways to search:

      • Basic: just enter your search terms

      • Advanced: more controlled search - uses limits, preview/index, history

      • Complex Boolean: command language with qualifiers in brackets;

        • syntax= term [field] AND term [field] etc.


Entrez differences

Entrez Differences

  • Differences among the various databases

    • Different search fields available

    • Different limits available

    • Some controlled, some non-controlled

    • Some archival, some curated


Two ways to get to ncbi

Two Ways to Get to NCBI

  • Directly at - http://www.ncbi.nlm.nih.gov/

  • Through HSC Library’s webpage:

    • http://www.library.health.ufl.edu/

    • Click on “Databases” icon

    • Click on “NCBI” icon


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

www.library.health.ufl.edu

Click on “Databases” from HSCL Website


Omim online mendelian inheritance in man

OMIM - Online Mendelian Inheritance in Man

  • Catalog of human genes and genetic disorders

  • 19,854 records (as of 1/27/10)

  • Records are basically “review articles”

  • Records link to PubMed, sequences, structures, etc.

  • Built on Entrez architecture

  • Search tip – look for your disease or gene in “title” field on “Limits” page


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Choose OMIM from the dropdown and then click on “search” to reach the OMIM page


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

We will search for information on “Sipple Syndrome”, but first we limit so that we search only in the title field

x

Limit so that your terms reside only in the “title”


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Type in Sipple Syndrome, then click “Go”

Link to discussion of Sipple Syndrome

Link to OMIM Gene Map


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Table of Contents for Sipple Syndrome record

Record was retrieved via these words in title

Link to record for the RET Oncogene


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Table of Contents for RET gene record


Pubmed

PubMed

  • Journal literature database

  • Pre-clinical and clinical information – best literature database to use for Dr. Miyamoto’s project

  • Approximately 5,200 journals covered; currently over 18,000,000 records

  • Most citations include abstract

  • Can search via keyword, but has been built to take advantage of controlled vocabulary search


Controlled vs non controlled vocabularies

Controlled vs Non-controlled Vocabularies

  • “Old People” Example


Controlled vocabulary

Controlled Vocabulary

  • Controlled terms act as “umbrella” to pick up all synonyms, spelling differences (hemoglobin/haemoglobin), singular vs plural, etc.

  • In PubMed, use MeSH Database to find and search controlled MeSH terms (Medical Subject Headings)

  • Once in MeSH Database, can use additional options to enhance search (major heading, subheadings, etc.)


Mesh example

MeSH Example

  • Find journal articles on the “immunological aspects of breast cancer and vaccines”; but only those papers where “immunological aspects of breast cancer” is the main point of the articles you find.

  • Search PubMed


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Enter PubMed through our direct link (rather than through NCBI) and you will be able to directly see if the HSCL owns the journal articles you find


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

The “ufhsclib” indicates that you have entered PubMed correctly, and that the journals the library owns will be apparent

Use the MeSH Database as a dictionary to find the appropriate MeSH term, and then to refine your search


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Note that we have left PubMed and are in the MeSH “dictionary”

AIDS

You typed “breast cancer” into MeSH database

Use “breast neoplasms” rather than breast cancer

Click on the link to refine the search


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Topical subheadings help focus search to one or more aspects of the subject

Check here and your topics will be the main point of the articles you find – you won’t get peripheral citations. Not recommended the first time you search a topic – if there are few papers in existence for your topic, you may be left with no articles at all


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Note that the term “Breast Neoplasms” will pick up all the more specific types of breast cancer


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Send your search to the search box

MeSH automatically builds the search for you – in this example, you are looking for papers in which the immunological aspects of breast cancer are the main point of all the articles you retrieve

Click “Search PubMed”


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Once you have sent the search to the search box, and clicked on “search PubMed”, you leave the MeSH Database, and the search is performed in PubMed

Note that this is the search the MeSH Database built for you – it used the MesH term “breast neoplasms”, glued “immunology” directly to the search by using the slash, and picked up all the different types of breast neoplasms. MeSH also retrieved only the papers where these topics were the main points of the articles. You did not need to do any of this yourself – MeSH did it for you once you found the proper MeSH term, and clicked on subheading. Now we need to complete the second half of the search – vaccines


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Now we need to complete the second half of the search – vaccines. Pull down the drop-down so you are in MeSH again, and search for the MeSH term. Look through the list to see if there is one that is most appropriate. Since we are looking for vaccines related to breast cancer, perhaps “cancer vaccines” would be useful. Read the “scope note” to be sure.

Scope Note


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

As in the breast cancer search, you can choose a subheading and limit to articles where this topic is the main point; I’ve chosen not to do so here (if you don’t choose suheadings or main point, remember to click on the check box next to “cancer vaccines”.) Send to search box; click “search PubMed”

You’ve now found articles on cancer vaccines, but you need to combine the breast cancer and cancer vaccines concepts


Boolean operators

Boolean Operators

  • Search statements may be combined using AND, OR, NOT

AND

OR

NOT


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

To combine searches, choose “Advanced Search”

The Advanced Search screen displays your PubMed history; from here you can combine your two searches using the appropriate Boolean operator

For Part B, print the PubMed history, which shows your searches.


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

You have now found papers in which the immunology of breast cancer is the main point of the article, and those papers are also about cancer vaccines


Mesh etc

MeSH etc.

  • MeSH Database:

    • Found appropriate search terms

    • Automatically exploded “breast neoplasms”, so narrower terms (“breast neoplasms, male”, “carcinoma, ductal, breast”, etc) were ORed together

    • Allowed the addition of subheadings (immunology) to narrow to a particular aspect

    • Allowed narrowing to “main point”

  • Use History to combine (AND)


Mesh caveats

MeSH Caveats

  • Performing a MeSH search is usually more precise and exhaustive than a keyword search, however:

    • The most recent papers are not searched - therefore should also complete a keyword search “in process”

    • Very new concepts/scientific terms may not yet be represented by MeSH

    • Very specific or rare concepts may never be represented by MeSH

  • So sometimes you will need to do a keyword search as well


In process

In Process

  • In our “breast cancer, immunology, cancer vaccine” example, perform the following keyword search, only in the newest records (in process)

    • ((vaccin*) AND (breast cancer* OR breast neoplasm* OR breast tumor*)) AND in process [sb]

    • Try as many synonyms as possible

    • [sb] must be included to tell computer to just search the “in process” part of the database

    • * truncates to word root

  • This search picks up the current articles that do not yet have MeSH terms


Link out to e journals

Link Out to E-journals

  • Remember, if you entered PubMed directly from the HSCL’s icon, you can see if the HSCL owns the journal articles you found

  • Choose the “abstract” or “citation” displays from the pulldown menu

  • Brown and blue icons tell if the HSCL owns that journal issus electronically or in print

  • Will NOT tell you what is available at Marston Science Library


What if pubmed does not indicate the article is owned at uf

What if PubMed does not indicate the article is owned at UF?

  • Use the “Catalog” to see if the paper is available in print at the HSCL, Marston Science Library or elsewhere on campus

  • The catalog may also be used to help locate books, government documents, videotapes, etc – items that are not indexed in PubMed


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

www.library.health.ufl.edu

Click on “Catalog” from HSCL Website


Entrez nucleotides genbank

Entrez Nucleotides (GenBank)

  • Database of nucleotide sequences (ATGC)

  • Actually contains data from several databases - GenBank, EMBL, DDBJ, RefSeq

  • Hard to search because many submitting scientists send in redundant information and poorly annotated information


Nucleotide data domain

Nucleotide Data Domain

  • As of December 15, 2009

    • Over 110,118,557,163bases

    • Over 112,910,950sequence records

    • Over 200,000 species represented

    • Some complete genomes and chromosomes


Organisms represented

Organisms Represented

  • Homo sapiens

  • Many model organisms, including:

    • Mus musculus

    • Caenorhabditis elegans

    • Oryza sativa

    • Drosophila melanogaster

    • Arabidopsis thaliana

  • Non-model organisms as well (trout, etc.)


International collaboration

International Collaboration

  • Contributors:

    • GenBank

    • European Molecular Biology Laboratory (EMBL)

    • DNA Databank of Japan (DDBJ)

  • Daily exchange of data among these groups


Genbank sample record

GenBank Sample Record

  • Before searching, we will look at the GenBank sample record

  • Retrieve the sample record from the main page – click on “DNA & RNA”, then “GenBank”, then choose the “record” link.

  • Note that the “Features” field provides useful biological information, and may be searched


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Click any link in sample record to access definition of field and search tips

“Definition” field acts as record title – search [titl]

Unique identifier; assigned by NCBI; required by journals/grants

Link to PubMed citation/abstract


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

The “Features” field provides the most biological information; search as [fkey]

Numbers indicate location on the nucleotide sequence


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

…3158


Searching nucleotides

Searching “Nucleotides”

  • Database is difficult to search:

    • Redundant records

    • Archival - poor or missing annotation

  • Best searches are done using commands; need a class to learn all


Example

Example

  • Perform a basic Search in Entrez Nucleotides for “human presenilin 1”

  • Why did we get so many non-human records?

  • Check “Details” to see how search was parsed


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Choose “nucleotide” from dropdown, then click “search”

Search for HUMAN presenilin 1

But end up with rat, mouse, etc.


Specify gene and organism

Specify Gene and Organism

  • One trick – search your topic as a “gene” (psen1) and choose your taxon as “organism” (human)

    • Note – you may miss relevant sequences, but should not pick up irrelevant sequences

  • Easiest way to perform this search:

    • searching with commands: psen1 [gene] AND human [organism]


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Retrieve sequence through link


Best sequences

Best Sequences

  • The subset “RefSeq” contains the “best” sequence:

    • Non-redundant

    • Well understood and annotated

    • Checked for sequencing error

  • Not all genes or organisms have RefSeq records available


Refseq records

RefSeq Records

  • Easy to distinguish –

    • Two letters, underscore, then numbers (NM_123456)

  • Easy to search for –

    • Use “only from” on “limits” page

    • Complex boolean – AND srcdb_refseq [prop]

    • Click tab(easiest way)


Refseq

RefSeq

  • Example - Find only RefSeq records for human presenilin 1

  • Compare information in RefSeq record to that found in previous nucleotide records


Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Click on the RefSeq tab to retrieve only the “best” sequences (highly annotated, complete, nonredundant)

The typical RefSeq accession number format: 2 letters, an underscore, and then numbers


Viewing formats

Viewing Formats

  • The “Default” view is the standard GenBank record

  • Researchers often use the “FASTA” format for analysis

  • Change the record format at the “Display” pull-down menu


  • Login