Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources
Download
1 / 65

Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources. Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries/ U.F. Genetics Institute PCB3063, General Genetics [email protected] Today’s Session. Your term project

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Getting Started: PCB3063 Term Project and NCBI’s OMIM, PubMed and Sequence Resources' - lidia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Getting started pcb3063 term project and ncbi s omim pubmed and sequence resources

Getting Started: PCB3063 Term Projectand NCBI’s OMIM, PubMed and Sequence Resources

Michele R. Tennant, Ph.D., M.L.I.S.

Health Science Center Libraries/

U.F. Genetics Institute

PCB3063, General Genetics

[email protected]


Today s session
Today’s Session

  • Your term project

  • Resources to help you with your project …

    • HSCL Website, Catalog, etc.

    • NCBI Resources:

      • OMIM – “review articles”

      • PubMed – journal articles

      • Nucleotides/RefSeq – gene sequences

  • Receive your term project topic


Your term project
Your Term Project

  • Scientific poster on an assigned genetic disorder

  • Should cover all aspects of genetics –

    • Mode of inheritance

    • What gene normally does

    • What protein is encoded by gene

    • Map location and gene structure

    • Types of mutations and what they do to protein

    • Potential for gene therapy

    • Etc. (more info next time)


Your term project1
Your Term Project

  • Four assignments for your project:

    • Part A:

      • Identify disorder/gene in OMIM and MeSH

      • E-learning assessment by start of class Feb. 11

    • Part B:

      • Literature and sequence searches

      • E-learning assessment by start of class Feb. 25 AND paper form and search print-outs in class Feb. 25

    • Part C:

      • Structure, SNP, map and clinical db searches

      • E-learning assessment by start of class Mar. 30 AND paper form and search print-outs in class Mar. 30

    • Poster Presentations – Apr. 15

  • Note – keep Parts A, B, and C (and the corresponding search print-outs) once they are returned to you; you may need to include resubmit them with your poster.


NCBI

  • National Center for Biotechnology Information

  • Located on the Bethesda National Institutes of Health campus

  • Part of the National Library of Medicine (NLM), which is part of the NIH

  • Created by Congress in 1988

  • Home of GenBank since 1992


Ncbi mandates
NCBI Mandates

  • Develop automated systems for the storage, retrieval, and analysis of molecular, genetic and biochemical information

  • Develop software for the study of molecule structure and function


Ncbi mandates1
NCBI Mandates

  • Facilitate the use of molecular databases and programs by both researchers and clinicians

  • Coordinate international cooperation in gathering molecular, genetics and biochemical data


Effective searchers
Effective Searchers ...

  • know the content of the database

    • subjects, type of data, years of coverage, curated vs. non-curated

  • understand the structure of the database

    • record structure, searchable fields, controlled vs non-controlled vocabularies

  • understand searching options and tools

    • thesaurus, limits, AND/OR, etc.


Entrez
Entrez

  • Search tool on the NCBI website

  • Contains a variety of databases:

    • Nucleotide sequence; Protein sequence; Molecular structure; SNPs; Expression data; Journal literature

    • Each “database” contains “records”

    • Each “record” in database contains “fields”


Entrez search options
Entrez Search Options

  • Similar among the various databases

    • Entrez conventions: AND, OR, NOT, *

    • Three ways to search:

      • Basic: just enter your search terms

      • Advanced: more controlled search - uses limits, preview/index, history

      • Complex Boolean: command language with qualifiers in brackets;

        • syntax= term [field] AND term [field] etc.


Entrez differences
Entrez Differences

  • Differences among the various databases

    • Different search fields available

    • Different limits available

    • Some controlled, some non-controlled

    • Some archival, some curated


Two ways to get to ncbi
Two Ways to Get to NCBI

  • Directly at - http://www.ncbi.nlm.nih.gov/

  • Through HSC Library’s webpage:

    • http://www.library.health.ufl.edu/

    • Click on “Databases” icon

    • Click on “NCBI” icon


www.library.health.ufl.edu

Click on “Databases” from HSCL Website


Omim online mendelian inheritance in man
OMIM - Online Mendelian Inheritance in Man

  • Catalog of human genes and genetic disorders

  • 19,854 records (as of 1/27/10)

  • Records are basically “review articles”

  • Records link to PubMed, sequences, structures, etc.

  • Built on Entrez architecture

  • Search tip – look for your disease or gene in “title” field on “Limits” page



We will search for information on “Sipple Syndrome”, but first we limit so that we search only in the title field

x

Limit so that your terms reside only in the “title”


Type in Sipple Syndrome, then click “Go” first we limit so that we search only in the title field

Link to discussion of Sipple Syndrome

Link to OMIM Gene Map


Table of Contents for Sipple Syndrome record first we limit so that we search only in the title field

Record was retrieved via these words in title

Link to record for the RET Oncogene


Table of Contents for RET gene record first we limit so that we search only in the title field


Pubmed
PubMed first we limit so that we search only in the title field

  • Journal literature database

  • Pre-clinical and clinical information – best literature database to use for Dr. Miyamoto’s project

  • Approximately 5,200 journals covered; currently over 18,000,000 records

  • Most citations include abstract

  • Can search via keyword, but has been built to take advantage of controlled vocabulary search


Controlled vs non controlled vocabularies
Controlled vs Non-controlled Vocabularies first we limit so that we search only in the title field

  • “Old People” Example


Controlled vocabulary
Controlled Vocabulary first we limit so that we search only in the title field

  • Controlled terms act as “umbrella” to pick up all synonyms, spelling differences (hemoglobin/haemoglobin), singular vs plural, etc.

  • In PubMed, use MeSH Database to find and search controlled MeSH terms (Medical Subject Headings)

  • Once in MeSH Database, can use additional options to enhance search (major heading, subheadings, etc.)


Mesh example
MeSH Example first we limit so that we search only in the title field

  • Find journal articles on the “immunological aspects of breast cancer and vaccines”; but only those papers where “immunological aspects of breast cancer” is the main point of the articles you find.

  • Search PubMed


Enter PubMed through our direct link (rather than through NCBI) and you will be able to directly see if the HSCL owns the journal articles you find


The “ufhsclib” indicates that you have entered PubMed correctly, and that the journals the library owns will be apparent

Use the MeSH Database as a dictionary to find the appropriate MeSH term, and then to refine your search


Note that we have left PubMed and are in the MeSH “dictionary”

AIDS

You typed “breast cancer” into MeSH database

Use “breast neoplasms” rather than breast cancer

Click on the link to refine the search


Topical subheadings help focus search to one or more aspects of the subject

Check here and your topics will be the main point of the articles you find – you won’t get peripheral citations. Not recommended the first time you search a topic – if there are few papers in existence for your topic, you may be left with no articles at all


Note that the term “Breast Neoplasms” will pick up all the more specific types of breast cancer


Send your search to the search box the more specific types of breast cancer

MeSH automatically builds the search for you – in this example, you are looking for papers in which the immunological aspects of breast cancer are the main point of all the articles you retrieve

Click “Search PubMed”


Once you have sent the search to the search box, and clicked on “search PubMed”, you leave the MeSH Database, and the search is performed in PubMed

Note that this is the search the MeSH Database built for you – it used the MesH term “breast neoplasms”, glued “immunology” directly to the search by using the slash, and picked up all the different types of breast neoplasms. MeSH also retrieved only the papers where these topics were the main points of the articles. You did not need to do any of this yourself – MeSH did it for you once you found the proper MeSH term, and clicked on subheading. Now we need to complete the second half of the search – vaccines


Now we need to complete the second half of the search – vaccines. Pull down the drop-down so you are in MeSH again, and search for the MeSH term. Look through the list to see if there is one that is most appropriate. Since we are looking for vaccines related to breast cancer, perhaps “cancer vaccines” would be useful. Read the “scope note” to be sure.

Scope Note


As in the breast cancer search, you can choose a subheading and limit to articles where this topic is the main point; I’ve chosen not to do so here (if you don’t choose suheadings or main point, remember to click on the check box next to “cancer vaccines”.) Send to search box; click “search PubMed”

You’ve now found articles on cancer vaccines, but you need to combine the breast cancer and cancer vaccines concepts


Boolean operators
Boolean Operators and limit to articles where this topic is the main point; I’ve chosen not to do so here (if you don’t choose suheadings or main point, remember to click on the check box next to “cancer vaccines”.) Send to search box; click “search PubMed”

  • Search statements may be combined using AND, OR, NOT

AND

OR

NOT


To combine searches, choose “Advanced Search” and limit to articles where this topic is the main point; I’ve chosen not to do so here (if you don’t choose suheadings or main point, remember to click on the check box next to “cancer vaccines”.) Send to search box; click “search PubMed”

The Advanced Search screen displays your PubMed history; from here you can combine your two searches using the appropriate Boolean operator

For Part B, print the PubMed history, which shows your searches.


You have now found papers in which the immunology of breast cancer is the main point of the article, and those papers are also about cancer vaccines


Mesh etc
MeSH etc. cancer is the main point of the article, and those papers are also about cancer vaccines

  • MeSH Database:

    • Found appropriate search terms

    • Automatically exploded “breast neoplasms”, so narrower terms (“breast neoplasms, male”, “carcinoma, ductal, breast”, etc) were ORed together

    • Allowed the addition of subheadings (immunology) to narrow to a particular aspect

    • Allowed narrowing to “main point”

  • Use History to combine (AND)


Mesh caveats
MeSH Caveats cancer is the main point of the article, and those papers are also about cancer vaccines

  • Performing a MeSH search is usually more precise and exhaustive than a keyword search, however:

    • The most recent papers are not searched - therefore should also complete a keyword search “in process”

    • Very new concepts/scientific terms may not yet be represented by MeSH

    • Very specific or rare concepts may never be represented by MeSH

  • So sometimes you will need to do a keyword search as well


In process
In Process cancer is the main point of the article, and those papers are also about cancer vaccines

  • In our “breast cancer, immunology, cancer vaccine” example, perform the following keyword search, only in the newest records (in process)

    • ((vaccin*) AND (breast cancer* OR breast neoplasm* OR breast tumor*)) AND in process [sb]

    • Try as many synonyms as possible

    • [sb] must be included to tell computer to just search the “in process” part of the database

    • * truncates to word root

  • This search picks up the current articles that do not yet have MeSH terms


Link out to e journals
Link Out cancer is the main point of the article, and those papers are also about cancer vaccinesto E-journals

  • Remember, if you entered PubMed directly from the HSCL’s icon, you can see if the HSCL owns the journal articles you found

  • Choose the “abstract” or “citation” displays from the pulldown menu

  • Brown and blue icons tell if the HSCL owns that journal issus electronically or in print

  • Will NOT tell you what is available at Marston Science Library


What if pubmed does not indicate the article is owned at uf
What if PubMed does not indicate the article is owned at UF? cancer is the main point of the article, and those papers are also about cancer vaccines

  • Use the “Catalog” to see if the paper is available in print at the HSCL, Marston Science Library or elsewhere on campus

  • The catalog may also be used to help locate books, government documents, videotapes, etc – items that are not indexed in PubMed


www.library.health.ufl.edu cancer is the main point of the article, and those papers are also about cancer vaccines

Click on “Catalog” from HSCL Website


Entrez nucleotides genbank
Entrez Nucleotides (GenBank) cancer is the main point of the article, and those papers are also about cancer vaccines

  • Database of nucleotide sequences (ATGC)

  • Actually contains data from several databases - GenBank, EMBL, DDBJ, RefSeq

  • Hard to search because many submitting scientists send in redundant information and poorly annotated information


Nucleotide data domain
Nucleotide Data Domain cancer is the main point of the article, and those papers are also about cancer vaccines

  • As of December 15, 2009

    • Over 110,118,557,163bases

    • Over 112,910,950sequence records

    • Over 200,000 species represented

    • Some complete genomes and chromosomes


Organisms represented
Organisms Represented cancer is the main point of the article, and those papers are also about cancer vaccines

  • Homo sapiens

  • Many model organisms, including:

    • Mus musculus

    • Caenorhabditis elegans

    • Oryza sativa

    • Drosophila melanogaster

    • Arabidopsis thaliana

  • Non-model organisms as well (trout, etc.)


International collaboration
International Collaboration cancer is the main point of the article, and those papers are also about cancer vaccines

  • Contributors:

    • GenBank

    • European Molecular Biology Laboratory (EMBL)

    • DNA Databank of Japan (DDBJ)

  • Daily exchange of data among these groups


Genbank sample record
GenBank Sample Record cancer is the main point of the article, and those papers are also about cancer vaccines

  • Before searching, we will look at the GenBank sample record

  • Retrieve the sample record from the main page – click on “DNA & RNA”, then “GenBank”, then choose the “record” link.

  • Note that the “Features” field provides useful biological information, and may be searched


Click any link in sample record to access definition of field and search tips

“Definition” field acts as record title – search [titl]

Unique identifier; assigned by NCBI; required by journals/grants

Link to PubMed citation/abstract


The “Features” field provides the most biological information; search as [fkey]

Numbers indicate location on the nucleotide sequence


…3158 information; search as [fkey]


Searching nucleotides
Searching “Nucleotides” information; search as [fkey]

  • Database is difficult to search:

    • Redundant records

    • Archival - poor or missing annotation

  • Best searches are done using commands; need a class to learn all


Example
Example information; search as [fkey]

  • Perform a basic Search in Entrez Nucleotides for “human presenilin 1”

  • Why did we get so many non-human records?

  • Check “Details” to see how search was parsed


Choose “nucleotide” from dropdown, then click “search”

Search for HUMAN presenilin 1

But end up with rat, mouse, etc.


Specify gene and organism
Specify Gene and Organism “search”

  • One trick – search your topic as a “gene” (psen1) and choose your taxon as “organism” (human)

    • Note – you may miss relevant sequences, but should not pick up irrelevant sequences

  • Easiest way to perform this search:

    • searching with commands: psen1 [gene] AND human [organism]



Best sequences
Best Sequences “search”

  • The subset “RefSeq” contains the “best” sequence:

    • Non-redundant

    • Well understood and annotated

    • Checked for sequencing error

  • Not all genes or organisms have RefSeq records available


Refseq records
RefSeq Records “search”

  • Easy to distinguish –

    • Two letters, underscore, then numbers (NM_123456)

  • Easy to search for –

    • Use “only from” on “limits” page

    • Complex boolean – AND srcdb_refseq [prop]

    • Click tab(easiest way)


Refseq
RefSeq “search”

  • Example - Find only RefSeq records for human presenilin 1

  • Compare information in RefSeq record to that found in previous nucleotide records


Click on the RefSeq tab to retrieve only the “best” sequences (highly annotated, complete, nonredundant)

The typical RefSeq accession number format: 2 letters, an underscore, and then numbers


Viewing formats
Viewing Formats sequences (highly annotated, complete, nonredundant)

  • The “Default” view is the standard GenBank record

  • Researchers often use the “FASTA” format for analysis

  • Change the record format at the “Display” pull-down menu


ad