ontologies and biomedicine l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Ontologies and Biomedicine PowerPoint Presentation
Download Presentation
Ontologies and Biomedicine

Loading in 2 Seconds...

play fullscreen
1 / 38

Ontologies and Biomedicine - PowerPoint PPT Presentation


  • 257 Views
  • Uploaded on

Ontologies and Biomedicine. What is the "right" amount of semantics ?. Ontologies and Biomedicine. The “right” amount of semantics depends on what you want to do with it. Ontologies and Biomedicine. Research is based on inference from what is known, and therefore it demands rigor.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Ontologies and Biomedicine' - RexAlvis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ontologies and biomedicine

Ontologies and Biomedicine

What is the "right" amount of semantics?

ontologies and biomedicine2

Ontologies and Biomedicine

The “right” amount of semantics depends on what you want to do with it

ontologies and biomedicine3

Ontologies and Biomedicine

Research is based on inference from what is known, and therefore it demands rigor

ontologies and biomedicine4

Ontologies and Biomedicine

Without rigor, we won’t—know what we know, or where to find it, or what to infer from it.

semantic spectrum
Semantic Spectrum

Ambiguous

Logical and precise

Natural Language

Computable Ontology

Highly expressive

Less expressive

ad hoc tagging approach
Ad hoc tagging approach
  • Let the users defined words and phrases
    • Foregoes the use of an expertly curated vocabulary or ontology.
  • Fast and distributed approach yields a vast amount of content
    • No recruitment and training of people to maintain the ontology is required.
    • No recruitment and training of annotators to interpret the material is required.
ad hoc tagging approach8
Ad hoc tagging approach
  • Tagging approach places the burden of interpretation and classification on every end user
    • Overall this is more costly and wasteful
    • Is inappropriate in the scientific domain
  • The problem is not about people communicating. It is about computers and HCI.
build apply and use
Build, apply, and use
  • Ontology captures current scientific theory that seeks to explain all of the existing evidence and is used to draw inferences and make predictions
    • Acts like a review
    • Requires curators who are experts in both the science and logic
  • Ontology application is the real bottleneck
    • But overall is less costly and wasteful
slide10
Univocity:

Terms should have the same meanings on every occasion of use

  • Positivity:

Terms such as ‘non-mammal’ or ‘non-membrane’ do not designate genuine classes.

  • Objectivity:

Terms such as ‘unknown’ or ‘unclassified’ or ‘unlocalized’ do not designate biological natural kinds.

  • Single Inheritance:

No class in a classification hierarchy should have more than one is_a parent on the immediate higher level

  • Intelligible Definitions:

The terms used in a definition should be simpler (more intelligible) than the term to be defined

  • Reality Based:

When building or maintaining an ontology, always think carefully at how classes relate to instances in reality

  • Distinguish Classes and Instances:

What is necessarily true for instances is not necessarily true for classes

annotation bottleneck
Annotation bottleneck
  • An active lab can easily generate 10-100GB of data per month, and it is very difficult to manage data on this scale.
  • Even the best analytic schemes will be for naught if we cannot find our data.
  • And the data is complex
  • Yet, the annotation effort required will be utterly wasted if it cannot be reliably computed upon.
implies numerous light ontologies
3-dimensions

Protein function

Cell type

Tissue

Stage

Cellular component

Organism

And more…

Implies numerous “light” ontologies
or it implies a single complex one
3-dimensions

Protein function

Cell type

Tissue

Stage

Cellular anatomy

Organism

And more…

Plus all of the relations between these elements

Or it implies a single complex one
practicalities
Practicalities
  • The ontology should be robust or the annotator’s time is wasted
  • Research won’t wait, data must be annotated at the rate at which it is generated
  • Complex ontologies are much more difficult to get right than lighter ones
  • Light ontologies are easier to build and maintain
  • Complex ontologies can be built from lighter ones
the aims of go
The aims of GO
  • To develop comprehensive shared vocabularies of terms describing aspects of molecular biology.
  • To describe the gene products held in each contributing model organism database.
  • To provide a scientific resource for access to the vocabularies, the annotations, and associated data.
  • To provide a software resource to assist in curation of GO term assignments to biological objects.
the primary strength of the go
The primary strength of the GO
  • The GO covers three domains of biology
    • Molecular Function
    • Biological Process
    • Cellular Component
  • These are “precisely defined” axes of classification
the breakdown of work
The breakdown of work
  • Task 1
    • Building the ontology: a computable description of the biological world
  • Task 2
    • Describing your gene product—annotation
      • Biological process
      • Molecular function
      • Cellular localization
the early key decisions
The early key decisions
  • The vocabulary itself requires a serious and ongoing effort.
  • Carefully define every concept
  • Initially keep things as simple as possible and only use a minimally sufficient data representation.
  • Focus initially on molecular aspects that are shared between many organisms.
go databases distributed and centralized
GO databases: distributed and centralized
  • Support cross-database queries
    • By having a mutual understanding of the definition and meaning of any word used to describe a gene product
  • Provide database access to a common repository of annotations
    • By submitting a summary of gene products that have been annotated
slide22

GO data

FTP

GO CVS

Scripts

HTTPD

Anonymous

CVS

slide23

Many Scripts

GO CVS

AmiGO

GO Database

godatabase org
GODatabase.org
  • Hits = 77,012
  • Visits = 14,063
  • Sites = 6,638
  • Averages per week
slide26

Number of links to a site: as reported by Google

www.geneontology.org 7,240

www.godatabase.org 33

obo.sourceforge.net 10

song.sourceforge.net 6

genome.ucsc.edu 3,670

www.ncbi.nih.gov 12,000

www.ebi.ac.uk 14,900

sciencemag.org 14,900

www.ncbi.nlm.nih.gov 34,500

slide27

Most Common GOIDs accessed via AmiGO

72020 GO:0006810 transport

56862 GO:0005524 ATP binding

53622 GO:0019012 virion

47773 GO:0006955 immune response

46943 GO:0003677 DNA binding

41474 GO:0006508 proteolysis and peptidolysis

41126 GO:0006355 regulation of transcription, DNA-dependent

40427 GO:0004872 receptor activity

34943 GO:0005215 transporter activity

30890 GO:0007186 G-protein coupled receptor protein signaling pathway

30001 GO:0003700 transcription factor activity

28127 GO:0006118 electron transport

26636 GO:0005509 calcium ion binding

24007 GO:0006968 cellular defense response

21250 GO:0016486 peptide hormone processing

20440 GO:0008152 metabolism

19742 GO:0005515 protein binding

19316 GO:0007155 cell adhesion

18254 GO:0005198 structural molecule activity

slide28

Taxon covered by the GO (some)

Arabidopsis: TAIR, taxon:3702

Caenorhabditis: WormBase, taxon:6239

Candida albicans: CGD, taxon:5476

Danio: ZFIN, taxon:7955

Dictyostelium: DictyBase, taxon:5782

Drosophila: FlyBase, taxon:7227

Mus: MGI, taxon:10090

Oryza sativa: Gramene, taxon:39947 = Oryza sativa (japonica cultivar-group);

Rattus: RGD, taxon:10116

Saccharomyces: SGD, taxon:4932

Leishmania major: GeneDB, taxon:5664

Plasmodium falciparum: GeneDB, taxon:5833

Schizosaccharomyces pombe: GeneDB, taxon:4896

Trypanosoma brucei: GeneDB, taxon:185431

Bacillus anthracis: TIGR, taxon:198094

Coxiella burnetii: TIGR, taxon:227377

Geobacter sulfurreducens: TIGR, taxon:243231

Listeria monocytogenes: TIGR, taxon:265669

Methylococcus capsulatus: TIGR, taxon:243233

Pseudomonas syringae: TIGR, taxon:223283

Shewanella oneidensis: TIGR, taxon:211586

Vibrio cholerae: TIGR, taxon:686

nih funded experimental research that uses the go
National Institute on Aging (NIA)

National Institute of Allergy and Infectious Diseases (NIAID)

National Cancer Institute (NCI)

National Institute on Drug Abuse (NIDA)

National Institute on Deafness and Other Communication Disorders (NIDCD)

National Institute of Dental & Craniofacial Research (NIDCR)

National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)

National Institute of Biomedical Imaging and Bioengineering (NIBIB)

National Institute of Environmental Health Sciences (NIEHS)

National Eye Institute (NEI)

National Institute of General Medical Sciences (NIGMS)

National Institute of Child Health and Human Development (NICHD)

National Human Genome Research Institute (NHGRI)

National Heart, Lung and Blood Institute (NHLBI)

National Library of Medicine (NLM)

National Institute of Neurological Disorders and Stroke (NINDS)

National Center for Research Resources (NCRR)

NIH-funded experimental research that uses the GO
other funded experimental projects that use the go
Other funded experimental projects that use the GO
  • Public Heath Service
  • Walter Reed Army Medical Center
  • United States Department of Agriculture
  • Department of Defense
  • USAID
  • National Science Foundation
a successful case study31

A “successful” case study

There are still challenges to meet

building upon sharing light axiomatic ontologies eliminates
Building upon (sharing) light, axiomatic ontologies eliminates:
  • Spelling mistakes or differences
    • oesinophil vs. eosinophil
  • Differences in synonyms, names or naming conventions
    • Spermatazoon, sperm cell, spermatozoid, sperm
  • Differences in definitions
    • pericardial cell develops_frommesodermal cell vs.Nothingdevelops_from pericardial cell
  • Inconsistent structure
inconsistent structure
Inconsistent structure

CL

GO

hemocyte

hemocyte differentiation

(sensu Arthropoda)

plasmocyte

plasmatocyte differentiation

lamellocyte differentiation

lamellocyte

finer granularity in the go
GO

immune cell

activation, migration, chemotaxis…

erythrocyte differentiation is_a myeloid blood cell differentiation”

CL

no such term: “immune cell”

no such term: “myeloid blood cell”

Finer granularity in the GO
courser granularity in the go
GO

neuroblast proliferation is_a cell proliferation

CL

neuroblast is_a neuronal stem cell is_a stem cell is_a cell

Courser granularity in the GO
even a light ontology like the go is difficult enough
Even a “light” ontology like the GO is difficult enough
  • A methodology that enforces clear, coherent definitions:
  • Promotes quality assurance
    • intent is not hard-coded into software
    • Meaning of relationships is defined, not inferred
  • Guarantees automatic reasoning across ontologies and across data at different granularities
  • Consequences of inconsistencies
    • Hard to synchronize manually
    • Inconsistent user-search results
meeting the goal drawing inferences
Meeting the goal: Drawing inferences

PMID:5555

PMID:4444

Direct evidence

Direct evidence

?

SP:1234

SP:8723

SP:19345

A

B

C

D

Human

human

Indirect evidence

SP:48392

PMID:8976

B

Xenopus

toad

Indirect evidence

SP:48291

SP:38921

B

C

PMID:3924

Drosophila

PMID:9550

yeast

chris mungall
Chris Mungall

Sima Misra

Thank you

NCBO

Reactome

GO

SO