560 likes | 669 Views
Introduction to anatomy ontology building. David Osumi -Sutherland FlyBase ( www.flybase.org ) Virtual Fly Brain ( www.virtualflybrain.org ). Take home messages. An ontology is a classification There are lots of useful ways to classify stuff
E N D
Introduction to anatomy ontology building David Osumi-Sutherland FlyBase (www.flybase.org) Virtual Fly Brain (www.virtualflybrain.org)
Take home messages • An ontology is a classification • There are lots of useful ways to classify stuff • Maintaining multiple classification schemes by hand is impractical • So you should automate it. • Everybody makes mistakes • So you should get the computer find errors for you • Re-use other people’s work where possible • import class hierarchies • use common patterns • Cautionary note – formal languages have limitations. Don’t expect to be able to express everything!
What is an ontology ? • A set of defined, inter-related terms to use in annotation/metadata/knowledge bases. • A classification • A query-able store of (scientific) knowledge that uses logical inference.
What is an ontology ? • A set of defined, inter-related terms to use in annotation/metadata/knowledge bases. • A classification • A query-able store of (scientific) knowledge that uses logical inference. depends on depends on depends on
What (use) is an ontology? • A set of defined, inter-related terms to use in annotation. • Annotation of • papers; specimens; gene expression; phenotype… • Use of common annotation terms across multiple databases allows easy shared integration. • Relations between terms allow annotations to be grouped in scientifically meaningful ways • requires an ontology to be an accurate and scientifically meaningful classification and store of scientific knowledge.
What is an ontology ? • A classification • There are lots of scientifically useful ways to classify a bit of anatomy. • its parts and their arrangement • its relation to other structures • what is it: part of; connected to; adjacent to, overlapping? • its shape • its function • its developmental origins • its species or clade • its evolutionary history?
What is an ontology ? • The scientific knowledge an ontology contains can make the reasons for classification explicit. • e.g. • Any sense organ that functions in the detection of smell is an olfactory sense organ • All large basiconicsensilla of the antenna function in detection of smell • Therefore all large basiconicsensilla of the antenna are are olfactory sense organs
Why ontology development is like software or database development • Ideal case – • maintainable • basic maintenance (e.g. correcting simple errors) is easy • scalable • grow your project as large as you need without breaking • extensible • easy to add new functionality without breaking existing • integrate-able • Can integrate easily with work of others – so you don’t have to solve all problems yourself
Why ontology development is like software or database development • Ideal case – Future editors can build on your work • maintainable – By multiple editors • basic maintenance (e.g. correcting simple errors) is easy • scalable – By multiple editors • grow your project as large as you need without breaking • extensible – By multiple editors • easy to add new functionality without breaking existing • integrate-able • Can integrate easily with work of others – so you don’t have to solve all problems yourself
How not to build ontologies- The trap • A small, simple ontology or program with one developer can get away with practices that a large one can not • given • shallow, single inheritance classification (each class has 0-1 superclasses) • very few relationship types • < 1000 terms. • it is feasible to: • have little annotation/documentation • have no automated error checking • have no automated classification • keep redundancy to a minimum by hand
How not to build ontologies- The trap • Small, simple ontologies and programs have a habit of growing large and complicated. • Users demand lots more terms for annotation • Users demand multiple axes of classification • No scientific reason to favor one over another • Users demand/editors favor multiple relationship types to record information they believe scientifically important. • Editors/coders move on • someone else has to continue their work. Is the documentation mainly in the old developers head?
How not to build ontologies- The trap • Worst case scenario – the tangled pit of misery: • Difficult, perhaps impossible to maintain or extend • Tangled, convoluted, redundant structure with little or no documentation or annotation. • Editing tends to inadvertently break previous functionality. • Little or no error checking means you don't even notice when you break stuff. Users find out later. • Even you can't easily edit what you built 6 months ago without getting confused and making a mess.
Avoiding tangled pits of misery • There are no perfect answers, but these might help: • good annotation and documentation; • good, consistent style; • avoidance of redundancy; • let the computer keep track of things for you • modularity; • automate • a consistent set of tests of existing functionality (j-unit / consistency); • constant testing during development; • design patterns.
Good Practice 1:Good annotation and documentation • Clear textual definitions with references • ensure accurate manual annotation • make assertions of scientific fact trace-able • serve as documentation for future ontology developers • Also useful to record – for users and future developers: • Experimental evidence for assertions of scientific fact • Notes on confusing or conflicting usage of terms • Reasons for design choices/compromises
Options for formalization • OWL • W3C standard • Decidable • Big open source community of tool developers • multiple fast reasoners – getting better all the time • Easy to read syntax – OWL Manchester syntax (OWL MS) • OBO • Best thought of as a subset of OWL, with which it is increasingly integrated • Limited community of tool developers • Easy(ish) to read syntax • Common logic • Very powerful. But easy to come up with solutions that can’t be usefully reasoned with.
Relationships are the formalized part of a definition. • The criteria for class membership is recorded using textual definitions, at least some elements of which are formalized as relationships. • name: insect wing • def: “A membranous dorsal appendage or the meso- or metathorax that functions in flight .” [Snodgrass, 1935] • is_a: appendage • relationship: part_of thoracic segment • relationship: has_function_in flight
Classification is transitive • If A SubClass* of B and B SubClassOf C then A SubClassOf C • All members of class A are members of class C. So, the definition of class C must apply to class A. * OWL (MS) SubClassOf≅OBO is_a
Classification is transitive • ‘material anatomical entity’ <- is_a ‘sense organ’ <- is_asensillum <- is_a ‘olfactory sensillum’ <- is_a ‘antennal basiconicsensillum’ • ‘material anatomical entity’: “… has mass.” • ‘sense organ’: “… functions in the detection of a stimulus involved in sensory perception.” • sensillum: “A sense organ consisting of a small cluster of cells of various types.” • ‘olfactory sensillum’: “… functions in the detection of smell” * OWL (MS) SubClassOf≅OBO is_a
class – class relationships are quantified • Class:Class relationships are many to many • Does the relation apply to all or just some of the class ? • we specify this with quantifiers: • ∀: for all, all, only, every • ∃: there exists, some • Cautionary note – • Modeling knowledge as class hierarchies defined with quantified logic is an extremely useful but is limited. • Don’t expect to be able to use if for everything you know! • Expressivity of OWL is more limited still.
relationships specify necessary conditions for class membership • Being part of an insect thorax is a necessary condition of being in the class ‘insect leg’. • English: • All insect legs are part of some (type of) insect thorax • OBO (quantifiers hidden) • name: insect leg • relationship: part_ofthorax • OWL (MS): • ‘insect wing’ SubClassOfpart_ofsome thorax • PL: • ∀leg(x), ∃thorax(y) and part_of(x,y) * * ignoring time argument from OBO RO 2005
Classification is transitive • If A SubClass* of B and B SubClassOf C then A SubClassOf C • All members of class A are members of class C. So, the definition of class C must apply to class A. (all) leg part_ofsome thorax ‘front leg’ SubClassOfleg therefore (all) ‘front leg’part_ofsome thorax * OWL (MS) SubClassOf≅OBO is_a
Directionality and quantifiers • True: all ‘insect wing’ part_ofsome ‘insect thorax’ • False: all ‘insect thorax’ has_partsome ‘insect wing’ • True: all ‘claw’ connected_tosome ‘tarsal segment’ • False: all ‘tarsal segment’ connected_tosome claw
Manually maintaining an ontology with multiple classification schemes is impractical • It is difficult to keep track of multiple classification chains to: • ensure completeness; • avoid redundancy; • avoid introducing error due to inheritance of classification criteria from a distant ancestor
Automating multiple classification. • The scientific knowledge an ontology contains can make the reasons for classification explicit. • e.g. • Any sense organ that functions in the detection of smell is an olfactory sense organ • All large basiconicsensilla of the antenna function in detection of smell • Therefore all large basiconicsensilla of the antenna are are olfactory sense organs
Automating multiple classification. • We can specify that some set of necessary conditions for class membership are sufficient to determine class membership • English • Any sense organ that functions in the detection of smell is an olfactory sense organ • OWL (MS): • olfactory sense organ’ EquivalentTo: sense organ that has_function_insome ‘detection of chemical stimulus involved in sensory perception of smell’ • OBO • name: olfactory sense organ • intersection_of: sense organ • intersection_of: has_function_in ‘detection of chemical stimulus involved in sensory perception of smell’
Automating multiple classification. • ‘olfactory sense organ’ EquivalentTo: sense organ that has_function_insome ‘detection of chemical stimulus involved in sensory perception of smell’ • ‘large basiconicsensillum of antenna’ SubClassOf: ‘sense organ’; SubClassOfhas_function_insome ‘detection of chemical stimulus involved in sensory perception of smell’ • Reasoner concludes: ‘large basiconicsensillum of antenna’ SubClassOf‘olfactory sense organ’ Keene & Waddell, 2007
Use other people’s work to build your classification • Gene Ontology classification of sensory processes:
Some extra OWL expressivity • In OWL we can also specify number (cardinality): • (all) insect: SubClassOfhas_componentexactly 6 leg
Error checking is essential – everybody makes mistakes • Some classes don’t have instances in common. Nothing can be an oak tree and a fruit fly; an anatomical structure and a biological process. • We say that such classes are disjoint • Declaring classes to be disjoint allows reasoners to find contradictions. This is especially powerful when combined with domain and range constraints. • This is your main means of error checking. Use it extensively. It also speeds up some reasoners.
Error checking - domain and range constraints • ‘cortisol secretion’ SubClassOf ‘endocrine hormone secretion’ SubClassOf process • ‘adrenal gland’ SubClassOf ‘endocrine gland’ SubClassOfstructure • structure DisjointWithprocess (nothing can be both a structure(adrenal gland) and a process (e.g. cortisol secretion) • has_function_in • domain: structure* • range: process* if xhas_function_iny then x must be an object and y must be a process. • Now if I mistakenly add: cortisal secretion has_function_in some adrenal gland. • Inconsistency: cortisol secretion SubClassOfstructure and process * more strictly, structure= continuant; range = occurrent
Error checking is essential – everybody makes mistakes • Some classes don’t have instances in common. Nothing can be an oak tree and a fruit fly; an anatomical structure and a biological process. • We say that such classes are disjoint • Declaring classes to be disjoint allows reasoners to find contradictions. This is especially powerful when combined with domain and range constraints. • This is your main means of error checking. Use it extensively. It also speeds up some reasoners.
Reasoner assisted error checking by eye • Keep an eye on classification inferred by the reasoner. • Protégé shows inferred classification and inherited relationships – keep an eye on these
Reasoner assisted error checking by eye • Run some test queries – do they give the answers you expect?
Mereology part_of is transitive If A part_of B part_of C part_of D Then A part_of D overlap is not transitive. If A overlaps B overlaps C then A may or may not overlap C A B C D A C B C A B
Transitivity of part_of • Given • (All) ‘insect coxa’ part_ofsome ‘insect leg’ • (All) ‘insect leg’part_ofsome ‘insect thoracic segment’ • (All) ‘insect thoracic segment’part_ofsome ‘insect thorax’ • Then • (All) ‘insect coxa’ part_ofsome ‘insect thorax’
Automating partonomy • As for class – maintaining multiple overlapping part hierarchies by hand is hard. • Some scope for auto-populating partonomies – e.g.- • English • Any anatomical structure that functions in endocrine hormone secretion is part of some endocrine system • OWL • (‘anatomical structure’ that has_function_insome ‘endocrine hormone secretion’) SubClassOf(part_ofsome ‘endocrine system’) • OBO • name: endocrine system component • intersection_of: anatomical structure’ • intersection_of: has_function_in ‘endocrine hormone secretion’ • relationship: part_of endocrine system
Declaring spatial disjointness provides error checking for partonomy • In OWL:part_ofsome X DisjointWithpart_ofsome Y
Reasoning with overlap B A A overlaps B if and only if there exists some X and X part_of A and X part_of B rules: If X part_of A then X overlaps A If A has_part X then A overlaps A overlaps . * part_of . * has_part In OWL (MS) * = SubPropertyOf In OBO *= is_a X X A B
Reasoning with overlap B B A A More rules If A has_part X and X part_of B then X overlaps B If C has_part A and A overlaps B then C overlaps B If B overlaps A and A part_of C then B overlaps C In OWL (MS): has_partopart_of -> overlaps In OBO: name: overlaps holds_over_chain: has_partpart_of X X X A B C
Image - Greg Jefferis Keene & Waddell, 2007
Shortcut relations • In OWL, we can write compound class expressions: • ‘antennal lobe projection neuron’ has_partsome (soma that part_ofsome ‘antennal lobe cortex’) • But these can quickly get long and verbose • ‘‘DL1 adPN’ has_partsome (potsynaptic membrane (GO) that part_ofsome (synapse (GO) that part_ofsome ‘DL1 glomerulus’)))
Shortcut relations • Shortcut relations stand in for compound class expressions. • ‘DL1 adPN’ has_part some (potsynaptic membrane (GO) that part_of some (synapse (GO) that part_of some ‘DL1 glomerulus’))) • > • ‘DL1 adPN’ has_postsynaptic_terminal_in some ‘DL1 glomerulus’ • Can be expanded if detail needed. • Provides rigorous documentation of meaning.
Where to start? • Make a flat list of the terms you need and list the types of classification you want to use to link them together. • Has someone already formalized this type of classification? • If so, use their pattern. If not – draft some formalizations yourself: • Are any simplifications justifiable – or likely to be too misleading? • DON’T FORMALIZE FOR THE SAKE OF IT! Some classifications are hard to formalize well – or may be best left to human judgment. • Import upper classifications and relations • Import classifications to root for all foreign terms used. • Work with ontologists to formally define relations where possible • But don’t let this become a road block!
Technical issues • Imports: • Importing whole ontologies is easy in both OBO and OWL • But importing large ontologies is impractical in both • Generating simple slices of OBO ontologies is easy (have perl scripts, happy to share) • Generating slices of OWL ontologies – some tools (Ontofox), but still need work.
Developing nested ontologies CARO VAO Present TAO Modularized ontology
Resources • CARO – upper ontology • new version being prepared out soon. • Some standard patterns using qualities • FUNCARO • provides standard patterns for representing function using CARO + GO • ro.owl • new home for OBO relations – particularly shortcut relations. Imports fundamental relations from BFO (basic formal ontology)
Multiple classification • There are lots of scientifically useful ways to classify a bit of anatomy: • parts and their arrangement - • its relation to other structures • what is it: part of; connected to; adjacent to, overlapping? • its shape • its function • its developmental origins • its species or clade • its evolutionary history?