A Real-World Knowledge Engineering Application:The NeuroScholar Project Gully APC Burns K. M. Research Group University of Southern California
Structure of the presentation • Ideas & Concepts • Design • Implementation • Demonstration
I. Ideas & Concepts In which we are reminded of what most people think knowledge is, how it is currently used (and misused) and how we might improve matters.
What does the word ‘Knowledge’ mean? • Main Entry: knowl·edgePronunciation: 'nä-lijFunction: nounEtymology: Middle English knowlege, from knowlechen to acknowledge, irregular from knowenDate: 14th century1obsolete: COGNIZANCE2 a (1) : the fact or condition of knowing something with familiarity gained through experience or association (2) : acquaintance with or understanding of a science, art, or technique b (1) : the fact or condition of being aware of something (2) : the range of one's information or understanding <answered to the best of my knowledge> c: the circumstance or condition of apprehending truth or fact through reasoning : COGNITIONd: the fact or condition of having information or of being learned <a man of unusual knowledge>3archaic: SEXUAL INTERCOURSE4 a: the sum of what is known: the body of truth, information, and principles acquired by mankind barchaic: a branch of learning [from http://www.m-w.com/]
The published literature … is the end-product of research and as such forms the basis for human understanding of the subject … is very valuable. … is structured. … is interpretable. Image taken from U.S. Geological Survey Energy Resource Surveys Program
The published literature … is large and unwieldy. … has varying reliability. … is inconsistent. … is based on natural language. … is difficult to automate. … is terse … is qualitative … is 2-D Image taken from U.S. Geological Survey Energy Resource Surveys Program
The published literature … is a valid target for attack with informatics-based methods. This permits … (a) Increased clarification through formalization (b) large-scale data-handling capability (c) analysis of existing data to examine organization Image taken from U.S. Geological Survey Energy Resource Surveys Program
The current status of ‘theory’ in Neuroscience How we would like neuroscientists to think Where we would like to work A semantic continuum • [Mike Uschold, Boeing Corp] Shared human consensus Semantics hardwired; used at runtime Semantics processed and used at runtime Text descriptions Implicit Informal (explicit) Formal (for humans) Formal (for machines) • Further to the right means: • Less ambiguity • More likely to have correct functionality • Better inter-operation (hopefully) • Less hardwiring • More robust to change • More difficult
What’s wrong with this picture?…from a neuroscientist’s point of view… Number of structures = 500 x 2 Number of Cell Groups per structure = 10 Number of Possible Connections between cell groups = 10,000 x 10,000 = 108 Estimated Number of Connections between cell groups = 250,000 From Swanson (1998), “Brain Maps, Structure of the Rat Brain”, 2nd edition, Elsevier, Amsterdam.
… it’s even worse than that … • Neuroscience is extremely multidisciplinary • Spatial Scales of Measurement: 101 – 10-9 m • Temporal Scales of Measurement: 70 yrs (2.21x109 s) to 10-3 s(not even including evolutionary time!) • Study occurs in a heterogeneous theoretical framework involving: • Anatomy, Physiology, Psychology, Ethology, Biochemistry (Molecular Biology, Genetics, Bioinformatics), Biophysics, Behavioural Ecology, Biology … to name a few… • All of these subjects are specialized, hard to link work between disciplines and across levels
… & it’s even worse than that !!! • Neuroanatomical nomenclature are the closest thing that neuroscience has for a standardized framework… • In any given paper, the same name may be used for different structures, or different names may be used different structures. • e.g., ‘Globus Pallidus, pars medialis (GPm)’ also called the ‘Entopeduncular Nucleus’ by others. • See the index of Swanson (1998), “Brain Maps, Structure of the Rat Brain”, 2nd edition, Elsevier, Amsterdam list of synonyms according to one source.
We restrict the problem space to a specific soluble strategy • Describe a given phenomenon (e.g., the stress response). • Identify which populations of neurons are involved in the phenomenon (i.e., any neurons that turn on, turn off, change their firing, affect the phenomenon if messed with, etc.). • Represent how these populations of neurons are interconnected. • Represent the dynamic processes of there neurons that underlie the phenomenon.
A Construct: ‘A Knowledge Model’ • = A personalized representation of an individual’s knowledge. • e.g., A review article is an example of a non-computational knowledge model
Another Construct: ‘Knowledge Landscape’ • = A map of Knowledge Models (where each KM is timestamped) • e.g., An list of the best reviews of a given subject over time is an example of a non-computational knowledge landscape
II. Design In which all of these high-falutin’ ideas are put into a logical design and it becomes clear that the design criteria of the NeuroScholar project distinguish it from pure research in computer science
Some design requirements • In order of importance • Powerful & enabling to neuroscientists in their everyday work • Easy to use! (i.e., free, multi-platform, one-click installation) • Knowledge acquisition / data collation is the rate limiting step • Open-source for future development as an academic project.
Knowledge Landscapes NeuroScholar Screenshot- (dummy data)
‘Knowledge Landscape’ ‘Data Collection’ ‘Fragments’ ‘Knowledge Model’ ‘Entities’ ‘Properties’ ‘Relations’ ‘Annotations’ Knowledge Landscapes NeuroScholar Screenshot- (dummy data)
‘Data Collection’ A set of data fragments ‘Annotations’ Knowledge Models & examples e.g. a publication: Allen GV & DF Cechetto. (1993) J Comp Neurol 330:421-438. ‘Fragments’ ‘Entities’ ‘Properties’ ‘Relations’
individual pieces of the literature ‘Fragments’ ‘Annotations’ Knowledge Models & examples ‘Data Collection’ e.g. descriptions of experimental results.“… Moderate to light terminal labeling was present in the parvocellular portions of the paraventricular nucleus, anterior-hypothalamic nucleus, anterior portion of the lateral hypothalamic area (Figs. 2D, 3B), and in the central nucleus of the amygdala (Fig, 2D)….” From Allen & Cechetto (1993) ‘Entities’ ‘Properties’ ‘Relations’
e.g. neuronPopulation object knowledge type = descriptiondomain type = tract-tracing experiment ‘Entities’ brainVolumes experimentalMethod labeling ‘Properties’ ‘Annotations’ injectionSite labeling Knowledge Models & examples Abstract data structures that capture the meaning of a set of fragments within the framework of the NeuroScholar system ‘Data Collection’ ‘Fragments’ ‘Relations’
ZI LHA ‘Annotations’ Knowledge Models & examples Rules that link two objects together. ‘Data Collection’ ‘Fragments’ ‘Entities’ ‘Properties’ ‘Relations’ ‘Relations’
Sets of objects and relations, explicitly selected and prioritized within system neuronPopulation2 ‘Annotations’ neuronPopulation1 Knowledge Models & examples ‘Data Collection’ ‘Fragments’ ‘Summaries’ ‘Entities’ ‘Properties’ ‘Relations’
‘Annotations’ Human-interpretable text to make contents of knowledge base understandable ‘Annotations’ Knowledge Models & examples ‘Data Collection’ ‘Fragments’ ‘Objects’ ‘Properties’ ‘Relations’
Distributed Online Sources of Information ‘Fragments’ Local Implementation
Distributed Online Sources of Information Users’ Spaces & Models ‘Fragments’ Centralized Published Knowledge Repository Local Implementation
Distributed Online Sources of Information ‘Fragments’ Users’ Spaces & Models ‘Pending Review’
Distributed Online Sources of Information ‘Fragments’ Users’ Spaces & Models P2P sharing KnowledgeModelComparison
Knowledge Model Comparison • Given two users A & B, with Knowledge Models KA & KB being shared under the P2P model. • We want A to be able to run a program that automatically compares KB to KA so that the discrepancies and contradictions between the two models can be understood and reconciled.
What’s wrong with this picture?…from an computer scientist’s point of view… • Where is the formal logic? It’s o.k. if we only export knowledge models to a formal logic-based representation rather that base our entire approach on it. Knowledge Acquisition is the rate-limiting step!
Knowledge Representation • Knowledge representation is a multidisciplinary subject that applies theories and techniques from three other fields: • Logic provides the formal structure and rules of inference. • Ontology defines the kinds of things that exist in the application domain. • Computation supports the applications that distinguish knowledge representation from pure philosophy… • Sowa (2000), Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks Cole Publishing Co., Pacific Grove, CA.
Knowledge Representation • … Without logic, a knowledge representation is vague, with no criteria for determining whether statements are redundant or contradictory. Without ontology, the terms and symbols are ill-defined, confused, and confusing. And without computable models, the logic and ontology cannot be implemented in computer programs. Knowledge representation is the application of logic and ontology to the task of constructing computable models for some domain. • Sowa (2000), Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks Cole Publishing Co., Pacific Grove, CA.
III. Implementation In which the design issues become concerned with more pressing concerns like: ‘how are we actually going to build this thing?’
Some implementation choices • Built under UML-based software engineering paradigm • The View-Primitive-Data-Model framework (‘VPDMf’) • Object Oriented Design • Unified Modeling Language (UML) • PerlOO • Java • Relational Databases • MySQL • Informix • Exporting Ontologies (via the VPDMf) • XML, RDF, Flogic • Exporting Logic • Embedded within typed Relation objects within the OO knowledge model. • Use simple method overloading in Java to run Knowledge Model Comparison
VPDMf System Builder UML-based documentation VPDMf specs (Data Model file & VPDMf XML files) Forward Engineering DBMS Reverse Engineering Final Working System User Interface Component
VPDMf Admin App Plugins Plugins VPDMf Client App Implementation Plan Client Server Review Database Main Database Local Database
VPDMf Admin App Local Apps Plugins Plugins VPDMf Client App Implementation Plan Client Server Review Database Main Database Local Database VPDMf System Builder
VPDMf Admin App Plugins Plugins VPDMf Client App Implementation Plan Client Server Review Database Main Database Demonstration Local Database
Data management of publication data General knowledge management structures Annotations, Justifications, Judgements Experimental data, General histological data Neuroanatomical tract tracing data Final output of the system: the knowledge model Components of the knowledge model specific to neuronal data General data constructs used throughout the system Large scale organization of NeuroScholar’s schema
ViewLink ViewDefinitionArticle ViewDefinitionFragment
Additional Functionality: Specialized Form Controls &Plugins • The Article Robot Form Control Uses PubMed to retrieve citation information easily • The Fragmenter Plugin Allows delineation of fragments on pdf files • The AtlasMapper Plugin Allows delineation of regions on brain maps
IV. Demonstration In which the truth is finally revealed