270 likes | 347 Views
A proposal for extending metadata in service meta-model including goals, use cases, and model alignment. Discusses browsing ontologies, detecting changes, and ensuring trust. Reviewing existing standards and alignment considerations.
E N D
Terminology Metadata Extension of the Service Meta Model SWG Proposal January 2008
Agenda • Background (5 min) • Review Proposed Model (15 min) • Discussion (5 min) • Vote • Next Steps (5 min)
Team Members • Tom Johnson (Mayo) • Frank Hartel (NCI) • George Komatsoulis (NCI) • Sal Mungal (Duke) • Hua Min (Fox Chase) • Scott Oster (OSU) • Mike Riben (MD Anderson) • Brian Davis (3rd Millennium)
Background - Goals • Goals: • Identify metadata queryable at the index service level • Narrow focus for first revision … • Initial model defined to satisfy discovery use cases • Support development of enhanced grid discovery client • Resolve runtime services for terminologies of interest • Additional metadata available through runtime services • Allow/anticipate future expansion
Background – Use Cases • Use Case Collection & Classification of Attributes • Identification • Internationalization • Intended/Allowed Usage • Provenance • Administration
Background - Use Cases • Samples • Browse Existing Ontologies • Viewing Differences • Detecting Recently Added Ontologies • Web of Trust
Background - Use Cases (1) Browse Existing Ontologies • An ontology developer is interested in creating an ontology for a domain (e.g., radiographic anatomy). • Determine if there are already similar ontologies in that domain. • Evaluates assigned categories for registered ontologies. • Discovers match for “anatomy” • Views available titles and descriptions • Finds listings for “human” and “mouse” anatomy, but not “radiology” • Looks at the human anatomy ontology to see if it fits the need Attributes: category, title, description
Background - Use Cases (2) Viewing Differences • An ontology developer wants to view what has changed between two versions of an ontology. • Retrieve listing of registered terminology services • Sort by URI, then version • Select and resolve grid services for differing versions • Invokes runtime services to resolve and compare content Attributes: uri, version
Background - Use Cases (3) Detecting Recently Added Ontologies • A user wants to contact the providers for new ontologies registered within the last quarter. • Query registered ontologies by registration date • Pull point of contact information (source, curator, registration authority) from listed items Attributes: registration date, registration authority, source, curator
Background - Use Cases (4) Web of Trust • Quality of ontologies: • User is aware that there are several anatomy ontologies, and is unclear which to use. • Trusts certain ontology sources (anatomists) more than others • Views ontology source to determine content origin • Views intended and example use to consider alignment with application • Considers caBIG certification level Attributes: source, intended use, example use, certification level
Background – Model • Focus of work on … • Model alignment • External … Incorporate feedback from review and alignment with relevant specifications and standards. • Internal … Take better advantage of previously registered models and classes. • Incorporating specific feedback on model classes and attributes.
Background - Alignment • Specifications/standards considered … • Dublin Core • ISO 11179-2/3/6: classification, registries, admin • LexGrid/LexBIG model • National Center for Biomedical Ontology (NCBO) BioPortal • Public Health Information Network (CDC/PHIN) • Simple Knowledge Organization System (SKOS core) • UMLS Rich Release Format (RRF) • CTS/CTS2
Background – Model Alignment • Findings … • No silver bullet • General alignment for defined items • All SWG items and definitions represented conceptually in one or more specifications • Adequate, but not perfect, alignment of semantics • Some name changes • Some new attributes identified • Supplement existing use case • Generally not found to be required unless we add use cases
Model – Core Identification & Description • uri (1) • Unique persistent identifier. • urn:oid:2.16.840.1.113883.6.2 • title (1) • Formal or published name for display. • International Classification of Disease, 9th… • localName (1..n) • Name used to refer to the terminology within a localized context; often a mnemonic. • ICD-9-CM, ICD-9 • description (0..1) • Human-readable explanation or narrative. • The International Classification of … • category (0..n) • Applicable domains or scientific fields. • e.g. anatomy, genomic, proteomic, phenotype…
Model – Core Identification & Description • type (0..1) • Nature of content relative to the category. • application – describes domain in an application dependent manner • core – describes domain in an application independent manner • domain – describes the most important concepts in a domain • task – describes generic types of tasks or activities (e.g. selling, selecting) • upperLevel – describes general, domain independent concepts (e.g. space, time) • structure (1) • Indicates complexity of maintained relationships • flat – no hierarchy • simple - supports a single inheritance mono-hierarchicalstructure. • complex - supports multiple relationships and/or relationship types
Model – Core Identification & Description • defaultLanguage (1) • Language for text unless otherwise specified • eng • supportedLanguage (1..n) • Languages supported for text-based content • eng, spa, … • supportedContentType (1..n) • Supported type of text or imbedded multimedia • e.g. mime type (text/plain, image) • keyword (0..n) • Words or phrases of special significance. • patient record, nursing protocol, …
Model - Usage • intendedUse (0..n) • Human-readable description of intended use. • data integration • exampleUse (0..n) • Human-readable example of use. • Integration of protein data. • isRestricted (1) • Indication of intellectual property boundaries. • true • rights (0..n) • Human-readable description of IP rights. • NCI Thesaurus terms of use … • rightsHolder (point of contact) (0..1) • Contact point for intellectual property rights. • National Cancer Institute
Model - Provenance • source (0..1) • Origin or provider of content • National Center for Health Statistics (NCHS) • curator (0..1) • Maintains the content in the release format (e.g. OWL, OBO, RRF) • National Library of Medicine • releaseDate (0..1) • Date of availability in released format. • 2007-08-30 • releaseFormat (0..1) • Format as released by the curator. • e.g. OWL, OBO, RRF • releaseLocation (0..1) • Location of resource in the releaseFormat. • ftp://ftp1.nci.nih.gov/pub/cacore/EVS/NCI_Thesaurus/Thesaurus_07.12a.OWL.zip
Model - Provenance • releasePackage (0..1) • Name of the composite ontology or meta distribution containing the terminology as released. • e.g. UMLS, NCI_MetaThesaurus, BiomedGT • releaseVersion (0..1) • Represented version identifier. • 2007
Model - Administration • registrationAuthority (1) • Responsible for maintaining content on the grid • National Cancer Institute • registrationDate (1) • Date of grid availability or last change of registration status. • 2007-09-30 • registrationStatus (1) • Designation of terminology status in life cycle. • Possible values from 11179-3 registration life cycle status category. • registrationTag (0..1) • Supports lookup by version-agnostic designation • development, test, production • certification (0..1) • caBIG level of compliance. • bronze, silver, gold
Model – Anticipated Alignment against available classes Superclasses Based on 11179
Vote • Vote will be for … • Approval of the identified criteria • Acknowledgement that model will be aligned with existing (e.g. 11179-based) superclasses, with model and attribute details to be addressed as required.
Next Steps • Model harmonization w/ recommended superclasses • Change caGRID tooling to capture additional metadata when registering terminology • Create custom discovery client for terminology services, to take advantage of additional metadata in support of identified use cases