310 likes | 462 Views
Tools for Next Generation of CMS: XML, RDF, & GRDDL. Chimezie Ogbuji (chee-meh)  Cleveland Clinic Foundation Cardiothoracic Surgery Research ogbujic@ccf.org / chimezie@gmail.com. Background (CT Research Roadmap) . A large, relational registry for Cardiothoracic procedures
 
                
                E N D
Tools for Next Generation of CMS: XML, RDF, & GRDDL • Chimezie Ogbuji (chee-meh) Cleveland Clinic Foundation Cardiothoracic Surgery Research ogbujic@ccf.org / chimezie@gmail.com
Background (CT Research Roadmap) • A large, relational registry for Cardiothoracic procedures • Relatively small research department with very little software engineering experience • Traditional CMS and DBMS were insufficient • Initiated a large effort to convert to a metadata-driven XML / RDF repository (SemanticDB) • Need to replace a productive, integrated research pipeline • Data entry, clinical Q&A, patient follow-up, concurrent study management,... • 100+ research papers per year
Background (Institute of Medicine Proposal) • The Computer-Based Patient Record: An Essential Technology for Health Care • ISBN: 0309055326 • Old but very relevant set of requirements by the IOM (still unfulfilled). • A comprehensive attempt to address all the requirements: technological, clinical, procedural, etc.. • Can be (completely) addressed with Semantic Web architecture, document processing, and “Web 2.0” architecture.
CPR: Functional Requirements • Uniform, extensible record content • (Standard) record formats • System performance • Linkages • Intelligence • Reporting Capabilities • Security • Multi-views • Accessiblity
Definitions: KR / CMS • What is Knowledge Representation (KR)? • What is a Knowledge Base (KB)?: • A database system which facilitates deductive reasoning over a KR • Commonly called Rule-based Systems • What are Expert Systems? • What is a Content Management System (CMS)?
Knowledge Representation • Older ideas at corners, newer ideas along sides (Credit: Conrad Barski, M.D.)
Content Management System:The What • The terms CMS and Content Repository are essentially interchangeable • Modern content repositories are best characterized by JSR 170 / 283 • “.. a high-level information management system that is a superset of traditional data repositories” • Integrated support for the XPath data model is the most prominent feature (native document management)
Content Repository Feature Set • Modern CMS standards cover document management effectively • Read/write access • Versioning • Event monitoring • Document-level access control • Concurrent access • Cross-linking • Profiles and Document Types
Anatomy of a JSR 170 Implementation • Jack Rabbit • Component-based • Content Applications • Content Repository API • Implementation
Knowledge Bases and CMS • What of the requirements that Expert Systems meet? • Document management and knowledge management systems are historically isolated from each other • XML & RDF are contemporary manifestations of these methodologies • They have remained as isolated as their predecessors • They typically only coincide with regards to syntax
XML & RDF:Eating and Having your Cake • Classic example of where the document-oriented approach falls short: • Modern EHR cannot facilitate dynamic research • Unified infrastructure for document and knowledge management is needed • One of the earliest examples: • 4Suite Server version 0.10.0 (December 2000) • Current state of the art (GRDDL): • Gleaning Resource Descriptions from Dialects of Language
GRDDL:The Elevator Pitch • Provides a way to normalize RDF concrete syntaxes • The problem: • Many RDF concrete syntaxes (RDF/XML,Trix, RDFa,..) • The authoritative concrete syntax is not without issues • The solution: • Define mappings from XML dialects to RDF graphs • Use turing-complete XML pipelines • English as a second language analogy
GRDDL:The Components • Faithful Rendition • “By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.” • Various Mechanism for nominating transformations: • Specific XML attribute, XML Namespaces, HTML Profiles, and XHTML links • GRDDL-aware agents compute GRDDL results (RDF graphs)
The CMS Alternative:“Dual Representation” • Persist XML in synchrony with its faithful rendition • Changes to the XML trigger calculation and storage of corresponding RDF • “Dual Representation” • Implemented by 4Suite Server Document Definitions • The basis of how we capture patient records with maximum syntactic and semantic expressivity
Document Definition • The document definition is the mapping • Usually an XSLT document
Dual Representation:Advantages • Maximum expressiveness and versatility of content • Unified naming convention and access control (more on this later) • Uniform, concrete RDF syntaxes • For systems which speak XML fluently (XForms, POX over HTTP, WS-*, etc..) • Cheap support for XML & RDF content negotiation • Use of RDF as a semantic index for XML
Document Definition:Similarities • GRDDL • RDDL • Resource Directory Description Language • Human-readable descriptive material about a target • A directory of individual resources related to a target • Nature and Purpose • Schema, stylesheet, etc. • Lives at a namespace URI • WXS's targetNamespace • Common theme is a set of definitions for a document or a class of documents
Registering a Document to a Class • Namespace registration works well for the web (preferred approach of W3C TAG) • What if you don't control the content served from the namespace of an existing vocabulary? • Atom, Docbook, etc. • A CMS is better suited for a 'closed' / 'controlled' approach • Persist membership metadata in the CMS
Document and Graph Granularity • Tying documents to graphs normalizes the content granularity • Documents and their RDF graphs can be treated uniformly: • Naming convention • Targeted querying • Access control management
Controlled Naming Convention:Continued • RDF Dataset (from SPARQL): • A collection of named graphs • The RDF is stored in a graph with the same URI as the XML source document • When RDF is used as the primary cross-document 'index' you can: • SELECT ?graph WHERE { GRAPH ?graph { ... } } • document($graph)/.. XPath .. • The space compromise (of dual representation) can be further mitigated by only extracting a minimal RDF graph
Uniform Access Control for XML/RDF CMS • Traditionally, Access Control Lists are associated with an object • Example: a file or directory in a filesystem • Assign document / graph ACLs to a single URI • Certain users / groups can query the RDF but cannot read the XML • De-identification of EHR: HIPPA • The 4Suite repository supports unified XML/RDF ACL
Going Forward • The SPARQL RDF dataset needs to be generalized • There is a long list of representation problems solved by a formal named graph specification • RDF graphs need to be first-class objects in CMS • Build a common Content Repository API for XML / RDF on the JSR 170 / 283 foundation • Where do the 4Suite Repository API and JSR 170 / 283 overlap? • How do we generalize Document Definitions?
Primary Takeaways • We need to stop thinking of XML & RDF as mutually exclusive solutions to similar problems • CMS standards are needed for the next generation of semantic / rich web applications • These standards can preemptively level the landscape of toolkits in this space
References • D. Nuescheler et al, JSR 170: Content Repository for Java • http://jcp.org/en/jsr/detail?id=170 • D. Connolly, Gleaning Resource Descriptions from Dialects of Language • http://www.w3.org/TR/grddl/ • J. Borden, T. Bray, Resource Directory Description Language • http://www.rddl.org/ • E. Prud'hommeaux, A. Seaborne, SPARQL Query Language for RDF • http://www.w3.org/TR/rdf-sparql-query/ • Fourthought Inc., 4Suite • http://4Suite.org