1 / 18

The Ocelot Frame Knowledge Representation System

The Ocelot Frame Knowledge Representation System. Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International pkarp@ai.sri.com. Frame Knowledge Representation Systems. Long history of development in the AI knowledge representation community

Download Presentation

The Ocelot Frame Knowledge Representation System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International pkarp@ai.sri.com

  2. Frame Knowledge Representation Systems • Long history of development in the AI knowledge representation community • Distant cousin of object-oriented databases (convergent evolution) • Background reading on frame systems • P. Karp, “The design space of frame knowledge representation systems” • http://www.ai.sri.com/pubs/files/236.pdf • P. Karp, “Distinguishing Knowledge Bases and Data Bases: Who's on First and What's on Second” • http://www.ai.sri.com/pubs/files/1397.pdf

  3. Ocelot Information • P.D. Karp et al, “A collaborative environment for authoring large knowledge bases,” J Intelligent Information Systems 13:155-94 1999. http://www.ai.sri.com/pkarp/pubs/99jiis.pdf • “Ocelot User’s Guide” http://www.ai.sri.com/pkarp/ocelot/

  4. Ocelot Data Model • Ocelot database • Aka DB, Knowledge Base, KB, PGDB • An Ocelot database is a collection of frames and slots

  5. Ocelot Frames • Two kinds of frames: • Classes: Genes, Pathways, Biosynthetic Pathways • Instances (objects): trpA, TCA cycle • A symbolic frame name (id, key) uniquely identifies each frame • Examples: EG10223, TRP, Proteins • Classes have Superclass(es), Subclass(es), Instance(s) • Instances have one or more parent classes

  6. Slots • Encode attributes and properties of a frame • Molecular weight, gene coordinates, comments • Represent relationships between frames • The value of a slot is the identifier of another frame

  7. Slots • Number of values • Single valued • Multivalued: sets or lists • Slot values • Integer, real, string, symbol (frame name) • Every slot is described by a “slot frame” (slotunit) in a KB that defines meta information about that slot • Datatype, classes it pertains to, constraints • Enumerations • Two slots are inverses if they encode opposite relationships • Slot Product in class Genes • Slot Gene in class Polypeptides

  8. Ocelot Schema • Schema is stored within the DB • Schema is self documenting • Slot frames define metadata about slots • Schema evolution facilitated by • Easy addition/removal of slots, or alteration of slot datatypes • Flexible data formats that do not require dumping/reloading of data • New versions of Pathway Tools include a schema upgrade function • Updates schema to match that of new MetaCyc version • Transforms data into new schema

  9. Figure showing multiple users tapping into one mysql server

  10. Ocelot Storage Subsystem • RDBMS KBs • RDBMS schema is independent of application schema • DBMS is submerged within Ocelot, invisible to users • Frames transferred from DBMS to Ocelot • On demand • By background prefetcher • Memory cache • Persistent disk cache speeds performance via Internet

  11. Ocelot Frame Faulting • When a frame is referenced by Pathway Tools • Look in Ocelot virtual memory • Look in disk cache • Look in RDBMS

  12. Ocelot RDBMS Transaction History • RDBMS KBs store complete transaction history • Stored as sequences of GFP operations executed by the user or by Pathway Tools • Right click -> Show -> Changes in pop-up window • Used to compute gene last-curated date • Can be used to open a PGDB in an earlier state

  13. Ocelot RDBMS Concurrency Control • When user A saves updates: • Ocelot queries all transactions that occurred since A last saved or since the start of A’s session • Ocelot compares the operations in those transactions with the updates made by A • If conflicts are found, save does not occur and conflicts are reported to the user • If no conflicts, save proceeds • Other user transactions are evaluated into A’s session • “Refresh”

  14. Ocelot Update Conflicts • Example conflicting updates: • User A deletes frame F ; User B modifies value in slot F • User A changes MW of protein P from 3 to 4 ; User B changes MW of protein P from 3 to 5 • Example of updates that don’t conflict: • User A updates frame E ; User B updates frame F • User A updates the value of P.MW ; User B updates the value of P.pI • Users A and B both delete all values of P.MW

  15. Revert KB Operation • Undoes all changes in current session

  16. Pathway Tools / BioCycSoftware/Database Bundles • Each downloadable Pathway Tools configuration contains a combination of PGDBs • Those PGDBs are loaded into Lisp virtual memory • Build process: • Start Common Lisp • Load in all Pathway Tools compiled Lisp code into virtual memory • Load in all PGDBs for that configuration into virtual memory • Save virtual memory image as binary executable file

  17. “Full BioCyc” or Tier 1+2+3 Configuration • 507 PGDBs loaded into virtual memory

  18. BioCyc at 10,000 Genomes • Scalability of current approach is limited • New approach: For full BioCyc, store PGDBs not in virtual memory but in Franz AllegroCache • AllegroCache is a Common Lisp object-oriented database • Implementation now in hand for Ocelot • We have done extensive performance testing • Performance looks good to 10,000 PGDBs

More Related