AQUA: AQUAINT Question Answering System

AQUA: AQUAINT QuestionAnswering System SAIC, San Diego KSL, Stanford

Agenda • System Architecture • SAIC – Current Status and goals • KSL – Current Status and goals • Future plans

SAIC-KSL-NMSU Collaborative System – Long term goal SAIC Interlingua KIF Translator NMSU Query Processor QUESTION Interlingua Query NL Query KIF Query Interlingua Answer NL Answer KIF Answer NMSU NL Generator SAIC KIF  Interlingua Translator ANSWER KSL Java Theorem Prover

SAIC TASKS

Aligning Ontologies:the AQUA OntologyMapper Tool • Dynamic semantic alignment tool developed to assist the user in performing this process • Automate mapping of Ontosem (source) ontology to Ontolingua (target) ontology • Map classes and instances • Semantic alignment algorithm • Single-word entries • Multi-word entries • Proved to be more difficult than expected.

Single word matching algorithm • Exact Matches • Same Stem • Synonyms • Exact Matches • Stem Matches • 1-Generation Child Match • No Match • Display Ontolingua Ontology Tree • Parent Match  Child Match • Parent Match  3 levels if no Child Match • Multiple Matches • Display and Allow User To Select

Multi-word matching algorithm • Exact Stem Match (A-B-C) • Exact Permutation (B-C-A, B-A-C, …) • Ontolingua Term Contains Subset of Ontosem Words • Constrained to Parent Match Subtree • Ontosem Term Contains Subset of Ontolingua Term • Constrained to Parent Match Subtree • 1-Generation Child Match • No Match • Display Ontolingua Ontology Tree • Parent Match  Child Match • Parent Match  3 levels if no Child Match • Multiple Matches • Display and Allow User To Select

AQUA OntologyMapper Initial Screen

AQUA OntologyMapper No Match – Parent Subtree Displayed

AQUA OntologyMapper Multiple Matches Found

AQUA OntologyMapper

Current Approach toSemantic Alignment • Two levels of analysis in SAIC semantic alignment software: • Linguistic analysis: software matches terms by finding common word roots and combinations of words; also searches for synonyms • Structural analysis: software matches terms by finding common organization hierarchy in meaning representations

Potential Performance-Boosting Techniques • Additional sources of linguistic analysis: • Documentation of concepts (often contains linguistic clues) • Ontolingua: “Conveyance -- vehicle for transporting people” • Ontosem: “Vehicle -- artifacts used for transporting people and cargo” • Additional sources of structural information • Slots/constraints • Additional structure mapping techniques • Such as Similarity Flooding* • Federation of Matchers • Each individual alignment technique returns 15-50% of the terms matched, with little overlap • Individual techniques will be combined to find combined matches * Similarity Flooding – A versatile Graph Matching algorithm and its application to Schema Matching ” Sergei Melnick, Hector Garcia-Molina, Edgar Rahm

Translation from TMR to KIF:the AQUA Ontolingua Translator • Once alignment has been performed the translation of all TMR output is performed. • Some initial work has been done on this task and several articles have been extracted from text to TMR and TMR to KIF. • Functioning demonstration illustrating extracted TMR, KIF translation and question answering using JTP over extracted information

Document and TMR

Generated KIF

Queries

Query results 1

Query Results 2

Future plans • Translation: • Focus on improving the translation of TMR to KIF • Extraction: • Focus on extraction of temporal relations • Maximum utilization of KSL temporal reasoner development • Work with Onyx to acquire as many extracted TMR articles as possible in order to focus on TMR->KIF translation • Work with AQUAINT specified data sets for extracted articles. • Leverage Genoa II work • Leverage ontologies/kbs developed for GENOA II • Leverage Stanford KSL and IBM in the NIMD contract • allow a comparison between 2 extraction techniques in the same question answering environment.

KSL Tasks

Backups

KSL/IBM NIMD contract award • Stanford university has been awarded a NIMD contract where text extraction into the Ontolingua system • This is a natural fit with the current AQUA system. • The design for the AQUA OntologyMapper is not ontology specific and is extensible to multiple ontologies • Leverage this design characteristic to incorporate an additional extraction system

Knowledge Systems LaboratoryStanford University Faculty Richard Fikes Edward Feigenbaum (Director) (Emeritus) Senior Scientists Deborah McGuinness Sheila McIlraith (Associate Director) Research Staff and Students Jessica Jenkins, Rob McCool, Paulo Pinheiro da Silva, Gleb Frank, … Technology for effectively representing and using knowledge in computer systems “In the knowledge is the power.” 12/4/02

Recent Developments • JTP – A hybrid reasoner for query answering • Developed reasoners for time-dependent knowledge • Expanded functionality of DAML+OIL reasoner • DQL – Agent language and protocol for deductive query answering • Inference Web – Providing understandable explanations for derived query answers

JTP – A Hybrid Reasoner for Query Answering • An architecture for hybrid reasoning • First-order logic model elimination theorem prover • Suite of special purpose reasoners • Dispatchers and APIs for reasoners • Developing special purpose query-answering reasoners • Using time-dependent knowledge • Using Semantic Web knowledge expressed in DAML+OIL • Available from the Web as a JAVA program www.ksl.stanford.edu/software/JTP

Representing Time-Dependent Knowledge • A time ontology provides representation vocabulary • Objects • Primitive objects: time line, points, intervals, durations, … • Time units: second, minute, hour, … • Calendar objects: Monday, January, 2001, … • Relations • For points: location-of, before, after, equal-point, the-point, … • For intervals: precedes, meets, overlaps, co-starts, during, … • Abstractly specified time point locations E.g., “He was born in 1916” “She arrived in January 2002.”

Allen Relations on Time Intervals • Precedes: |—————| End-1 < Start-2 |——————| • Meets: |—————| End-1 = Start-2 |——————| • Overlaps: |————–| Start-1 < Start-2 < End-1 |——————| • Costarts: |————| Start-1 = Start-2 |——————| • During: |————| Start-2 < Start-1 |——————| End-1 < End-2 • Cofinishes: |————| End-1 = End-2 |——————| • Equal:|——————| Start-1 = Start-2 |——————| End-1 = End-2

Reasoning With Time-Dependent Knowledge • The reasoner maintains a directed graph of time points • Based on the relations “before”, “after”, and “equal-point” • Includes intervals using their starting and ending points • The reasoner operationalizes definitions of relations • Evaluates instances E.g., (Before A-Point-In-1942 A-Point-In-January-1968) • Infers instances of goals E.g., find intervals ?int such that (During ?int 1942) • Responds to assertions with additional inferred assertions E.g., (=> (and (starting-point ?s1 ?i1) (starting-point ?s2 ?i2) (during ?i1 ?i2)) (before ?s2 ?s1))

Representing When Events Occur • On 8 August 1998 a Taliban military offensive in northern Afghanistan concludes with the occupation of Mazar-e-Sharif. • (equal-point (starting-point August-8-1998) (the-point (year 1998) (month 7) (day 8) (hour 0) (minute 0) (second 0))) • (equal-point (ending-point August-8-1998) (the-point (year 1998) (month 7) (day 8) (hour 24) (minute 60) (second 60))) • (overlaps Taliban-military-offensive-in-northern-Afghanistan August-8-1998) • (meets Taliban-military-offensive-in-northern-Afghanistan Occupation-of-Mazar-e-Sharif) ---offensive---|---occupation--- |---8/8/98---

Begin Time Queries • On 9 August Iran accuses the Taliban of taking 9 diplomats and 35 truck drivers hostage in Mazar-e-Sharif. The crisis began with that accusation. • (during Iran-accuses-Taliban-of-taking-hostages August-9-1998) • (costarts Iran-accuses-Taliban-of-taking-hostages Iranian-Taliban-Crisis) |--------8/9/98--------| |---accusation---| |---crisis--- • “When did the Iranian-Taliban crisis begin?”“August 9, 1998.” • Query: (location-of (starting-point Iranian-Taliban-crisis) ?lower-bound ?upper-bound) • Answer: ?lower-bound = Starting-Point-Of-August-9-1998 ?upper-bound = Ending-Point-Of-August-9-1998

Duration Queries • On 2 November Iran concludes the Zolfaghar-2 military exercise peacefully, ending the crisis between the two sides. • (ends-during Zolfaghar-2 November-2-1998) • (cofinishes Zolfaghar-2 Iranian-Taliban-Crisis) ---Zolfaghar---| |---11/2/98---| ---crisis---| • “How many days did the Iranian-Taliban crisis last?”“84 to 86.” • Query: (duration-in-units Iranian-Taliban-crisis day ?lower-bound ?upper-bound) • Answer: ?lower-bound = 84 ?upper-bound = 86 • “How many weeks did the Iranian-Taliban crisis last?”“12.” • Query: (duration-in-units Iranian-Taliban-crisis week ?lower-bound ?upper-bound) • Answer: ?lower-bound = 12 ?upper-bound = 12.29

During Queries • On 5 September Iran states that it has the right under international law to strike the Taliban after Iranian media sources report that the Taliban have killed 5 Iranian diplomats. • (during Iran-declares-right-to-strike-Taliban September-5-1998) • (precedes Iranian-media-reports-diplomats-killed-by-Taliban Iran-declares-right-to-strike-Taliban) -------9/5/98-------- ---report---| |---declaration---| • “During what events did Iran declare the right to strike the Taliban under international law?”“The Iranian-Taliban crisis.” • Query: (during Iran-declares-right-to-strike-Taliban ?evt) (type ?evt Event) • Answer: ?evt = iranian-taliban-crisis

DQL (DAML Query Language) • Language and protocol for agent-to-agent query-answering • From knowledge represented in DAML+OIL (or OWL) • Supports a query-answering dialogue between a client and a server • Supports derivation of answers using automated reasoning • Knowledge may be in multiple distributed knowledge bases • Knowledge bases need not be specified by the client • Design Issues • The formal properties of queries and answers • How are queries, answers, and knowledge bases related? • Inferring answers may be expensive • Impractical to always try to compute all answers • Answers may only be known to exist • There may be an infinite number of answers • What are justifications and when should they be computed?

DQL Query-Answering Dialogue Query Answer Bundle (including a process handle) Client Server Server Continuation Answer Bundle (including a process handle) … or Answer Bundle (including termination token(s)) Server Termination

DQL Implementations by KSL • XML syntax for DQL query-answering dialogues • Server for answering queries powered by JTP • Client for asking queries from a Web browser • Enables humans to query a DQL server

Trusting Query Answers • Trusting an agent’s answers means that we trust: • The input (and their sources) to the agent • The recency of the input • The inference rules in the agent’s reasoner(s) • Automated reasoners provide little support for explaining query answers • Justifications are typically: • Unsharable » Difficult to visualize • Monolithic » In inappropriate notations • Difficult to combine » Difficult to refine • Reasoner-specific

Inference Web • Framework for explaining reasoning results • Objective: Enable proofs and proof fragments provided by reasoners to be stored, exchanged, combined, annotated, filtered, segmented, compared, and rendered • The Inference Web is currently composed of: • Proof interlingua DAML+OIL/OWL specification of proofs that provides • Proof browser for displaying Inference Web proofs • Possibly from multiple reasoners • Proof parser to support segmentation and follow-up question support • Register for reasoners and inference rules • Prototype implementation using the JTP Reasoner

AQUA: AQUAINT Question Answering System