1 / 32

QUIRK: QU estion Answering = I nformation R etrieval + K nowledge

QUIRK: QU estion Answering = I nformation R etrieval + K nowledge. Cycorp IBM Presenter: Stefano Bertolo (Cycorp). Project Goals. Break answer-by-retrieval bottleneck Deep (semantic) understanding of queries and answers Integration of heterogeneous sources

adah
Download Presentation

QUIRK: QU estion Answering = I nformation R etrieval + K nowledge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. QUIRK: QUestion Answering = Information Retrieval + Knowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

  2. Project Goals • Break answer-by-retrieval bottleneck • Deep (semantic) understanding of queries and answers • Integration of heterogeneous sources • Formalized knowledge to integrate state-of-the-art IR components with state-of-the-art knowledge bases

  3. Answer by retrieval Q: Who was the first president of Zambia? ………………………………………… Kenneth Kaunda, the first president, kept Zambia within the Commonwealth of Nations… …………………………..

  4. Answer by reasoning Q: Who sponsored Kai’s attack against Pamina? …On February 13, Kai detonated the truck in front of Pamina’s HQ… …On January 25, Kai bought a truckload of fertilizer drawing against account 9999 at MegaBank… … On January 15, Vitas Bayo deposited $50,000 on account 9999 at MegaBank…

  5. QUIRK strategy • Use Formalized knowledge for: • Semantic understanding of queries; • Justification of answers; • Use Formalized knowledge as: • Format for data normalization • ‘Glue’ for data integration of: • information extracted from unstructured data • SQL queries against structured DBs • Cyc’s knowledge

  6. DB1 Cyc KB DB2 Blackboard Inference Agent DB-N Answer Manager Preemptive annotations Query Manager IRAgent Unstructured Documents GuruQA (IBM)

  7. Query Interpreter Query Refiner GuruQA Assistant GuruQA (IBM) Blackboard Q-Eng Q-CycL Q-Guru A-Eng A-CycL A-Guru Answer Manager Cyc English Generator Cyc Inference Engine

  8. Blackboard architecture • Add/remove agents without disrupting existing architecture • Test performance/speed with several combinations of agents • Operate asynchronously.

  9. Query Interpreter • Q: “Who opposes the WTO?” (and (isa ?WHO Person) (thereExists ?EVENT (and (isa ?EVENT ActOfDissent) (performedBy ?EVENT ?WHO) (maleficiary ?EVENT WorldTradeOrganization))))

  10. GuruQA Assistant • CycL query => PERSON$ oppose(s/d) the WTO denounce(s/d) the World Trade Organization attacke(s/d) …

  11. Cyc Inference Engine • CycL Query => [(PersonNamedFn “Kai”) JUSTIFICATION-1] [(PersonNamedFn “Dr. Chen”) JUSTIFICATION-2] … [(PersonNamedFn “Kai”) JUSTIFICATION-N] …

  12. Cyc Justifications A? A from [B and C] (source 6743) B from source 67430 C from source 78539

  13. Sources for Cyc Inference • 1.4M+ CycL assertions already in Cyc’s Knowledge Base • Virtual Assertions in DataBases • Unsupervised Textract / CycL annotation of unstructured documents

  14. Data Source Integration • Data Normalization • Data Fusion

  15. Data Normalization cat chat Katze gato gatto “felis felis” Interpretation Search cat OR chat OR Katze OR gato OR gatto OR “felis felis”

  16. Data Normalization …Zhang Mei Li, was born on January 1, 1927… (birthDate (PersonNamedFn “Zhang Mei Li”) (DayFn 01 (MonthFn January (YearFn 1927))))

  17. Data Normalization • language independent representation of - entities - concepts - relationships CycL contains 100K+ primitives, can compositionally define infinitely many non-atomic terms.

  18. Data Fusion • Dr. Chen lives in Fresno • Zhang Mei Li lives in Oakland • Kai lives in Los Angeles • California is in the Pacific Time Zone • Dr. Chen/Zhang Mei Li/Kai and Dr. Chen/Zhang Mei Li/Kai live in the same time zone

  19. Heterogeneous Sources Q: How old is Dr. Chen’s mother? …Zhang Mei Li, mother of Pamina’s Dr. Chen…

  20. Data Fusion • Requires language independent connections/inferential links among • Entities • Concepts • Propositions (Facts, Rules) • Cyc’s Ontology • Cyc’s Knowledge Base

  21. Consensus Reality • Formalized Knowledge about `Consensus Reality’ = inferentially enabled `glue’ for Data Fusion • E.g. “Was Kai implicated in the Munich 1972 attack (when he was a toddler of 2)?”

  22. DBs as `virtual assertions’ stores (birthDate (PersonNamedFn “Zhang Mei Li) ?WHEN) SELECT: DOB FROM: PERSONAL_DATA WHERE: NAME = “Zhang Mei Li”

  23. Unsupervised Textract / CycL Annotations • IBM Textract relations: [Cycorp, Inc. : located-in : Austin, TX] • mapped to CycL Assertions: (objectFoundInLocation Cycorp CityOfAustinTX)

  24. Augmenting Textract Annotations • Concept Annotation “Boston”  { CityOfBostonMA, BostonTheBand, … } • Word Sense Disambiguation “I went to Boston”  CityOfBostonMA • Analysis of nominal compounds “leather jacket”  (SubcollectionOfWithRelationToTypeFn Jacket mainConstituent Leather)

  25. Unsupervised CycL Annotations • IBM’s Nominator and Parsers to extract Named Entities and basic syntactic dependencies (SUBJ-VERB, VERB-OBJ) • Map dependencies to CycL event structures.

  26. Cyc-to-English generator • (PersonNamedFn “Dr. Chen”) • JUSTIFICATION-N • “Dr. Chen opposes the WTO, because people who demonstrate against organizations oppose them (Cyc KB, assertion 99999) and Dr. Chen demonstrated against the WTO in Seattle (document 12345).

  27. Year 1 Tasks • Get entire system to run robustly with integration of all the IBM and Cycorp components described • Improve question understanding and refinement • Broaden coverage of English to CycL mapping enabling annotation of large collection of documents

  28. Year 2 Tasks • Add new agents to the blackboard to represent the user and session context • Improve integration of answers obtained from GuruQA and Cyc • Improve integrated IBM and Cycorp modules for unstructured document annotation

More Related