Tianxiang Lu, Guillaume Jacquet, Aaron Kaplan, Erica Melis - PowerPoint PPT Presentation

tianxiang lu guillaume jacquet aaron kaplan erica melis n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Tianxiang Lu, Guillaume Jacquet, Aaron Kaplan, Erica Melis PowerPoint Presentation
Download Presentation
Tianxiang Lu, Guillaume Jacquet, Aaron Kaplan, Erica Melis

play fullscreen
1 / 34
Tianxiang Lu, Guillaume Jacquet, Aaron Kaplan, Erica Melis
96 Views
Download Presentation
brandee-canute
Download Presentation

Tianxiang Lu, Guillaume Jacquet, Aaron Kaplan, Erica Melis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. ParSemKBIntegrating Text Mining Results form Different Modules Using Extended OWL-DL Tianxiang Lu, Guillaume Jacquet, Aaron Kaplan, Erica Melis Xerox Research Center Europe 6 chemin de Maupertuis 38240 Meylan, France

  2. Motivation • To merge different outputs of linguistic analysis model together. • To provide the standard way for working together with other public information on the web • To answer conjunctive queries

  3. Example of Problems • Given: • New York Time, 20 August 2007 • Mr. Bush is now the president of United States. Tomorrow, he will meet Nicolas Sarkozy.

  4. Basic Information • (s1) Subject(is, Mr. Bush) • (s1) Person(Mr. Bush) • (s1) Attribute(Mr. Bush, president of US) • (s2) Subject(will meet, he) • (s2) Object(will meet, Nicolas Sarkozy) • (s2) Person(Nicolas Sarkozy)

  5. Coreference Engine • (s1) Profession(president of US) • (co) Coref(he, Mr. Bush)

  6. XTM (temporal expression) • (s1) Temp(is, now) • (s1) Refer(now, 20/08/2007) • (s1) Temp(meet, tomorrow) • (s2) Refer(tomorrow, 21/08/2007) • (od) Before(is, meet)

  7. EventsPotter Analyzer • (s1) Factual-Event( "New York Times", -, is, Mr. Bush, president of US, - , now ) • (s2) Factual-Event( "New York Times", -, meet, he, Nicolas Sakozy, - , tomorrow )

  8. Problems • How to merge the results together • What happens if we want to use another approach (e.g. Statistical approach, Text Entailment etc. ) • How to integrate the information from WEB • Query: What did Nicolas Sarkozy and the president of US do together in 2007?

  9. Solution: Knowledge-based Approach • The ParSemKB as a Blackbox • Internal Architecture of the Framework • Evaluation of OWL-DL as a Knowledge Representation language for ParSemKB • Documentations

  10. ParSemKB as a Blackbox • Global Architecture • ParSemKB: Input and Output • ParSemKB: Specifications • Framework : Interfaces • Framework : Workflow and interaction

  11. Global Architecture • Preprocessing Part • Factspotter analysis tools • Eventspotter analysis tools • TXM • Text Entailment system • Web sources • Storage Part • Occurrence Base + Knowledge Base • APIs between them • Adapters by Preprocessing part for translation • Input API and Application API by storage part

  12. ParSemKB: Specifications • /F10/ Understand the input format • /F20/ Analyze the input corpus by using the background knowledge • /F30/ Store the knowledge in an appropriate way • /F40/ Provide an efficient way to retrieve the knowledge by querying the knowledge base

  13. ParSemKB : Input and Output • Input • /D10/ Background knowledge • /D20/ Target corpus of data • Output • /D30/ The storage of knowledge • /D40/ The test results

  14. Design - Interfaces

  15. Design - Activities(Workflow)

  16. Implementation Describe a domain using specific but standard Knowledge Representation language Find or implement the first framework supporting the language and build up the system Enhance the built up system by defining and inputing rules Test the system by trying out specific queries on API level Try to build up other systems by swapping the parts from 1 to 4. Evaluate the efficiency of them

  17. Internal Architecture of Framework • Design Pattern “Factory method” for Knowledge Base Managers • JenaKBManager configurations • Test Templates

  18. Factory for KBManager

  19. JenaKBManager • JENA PELLET PELLET MEM • JENA OWL PELLET DB • JENA PELLET PELLET DB • JENA OWL KAON2

  20. Test Templates

  21. Evaluation of OWL-DL as aKR for ParSemKB • Why OWL-DL • Reification • DL-Safe Rules • Negative Information • Uncertainty

  22. Why OWL-DL • W3C Recommendation as Standard in 2004 and widely used • Based on XML as basic raw data exchange format • Based on RDF as reference data exchange • Different level of expressive power: Lite, DL, Full • DL: description logic (introduce negation, cardinality and complex restrictions)

  23. OWL – abstract syntax with DL-extension • axiom ::= ’Class(’ classID modality {annotation} {description} ’)’ • axiom ::= ’ClassEnumeratedClass(’ classID {annotation} {individualID} ’)’ • description ::= classID • | restriction • | ’unionOf(’ {description} ’)’ • | ’intersectionOf(’ {description} ’)’ • | ’complementOf(’ description ’)’ • | ’oneOf(’ {individualID} ’)’ • ObjectRestrictionComponen ::= ’allValuesFrom(’ description ’)’ • | ’someValuesFrom(’ description ’)’ • | ’value(’ individualID ’)’ • | cardinality (ohne Beschr ¨ankung) • axiom ::= ’EquivalentClasses(’ description {description} ’)’ • | ’DisjointClasses(’ description description {description} ’)’ • | ’SubClassOf(’ description description ’)’n

  24. Reification • Example: On the paper NewYork Times, Tom has pointed out that J.W. Bush was President in 2005. • Reified Statements: • (statement1: "J.W Bush" has_function President) • (statement2: President has_Time 2005) • (Reified statement3: statement1 hasSource Tom) • (Reified statement4: reified statement3 hasSource NewYork Times)

  25. A1-A2 representation

  26. DL-Safe Rules • 1. Entity(?x) ^ Entity(?y) • ^ swrlx:hasProperty(?x, end-offset, ?eo) • ^ swrlx:hasProperty(?y, end-offset, ?eo) • ^ abox:hasClass(?y, ?C) ^ abox:hasClass(?x, ?C) • -> sameAs(?x, ?y)

  27. Negation and Uncertainty • Introduce the notation for negation • Factual • Counter-factual • Possible • Makov Logic Network / Baysien Logic for handling the uncertain/ unknown objects. (not implemented)

  28. Implementation – Input documents (see reference documents) • Domain definition: abstract version and personXeroxFOL.owl • Example inputs for individuals: • correferenceExample.rdf • schwarzneggerExample.rdf • documentExampleBush.rdf • Out.owl as large scale data

  29. Implementation – Java API using Jena Framework • Import description of domain using “com.hp.hpl.jena.ontology.OntModel” • Bind different reasoners • Default reasoner for OWL-DL (instances based reasoning) • External reasoners • Choice of reasoners: Pellet, KAON2 • Implementation of a test sets for different frameworks • Implementation of JenaKB and run the simple trial.

  30. Implementation – Packages • source (src) • Basics • Interfaces • KnowledgeManagers • Tests • executable binary codes(bin) • inputs (inputs) • configuration (conf) • libruary (lib) • logs (logs) • outputs (outputs)

  31. Documentations • Scientific Report • User Guide • Developer Guide • External Materials • White Paper • Documents from Guillaume, Salah • Java Doc • Where to find what

  32. References • Jena with OWL • http://jena.sourceforge.net/javadoc/index.html • http://jena.sourceforge.net/ontology/index.html • W3C relevant • http://www.w3.org/TR/2004/REC-owl-guide-20040210 • http://www.w3.org/Submission/SWRL/ • Protege • http://protege.stanford.edu/overview/protege-owl.html • http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf • http://protege.stanford.edu/plugins/owl/publications/DL2004-protege-owl.pdf • Other papers • Blog • Query-Answering for OWL-DL with Rules • Supporting Rule System Interoperability on the Semantic Web with SWRL • Mapping XML to existent OWL ontologies