1 / 37

Deployment and Evaluation Issues in Ontology-Based Information Extraction

Deployment and Evaluation Issues in Ontology-Based Information Extraction. Ex – progress and Perspectives. Agenda. WIE – motivation and Use Cases Extraction ontologies structure and content IE workflow Authoring extraction ontologies by hand with use of Domain Ontologies

hachi
Download Presentation

Deployment and Evaluation Issues in Ontology-Based Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deployment and Evaluation Issues in Ontology-Based Information Extraction Ex – progress and Perspectives

  2. Agenda • WIE – motivation and Use Cases • Extraction ontologies structure and content • IE workflow • Authoring extraction ontologies • by hand • with use of Domain Ontologies • with use of other Business Metamodels • Results so far

  3. name place (Web) Information Extraction • purpose: • extract objects from documents • or • semantically annotate existing documents

  4. IE Use Cases • Extraction of objects of a known, well-defined class(es) • Annonatating document collections of any size • Sources: Structured, semi-structured, free-text • Extraction should improve if: • documents contain some formatting (e.g. HTML) • this formatting is similar within or across document(s) • some examples are provided

  5. Domains considered so-far • Online products • Contact information • Seminars, events • Weather forecasts • Free-text business statements • Football

  6. Sources of Knowledge for IE • ontology • the only mandatory source • may include hand crafted patterns for typical attribute values • sample instances • possibly coupled with referring documents • used to get typical content and context of extractable items • common formatting structure • of instances presented • in a single document, or • among documents from the same source

  7. The (Simple) Extraction Process

  8. Extraction Ontologies • contain: • semantic structure of extracted data • additional IE “hooks“, i.e. learned or ad-hoc patterns for typical content of extracted values and their context • their semantic structure describes the presentation of objects instead of real- world objects • we speak about presentation ontologies

  9. Presentation Ontologies • contains concepts that are to be populated with many instances • => can be viewed as information ontologies • class’ attributes can be represented as a set of variables • => can be used as a data structures • can contain additional higher-level restrictions • => can be looked upon as knowledge ontologies

  10. Nature of Presentation Ontologies • presentation ontologies are of slightly different nature than other models • they usually contain: • a single core class • it‘s attributes • additional constraints

  11. Example Presentation Ontology

  12. Source of Presentation Ontologies • typically designed by human for a specific extraction task • single purpose hand-crafting from scratch is tedious and can introduce inconsistencies • it should be possible to craft extraction models with reuse of existing meta models so that: • semantics of the extracted data are consistent with existing knowledge models • the need for initial domain analysis and for expert knowledge lessens (and so do the costs)

  13. High-level scheme of EO-based IE

  14. Hand-crafted Extraction Ontologies Axioms • class level • attribute level Patterns • class content • attribute value • attribute context • class context Value constraints • word length • numeric value

  15. Building upon Existing Models • for crafting a presentation ontology from a preexisting knowlegde model a transformation process is needed • the transformation will differ with use of distinct models but there are some general steps: • choose / find the core class C • create its attributes in the presentation ontology • formulate ontological constraints over attributes • create additional “WIE hooks” to form a complete extraction ontology • as the expressiveness of the source models is usually very high, the transformation cannot be processed deterministically

  16. Reuse of Domain Ontologies • transformation of a domain ontology will mainly amount of the general steps mentioned before • so far, we were able to formalize a few general heuristic rules that can help an expert • a) to choose the core class of an incipient presentation ontology • b) to populate it with attributes

  17. Weather Location London Prague Innsbruck Transformation Rule a1) • Class C that has individuals directly asserted in the domain ontology should probably not become the core class in the presentation ontology.

  18. Weather hasLocation Location Transformation Rule a2) • If some property does not have an inverse property explicitly declared, a class C in the domain of this property is more likely to become the core class than any class that figures in its range.

  19. Weather hasLocation > 1 Location Transformation Rule a3) • If a class C has a minimum cardinality restriction on property D whose range is class C1, such that C1 does not have any restrictions on the inverse property of D, then C1 should not become the core class.

  20. Weather Weather Condition Wind Direction Transformation Rule a4) • If there is a chain of object properties then the classes at the ends of such a chain are more likely to form the core class. If a class C is at the end of more such chains, it is even more suitable for becoming the core class.

  21. Weather Weather Condition Wind Direction Transformation Rule b1) • A datatype property may directly yield an attribute. Furthermore a datatype property, together with a chain of object properties (typically part-of properties) may yield an attribute too.

  22. Weather Precipitation Rain Hail Snow Transformation Rule b2) • A set of mutually disjoint subclasses may yield an attribute even without a property counterpart in the source ontology.

  23. Weather Condition Precipitation Rain Hail Snow Transformation Rule b3) • A set of mutually disjoint subclasses of some class together with a chain of object properties may yield an attribute.

  24. Weather Location London Prague Innsbruck Transformation Rule b4) • An object property of a class C whose object has some individuals asserted in the ontology may yield an attribute.

  25. Test Results • numbers in the table only show the rules whose results were really chosen • these test verify that the transformation is possible

  26. Test Results there is a correlation between the use of rules b4) and a1), because both are based on the presence of instances

  27. Test Results if an ontology is in the form of a taxonomy, it can still be useful via the rule b3)

  28. Using other Business Metamodels • What Metamodels? • theoretically, any • in praxis it will be the most common ones: • UML • BPM • relational database models • ...

  29. UML • industrial standard • integrates models used in software engineering • contains more diagram groups: • structural diagrams • behavioral diagrams • other

  30. UML – Structural Diagrams • describe static structural constructs • concept of a class is very simlar to ontologies • class and object diagrams can be used similarly

  31. Other UML Uses • Structure diagrams may help significantly, mainly with populating ontology with attributes and with identifying part-of relations • Behavioral diagrams can yeild attributes and general realations but the use is rare • UML supplements can only provide some technical details, like attributes datatypes or sample values • Other UML diagrams can be used only very vaguely

  32. Relational Model • based on predicate logic and set theory, it has many things in common with other means of specification of a domain • an entity (i.e. a table) can directly yield a class and its fields • references can be mapped to class‘ properties • explicit specification of primary and secondary keys allow to easily recognize an inverse property • supporting tables (for m:n relations) should not yield a core class

  33. Business Process Model • describes a collection of activities needed to produce a specific output • every process depicts a change of a state of an entity and should yield a possible value of an attribute, or the attribute itself. • the event element and the choice element express a relation to some other entity • a set of processes should describe an entity, which can then possibly lead to a class

  34. Results - Seminars P R F GOLD AUTO AMAT GMAT etime-strict 98.46 88.89 93.43 216 195 192 192 etime-loose 99.49 89.12 94.02 216 195 2 0.5 location-strict 58.78 74.15 65.58 325 410 241 241 location-loose 79.72 85.61 82.56 325 410 85.84 37.23 speaker-strict 71.11 68.73 69.90 371 360 256 255 speaker-loose 76.58 73.83 75.18 371 360 19.69 18.9 stime-strict 95.75 88.07 91.75 486 447 428 428 stime-loose 95.75 88.07 91.75 486 447 0 0 avg-strict 79.11 79.83 79.47 1398 1412 1117 1116 avg-loose 86.72 83.88 85.28 1398 1412 107.5 56.6

  35. Results - Weather P R F temperature-strict 97.1282.1689.32 temperature -loose 99.7191.17 95.34 location-strict 92.5681.1286.42 location-loose 94.3486.1290.09 condition-strict 81.6371.8175.52 condition-loose 86.2580.1583.79 time-strict 93.14 87.05 90.84 time-loose 96.7291.15 93.51

  36. Results - Football

  37. Thank you for your time

More Related