370 likes | 566 Views
Deployment and Evaluation Issues in Ontology-Based Information Extraction. Ex – progress and Perspectives. Agenda. WIE – motivation and Use Cases Extraction ontologies structure and content IE workflow Authoring extraction ontologies by hand with use of Domain Ontologies
E N D
Deployment and Evaluation Issues in Ontology-Based Information Extraction Ex – progress and Perspectives
Agenda • WIE – motivation and Use Cases • Extraction ontologies structure and content • IE workflow • Authoring extraction ontologies • by hand • with use of Domain Ontologies • with use of other Business Metamodels • Results so far
name place (Web) Information Extraction • purpose: • extract objects from documents • or • semantically annotate existing documents
IE Use Cases • Extraction of objects of a known, well-defined class(es) • Annonatating document collections of any size • Sources: Structured, semi-structured, free-text • Extraction should improve if: • documents contain some formatting (e.g. HTML) • this formatting is similar within or across document(s) • some examples are provided
Domains considered so-far • Online products • Contact information • Seminars, events • Weather forecasts • Free-text business statements • Football
Sources of Knowledge for IE • ontology • the only mandatory source • may include hand crafted patterns for typical attribute values • sample instances • possibly coupled with referring documents • used to get typical content and context of extractable items • common formatting structure • of instances presented • in a single document, or • among documents from the same source
Extraction Ontologies • contain: • semantic structure of extracted data • additional IE “hooks“, i.e. learned or ad-hoc patterns for typical content of extracted values and their context • their semantic structure describes the presentation of objects instead of real- world objects • we speak about presentation ontologies
Presentation Ontologies • contains concepts that are to be populated with many instances • => can be viewed as information ontologies • class’ attributes can be represented as a set of variables • => can be used as a data structures • can contain additional higher-level restrictions • => can be looked upon as knowledge ontologies
Nature of Presentation Ontologies • presentation ontologies are of slightly different nature than other models • they usually contain: • a single core class • it‘s attributes • additional constraints
Source of Presentation Ontologies • typically designed by human for a specific extraction task • single purpose hand-crafting from scratch is tedious and can introduce inconsistencies • it should be possible to craft extraction models with reuse of existing meta models so that: • semantics of the extracted data are consistent with existing knowledge models • the need for initial domain analysis and for expert knowledge lessens (and so do the costs)
Hand-crafted Extraction Ontologies Axioms • class level • attribute level Patterns • class content • attribute value • attribute context • class context Value constraints • word length • numeric value
Building upon Existing Models • for crafting a presentation ontology from a preexisting knowlegde model a transformation process is needed • the transformation will differ with use of distinct models but there are some general steps: • choose / find the core class C • create its attributes in the presentation ontology • formulate ontological constraints over attributes • create additional “WIE hooks” to form a complete extraction ontology • as the expressiveness of the source models is usually very high, the transformation cannot be processed deterministically
Reuse of Domain Ontologies • transformation of a domain ontology will mainly amount of the general steps mentioned before • so far, we were able to formalize a few general heuristic rules that can help an expert • a) to choose the core class of an incipient presentation ontology • b) to populate it with attributes
Weather Location London Prague Innsbruck Transformation Rule a1) • Class C that has individuals directly asserted in the domain ontology should probably not become the core class in the presentation ontology.
Weather hasLocation Location Transformation Rule a2) • If some property does not have an inverse property explicitly declared, a class C in the domain of this property is more likely to become the core class than any class that figures in its range.
Weather hasLocation > 1 Location Transformation Rule a3) • If a class C has a minimum cardinality restriction on property D whose range is class C1, such that C1 does not have any restrictions on the inverse property of D, then C1 should not become the core class.
Weather Weather Condition Wind Direction Transformation Rule a4) • If there is a chain of object properties then the classes at the ends of such a chain are more likely to form the core class. If a class C is at the end of more such chains, it is even more suitable for becoming the core class.
Weather Weather Condition Wind Direction Transformation Rule b1) • A datatype property may directly yield an attribute. Furthermore a datatype property, together with a chain of object properties (typically part-of properties) may yield an attribute too.
Weather Precipitation Rain Hail Snow Transformation Rule b2) • A set of mutually disjoint subclasses may yield an attribute even without a property counterpart in the source ontology.
Weather Condition Precipitation Rain Hail Snow Transformation Rule b3) • A set of mutually disjoint subclasses of some class together with a chain of object properties may yield an attribute.
Weather Location London Prague Innsbruck Transformation Rule b4) • An object property of a class C whose object has some individuals asserted in the ontology may yield an attribute.
Test Results • numbers in the table only show the rules whose results were really chosen • these test verify that the transformation is possible
Test Results there is a correlation between the use of rules b4) and a1), because both are based on the presence of instances
Test Results if an ontology is in the form of a taxonomy, it can still be useful via the rule b3)
Using other Business Metamodels • What Metamodels? • theoretically, any • in praxis it will be the most common ones: • UML • BPM • relational database models • ...
UML • industrial standard • integrates models used in software engineering • contains more diagram groups: • structural diagrams • behavioral diagrams • other
UML – Structural Diagrams • describe static structural constructs • concept of a class is very simlar to ontologies • class and object diagrams can be used similarly
Other UML Uses • Structure diagrams may help significantly, mainly with populating ontology with attributes and with identifying part-of relations • Behavioral diagrams can yeild attributes and general realations but the use is rare • UML supplements can only provide some technical details, like attributes datatypes or sample values • Other UML diagrams can be used only very vaguely
Relational Model • based on predicate logic and set theory, it has many things in common with other means of specification of a domain • an entity (i.e. a table) can directly yield a class and its fields • references can be mapped to class‘ properties • explicit specification of primary and secondary keys allow to easily recognize an inverse property • supporting tables (for m:n relations) should not yield a core class
Business Process Model • describes a collection of activities needed to produce a specific output • every process depicts a change of a state of an entity and should yield a possible value of an attribute, or the attribute itself. • the event element and the choice element express a relation to some other entity • a set of processes should describe an entity, which can then possibly lead to a class
Results - Seminars P R F GOLD AUTO AMAT GMAT etime-strict 98.46 88.89 93.43 216 195 192 192 etime-loose 99.49 89.12 94.02 216 195 2 0.5 location-strict 58.78 74.15 65.58 325 410 241 241 location-loose 79.72 85.61 82.56 325 410 85.84 37.23 speaker-strict 71.11 68.73 69.90 371 360 256 255 speaker-loose 76.58 73.83 75.18 371 360 19.69 18.9 stime-strict 95.75 88.07 91.75 486 447 428 428 stime-loose 95.75 88.07 91.75 486 447 0 0 avg-strict 79.11 79.83 79.47 1398 1412 1117 1116 avg-loose 86.72 83.88 85.28 1398 1412 107.5 56.6
Results - Weather P R F temperature-strict 97.1282.1689.32 temperature -loose 99.7191.17 95.34 location-strict 92.5681.1286.42 location-loose 94.3486.1290.09 condition-strict 81.6371.8175.52 condition-loose 86.2580.1583.79 time-strict 93.14 87.05 90.84 time-loose 96.7291.15 93.51