1 / 9

Merging Terminology and Disambiguation

Merging Terminology and Disambiguation. Tadej Štajner, Mārcis Pinnis MLW-LT WG Meeting, 24.1.2013, Prague. Summary. 1) We can combine Terminology and Disambiguation under one umbrella, but at the same time do not loose backwards compatibility.

irish
Download Presentation

Merging Terminology and Disambiguation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Merging Terminology and Disambiguation TadejŠtajner, MārcisPinnis MLW-LT WG Meeting, 24.1.2013, Prague

  2. Summary • 1) We can combine Terminology and Disambiguation under one umbrella, but at the same time do not loose backwards compatibility. • 2) There are compromises, however (in what can and what cannot be annotated)!

  3. Current state • term, termInfo*, • disambigIdent* • disambigClass* • disambigGranularity (entity|lexicalConcept|ontologyConcept) • disambigConfidence • Can’t support simultaneous annotation on multiple granularity levels • Can there be more granularity levels?

  4. Common ground • Both data categories model things in the form of <fragment, relationship, URI> • The current relationships that we model in both DCs are term, entity, lexical concept, ontology concept, class type • With some exceptions: • its:term=“yes” • its:disambigSource, its:disambigIdent

  5. Scenario A • Add ‘term’ as another disambigGranularity level, keep its:term as a flag to allow for the simple use case • Fits in same framework • Precludes use of multiple levels on the same node, so it fails on this constraint

  6. Suggestions • Divide Disambiguation in smaller parts dividing the "granularities" (or levels) in separate parts to accomodate multiple simultaneous levels • each of the levels serves a different purpose for different target audiences and scenarios • Treat Terminology as equally important level • This is where we end up under one umbrella • Implement this in the spec as a set of attributes per level that have a similar pattern for every level • This will allow independency of the different levels • Keep the cardinality to 1 value for every attribute: • users should use one tool for one text analysis level - if they use multiple, they have to create a wrapper that is able to combine the outputs. Otherwise you will end up having conflicting annotations anyway – we shouldn‘t encourage that. • Keep the inheritance of annotations as-is - no hierarchical entities

  7. Scenario B • Keep Terminology as-is, drop disambigGranularity, and encode the levels in the attributes themselves: • entityIdent*, lexicalConcept*, ontologyConcept* - many new attributes, following the same pattern • Allows annotating multiple levels on same node • Every “level” gets its own set of attributes: • *ref • *refPointer • * + *source • *confidence Same pattern appears everywhere – this would make adoption easier compared to declaring granularity.

  8. Hierarchical named entities? • Difficult to interpret on the individual node level: • <b disambigClassRef=“nerd:Person”>Mayor of <b disambigClassRef=“nerd:City”>London</b></b> • “London” would have classes Person and City. Can we really say that London is a Person? (Answer: not without looking at the surrounding nodes) Painful to implement!

  9. Other possible levels • Proposal: add a loosely-defined “keyword” level for annotations which don’t fit in others

More Related