1 / 23

Comparing approaches to XML-based discourse modeling: Secondary Information Structuring

Comparing approaches to XML-based discourse modeling: Secondary Information Structuring. Felix Sasaki University of Bielefeld Research Group "Text-technology" Project "Sekimo". www.text-technology.de. Overview. Primary Information Structuring and its shortcuts for discourse modeling

zlhna
Download Presentation

Comparing approaches to XML-based discourse modeling: Secondary Information Structuring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing approaches toXML-based discourse modeling:Secondary InformationStructuring Felix Sasaki University of Bielefeld Research Group "Text-technology" Project "Sekimo" www.text-technology.de

  2. Overview • Primary Information Structuring and its shortcuts for discourse modeling • A solution: Secondary Information Structuring • Its realization within our project: • Annotation format and analysis • A conceptual level • Mapping between data and conceptual level • Representation of the conceptual level • Resources developed within the framework • Related approaches

  3. textual data <corpus> <bunsetsu> <VN>shucchou</VN> <PGen>no</PGen> </bunsetsu> <bunsetsu> <NF>ken</NF> </bunsetsu> </corpus> <corpus> <bunsetsu> <VN>shucchou</VN> <PGen>no</PGen> </bunsetsu> <bunsetsu> <NF>ken</NF> </bunsetsu> </corpus> annotations document grammar constructs <!ELEMENT bunsetsu (VN|PGen|NF)+> <!ELEMENT PGen (#PCDATA)> ... Primary Information Structuring <corpus> <bunsetsu> <VN>shucchou</VN> <PGen>no</PGen> </bunsetsu> <bunsetsu> <NF>ken</NF> </bunsetsu> </corpus> <!ELEMENT bunsetsu (VN|PGen|NF)+> <!ELEMENT PGen (#PCDATA)> ...

  4. <corpus> <NP> <COMP><NP.NO><COMP> <VN>shucchou</VN></COMP> <HD><PGen>no</PGen></HD> </NP.NO></COMP><HD> <NF>ken</NF></HD> </NP> </corpus> Relation between theory-, language- or domain- specific document grammars and annotations? element for japanese dependency grammar element for HPSG grammar ? Its shortcuts for discourse modeling <corpus> <bunsetsu> <VN>shucchou</VN> <PGen>no</PGen> </bunsetsu> <bunsetsu> <NF>ken</NF> </bunsetsu> </corpus> <!ELEMENT bunsetsu (VN|PGen|NF)+> <!ELEMENT PGen (#PCDATA)> ... <!ELEMENT COMP (NP.NO|VN)+

  5. Overview • Primary Information Structuring and its shortcuts • A solution: Secondary Information Structuring: • Annotation format and analysis • A conceptual level • Mapping between data and conceptual level • Representation of the conceptual level • Resources developed within the framework • Related approaches

  6. Secondary Information Structuring Secondary Information Structuring bunsetsu NP Primary Information Structuring bunsetsu COMP HD COMP NP annotation format and annotation analysis HD Annotation 1 Annotation 2 Annotation n pool of unrelated document grammar constructs conceptual level: interrelated document grammar constructs multiple annotations of the same primary data

  7. Annotation format and annotation analysis • Multiple annotations of the same primary data • Analysis 1: multilayer relations, for relations between annotations on separate layers: • identity • endpoint_is_startingpoint • ... • Analysis 2: caterpillar expressions, for relations within the tree-structure of a single layer if there are no meta-relations for annotation units: • Analysis 3: Sub-classification of annotation units according to Analysis 1 and 2

  8. Secondary Information Structuring Secondary Information Structuring bunsetsu NP Primary Information Structuring bunsetsu COMP HD COMP NP HD Annotation 1 Annotation 2 Annotation n conceptual level

  9. Conceptual level: Basic structure Model-specific concepts Model HPSG Head-general Comp-general Head-sub1 Head-sub2

  10. Head-general HPSG H1 B2 Head-sub1 Head-general relationpartOf relationsubClassOf interconceptual properties Conceptual level: Relations Model-specific concepts Model HPSG Head-general Comp-general Head-sub1 Head-sub2

  11. right ‘Comp-general‘ HPSG caterpillarToComp Head-general Comp-general starting point is end point starting point is end point Tripel Notation Head-sub1 Head-sub2 Head-general caterpillarToComp right ‘Comp-general’ Head-general starting point is end point Comp-general Head-sub1 starting point is end point Head-sub2 Interconceptual properties

  12. Secondary Information Structuring Secondary Information Structuring bunsetsu NP Primary Information Structuring bunsetsu COMP HD mapping between data and conceptual level COMP NP HD Annotation 1 Annotation 2 Annotation n

  13. Mapping between data and conceptual level • Key concept: Interconceptual properties equal configurations between annotations on different annotation layers, or caterpillar expressions! • Document grammar constructs which are the basis for the annotation layers are mapped manually to superordinated concepts • From all subordinated concepts, the mapping can be inferred automatically

  14. right ‘Comp-General‘ HPSG caterpillarToComp Head-General Comp-General models starting point is end point starting point is end point manual mapping Head-Sub1 Head-Sub2 <xsd:element name=“hd“/> pool of document grammar constructs <xsd:element name=“bunsetsu“/> <xsd:element name=“Comp“/> ... Visualization of the mapping intensional, declarative description of axioms for document grammar constructs automatically inferred manual mapping extension: document grammar constructs and annotated documents

  15. Operations between data and conceptual level theory-driven Interrelation Validation of Hypothesis, Transformation of Data (not yet implem.) Data-based Interrelation Secondary Information Structuring element element Primary Information Structuring Language- and theory-specific document grammars attribute element Annotation 1 Annotation 2 Annotation n

  16. Conceptual level: Representation as RDFS • RDF Schema: “Resource Description Framework, Vocabulary Description Language” • Offers the constructs which are necessary for the representation of the models  Integration of many other, abstract resources, e.g. lexical knowledge (WordNet) • More expressive languages deploy RDFS, i.e. OWL  Ontological knowledge, e.g. (SUMO), or Linguistic ontologies (e.g. GOLD) which use these languages can be related to the conceptual level ”Abstract “ language resources can be combined with annotated data

  17. rdfs:subClassOf rdf:property Visualization of the RDFS representation rdfs:Class A rdfs:Class B rdfs:Class A-1 rdfs:Class A-2 rdfs:Class B-1 rdfs:Class B-2 rdfs:Class A-1-1 rdfs:Class A-1-2 rdfs:Class B-1-1 rdfs:Class B-1-2

  18. Resources: Primary Information Structuring Secondary Information Structuring hpsg-annotation of VERBMOBIL-treebank element element Primary Information Structuring bunsetsu-annotation of VERBMOBIL-treebank Language- and theory-specific document grammars attribute element annotation of tinkertoy-dialogues Annotation 1 Annotation 2 Annotation n

  19. Resources: Secondary Information Structuring Secondary Information Structuring element element Primary Information Structuring Language- and theory-specific document grammars attribute element Japanese functional pragmatics (JadEx-Project) bunsetsu- categories HPSG-related categories Annotation 1 Annotation 2 Annotation n General Japanese linguistic categories

  20. Overview • Primary Information Structuring and its shortcuts • A solution: Secondary Information Structuring: • Annotation format and analysis • A conceptual level • Mapping between data and conceptual level • Representation of the conceptual level • Resources developed within the framework • Related approaches

  21. Related methodology • ISO initiative on Language Resources Standards TC37 SC 4 • The creation of general and specific annotation vocabularies: • VAML (Virtual Markup Language) • CAML (Concrete Markup Language) • Applied within the latest Version of the Corpus Encoding Standard • Difference to our methodology: Relations between VAMLs and CAMLs are primarily based upon tree-structured data

  22. XML-based discourse modeling • Discourse Modeling • A modeling framework (ontologically empty), not a specific model • Document grammars supply annotation categories, without a specific interpretation • The expressive power of trees is enhanced • Applicable mainly for textual data, not multimodal domains • XML-based? Yes! • XML as an enhanced data model • Document grammars are sufficient and useful for Primary Information Structuring

  23. Comparing approaches toXML-based discourse modeling:Secondary InformationStructuring Felix Sasaki University of Bielefeld Research Group "Text-technology" Project "Sekimo" www.text-technology.de

More Related