Overview of the ISO 16642 TMF: A Framework for Computerized Terminologies
380 likes | 505 Views
This document provides an in-depth overview of the ISO 16642 Terminological Markup Framework (TMF), developed by Laurent Romary at Laboratoire Loria. It outlines general principles for representing computerized terminologies and maintaining interoperability across different formats. Key components include definitions of underlying structures, the Terminological Markup Language (TML), and examples of terminology entries. The framework emphasizes the importance of using standardized data categories for effective terminology management and offers guidance on achieving interoperability through functional tools.
Overview of the ISO 16642 TMF: A Framework for Computerized Terminologies
E N D
Presentation Transcript
ISO 16642 TMF - Terminological Markup Framework Laurent Romary - Laboratoire Loria
General principles • Expressing constraints on the representation of computerized terminologies • What is the underlying structure of computerized terminologies? • Which data-category is used and under which conditions? • Maintaining interoperability between representations • Providing a conceptual tool to compare two given formats
Definitions • TMF: Terminological Mark-up Framework • Definition of underlying structures and mechanisms needed for the computer representation of terminological data • Independence with regards any specific format • TML: Terminological Mark-up Language • One specific representation format generated within TMF • E.g.: DXLT is a possible TML
A family of formats TMF … TML1 TML2 TML3 TML1 (Geneter) (DXLT)
Meta-model Representing the underlying structure of terminological data
Terminological Data Collection 0:1 * * 1 1 1 Global Information Terminological Entry Complementary Information * * Terminology- related Information 1 * Language Section 1 * * 1 Term Section * * 1 Term Component Section
The structural skeleton Terminological Data Collection (TDC) Global Information (GI) Complementary Information (CI) * Terminological Entry (TE) * Language Section (LS) * Term Level (TL) * Term Component Level (TCL)
How does this work? Walking through an example…
DXLT example <termEntryid='ID67'> <descrip type='subjectField‘>manufacturing</descrip> <descrip type='definition'>A value between 0 and 1 used in ...</descrip> <langSetlang='en'> <tig> <term>alpha smoothing factor</term> <termNote type='termType'>fullForm</termNote> </tig> </langSet> <langSetlang='hu'> <tig> <term>Alfa ...</term> </tig> </langSet> </termEntry>
id=‘ID67’ [attribute] subjectField=‘ manufacturing ’ [typedElement] definition=‘A value…’ [typedElement] TE lang=‘ en ’ [attribute] LS lang=‘ hu ’ [attribute] TS term=‘…’ [element] term=‘alpha smoothing factor’ [element] termType=‘fullForm’ [typedElement] Identifying the structural skeleton TE: Terminological Entry LS: Language Section TS: Term Section
TMF information model id=‘ID67’ subjectField=‘ manufacturing ’ definition=‘A value…’ TE LS LS lang=‘ hu ’ lang=‘ en ’ term=‘alpha smoothing factor’ termType=‘fullForm’ TS term=‘…’ TS
GMT representation <struct type=“TE”> <feat type=“id”>ID67</feat> <feat type=“subjectField”>manufacturing</feat> <feat type=“definition”>A value between 0 and 1 used in ...</feat> <struct type=“LS”> <feat type=“lang”>en</feat> <struct type=“TS”> <feat type=“term”>alpha smoothing factor</feat> <feat type=“termType”>fullForm</feat> </struct> </struct> <struct type=“LS”> <feat type=“lang”>hu</feat> <struct type=“TS”> <feat type=“term”>Alfa ...</feat> </struct> </struct> </struct>
TML à la mode ISO • Ingredients • A structural skeleton • (take the TMF Metamodel) • A reference Data Category Registry • ISO 12620 is a good place to find one • Recette • Choose some data categories from the registry • You can even constrain the values of your datcats • Associate a style and vocabulary to each datcat • You can inspire yourself from others (DXLT) • Serve it hot to your software guy with a piece of SALT software
GMT Generic Mapping Tool
Background • Interoperability principle • If any two TMLs have exactly the same DCS, even though they differ radically in style and vocabulary, they are equivalent. • Consequence • It is always possible to define a filter from one TML to another when they are interoperable • GMT is the intermediate representation to do so
From one TML to another • GMT - Generic mapping tool • an abstract XML representation • identification of levels • <struct type=“LS”>…</struct> • a recursive element • representation of data-categories • <feat type=“definition”>…</feat>
GMT description cont. • Bracketing features <brack> <feat type=“classificationCode“> xxx </feat> <feat type=“classificationSystem“> Lenoc </feat> </brack>
GMT description cont • Annotating information <feat type=“definition”> pencil whose <annot type=“characteristic”> casing </annot> is fixed around a cental graphite medium which is used for writing or making marks </feat>
Data Categories A Formal Description
Data Category Registry DCRegistry rdf:about Description dcsd:DataCategory VersionNumber Data Category
Data Category description DCIdentifier DCParent DCName dcsd:DCIdentifier dcsd:DCParent DCDefinition dcsd:DCName dcsd:DCDefinition dcsd:DCType DCType (S, C) Data Category dcsd:DCExample DCExample dcsd:DCAdmin dcsd:DCComment dcsd:Content dcsd:Level DCAdmin DCComment Locus Content Salt 2000-11-08/SEW
Levels and content Content dcsd:DataType dcsd:TargetType Level/Loci rdf:Alt rdf:Alt TargetType DataType List of References List of References rdf:Alt rdf:li Ref to other datcats rdf:li List of References Ref to other datcat(s) rdf:li Ref to other datcat(s)
Actualizing a DatCat TMF specific properties
Styling properties Simple Element Attribute TypedElement ValuedElement TVElement Anchor StyleName Data Category dcsd:Anchor dcsd:StyleName dcsd:Style dcsd:ElementName ElementName Style dcsd:Value dcsd:AttributeName dcsd:TypeValue AttributeName Value TypeValue Pour simple
Attribute style description • dcsd:StyleName=“Attribute” • Conditions of use: • Not valid for annotations • Required properties • dcsd:AttributeName • Example: • dcsd:AttributeName=“id” • <anchorElement id=“xx54893”>…</>
Element style description • dcsd:StyleName=“Element” • Required properties • dcsd:ElementName • Example: • dcsd: ElementName =“definition” • <definition>…</definition>
TypedElement style description • dcsd:StyleName=“TypedElement” • Required properties • dcsd:ElementName, dcsd:TypeValue • Example: • dcsd:ElementName =“termNote” • dcsd:TypeValue=“partOfSpeech” • <termNote type=“partOfSpeech”/>N</termNote>
ValuedElement style description • dcsd:StyleName=“ValuedElement” • Conditions of use: • Not valid for annotations • Required properties • dcsd:ElementName • Example: • dcsd:ElementName =“pos” • <pos value=“noun”/>
TVElement style description • dcsd:StyleName=“TVElement” • Conditions of use: • Not valid for annotations • Required properties • dcsd:ElementName, dcsd:TypeValue • Example: • dcsd:ElementName =“free” • dcsd:TypeValue=“pos” • <free type=“pos” value=“noun”/>
Simple style description • dcsd:StyleName=“Simple” • Conditions of use: • Express the value of simple data categories • Required properties: • dcsd:Value • Example: • dcsd:Value =“Nom” • <pos>Nom</pos>
Two types of languages • Working language • The language used at a given place in a document, along the XML hierarchy • Representation: xml:lang • Object language • The language about which you speak at a given place in your terminological entry (e.g. describes the Language Section level) • Representation: as a data category “language”, with a narrow scope
Example — DXLT <langSet lang='en’xml:lang=“fr”> <descrip type='definition’>Une valeur entre 0 et 1 utilisée…</descrip> <tig> <term xml:lang=“en”>alpha smoothing factor</term> <termNote type='termType'>fullForm</termNote> </tig> </langSet>
Example — GMT <struct type=“LS”xml:lang=“fr”> <feat type=“language”>en</feat> <feat type='definition’>Une valeur entre 0 et 1 utilisée…</feat> <struct type=“TL”> <feat type=“term” xml:lang=“en”>alpha smoothing factor</feat> <feat type='termType'>fullForm</feat> </struct> </langSet>
Conclusion • A general model for analysing and representing terminological data collection • An underlying formalism expressed in XML,RDF • Associated tools (Salt project) • DCSEditor, • DCSBrowser, • Automatic generation of XSLT filters and XML schemas from a given TML specification
Useful pointers • SALT project • http://www.loria.fr/projets/SALT • http://www.ttt.org/ • The TMF site • http://www.loria.fr/projets/TMF