1 / 20

Leveraging XLT: (Web-Enabled) Validation of Terminology Collections

Leveraging XLT: (Web-Enabled) Validation of Terminology Collections. Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001. Surrey-EU project history. Terminology Extraction and Management Projects: TWB , TWBII Management of Text Collections: TRANSTERM

teo
Download Presentation

Leveraging XLT: (Web-Enabled) Validation of Terminology Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Leveraging XLT: (Web-Enabled) Validation of Terminology Collections Lee Gillam, University of Surrey SALT Workshop, Antwerp 31 January 2001

  2. Surrey-EU project history • Terminology Extraction and Management Projects: TWB, TWBII • Management of Text Collections: TRANSTERM • Term Resources: POINTER • Terminology Validation: INTERVAL • Convergence in SALT?

  3. XLT ‘opportunities’ • Complete terminology collections available in XML – enhancement/reuse of other collections • Large number of (multilingual) terms – difficult for humans to appraise • Terminology relates to usage – document collections highly relevant • Quantity of terms – no guarantee of quality

  4. (Web-Enabled) Validation • Relevant documents on the web – contextual information • Relevant documents on the ‘corporate internet’ – contextual information • Term usage in other organisations (glossaries)/as understood by Joe E.C. Taxpayer • Resource enrichment

  5. System Description • For a given (D)XLT collection of terminology: • Partition collection by specific criteria • Collect documents relevant to criteria • Analyse documents against the partitioned collection • Report results

  6. System Description • Partition collection by specific criteria: • Use of ‘Xpath’ • “Give me all terms in English” • //dxlt/text/body/termEntry/langSet[@lang = ‘en’]/ntig/termGrp/term/text() • Alternative example: “Give me all subjectFields” • //dxlt/text/body/termEntry/descrip[@type=‘subjectField’]/text() [check!]

  7. System Description • Collect documents relevant to criteria • For terms, try internet/intranet searching • For subject field classifications, classification documents will be relevant • For definitions, comparisons with other glossaries may provide useful validating information • …..

  8. System Description • Analyse documents against the partitioned collection • Are the terms contained in the documents? • Are the terms in the documents now used as parts of compounds? • What are the contexts in which the terms are used? • Are there a number of potential other definitions for a particular term? • Does this fit in with a specific classification? • ….

  9. System Description • Report Results • Term frequency – Zero? • Potential compounds • Contexts • Definitions • Correctly classified • …..

  10. Prototype prototype ‘Results Area’ XML attributes Indicative Actions ‘XML’

  11. Prototype prototype

  12. Prototype prototype Indicative XPaths

  13. Prototype prototype

  14. Prototype prototype

  15. Prototype prototype Recall this term…

  16. Prototype prototype CIRCUIT SWITCHING Found in collected texts 43 times. Valid term? PACKET SWITCHING also exists in this resource.

  17. DHydro Sample • <termEntry id="HR-7"> • + <transacGrp> <descrip type="subjectField">200</descrip> • + <langSet lang="fr"> • <langSet lang="en"> • <descripGrp> <descrip type="definition">The apparent displacement in position of a heavenly body caused by the combination of the velocity of light and that of an observer on the surface of the earth. Aberration of light due to the rotation of the earth on its axis is termed diurnal aberration. That due to the revolution of the earth around the sun is termed annual aberration.</descrip> </descripGrp> • <ntig> <termGrp> • <term id="HR-7-en-1">aberration of light</term> • <termNote type="termType">main entry</termNote> <termNote type="partOfSpeech">n</termNote></termGrp> </ntig> </langSet> • + <langSet lang="es"> • </termEntry>

  18. Lenoch (GMT) • <struct type="classification"> • <feat type="name">AD2</feat> • <feat type="documentation">public and private organisations</feat> • <feat type="subclass-of">AD</feat> • </struct> • <struct type="classification"> • <feat type="name">AD3</feat> • <feat type="documentation">publications and documentary search</feat> • <feat type="subclass-of">AD</feat> • </struct> • <struct type="classification"> • <feat type="name">AD31</feat> • <feat type="documentation">documentation and information systems</feat> • <feat type="subclass-of">AD3</feat> • </struct>

  19. Lenoch (XOL) • <class> • <name>AD2</name> • <documentation>public and private organisations</documentation> • <subclass-of>AD</subclass-of> • </class> • <class> • <name>AD3</name> • <documentation>publications and documentary search </documentation> • <subclass-of>AD</subclass-of> • </class> • <class> • <name>AD31</name> • <documentation>documentation and information systems</documentation> • <subclass-of>AD3</subclass-of> • </class>

  20. Outlook • Initial Results show promise for Validation of Terminological Resources • significant development work is still required. • XPath generation needs tailoring to specific formats (DXLT), but provides useful power • Development to merge ‘Web glossaries’ – pre-terminological validation stage • Provide a powerful prototype of the capabilities for the (Web-Enabled) Validation of Terminology Collections – with DXLT-related formats. • DXLT as the de facto standard format for Terminology Validation?

More Related