1 / 25

Using OLIF, The Open Lexicon Interchange Format

Using OLIF, The Open Lexicon Interchange Format. Susan McCormick OLIF2 Consortium October 1, 2004. The OLIF Format. The Open Lexicon Interchange Format XML-compliant standard Supports exchange of lexical and terminological data for language technology applications

eitan
Download Presentation

Using OLIF, The Open Lexicon Interchange Format

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using OLIF,The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004

  2. The OLIF Format • The Open Lexicon Interchange Format • XML-compliant standard • Supports exchange of lexical and terminological data for language technology applications • Handles basic exchange as well as more complex applications such as MT lexicons

  3. The OLIF2 Consortium • OLIF v.2 was developed by the OLIF2 Consortium, a group of language technology companies and organizations interested in issues of MT data/term data exchange • Led by SAP • Members include Xerox, Microsoft, Trados, IBM, Systran, IAI, DFKI and Comprendium

  4. Developing OLIF v.2 • Based on OLIF prototype • Developed in EC-funded OTELO project – proposing standards for users of disparate language tools • Original purpose of OLIF was to facilitate terminology exchange for industrial users of MT

  5. Developing OLIF v.2 • Version 2 adapted from OLIF prototype using input from • Developers/users of 3+ MT systems • Developers/users of terminology management systems • Other language standards projects: • EAGLES • SALT • ISLE • MARTIF, TBX

  6. OLIF Version 2 Released as open standard in 2002 • XML-compliant • Covers 6 European languages • English, German, French, Spanish, Danish, Portuguese • Includes options for modeling administrative, morphological, syntactic and semantic data

  7. Available to Users • XML implementation of OLIF specification in a DTD • Available from OLIF2 Consortium web site: www.olif.net

  8. The OLIF File Follows Terminology Markup Framework (TMF) structure: • Header • Body • Shared resources

  9. The OLIF Entry Collection of monolingual data on a specified sense of a word or phrase • Optional links for cross-reference and transfer • Transfer is bilingual and unidirectional • Multiple transfers in multiple languages possible for single word sense

  10. Key Data Categories • The OLIF entry is uniquely identified by 5 key data categories: • Canonical form • Language • Part of speech • Subject field • Semantic reading

  11. Basic Well-Formed OLIF Entry • <entry> • <mono> • <keyDC> •   <canForm>table</canForm> •   <language>en</language> •   <ptOfSpeech>noun</ptOfSpeech> •   <subjField>general</subjField> •   <semReading>86</semReading> •   </keyDC> • </mono> • </entry>

  12. <entry> <mono> <keyDC> <canForm>table</canForm>   <language>en</language>   <ptOfSpeech>noun</ptOfSpeech>   <subjField>general</subjField>   <semReading>86</semReading> </keyDC> <monoDC> </monoDC> </mono> </entry> • <monoAdmin> • <originator>Weber</originator> • <adminStatus>ver</adminStatus> • </monoAdmin> • <monoMorph> • <inflection>like book,books</inflection> • </monoMorph> • <monoSyn> • <synType>cnt</synType> • <synFrame>[gencomp-opt]</synFrame> • </monoSyn> • <monoSem> • <semType>inform</semType> • </monoSem>

  13. OLIF Entry with Cross-Reference <entry> <mono> <keyDC> <canForm>table</canForm>   <language>en</language>   <ptOfSpeech>noun</ptOfSpeech>   <subjField>general</subjField>   <semReading>86</semReading> </keyDC> </mono> </entry> • <crossRefer> • <keyDC> • <canForm>row</canForm> • <language>en</language> • <ptOfSpeech>noun</ptOfSpeech> • <subjField>general</subjField> • <semReading>69</semReading> • </keyDC> • <crLinkType>has-meronym</crLinkType> • </crossRefer>

  14. OLIF Entry with Transfer <entry> <mono> <keyDC> <canForm>table</canForm>   <language>en</language>   <ptOfSpeech>noun</ptOfSpeech>   <subjField>general</subjField>   <semReading>86</semReading> </keyDC> </mono> </entry> • <transfer> • <keyDC> • <canForm>Tabelle</canForm> • <language>de</language> • <ptOfSpeech>noun</ptOfSpeech> • <subjField>general</subjField> • <semReading>86</semReading> • </keyDC> • </transfer>

  15. Data Category Values • Allowed values specified by OLIF • Administrative, terminological, linguistic values based on • General industry standards • E.g., allowed values for date derived from recommendations from ISO 8601:1988 • MT/Terminology standards • E.g., suggested values for subject field adapted from EC • Widely-recognized linguistic standards • E.g., allowed values for gender based on longstanding gender description for European languages

  16. User Extensions: The OLIF Data Category Registry • Users may declare and use their own values for certain data categories: • Subject field • Semantic reading • Morphological structure • Part of speech • Inflection • Aspect • Syntactic type • Syntactic frame • Semantic type • Concept hierarchy

  17. Organizing Based on Concept • Users may link monolingual entries via a concept identifier • These IDs can be used to organize entries as equivalent word senses associated with the same concepts rather than source word senses associated with transfers.

  18. Entries Linked by Concept <entry ConceptUserId= ”0731F16CCCD2D3119B4D”> <mono> <keyDC> <canForm>table</canForm> <language>en</language> <ptOfSpeech>noun</ptOfSpeech> <subjField>general</subjField> <semReading>86</semReading> </keyDC> </mono> </entry> • <entry ConceptUserId= • ”0731F16CCCD2D3119B4D”> • <mono> • <keyDC> • <canForm>Tabelle</canForm> • <language>de</language> • <ptOfSpeech>noun</ptOfSpeech> • <subjField>general</subjField> • <semReading>86</semReading> • </keyDC> • </mono> • </entry>

  19. What’s Available to the OLIF User? • On www.olif.net • Complete XML DTD for download • Hyperlinked DTD for viewing • Graphical view of structure of DTD • Current specification for OLIF v.2 • Formalization of OLIF data categories • Alphabetic list of XML elements and attributes • Fixed and recommended values for elements and attributes • Guidelines for formulating canonical forms • Sample OLIF entries

  20. Using OLIF • Some applications: • SAP has implemented an OLIF converter to exchange terminological data from its central termbase SAPterm • MT developers in OLIF2 Consortium currently developing OLIF converters (Comprendium, Systran) • OLIF User Forum = 60+ members

  21. What’s New: XML Schema OLIF XSD offers • 40+ built-in data types • Allows creation of user-defined data types • Supports inheritance

  22. What’s New: The OLIF API • Based on OLIF XSD, Java classes created • Supports: • Converting .csv files to OLIF • Converting from XML format to OLIF • Creating OLIF documents from scratch • Modifying OLIF documents

  23. What to Expect this Year from OLIF • OLIF XSD and API are available to the user from www.olif.net • OLIF web site upgraded, updated • Requirements for modeling Japanese entries integrated

  24. OLIF User Forum • Users of OLIF can access and post questions, messages and sample data from the OLIF group site: http://groups.yahoo.com/group/olifConsortium/

More Related