1 / 44

xml:tm

XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005. xml:tm. Automating Translation. Machine Translation Translation Memory Hybrid Linguistic Inference Engines Terminology. Automating Translation. Machine translation 40 year history

shakti
Download Presentation

xml:tm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005 xml:tm

  2. Automating Translation • Machine Translation • Translation Memory • Hybrid Linguistic Inference Engines • Terminology

  3. Automating Translation • Machine translation • 40 year history • Rigorous control of grammar and terminology can produce good results • Lots of interesting new developments with hybrid statistical/transfer based systems • Translation of free format text is theoretically impossible with current technology.

  4. Translation Memory • Align source and target text • Look up new text against memory • Relatively primitive technology • Not muchinnovation over the past 30 years • Need for proofing • Proprietary translation memory formats

  5. TranslatingXML Documents • XML inherently easier to translate • Separation of form and content • Support for Unicode and other international encoding formats. • Allows multiple output formats - PDF, XHTML, WAP

  6. XML Translation Standards • LISA - Localization Industry Standards Association: http://www.lisa.org • OASIS - Organization for the Advancement of Structured Information Standards: http://www.oasis-open.org • W3C - World Wide Web Consortium: http://www.w3c.org • OLIF Consortium: http://www.olif.net

  7. LISA Standards • TMX - Translation Memory Exchange format: http://www.lisa.org/tmx • TBX - Termbase Exchange format: http://www.lisa.org/tbx • SRX - Segmentation Rules Exchange format: http://www.lisa.org/srx • GMX - GILT Metrics Exchange format: http://www.lisa.org/gmx

  8. OASIS L10N Standards • XLIFF - XML Localization Interchange File Format: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff • TransWS - Translation Web Services: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=trans-ws • DITA – Darwin Information Technology Architecture http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita

  9. W3C and OLIF • W3C ITS http://www.w3.org/International/ http://www.w3.org/International/its • OLIF - Open Lexicon Interchange Format: http://www.olif.net

  10. XML namespace • Major feature of XML • Allows the mapping of different ontological entities onto the same representation • Allows different ways to look at the same data • Namespaces can be made transparent

  11. xml:tm • XML based text memory • Revolutionary approach to translating XML documents • First significant advance in translation memory technology • Uses XML namespace to transparently embed contextual information

  12. xml:tm namespace • Text Memory namespace • Can be mapped onto any XML document • Vertical view of document in terms of ‘text segments’ • Can be totally transparent

  13. xml:tm namespace Example of the use of tm namespace in an XML document: <documentxmlns:tm="urn:xml-Intl-tm"> <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>

  14. xml:tm namespace original document view tm namespace view doc tm title te tu text text section section te tu sentence tu sentence para text te tu sentence tu sentence para text te tu sentence tu sentence para text te tu sentence tu sentence para text te tu sentence tu sentence para text para text te tu sentence tu sentence

  15. xml:tm namespace original document view text <para> Namespace is very simple. It is easy to use. </para> tm namespace view tu sentence sentence te tu <para> <tm:te id=“e1”> <tm:tu id=“u1.1”> </tm:tu> Namespace is very simple. <tm:tu id=“u1.2”> </tm:tu> It is easy to use. </tm:te> </para>

  16. xml:tm Text Memory • Author memory Maintain memory of source text Authoring statistics Authoring tool input • Translation memory Automatic alignment Maintain perfect link of source and target text Reduce translation costs

  17. xml:tm DOM differencing Source Document Updated Source Document tu id=”1” tu id=”1” tu id=”2” deleted tu id=”3” tu id=”3” tu id=”4” tu id=”4” origid=”5” tu id=”7” tu id=”5” modified tu id=”6” tu id=”6” tu id=”8” new

  18. xml:tm Author Memory • Namespace aware DOM differencing • Identify changes from the previous version • Unique text unit identifiers are maintained • Modification history • Text units can be loaded into a database • Authoring environment integration

  19. xml:tm Translation Memory • The tm namespace can be used to create XLIFF files • Automatic alignment of source and target languages • Allows for more focused translation matching • Exact matching • Leveraged matching from document - identical text • Leveraged matching from database • Modified text unit matching • Non translatable text unit identification

  20. DITA Strengths • Topic-centric level of granularity • Very well thought out and flexible architecture for content creation and publishing • Substantial reuse of existing assets • Specialization at the topic and domain levels • Automated processing based on meta data property • Translate topic only once, reuse many times

  21. DITA and xml:tm • Both complement each other • xml:tm encourages text reuse at the sentence level • Automates translation matching and extraction • Automatic alignment of source and target documents at the text unit (sentence) level • Introduces the concept of exact matching for translation as well as focused matching • Fully integrated with existing standards such as SRX, GMX, TMX and XLIFF

  22. xml:tm translation via XLIFF Translated Document XLIFF Document Source Document trans-unit id=”1” tu id=”1” tu id=”1” trans-unit id=”2” tu id=”2” tu id=”2” tu id=”3” tu id=”3” trans-unit id=”3” tu id=”4” trans-unit id=”4” tu id=”4” trans-unit id=”5” tu id=”5” tu id=”5” trans-unit id=”6” tu id=”6” tu id=”6”

  23. xml:tm translated document translated document view translated tm namespace view doc tm title te tu tekst tekst section section te tu zdanie tu zdanie para tekst te tu zdanie tu zdanie para tekst te tu zdanie tu zdanie para tekst te tu zdanie tu zdanie para tekst te tu zdanie tu zdanie para tekst para tekst te tu zdanie tu zdanie

  24. xml:tm perfect alignment Exact alignment Translated Document Source Document tu id=”1” tu id=”1” tu id=”2” tu id=”2” tu id=”3” tu id=”3” tu id=”4” tu id=”4” tu id=”5” tu id=”5” tu id=”6” tu id=”6”

  25. xml:tm perfect matching Perfect Matching Matched Target Document Updated Source Document tu id=”1” tu id=”1” tu id=”2” deleted tu id=”3” tu id=”3” tu id=”4” tu id=”4” requires translation modified tu id=”7” tu id=”7” tu id=”6” tu id=”6” requires translation tu id=”8” tu id=”8” new

  26. xml:tm leveraged DB memory Perfect alignment Translated Document Source Document tu id=”1” tu id=”1” tu id=”2” tu id=”2” tu id=”3” tu id=”3” tu id=”4” tu id=”4” tu id=”5” tu id=”5” tu id=”6” tu id=”6” DB TMX

  27. xml:tm in-document leveraged matching Perfect Matching Matched Target Document Updated Source Document tu id=”1” tu id=”1” tu id=”2” deleted tu id=”3” tu id=”3” tu id=”4” tu id=”4” requires translation modified tu id=”7” tu id=”7” tu id=”6” tu id=”6” requires proofing leveraged match tu id=”8” tu id=”8” new:same id=”3”

  28. xml:tm in-document fuzzy matching Perfect Matching Matched Target Document Updated Source Document tu id=”1” tu id=”1” tu id=”2” deleted tu id=”3” tu id=”3” tu id=”4” tu id=”4” requires translation tu id=”7” tu id=”7” mod:origid=”5” fuzzy match tu id=”6” tu id=”6” requires proofing leveraged match tu id=”8” tu id=”8” New:same

  29. xml:tm db leveraged matching Perfect Matching Matched Target Document Updated Source Document tu id=”1” tu id=”1” tu id=”2” deleted tu id=”3” tu id=”3” tu id=”4” tu id=”4” requires translation tu id=”7” tu id=”7” mod:origid=”5” fuzzy match tu id=”6” tu id=”6” requires proofing doc leveraged match tu id=”8” tu id=”8” new:same requires proofing tu id=”9” tu id=”9” DB leveraged match DB

  30. xml:tm non-translatable text Exact Matching Matched Target Document Updated Source Document tu id=”1” tu id=”1” requires no translation tu id=”2” tu id=”2” non translatable non trans tu id=”3” tu id=”3” tu id=”4” tu id=”4” requires translation tu id=”7” tu id=”7” fuzzy match tu id=”6” tu id=”6” requires proofing doc leveraged match tu id=”8” tu id=”8” new:same requires proofing tu id=”9” tu id=”9” DB leveraged match DB

  31. Traditional Translation Scenario Publishing Translation Extracted text source text source text tm process extract Prepared text target text merge target text target text Translated text QA Translate

  32. extract perfect matching merge xml:tm Translation Scenario Publishing leveraged matching xml:tm source text Extracted text XLIFF file tm process Automatic Process Web service/ interface Web QA Translate Translator xml:tm target text Automatic Process

  33. xml:tm benefits • Open Standard donated by XML INTL to LISA • Complements DITA • Enterprise level scalability • Totally integrated within the XML framework • Source text is automatically extracted and matched • Word counts are controlled by the customer • Text can be presented for translation via the web • Data is merged automatically at end of translation cycle • All memory operations are totally automated • Can be used transparently for relay translations • More accurate – better matching

  34. xml:tm • Full specification: • http://www.xml-intl.com/docs/specification/xml-tm.html • Maintained by xml-intl.com • http://www.xml-intl.com/dtd/tm.dtd • http://www.xml-intl.com/dtd/tm.xsd • Detailed article on xml:tm in www.xml.com • Donated by XML INTL to Lisa OSCAR

  35. Any questions?

  36. XML INTL Contact Details • Postal address: PO Box 2167 Gerrards Cross Bucks SL9 8XF United Kingdom • Phone: +44 1753 480 467 • Fax: +44 1753 480 465 • Bob Willans - bwillans@xml-intl.com • Andrzej Zydroń – azydron@xml-intl.com • Bartek Bogacki – bbogacki@xml-intl.com

More Related