slide1
Download
Skip this Video
Download Presentation
xml:tm

Loading in 2 Seconds...

play fullscreen
1 / 44

xml:tm - PowerPoint PPT Presentation


  • 166 Views
  • Uploaded on

XML Based Text Memory Using XML technology to reduce the cost of translating XML documents 27 June 2005. xml:tm. Automating Translation. Machine Translation Translation Memory Hybrid Linguistic Inference Engines Terminology. Automating Translation. Machine translation 40 year history

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' xml:tm' - faith-joseph


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
XML Based Text Memory

Using XML technology to reduce the cost of translating XML documents

27 June 2005

xml:tm
automating translation
Automating Translation
  • Machine Translation
  • Translation Memory
  • Hybrid Linguistic Inference Engines
  • Terminology
automating translation1
Automating Translation
  • Machine translation
  • 40 year history
  • Rigorous control of grammar and terminology can produce good results
  • Lots of interesting new developments with hybrid statistical/transfer based systems
  • Translation of free format text is theoretically impossible with current technology.
translation memory
Translation Memory
  • Align source and target text
  • Look up new text against memory
  • Relatively primitive technology
  • Not muchinnovation over the past 30 years
  • Need for proofing
  • Proprietary translation memory formats
translating xml documents
TranslatingXML Documents
  • XML inherently easier to translate
  • Separation of form and content
  • Support for Unicode and other international encoding formats.
  • Allows multiple output formats - PDF, XHTML, WAP
xml translation standards
XML Translation Standards
  • LISA - Localization Industry Standards Association: http://www.lisa.org
  • OASIS - Organization for the Advancement of Structured Information Standards: http://www.oasis-open.org
  • W3C - World Wide Web Consortium: http://www.w3c.org
  • OLIF Consortium: http://www.olif.net
lisa standards
LISA Standards
  • TMX - Translation Memory Exchange format: http://www.lisa.org/tmx
  • TBX - Termbase Exchange format: http://www.lisa.org/tbx
  • SRX - Segmentation Rules Exchange format: http://www.lisa.org/srx
  • GMX - GILT Metrics Exchange format: http://www.lisa.org/gmx
oasis l10n standards
OASIS L10N Standards
  • XLIFF - XML Localization Interchange File Format: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff
  • TransWS - Translation Web Services: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=trans-ws
  • DITA – Darwin Information Technology Architecture http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita
w3c and olif
W3C and OLIF
  • W3C ITS

http://www.w3.org/International/

http://www.w3.org/International/its

  • OLIF - Open Lexicon Interchange Format: http://www.olif.net
xml namespace
XML namespace
  • Major feature of XML
  • Allows the mapping of different ontological entities onto the same representation
  • Allows different ways to look at the same data
  • Namespaces can be made transparent
xml tm
xml:tm
  • XML based text memory
  • Revolutionary approach to translating XML documents
  • First significant advance in translation memory technology
  • Uses XML namespace to transparently embed contextual information
xml tm namespace
xml:tm namespace
  • Text Memory namespace
  • Can be mapped onto any XML document
  • Vertical view of document in terms of ‘text segments’
  • Can be totally transparent
xml tm namespace1
xml:tm namespace

Example of the use of tm namespace in an XML document:

<documentxmlns:tm="urn:xml-Intl-tm">

<tm:tm>

<section>

<para>

<tm:te>

<tm:tu>

Namespace is very flexible.

</tm:tu>

<tm:tu>

It is very easy to use.

</tm:tu>

</tm:te>

</para>

xml tm namespace2
xml:tm namespace

original document view

tm namespace view

doc

tm

title

te

tu

text

text

section

section

te

tu

sentence

tu

sentence

para

text

te

tu

sentence

tu

sentence

para

text

te

tu

sentence

tu

sentence

para

text

te

tu

sentence

tu

sentence

para

text

te

tu

sentence

tu

sentence

para

text

para

text

te

tu

sentence

tu

sentence

xml tm namespace3
xml:tm namespace

original document view

text

<para>

Namespace is very simple. It is easy to use.

</para>

tm namespace view

tu

sentence

sentence

te

tu

<para>

<tm:te id=“e1”>

<tm:tu id=“u1.1”>

</tm:tu>

Namespace is very simple.

<tm:tu id=“u1.2”>

</tm:tu>

It is easy to use.

</tm:te>

</para>

xml tm text memory
xml:tm Text Memory
  • Author memory

Maintain memory of source text

Authoring statistics

Authoring tool input

  • Translation memory

Automatic alignment

Maintain perfect link of source and target text

Reduce translation costs

xml tm dom differencing
xml:tm DOM differencing

Source Document

Updated Source Document

tu id=”1”

tu id=”1”

tu id=”2”

deleted

tu id=”3”

tu id=”3”

tu id=”4”

tu id=”4”

origid=”5”

tu id=”7”

tu id=”5”

modified

tu id=”6”

tu id=”6”

tu id=”8”

new

xml tm author memory
xml:tm Author Memory
  • Namespace aware DOM differencing
  • Identify changes from the previous version
  • Unique text unit identifiers are maintained
  • Modification history
  • Text units can be loaded into a database
  • Authoring environment integration
xml tm translation memory
xml:tm Translation Memory
  • The tm namespace can be used to create XLIFF files
  • Automatic alignment of source and target languages
  • Allows for more focused translation matching
    • Exact matching
    • Leveraged matching from document - identical text
    • Leveraged matching from database
    • Modified text unit matching
    • Non translatable text unit identification
dita strengths
DITA Strengths
  • Topic-centric level of granularity
  • Very well thought out and flexible architecture for content creation and publishing
  • Substantial reuse of existing assets
  • Specialization at the topic and domain levels
  • Automated processing based on meta data property
  • Translate topic only once, reuse many times
dita and xml tm
DITA and xml:tm
  • Both complement each other
  • xml:tm encourages text reuse at the sentence level
  • Automates translation matching and extraction
  • Automatic alignment of source and target documents at the text unit (sentence) level
  • Introduces the concept of exact matching for translation as well as focused matching
  • Fully integrated with existing standards such as SRX, GMX, TMX and XLIFF
xml tm translation via xliff
xml:tm translation via XLIFF

Translated Document

XLIFF Document

Source Document

trans-unit id=”1”

tu id=”1”

tu id=”1”

trans-unit id=”2”

tu id=”2”

tu id=”2”

tu id=”3”

tu id=”3”

trans-unit id=”3”

tu id=”4”

trans-unit id=”4”

tu id=”4”

trans-unit id=”5”

tu id=”5”

tu id=”5”

trans-unit id=”6”

tu id=”6”

tu id=”6”

xml tm translated document
xml:tm translated document

translated document view

translated tm namespace view

doc

tm

title

te

tu

tekst

tekst

section

section

te

tu

zdanie

tu

zdanie

para

tekst

te

tu

zdanie

tu

zdanie

para

tekst

te

tu

zdanie

tu

zdanie

para

tekst

te

tu

zdanie

tu

zdanie

para

tekst

te

tu

zdanie

tu

zdanie

para

tekst

para

tekst

te

tu

zdanie

tu

zdanie

xml tm perfect alignment
xml:tm perfect alignment

Exact alignment

Translated Document

Source Document

tu id=”1”

tu id=”1”

tu id=”2”

tu id=”2”

tu id=”3”

tu id=”3”

tu id=”4”

tu id=”4”

tu id=”5”

tu id=”5”

tu id=”6”

tu id=”6”

xml tm perfect matching
xml:tm perfect matching

Perfect Matching

Matched Target Document

Updated Source Document

tu id=”1”

tu id=”1”

tu id=”2”

deleted

tu id=”3”

tu id=”3”

tu id=”4”

tu id=”4”

requires translation

modified

tu id=”7”

tu id=”7”

tu id=”6”

tu id=”6”

requires translation

tu id=”8”

tu id=”8”

new

xml tm leveraged db memory
xml:tm leveraged DB memory

Perfect alignment

Translated Document

Source Document

tu id=”1”

tu id=”1”

tu id=”2”

tu id=”2”

tu id=”3”

tu id=”3”

tu id=”4”

tu id=”4”

tu id=”5”

tu id=”5”

tu id=”6”

tu id=”6”

DB

TMX

xml tm in document leveraged matching
xml:tm in-document leveraged matching

Perfect Matching

Matched Target Document

Updated Source Document

tu id=”1”

tu id=”1”

tu id=”2”

deleted

tu id=”3”

tu id=”3”

tu id=”4”

tu id=”4”

requires translation

modified

tu id=”7”

tu id=”7”

tu id=”6”

tu id=”6”

requires proofing

leveraged match

tu id=”8”

tu id=”8”

new:same id=”3”

xml tm in document fuzzy matching
xml:tm in-document fuzzy matching

Perfect Matching

Matched Target Document

Updated Source Document

tu id=”1”

tu id=”1”

tu id=”2”

deleted

tu id=”3”

tu id=”3”

tu id=”4”

tu id=”4”

requires translation

tu id=”7”

tu id=”7”

mod:origid=”5”

fuzzy match

tu id=”6”

tu id=”6”

requires proofing

leveraged match

tu id=”8”

tu id=”8”

New:same

xml tm db leveraged matching
xml:tm db leveraged matching

Perfect Matching

Matched Target Document

Updated Source Document

tu id=”1”

tu id=”1”

tu id=”2”

deleted

tu id=”3”

tu id=”3”

tu id=”4”

tu id=”4”

requires translation

tu id=”7”

tu id=”7”

mod:origid=”5”

fuzzy match

tu id=”6”

tu id=”6”

requires proofing

doc leveraged match

tu id=”8”

tu id=”8”

new:same

requires proofing

tu id=”9”

tu id=”9”

DB leveraged match

DB

xml tm non translatable text
xml:tm non-translatable text

Exact Matching

Matched Target Document

Updated Source Document

tu id=”1”

tu id=”1”

requires no translation

tu id=”2”

tu id=”2”

non translatable

non trans

tu id=”3”

tu id=”3”

tu id=”4”

tu id=”4”

requires translation

tu id=”7”

tu id=”7”

fuzzy match

tu id=”6”

tu id=”6”

requires proofing

doc leveraged match

tu id=”8”

tu id=”8”

new:same

requires proofing

tu id=”9”

tu id=”9”

DB leveraged match

DB

traditional translation scenario
Traditional Translation Scenario

Publishing

Translation

Extracted text

source text

source text

tm process

extract

Prepared text

target text

merge

target text

target text

Translated text

QA

Translate

xml tm translation scenario

extract

perfect matching

merge

xml:tm Translation Scenario

Publishing

leveraged matching

xml:tm source text

Extracted text

XLIFF

file

tm process

Automatic Process

Web service/ interface

Web

QA

Translate

Translator

xml:tm target text

Automatic Process

xml tm benefits
xml:tm benefits
  • Open Standard donated by XML INTL to LISA
  • Complements DITA
  • Enterprise level scalability
  • Totally integrated within the XML framework
  • Source text is automatically extracted and matched
  • Word counts are controlled by the customer
  • Text can be presented for translation via the web
  • Data is merged automatically at end of translation cycle
  • All memory operations are totally automated
  • Can be used transparently for relay translations
  • More accurate – better matching
xml tm1
xml:tm
  • Full specification:
    • http://www.xml-intl.com/docs/specification/xml-tm.html
  • Maintained by xml-intl.com
    • http://www.xml-intl.com/dtd/tm.dtd
    • http://www.xml-intl.com/dtd/tm.xsd
  • Detailed article on xml:tm in www.xml.com
  • Donated by XML INTL to Lisa OSCAR
xml intl contact details
XML INTL Contact Details
  • Postal address:

PO Box 2167

Gerrards Cross

Bucks SL9 8XF

United Kingdom

ad