1 / 23

Inline Markup in XLIFF 2.0

Inline Markup in XLIFF 2.0. Fredrik Estreen - Lionbridge Yves Savourel - ENLASO. Disclaimer. While we believe the information presented here is pretty stable, but it only reflects the general consensus of the sub-committee working on the inline markup.

amalia
Download Presentation

Inline Markup in XLIFF 2.0

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inline Markup in XLIFF 2.0 Fredrik Estreen - Lionbridge Yves Savourel - ENLASO

  2. Disclaimer While we believe the information presented here is pretty stable, but it only reflects the general consensus of the sub-committee working on the inline markup. Things may change during the formal approval by the sub-committee and later when it goes through the process of review and approval from the main XLIFF TC.

  3. Agenda • Principles and Background • Inline Markup • Characters that are invalid in XML • Native Codes • Annotations • Extensions • Processing requirements • XLIFF Toolkit

  4. Some Principles Some of the guidelines we are trying to follow during the work: • Try to have only one way to do one thing • Provide processing requirements • Try to re-use existing standards when possible • Try to keep things simple

  5. Containing Structure The structural part of XLIFF changes in 2.0 and the inline markup should be easy to handle in the new model. • Static structure • <file> -> <group>* -> <unit> • Contents of the concatenated <source> elements remain static during processing • Dynamic structure inside <unit> • <segment>, <ignorable> -> <source>, <target> • A processor may merge or split the contents of segments or ignorable.

  6. What's the Inline Markup? The inline markup is what's inside the <source> and <target> elements • Characters that are invalid in XML • Original inline codes • Annotations

  7. Inline codes and segmentation • Inline codes belong to the <unit> and not to the <segment>(s) • ID uniqueness within the <unit> • Allows simple re-segmentation of the content of <unit> • No need to clone codes that span multiple segments

  8. Characters that are Invalid in XML For example control characters are not allowed in XML content, so they cannot be stored as-it in XLIFF. <cp hex="0007"/> represents U+0007 (the "bell" character) - Same as Unicode LDML format - Only characters invalid in XML must use this notation.

  9. Inline Codes • Support any type of native markup • Standalone: <ph/> • Spanning: <pc> and <sc/> + <ec/>

  10. Inline Codes - Use Cases All possible cases: Standalone code <ph id='1'/> Well-formed spanning code <pc id='1'>text</pc> Start marker of spanning code <sc id='1'/> End marker of spanning code <ec rid='1'/> Orphan start marker of spanning code <sc id='1' isolated='yes'/> Orphan end marker of spanning code <ec id='1' isolated='yes'/>

  11. Inline Codes - Storage of Original • No storage: <source>A<ph id="1"/>B</source> • Store, but only outside the segment: <source>A<ph id="1" nid="d1"/>B</source> <originalData> <data id="d1">&lt;BR></data> </originalData>

  12. Annotations <mrk> for well-formed constructs <sm/> + <em/> otherwise Attributes: • id (required) • type (default=generic) • translate (yes or no, default=yes) • ref (optional type-specific URI) • value (optional type-specific text/data)

  13. Annotations Types • Translate annotations • Term annotations • Comment annotations • Custom annotations The IDs link the same annotation in source and target if needed.

  14. Translate Annotation • To protect (or not) a span of content: <mrk id="1" translate="no">content</mrk> Note that translate can also be used with other types of annotations.

  15. Term Annotation • To denote a "term": <mrk id="1" type="term" value="simple definition" ref="reference to more info">content</mrk> The id links source and target if needed

  16. Comment Annotation • Simple: <source><mrk id="1" type="comment" value="The text of the comment">content</mrk></source> • With associated note: <source><mrk id="1" type="comment" ref="#n1">content</mrk></source> <notes> <note id="n1">Text of the note</note></notes>

  17. Custom Annotation • User-defined annotation: - The type attribute = <prefix>:<userType> - The meanings of the value and ref attributes are defined by the user. <mrk id="1" type="myPrefix:isbn" value="978-0-14-44919-8">The Epic of Gilgamesh</mrk>

  18. Extensions • A few attributes can take user-defined values: e.g. mrk@type, ph@type, pc@type • No additional attributes are allowed in any of the inline elements • No additional elements are allowed inside <source>, <target> or <data> Custom annotations are essentially the only way to extend markup inside the inline content.

  19. Processing Requirements • Allowed markup transforms and related attribute mapping. Between <pc> and <sc>,<ec> pair. • Define requirements for creation and editing of target text. • Rules on cloning markup with and without reference to native data • Stricter rules on attributes and ID references • How to handle segmentation changes

  20. XLIFF Toolkit - A Library and More • Java-based and open source (LGPL) • http://code.google.com/p/okapi-xliff-toolkit/ • Stream-based rather than DOM to handle very large documents • Reader is event-driven • Unit available as single object • Writer also available

  21. Library - Reading a Document XLIFFReader reader = new XLIFFReader(); reader.open(new File("myInput.xlf")); while ( reader.hasNext() ) { XLIFFEvent event = reader.next(); if ( event.getType() == XLIFFEventType.TEXT_UNIT ) { Unit unit = event.getUnit(); // Do something with the unit } } reader.close();

  22. Library - Updating a Document XLIFFReader reader = new XLIFFReader(); XLIFFwriter writer = new XLIFFWriter(); reader.open(new File("myInput.xlf"));writer.create(new File("myOutput.xlf")); while ( reader.hasNext() ) { XLIFFEvent event = reader.next(); if ( event.getType() == XLIFFEventType.TEXT_UNIT ) { Unit unit = event.getUnit(); // Do something with the unit } writer.write(event); } reader.close(); writer.close();

  23. Q & A Useful links • Read the latest Editor's Draft:https://wiki.oasis-open.org/xliff/ • Comment or ask questions in the mailing lists:https://lists.oasis-open.org/archives/xliff-comment/https://lists.oasis-open.org/archives/xliff-users/ • Try out the toolkit:http://code.google.com/p/okapi-xliff-toolkit/

More Related