1 / 67

Global User-Generated Content: The Final Localization Frontier

Global User-Generated Content: The Final Localization Frontier. Merle Tenney. Agenda. Dimensions of Content Dimensions of Translation Global Language Tools UGC Translation Practices Best Current Practices Global UGC Desiderata Call to Action. Dimensions of Content. UGC Pre–Web 2.0.

maire
Download Presentation

Global User-Generated Content: The Final Localization Frontier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Global User-Generated Content: The Final Localization Frontier Merle Tenney

  2. Agenda • Dimensions of Content • Dimensions of Translation • Global Language Tools • UGC Translation Practices • Best Current Practices • Global UGC Desiderata • Call to Action

  3. Dimensions of Content

  4. UGC Pre–Web 2.0 • 1965 Mainframe-based email, instant messaging • 1969 ARPANET • 1978–79 Bulletin board systems, discussion forums (Usenet) • 1983 Internet (TCP/IP) • 1991 World Wide Web (HTTP) • 1993–2003 Blogging, social network services, user classifieds, user auctions, wikis, social bookmarking, photo sharing

  5. UGC Post–Web 2.0 • 2004 Tim O’Reilly, John Battelle, Dale Dougherty define Web 2.0 • Leaders: Yahoo! Groups (1998), MySpace (2003), LinkedIn (2003) • 2004 Facebook • 2005 YouTube • 2006 Twitter • 2009 Foursquare

  6. Content Types • Managed content (MC) • Semi-managed content (SMC) • User-generated content (UGC) • Individual content • Community content • Computer-mediated communication (CMC)

  7. Managed Content • Authors: professional communicators (information developers) • Examples: user interfaces, user assistance, technical documentation; marcom materials, newsletters; web pages, institutional blogs • Requirements: institutional voice, subject matter expertise, polished writing • Tools: content management systems, publishing systems, office applications, blogging software

  8. Semi-Managed Content • Authors: information workers • Examples: technical reports, design documents; technotes, knowledge base articles, technical blogs, industry discussion lists • Requirements: technical expertise, effectual writing • Tools: content management systems, office applications, social network services, blogging software

  9. User-Generated Content • Authors: users and communities • Examples: user profiles, blogs, discussion lists, wikis, reviews, ratings, tags, classifieds, auction listings, user documents, user multimedia • Requirements: informed opinion, interesting content, effectual writing • Tools: office applications, wiki software, social network services, blogging software, classified ad systems, customer feedback forums

  10. Computer-Mediated Communication • Authors: everyone • Examples: emails, microblogs (tweets), direct messages, status updates, SMS messages (texting) , instant messages, chat sessions • Requirements: interesting message, succinct, comprehensible writing • Tools: email, blogging, microblogging, instant messaging, chat rooms, social network services, e-commerce, virtual worlds, online games

  11. Content Pyramid

  12. Content Structure • Structured content • Semi-structured content • Unstructured content

  13. Structured Content • Description: content taken from a closed set of values specified by developers, such as list values, numbers, and related data types • Examples: numerical data, structured keywords, taxonomies, values, lists (ratings, dates, gender, marital status, language, country, etc.) • Translation: no translation per se; language-neutral data, multilingual textual expressions of underlying data handled by UI localization or locale-based data formatting

  14. Semi-Structured Content • Description: content taken from a constrained and self-organizing but not closed set of values developed by users • Examples: user classifications, common search terms, user keywords, tag clouds, folksonomies • Translation: specialized bilingual terminology, with fallback to machine translation as needed

  15. Unstructured Content • Description: open, unconstrained user text • Examples: wikis, articles, blogs, discussions, reviews, chats, instant messages, emails • Translation: machine translation in pull contexts, including cross-language search; computer-aided translation in push contexts

  16. Content Forms • Text • Graphics • Audio • Video • Virtual reality • Location-based services

  17. Nontextual Content Forms • Integrated text • Titles, legends, labels, callouts, subtitles, transcriptions, text layers, text tracks • Associated text • Metadata, tags, comments • Accessibility text • alt, longdesc attributes

  18. Dimensions of Translation

  19. Global Content Creation • Zero translation (ZT) • Machine translation (MT) • Human translation (HT) • Transcreation (TC) • Original content (OC)

  20. Translation Modes • Machine translation • Unedited MT • Translation wiki • Human translation • Volunteer translators • Users & Friends • Community • Paid translators • Semi-professional • Professional

  21. Translation Cost

  22. Individual UGCTranslation Modes Individual UGC

  23. Community UGCTranslation Modes Community UGC

  24. Composite ContentTranslation Modes

  25. Push and Pull Translation Frameworks • Differences in applications and translation requirements • Push mode content translation • Proactive, for anticipated demand • Reactive, for attested demand

  26. Push & Pull Translation Comparison 10/15/2008 Web 2.0 Globalization – Merle Tenney 26

  27. Push & Pull Translation Comparison Web 2.0 Globalization – Merle Tenney

  28. Global Language Tools

  29. Global Content Creationand Translation • Authoring and editing • Automatic translation • Computer-aided translation

  30. Authoring and Editing • Spelling checkers • Style and grammar checkers • Language compliance checkers • Intelligent content reuse/authoring memory • Electronic references • Explanatory dictionaries • Thesauri • Bilingual dictionaries • Style guides

  31. Automatic Translation (AT) • AT > MT (machine translation) • AT ≥ MTM (machine translation + translation memory) • Translation pre-editing tools (language compliance checker + authoring memory) • Automatic text categorization (for selection of terminologies and translation memories) • Translation memory (TM) • Machine translation (MT)

  32. Computer-Aided Translation (CAT) • SL & TL text fields • Translation tools • Machine translation • Translation memory • Translation search • Terminology access • TL authoring and editing tools • General authoring and editing tools • Translation QA and translation post-editing tools • Translation leveraging updates • Terminology updates • Translation memory updates

  33. UGC Translation Challenges

  34. Problems with UGC — Low Quality • Terse, ungrammatical constructions • nonstandard CAPITALIZATION • Missing, creative punctuation • Accidental, intentional misspellings • Nonstandard diction—colloquial abbreviations & acronyms, leetspeak, emoticons

  35. Problems with UGC — Intrinsic Characteristics • Cryptic, clipped style (chats, IMs, tweets) • Conversational style • Diverse term variants • Wide range of lexicon • Frequent neologisms

  36. Solutions for Problematic UGC — Low Quality • Better writing, self-editing • Editing by others (designated content agents) • Authoring and editing tools • Translation pre-editing tools • Dialect translation tools

  37. Solutions for Problematic UGC — Intrinsic Characteristics • MT based on leveraged resources produced as by-product of CAT translations • Terminologies and translation memories based on automatic text categorization • Continued improvement in pull (MT) translation environments dependent on quantity and quality of effort in related push (CAT) translation environments • Ergo, need to support push translation environments and aggregated, quality-controlled user, community, and professional translation resources

  38. Best Current Practices

  39. UGC Translation • Push translation implementations • Google Translator Toolkit • Pull translation implementations • Outlook email translation (PROMT) • Mojofiti blog translation (Google) • eBay listing translation (SYSTRAN) • Translation viewers • Unedited MT (Microsoft) • Translation Wiki (Microsoft)

  40. Google Translator Toolkit

  41. Outlook Email Translation

  42. Mojofiti Blog Translation

  43. eBay Listing Translation

  44. Translation Viewers • Bilingual text display in web browser or document editor • Translation views—different strokes for different folks • Single-language view (SL or TL) • Original or translated content, with rollover display of corresponding sentence from translated or original content • Dual-language view (SL and TL) • Side-by-side or over-and-under display of original and translated content, with synchronized scrolling and sentence highlighting

  45. Global UGC Infrastructure Bing Translator Source Text Rollover Mode

  46. Global UGC Infrastructure Bing Translator Target Text Rollover Mode

  47. Global UGC Infrastructure Bing Translator Side-by-Side Mode

  48. Global UGC Infrastructure Bing Translator Over-and-Under Mode

  49. Bing Translation Wiki

  50. Global UGC Desiderata

More Related