1 / 33

Language Tags and Locale Identifiers

Language Tags and Locale Identifiers. A Status Report. Presenter and Agenda. Addison Phillips Internationalization Architect, Yahoo! Co-Editor, Language Tag Registry Update (LTRU) Working Group (RFC 3066bis, draft-matching) Language tags Locale identifiers. Languages? Locales?.

Download Presentation

Language Tags and Locale Identifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Language Tagsand Locale Identifiers A Status Report

  2. Presenter and Agenda • Addison Phillips • Internationalization Architect, Yahoo! • Co-Editor, Language Tag Registry Update (LTRU) Working Group (RFC 3066bis, draft-matching) • Language tags • Locale identifiers

  3. Languages? Locales? What’s a language tag? What the #@&%$ is a locale? Why do identifiers matter?

  4. Language Tags • Enable presentation, selection, and negotiation of content • Defined by BCP 47 • Widely used! XML, HTML, RSS, MIME, SOAP, SMTP, LDAP, CSS, XSL, CCXML, Java, C#, ASP, perl………. • Well understood (?)

  5. Locale Identifiers • Different ideas: • Accept-Locale vs. Accept-Language • URIs/URNs, etc. • CLDR/LDML • And Requirements: • Operating environments and harmonization • App Servers • Web Services • New Solution? Cost of Adoption: • UTF-8 to the browser: 8 long years

  6. In the Beginning Received Wisdom from the Dark Ages • Locales: • japanese, french, german, C • ENU, FRA, JPN • ja_JP.PCK • AMERICAN_AMERICA.WE8ISO8859P1 • Languages… … looked a lot like locales (and vice versa)

  7. Locales and Language Tags meet • Conversations in Prague… • Language tags are being locale identifiers anyway… • Not going to need a big new thing… • Just a few things to fix… … we can do this really fast

  8. BCP 47 Basic Structure • Alphanumeric (ASCII only) subtags • Up to eight characters long • Separated by hyphens • Case not important (i.e. zh = ZH = zH = Zh) 1*8alphanum * [ “-” 1*8 alphanum ]

  9. RFC 1766 zh-TW ISO 639-1 (alpha2) ISO 3166 (alpha2) i-klingon Registered value

  10. RFC 3066 sco-GB ISO 639-2 (alpha 3 codes) But use… eng-GB X alpha 2 codes when they exist

  11. Problems • Script Variation: • zh-Hant/zh-Hans • (sr-Cyrl/sr-Latn, az-Arab/az-Latn/az-Cyrl, etc.) • Obsolence of registrations: • art-lojban (now jbo), i-klingon (now tlh) • Instability in underlying standards: • sr-CS (CS used to be Czechoslovakia…

  12. And More Problems • Lack of scripts • Little support for registered values in software • Reassignment of values by ISO 3166 • Lack of consistent tag formation (Chinese dialects?) • Standards not readily available, bad references • Bad implementation assumptions • 1*8 alphanum *[ “-” 1*8 alphanum] • 2*3 ALPHA [ “-” 2ALPHA ] • Many registrations to cover small variations • 8 German registrations to cover two variations

  13. LTRU and “draft-registry” • Defines a generative syntax • machine readable • future proof, extensible • Defines a single source • Stable subtags, no conflicts • Machine readable • Defines when to use subtags • (sometimes)

  14. RFC 3066bis and LTRU sl-Latn-IT-rozaj-x-mine ISO 639-1/2 (alpha2/3) ISO 15924 script codes (alpha 4) ISO 3166 (alpha2) or UN M49 Registered variants (any number) Private Use and Extension

  15. More Examples • es-419 (Spanish for Americas) • en-US (English for USA) • de-CH-1996 (Old tags are all valid) • sl-rozaj-nedis (Multiple variants) • zh-t-wadegile (Extensions)

  16. Benefits • Subtag registry in one place: one source. • Subtags identified by length/content • Extensible • Compatible with RFC 3066 tags • Stable: subtags are forever

  17. Problems • Matching • Does “en-US” match “en-Latn-US”? • Tag Choices • Users have more to choose from. • Implementations • More to do, more to think about • (easier to parse, process, support the good stuff)

  18. Tag Matching • Uses “Language Ranges” in a “Language Priority List” to select sets of content according to the language tag • Four Schemes • Basic Filtering • Extended Filtering • Scored Filtering • Lookup

  19. Filtering • Ranges specify the least specific item • “en” matches “en”, “en-US”, “en-Brai”, “en-boont” • Basic matching uses plain prefixes • Extended matching can match “inside bits” • “en-*-US”

  20. Scored Filtering • Assigns a “weight” or “score” to each match • Result set is ordered by match quality • Postulated by John Cowan

  21. Lookup • Range specifies the most specific tag in a match. • “en-US” matches “en” and “en-US” but not “en-US-boont” • Mirrors the locale fallback mechanism and many language negotiation schemes.

  22. What Do I Do (Content Author)? • Not much. • Existing tags are all still valid: tagging is mostly unchanged. • Resist temptation to (ab)use the private use subtags. • Unless your language has script variations: • Tag content with the appropriate script subtag(s) • Script subtags only apply to a small number of languages: “zh”, “sr”, “uz”, “az”, “mn”, and a very small number of others.

  23. What Do I Do (Programmer)? • Check code for compliance with 3066bis • Decide on well-formed or validating • Implement suppress-script • Change to using the registry • Bother infrastructure folks (Java, MS, Mozilla, etc) to implement the standard

  24. What Do I Do (End-User)? • Check and update your language ranges. • Tag content wisely.

  25. LTRU Milestone Dates • (Done) RFC 3066bis • Registry went live in December 2005 • Produce “Matching” RFC • Draft-11 available (WG Last Call started… Monday) • (Anticipated) Produce RFC 3066ter • This includes ISO 639-3 support, extended language subtags, and possibly ISO 639-6

  26. Things to Read • Registry Draft http://www.inter-locale.com http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-12.txt • Matching Draft http://www.inter-locale.com • LTRU Mailing List https://www1.ietf.org/mailman/listinfo/ltru

  27. Things to Do (languages) • Get involved in LTRU • Get involved in W3C I18N Core WG! • Write implementations • Work on adoption of 3066bis: understand the impact • Then get involved with Locale identifiers…

  28. Back to Locales… • IUC 20 Round Table • Suzanne Topping’s Multilingual Article • Tex Texin and the Locales list…

  29. Locale Identifiers and Web Services

  30. W3C and Unicode • W3C • Identifiers and cross-over with language tags • Web services • XML, HTML • Unicode Consortium • LDML • CLDR • Standards for content

  31. “Language Tags and Locale Identifiers” SPEC • First Working Draft coming soon • URIs? • Simple tags?

  32. WS-I18N SPEC • First Working Draft now available: • http://www.w3.org/TR/ws-i18n

  33. Ideas?

More Related