Cldr the common locale data repository locales for the world
This presentation is the property of its rightful owner.
Sponsored Links
1 / 36

CLDR: The Common Locale Data Repository Locales for the World PowerPoint PPT Presentation


  • 43 Views
  • Uploaded on
  • Presentation posted in: General

CLDR: The Common Locale Data Repository Locales for the World. Lisa Moore George Rhoten Mark Davis Steven Loomis. Agenda. Why CLDR? CLDR data Tools and vetting Today and the future. Agenda. Why CLDR? CLDR data Tools and vetting Today and the future.

Download Presentation

CLDR: The Common Locale Data Repository Locales for the World

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cldr the common locale data repository locales for the world

CLDR:The Common Locale Data RepositoryLocales for the World

Lisa Moore

George Rhoten Mark Davis Steven Loomis


Agenda

Agenda

  • Why CLDR?

  • CLDR data

  • Tools and vetting

  • Today and the future

LRC – XI The Localisation Factory


Agenda1

Agenda

  • Why CLDR?

  • CLDR data

  • Tools and vetting

  • Today and the future

LRC – XI The Localisation Factory


Locales does anything stay the same

Locales – does anything stay the same?

"Theatre Center News: Thedate of the last version of this document was 2003年3月20日. A copy can be obtained for$50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors(in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt."

LRC – XI The Localisation Factory


Locales the many differences

Locales – the many differences

  • Locales specify user preferences

  • Linguistic and cultural differences

    • Languages, scripts, writing systems, ordering, directionality, formatting, numbers, sizes

  • Even in the same locale, interoperability issues across platforms

  • Global economics has increased the need for greater globalization support in computer systems

  • Everyone expects more!

LRC – XI The Localisation Factory


Add the universal character encoding

Add the Universal Character Encoding

  • Unicode: Unique character codes for all languages

LRC – XI The Localisation Factory


The need for common locale data

The Need for Common Locale Data

  • Computing environments often contain a variety of operating systems and software.

  • Historically locale sensitive data research has been done by individuals and/or companies.

  • Because of political changes, it is easy for locale data to become out of date.

  • It is difficult to get complete agreement on correctness.

LRC – XI The Localisation Factory


Common locale data project

Common Locale Data Project

  • Began as Common XML Locale Repository (CXLR) developed by OpenI18N in 2003

  • CLDR project began in 2004

  • Hosted by Unicode Consortium

    • http://www.unicode.org/cldr/

  • Goals:

    • Common, necessary software locale data for all world languages

    • Collect and maintain locale data

    • XML format for effective interchange

    • Freely available

LRC – XI The Localisation Factory


Cldr in use partial list

CLDR in use (partial list)

  • Libraries and Environments

    • ICU – International Components for Unicode

    • JDK – Java Development Kit

  • Operating Systems

    • Solaris

    • AIX

    • MacOS X

  • Applications

    • OpenOffice.org

    • Acrobat

    • ModernBill

LRC – XI The Localisation Factory


Agenda2

Agenda

  • Why CLDR?

  • CLDR data

  • Tools and vetting

  • The future

LRC – XI The Localisation Factory


What is a locale

What is a Locale?

  • A locale is an identifier referring to linguistic and cultural preferences

    • en_US, en_GB, ja_JP

  • These preferences can change over time due to cultural and political reasons

    • Introduction of new currencies, like the Euro

    • Standard sorting of Spanish changes

  • Many of these preferences have varying degrees of standardization

    • 12 and 24 hour format in the United States

  • This is a very broad topic

LRC – XI The Localisation Factory


Types of locale data

Types of Locale Data

  • Dates/time/calendar formats

  • Number/currency formats

  • Measurement system

  • Collation specification

    • Sorting

    • Searching

    • Matching

  • Translated names for language, territory, script, timezones, currencies,…

  • Script and characters used by a language

LRC – XI The Localisation Factory


Locale data markup language

Locale Data Markup Language

  • Locale data described using XML

  • CLDR data uses LDML

  • Structure of CLDR controlled by Locale Data Markup Language (LDML) specificationhttp://unicode.org/reports/tr35

LRC – XI The Localisation Factory


Ldml data categories

LDML Data Categories

<ldml>

<identity>

<localeDisplayNames>

<layout>

<characters>

<delimiters>

<measurement>

<dates>

<numbers>

<posix>

<collations>

LRC – XI The Localisation Factory


Names

Names

<localeDisplayNames>

  • Provides translated display names for languages, territories, scripts, variants and keywords used in CLDR.

  • Most of this information is at the language level, since it typically does not vary by territory, only language.

  • An example: ICU Locale Explorer

LRC – XI The Localisation Factory


Names examples

Names Examples

From ga.xml (Irish):

<localeDisplayNames>

<languages>

<language type="aa">Afar</language>

<language type="ab">Abcáisis</language>…

<scripts>

<script type="Arab">Araibis</script>…

<territories>

<territory type="AD">Andóra </territory>

<territory type="AE">Aontas na nÉimíríochtaí Arabacha

</territory>…

LRC – XI The Localisation Factory


Characters

Characters

<characters>

  • Allows for creation of exemplar character sets. An exemplar set specifies the set of characters that must be present in order to properly render the language.

  • Auxiliary exemplar set defines additional characters that may appear in foreign words or phrases.

  • Lower case only

LRC – XI The Localisation Factory


Date formats

Date Formats

<dates>

  • Defines representation of calendars using various calendaring systems (Gregorian, Buddhist, Islamic, Japanese, etc.)

  • Defines formatting for dates, times, eras and time zones

    • wide, abbreviated, or narrow

    • Date and time formats use patterns of letters to define proper formatting

  • Week information

  • Relative day/time translations (for example, yesterday, tomorrow, etc. )

  • An example: ICU Locale Explorer

LRC – XI The Localisation Factory


Characters dates examples

Characters / Dates Examples

From ga.xml (Irish):

<characters>

<exemplarCharacters> [a á b-e é f-i í j-o ó p-u ú v-z]

</exemplarCharacters>

<exemplarCharacters type="auxiliary"> [ḃ ċ ḋ ḟ ġ ṁ ṗ ṡ ṫ]</exemplarCharacters>

</characters>…

<dayContext type="format">

<dayWidth type="abbreviated">

<day type="sun">Domh</day>

<day type="mon">Luan</day>…

LRC – XI The Localisation Factory


Time zone names

Time Zone Names

<timeZoneNames>

  • Based on Olson time zone database

  • Localized display names for standard, daylight, and generic representations of time zones.

  • Short and long display names.

LRC – XI The Localisation Factory


Numbers

Numbers

<numbers>

  • Specifies proper localized formatting of numeric quantities

    • Decimal

    • Scientific

    • Currency

    • Percentages

  • Includes localized decimal, thousands separators, currency symbols, etc.

LRC – XI The Localisation Factory


Time zones currencies

Time Zones / Currencies

From ga.xml (Irish) and root.xml:

<timeZoneNames>

<zone type="Europe/Dublin">

<long>

<standard>Meán-Am Greenwich</standard>

<daylight>AmSamhraidh na hÉireann</daylight>

</long>…

<numbers>

<currencies>

<currency type=“EUR">

<displayName>Euro</displayName>

<symbol>€</symbol>…

LRC – XI The Localisation Factory


Delimiters

Delimiters

<delimiters>

  • Specifies a primary and secondary of delimiter characters to be used for bracketing quotations in text

LRC – XI The Localisation Factory


Delimiters example

Delimiters Example

From fr.xml (French):

<delimiters>

<quotationStart>«</quotationStart>

<quotationEnd>»</quotationEnd>

<alternateQuotationStart>“</alternateQuotationStart>

<alternateQuotationEnd>”</alternateQuotationEnd>

</delimiters>

LRC – XI The Localisation Factory


Collation

Collation

<collations>

  • Information in collation directory, not main

  • XML version of Java/ICU collation syntax

  • Unicode collation algorithm is the base http://unicode.org/reports/tr10

  • Allows tailoring of the UCA on a per locale basis.

LRC – XI The Localisation Factory


Collation example

Collation Example

From collations/root.xml:

<collations validSubLocales="ga ga_IE id id_ID ms ms_BN ms_MY nl nl_BE nl_NL pt pt_BR pt_PT">

<collation type="standard">

<rules>

...

<s>ā</s>

<t>Ā</t>

<s>á</s>

<t>Á</t>

<s>ǎ</s>

<t>Ǎ</t>

<s>à</s>

<t>À</t>…

LRC – XI The Localisation Factory


Agenda3

Agenda

  • Why CLDR?

  • CLDR data

  • Tools and vetting

  • Today and the future

LRC – XI The Localisation Factory


Cldr tools

CLDR Tools

  • Export

    • ICU resource bundle generation

    • POSIX locale generator

    • openOffice.org format export

  • Survey tool

    • http://www.unicode.org/cgi-bin/cldr-survey

LRC – XI The Localisation Factory


Vetting process for data

Vetting Process for Data

  • Collect from different platforms, experts, submissions: new or revised

    • References to external sources strongly encouraged

    • Must be before freeze date for release

    • Use Survey Tool to Collect Data

LRC – XI The Localisation Factory


Causes of conflicting data

Causes of Conflicting Data

  • Typographical errors

    • Canda instead of Canada

  • Regional differences

    • German spelling is different between countries

  • Parts of speech

    • “март 2004” versus “3 марта” when the Russian word for March is used in a date

  • Context of usage

    • Normal German sorting versus German phonebook sorting

  • Standards versus common use

    • “Republic of Laos” versus “Laos”

  • Individual preferences

    • 24 hour time format versus 12 hour time format

LRC – XI The Localisation Factory


Agenda4

Agenda

  • Why CLDR?

  • CLDR data

  • Tools and vetting

  • Today and the future

LRC – XI The Localisation Factory


Latest release cldr 1 4

Latest Release: CLDR 1.4

  • Released: July 17, 2006

  • 360 locales:

    • 121 languages

    • 142 territories

  • 25% more data

  • 17,000 new or modified data items

  • Over 100 different contributors

LRC – XI The Localisation Factory


Challenges

Challenges

  • Complex Formats

  • Experts knowledgeable both in technology and a specific language

    • Collation

    • Exemplar characters

    • Etc…

  • Require close interaction of CLDR experts with language experts

LRC – XI The Localisation Factory


Getting involved

Getting Involved

  • Simplest – anyone!

    • Use CLDR

    • Bug report / feature request

  • More Involved

    • Vetting, Assessment, Tools, Policies, Decisions, …

    • Any Unicode member eligible to name representatives including country liaison members

LRC – XI The Localisation Factory


Example country process finland

Example Country Process (Finland)

  • Finnish Ministry of Education made CLDR data a major goal, 2004-06

    • Research Institute for the Languages of Finland (“RILF” aka “Kotus”) designated agency

    • Two official languages (Finnish and Swedish) & four regional / minority languages (three Sámi & Romani as spoken in Finland) to be covered

    • Over 30 different parties represented: commercial, non-commercial, individuals

    • Results expected to lead to new/revised national standards

LRC – XI The Localisation Factory


For more information

For More Information

  • Unicode

    • http://www.unicode.org/

  • CLDR

    • http://www.unicode.org/cldr/

  • LDML specification

    • http://unicode.org/reports/tr35

  • [email protected]

LRC – XI The Localisation Factory


  • Login