1 / 47

Presented By: Prof. Manikrao L. Dhore Mr. Abhishek K. Dhote Department of Computer Engineering

LRC-XI-11 th Annual Internationalisation and Localisation Conference. A Paper On Automating the HTML Localisation Process: An Implementation Using a Java Internationalisation Approach. Presented By: Prof. Manikrao L. Dhore Mr. Abhishek K. Dhote Department of Computer Engineering

minor
Download Presentation

Presented By: Prof. Manikrao L. Dhore Mr. Abhishek K. Dhote Department of Computer Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LRC-XI-11th Annual Internationalisation and Localisation Conference A Paper On Automating the HTML Localisation Process: An Implementation Using a Java Internationalisation Approach Presented By: Prof. Manikrao L. Dhore Mr. Abhishek K. Dhote Department of Computer Engineering Vishwakarma Institute of Technology, Pune, India Organised By: Localisation Research Centre (LRC), Department of Computer Science and Information Systems (CSIS), University of Limerick,Limerick,Ireland.

  2. Agenda • Introduction • Why Web Page Localisation? • Borderless Integration • Why Multilingual Web Sites? • What is Locale and multi-locale Operation? • Internationalisation and Key Challenges • I18n Standard: Important Issues and Business Context • Variance : Regional and Cultural Issues • System Design • Web Localisation and Rural India • Localization Approaches • Architecture of Servers • System Implementation and Test Results • Configuration of Server • Localisation Test Results • Alternative Approach • Conclusion • References

  3. Service Sector Online Business Banking Sector Why Web Page Localisation? International Market and Customers Web Localisation Internet • Increased Sales Leads • Advantage of Global growth • Reduce Marketing Costs Information Repository Closed Linguistic Barriers Open Linguistic Barriers Objective Information Convenience

  4. Borderless Integration Model Business Process Local Business Entities Customer Integration Logic Resource Mapping Global Global Integration Deployment Business Logic Market Research Analyse Optimize Process Internet Framework

  5. Why Multilingual Websites? • Over 100 million people access the Internet in a language other than English. • Over 50% of web users speak native language other than English • According to Forrester research, 50% of all online sales are expected to occur outside USA. • Web users are four times more likely to purchase from a site that communicates in the customer’s native language. “Your website is your window to the world…”

  6. Basic Terminology • Locale • Set of features that can be varied depending on the language and culture of the user or the data • Internationalisation • The process of designing software so that it can be easily adapted to different locales • Localisation • The process of adapting software to a locale

  7. What is Locale? • A locale is an abstraction: a data processing structure that identifies a collection of culturally and linguistically affected preferences. • Java locales are associated with upwards of 300 pieces of data • time zone names • collation sequences • the infinity symbol • Number formats • Days of the week • Locales generally do not contain this data themselves. They represent a way of obtaining “localized behavior” in the system. • Locales are generally part of the programming context or environment.

  8. Client Locale Client Locale Message Passing Message Passing Logic Execution Logic Execution Multi-Locale Operation Server Processes System Context Context Separation Design Policy APIs provide late bindinglocalisation

  9. Internationalisation • "I18n" is an abbreviation for the word "Internationalisation". The term "i18n" is derived from its spelling as the letter "i" plus 18 letters plus the letter "n". I+n1t2e3r4n5a6t7i8o9n10a11l12i13s14a15t16i17o18+n • The extension of this naming convention to the terms Localisation (l10n), Europeanisation (e13n), Japanisation (j10n), Globalisation (g11n), seemed to come somewhat after the invention of "i18n". • Potentially handle multiple languages, customs in the world • Displaying/ Inputting characters for the users' native languages. • Handling popular encoding for the users' native languages. • Native characters for file names and other items. • Character classification & sorting. • Typesetting and hyphenation rules.

  10. Standards Encoding and Character Set • Unicode support and implementation • Use of language specific encoding • Configuring encoding Locale and Parameterisation • Availability, Performance • Continuity of i18n features • Translation Data Correspondence • UI design • Handling collation • Migration of existing data Presentation, Processing Reference Information Key Challenges

  11. Character encodings Date/Time Culture context Language rules UI preferences Currency Content management Localization Business impact Important Issues in I18n

  12. To improve effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces To reach out to global customer base by providing language/culture specific interfaces and allow for international preferences. Internationalisation New Application New Service New Product Mergers / Acquisitions. To consolidate same functionality application/service developed and maintained separately for separate language/region. Old Product Old Application Existing Service To support region specific functionality (due to legal aspects, financial practice etc.). To provide region specific value added services (like UI, look and feel, Sorting/Searching). Business Context of I18n

  13. Regional and Cultural Differences • Software solutions should be designed to fit into the cultural context of the user • Examples • Naming of the product • Differences in the meanings of jargons • Confusing graphical symbols • National rules, conventions • Religious beliefs and assumptions • Basic cultural values and customs • No appropriate translations available for phrases and slogans • Favorite sports and slangs • cultural anachronisms • Reading left-to-right, top-to-bottom etc…

  14. Language and Character Encoding • Language peculiarities • Hyphenation • Collation • Spelling • Transliteration English: ABC...RSTUVWXYZ German: AÄB...NOÖ...SßTUÜV…YZ Swedish/Finnish: AB...STUVWXYZÅÄÖ Norwegian: AB…VWXYÜZÆØÅ • There are various “standards” and they are varied for different languages • ISO standards: ISO-8859-1,2,3,4,5,6,7, Windows-1252 • Chinese encodings: Big5, Big5-HKCS, GB18030, GB2312 • Japanese and Korean: EUC-JP,EUC-KR, ISO-2022-JP, ISO-2022-KR

  15. Unicode Character Standard • Developed by the Unicode Consortium • Covers all major living scripts • Version 4.0 has 96,000+ characters • Capacity for 1 million+ characters • Unicode Character Set = ISO 10646 • Unicode adds character properties and algorithms • ISO and Unicode work together to synchronize • ISO support enhances international acceptance

  16. Date / Time Formats Variance • Hour minute separators,AM,PM,TimeZone • India : 4:00 P.M. • U.S.A. : 4:00 p.m. • France : 16.00 • Japan : 1600 • Japan : 4:00

  17. Numbers / Currency Variance • Varieties in group and fractional separators • India : 12,34,567.89 • England : 12,345.67 • Germany : 12.345,67 • Switzerland: 12’345,67 • Swiss money: 12’345.67 • France : 12 345,67 • Varieties in symbol placement, symbol length, precision, number width, rounding rules • India : Rs. 12,34,567.89 ; Re. 1 • U.S.A : US $1,234,567.89 • France : 12.345,67 € • Portuguese : 12$34ESC • Portuguese : 12$34€

  18. System Design

  19. Indian Languages Profile

  20. Data Source : 2001 Census of India Number Percentage Hindi 337,272,114 40.22% Bengali 69,595,738 8.30% Telugu 66,017,615 7.87% Marathi 62,481,681 7.45% Tamil 53,006,368 6.32% Urdu 43,406,932 5.18% Gujarati 40,673,814 4.85% Kannada 32,753,676 3.91% Malayalam 30,377,176 3.62% Oriya 28,061,313 3.35% Punjabi 23,378,744 2.79% Assamese 13,079,696 1.56% Sindhi 2,122,848 0.25% Nepali 2,076,645 0.25% Konkani 1,760,607 0.21% Manipuri 1,270,216 0.15% Kashmiri 56,693 0.01% Sanskrit 49,736 0.01% Other Languages 31,142,376 3.71% Total : 838,583,988 100.00% Percentage Languages Usage Index Language

  21. Indian Currency Example Indian Currency (Value Rs. 10) Population resides in villages of India : 70% Total number of Languages in India : 40 Official Languages : 22 Language Panel Overall Literacy Rate : 64.20 % English Language Literacy : 17.75 % 15 major Indian Languages

  22. Internationalisation Text Extraction Translation Localisation Information Channelisation Prepare material for localisation (account for text expansion, avoid embedded text..) Extract text from source Files (graphics, PDFs etc.) Translate content from Extracted materials Replace graphics, change colors, redesign layout to accommodate target culture.

  23. Site Acceptance Factors • Color • Image • Representation Translation Errors Text Placement in Separate File Web page is “dynamically” converted into target language Language selection Static web page is selected and displayed Mapping Techniques Late Binding Localisation Translation Localisation Process

  24. HTML Server Parse Request Module Client Browser_2 Client Browser_3 Client Browser_n Client Browser_1 Server Architecture S O C K E T A P I Localised Content -------- -------- -------- -------- Default Alternative Language Response Property File --------- --------- --------- ---------

  25. Implementation: Parse Request Module • Definition • To parse the request header • Responsibilities • To parse the request header • To analyze and forward the request • Provide log to the administrator • Compositions • Main server loop • Threads • Interfaces/Ports • Socket APIs

  26. Thread 1 Main Server Loop Thread 2 Thread 3 Thread 4 Thread 5 Thread n Parse Request Module Architecture

  27. HTML Server • Definition • Default implementation of HTTP protocol • Processes static HTML requests • Responsibilities • Process static HTML request • Process dynamic Internationalisation request • Compositions • Server Processes • Interfaces/Ports • Socket APIs

  28. GET Request Processor Static Response -------- -------- -------- -------- Default Language Default Language Alternative Language Alternative Language .properties --------- --------- --------- --------- Parse Protocol GET/POST Static Response -------- -------- -------- -------- POST Request Processor HTML Server Architecture

  29. System Implementation and Test Results

  30. Java Support for Internationalisation • The Locale class lets applications identify locales, allowing for truly multilingual applications. • The ResourceBundle class provides the foundation for localisation, including localization for multiple locales in a single application container. • The Date, Calendar, and TimeZone classes provide the basis for time handling around the globe. • The String and Character classes as well as the java.text package contain rich functionality for text processing, formatting, and parsing. • Text stream input and output classes support converting text between Unicode and other character encoding.

  31. Conversion Process • Character conversion is a pretty straightforward process as long as there is a one-to-one mapping between sequences of Unicode characters on one side and sequences of bytes in another encoding on the other side, and the input only consists of characters or bytes that have mappings. • The reality is : • A single character in a non-Unicode encoding may have multiple equivalent representations (say, a precomposed character and a sequence of base character and combining mark). • A character in one encoding may not have an equivalent in the other encoding. • An invalid sequence of bytes or characters may show up in the input.

  32. Process: Configure Server

  33. Process: Register

  34. Process: Log

  35. Process: Localise Servlet

  36. Web Page in English with IE

  37. Web Page in Spanish with IE

  38. Web Page in Dutch with IE

  39. Web Page in French with IE

  40. Web Page in Italian with IE

  41. Web Page in Portuguese with IE

  42. Web Page in German with IE

  43. Web Page in English with IE

  44. Web Page in Marathi with IE

  45. Conclusion • The Java Localisation API`s come in handy to dynamically localise the web page into alternative languages • The rich set of Java class libraries such as java.util.ResourceBundle and java.util.Locale provide an efficient approach to work with locale specific information • More manageable workspace for users in native language • Regional Settings, Colour, Image representation not disturbed • Improves effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces • Supports region specific functionality (due to legal aspects, financial practice etc.). • Provides region specific value added services (like UI, look and feel, Sorting/Searching). consolidate same functionality application/service developed and maintained separately for separate language/region.

  46. References [1]. Fernandez, N. C. (2000), Web Site Localisation and Internationalisation: A Case study, published, City University [2]. Khachane, J, (2005), Web Page Localisation, published Pune University [3]. DEPALMA, D.A. (1999), Strategies for Global Sites, Forrester Research Inc, May 1998 and The eBusiness Report. In: eMarketer [4]. ROCHE, M. (2000) Managing Multilingual Web Applications. 16th International Unicode Conference, Amsterdam [5]. NIELSEN, J. (1999) Designing Web Usability, Indianapolis: New Riders Publishing [6]. Deitsch, Loukides, M, Java Internationalisation

More Related