1 / 34

Web Apps I18n Testing and Test Data

Web Apps I18n Testing and Test Data. Katsuhiko Momoi Sr Test Engineer, Google Inc. An Outline. Web Apps Internationalization/Localization Current scenes and practices I18n Testing and Test Data Good test data -- Essential for good i18n test coverage Types of I18n Data & their uses

bnicholas
Download Presentation

Web Apps I18n Testing and Test Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Apps I18n Testing and Test Data Katsuhiko Momoi Sr Test Engineer, Google Inc.

  2. An Outline • Web Apps Internationalization/Localization • Current scenes and practices • I18n Testing and Test Data • Good test data -- Essential for good i18n test coverage • Types of I18n Data & their uses • Delivery mechanisms • Efficient I18n Testing • Tools/APIs • Summary and Conclusion

  3. Web Apps I18n and L10n • Increasing number of web apps • 30 or more at MSN, Yahoo, Google, etc. • Typically info oriented apps, e.g. search, maps, jobs, locals, etc. • Also some desktop-like apps – chat, mail, document, spreadsheet, presentation, photos, etc. • Support for a large number of languages • Yahoo, MSN, Google – Home pages in over 40 languages • Broader revenue base? • Core code base in Unicode • Display pages in UTF-8 • Major web sites have switched to Unicode

  4. Web Apps Development and I18n • Frequent updates and releases • Features more important than stability? • Short development cycles • 1-4 week release cycles • Pressures on testing teams • At any given time … • Multiple test servers and multiple versions to be tested • Multiple languages released at the same time • One binary, many localized languages • Last minute translation updates are frequent • New testing strategies needed

  5. General Testing Strategies: Test Code! • Devise a testing plan at product design stage • Push testing upstream into code writing • Refactor code as needed to make it more testable • Make code units testable • Tools like Testability Explorer are useful • http://code.google.com/p/testability-explorer/ • Test engineers and coding engineers work side by side • Minimize the release cycle • Testing is everyone’s business!

  6. General Testing Strategies: Use humans wisely! • Automated acceptance test for continuous builds • Bugs, Latency, Performance issues tracking • Automate as many tasks possible • Use/create tools to help automate testing • Use open source tools/test frames • Minimize UI level testing wherever possible • Human testers for exploratory testing and analysis • Complex scenario testing

  7. I18n Testing: An overview • Begin I18n feature/code discussion at a product design stage • I18n testing plan early in the development • Feature requirement • also includes local market requirements • Libraries • 3rd Party components • Check I18n readiness • Review localization plan and schedule • Start exploratory testing as soon as code is ready • Include I18n test cases in automated smoke test • A small number of critical tests for continuous builds

  8. I18n Testing: Before Localization Begins • Is the code Unicode compliant? • Run unit tests • Look for ways to test locale-dependent features before L10n • Test product code against I18n libraries • Time/Date/Currency formats, Collation, etc. • Identify special locale dependent features • Run language specific feature tests against an English or Pseudo-localized build (e.g. CJK IME testing) • Sniff out potential local product requirement issues

  9. I18n Testing: Localizability • Use pseudo-localization to identify • Unextracted/hard-coded strings • String Concatenation issues • Identify potential text expansion problems A sample: Original: Use default text encoding for outgoing messages Pseudo:[Ûšé ðéƒåûļţ ţéxţ éñçöðîñĝ ƒör öûţĝöîñĝ méššåĝéš one two three four five] • Establish localizability check point before L10n • Work with the translation team to establish guidelines • A set of check items that need to be passed before serious localization can begin • For example, the set may include all of the testing items mentioned so far • Ex of Syntax problem: (You bought) [merchandize] (on) [date]

  10. I18n Testing: When Localization begins • Almost all major I18n issues should have been caught/fixed long before this • Run functional tests against localized builds • Things to look out for: • Functional breakage due to literal dependency • Bidi UI breakage • UI breakage/text expansion issues • Untranslated strings • Translation appropriateness • Usually a task for linguistic reviewers • Get a product evaluation done by users in target countries or by target language users

  11. I18n Test Areas: A summary • Unicode Compliance • Input/Output correctness for character data • Functional correctness with non-ASCII data • Locale Dependent Code Behavior: General • Date/Time/Time Zone, Collation/Sorting, Search/Filter, Language switch, Address, Currency • Locale Dependent Code Behavior: Product Specific • Webmail: outgoing mail encoding selection • Special features offered via geo-location setting, e.g. weather info • Early discovery of Localizability issues (before localization begins) • Use pseudo-localization as a tool • Functional testing under localized UI • Watch out for Bidi/RTL UI issues

  12. Data: Unicode Compliance Test with data based on the latest Unicode standard (5.x) • Unformatted string data: random characters • Build test data with UnicodeSet/ICU • Use patterns or build programmatically • Can also build UnicodeSet from script(s) associated with a locale • CLDR exemplar characters (but probably not auxiliary set) • Unformatted string data: natural language data • May build a tool/API to generate real language string • Useful to create language data from a small subset of major script types • e.g. CJK, Latin1, RU, HE/AR, TH • Not tied to any specific locale • Hodge-podge data set from major languages • Good for manual testing

  13. Data: Quick Sanity Check for Unicode Compliance • If you want a quick sanity check on Unicode compliance … • Hand-crafted short strings may suffice: • Îñţérñåţîöñåļîžåţîöñ (UTF-8 two byte characters) • (UTF-8 two to four byte characters) • Random vs. real language strings • Random strings • Unit/Code testing • Real language strings • Manual testing • Real language data is needed for detecting language • Quick sanity check with Unicode compliance – easily recognizable

  14. Bad Data: Invalid/malformed/broken data • Test your code with bad data • Learn how your product deals with the unexpected • What happens when the product has to process invalid Unicode data? • Malformed UTF-8, UTF-16 bytes • Incorrect BOM • Text data file uploaded with incorrect information about encoding • EUC-JP file uploaded as Shift_JIS file for Japanese • File uploads with incorrectly encoded data • Data not defined under a protocol • Plain text data with HTML entities

  15. Data: Local Encoding Data • Most current web apps use UTF-8 • Ex. All major web mail services and search home pages are in UTF-8 • Local encoding data are needed only for: • Data upload and download from your web apps • Address books, Spreadsheet data in csv format • Apps accepting user uploads of text files • Your web pages/apps are embedded in another web site using a local encoding • Search/Ads display syndicated to another web site

  16. Lang/Locale Specific Formats: Date/Time • Difficult to test for accuracy • A number of format available on all platforms: Slovenian examples • Full: EEEE, dd. MMMM yyyy (sreda, 30. julij 2008) • Long: dd. MMMM yyyy (30. julij 2008) • Medium: d.M.yyyy (30.7.2008) • Short: d.M.yy (30.7.08) • Shortened forms are typical for web apps • Date/Time display in webmail Inbox view • Gmail: 30. jul • Some locale formats get old and newer formats may not be available from platforms • Updates help but sometimes not fast enough • Differences of opinions among natives

  17. Lang/Locale Specific Formats: Number/Currencies • Platform Libraries provide number and currency formats • Testing issues: • Number Formats are quite varied from locale to locale • Placement of currency symbols relative to numbers • The Euro is often recommended to be placed after the number • 3.50€ • But is often written as €3,50 or even 3€50 (DE or FR locale) • Native currency conventions carried over to the Euro • Native speaker validation is important • But such testers may not be available for all languages

  18. Lang/Locale Specific Formats: Addresses/Phone Numbers • Names, Addresses, Phone Numbers • These formats are usually not supplied by platform libraries • Custom address widgets may be necessary • Different countries/regions may need: • different order of address fields (larger to smaller or smaller to larger domains) • more street address lines (3 for example) • 'extra field' in their address • 'static text' between address items • no State (or province) in address • three levels for administrative divisions on the address (Province, second level City, third level city or county), e.g. China • different label name for their address item • A single Display Name field is easier than separate given name, middle name, family name fields

  19. Lang/Locale Specific Data: Collation, Search/Filter • Data for Collation/Sorting and Search/Filter don’t have specific formats for languages/locales • Best to use authentic language data • Collation samples available from ICU • Optional settings make validation task complex • Punctuations, lower/upper case distinctions • Search/Filter: • CJK Thai segmentation (ICU support available) • Query language detection needed • Real language strings are must

  20. Language Specific Data: Mail, Calendar, etc. Some functions are best tested with real language data • Mail data for outgoing message encoding tests • Need to be language specific since best encoding selection is language bound • Examples: • Cyrillic characters: • KOI8-R for Russian but KOI8-U for Ukrainian • Chinese characters: • ISO-2022-JP for ja, GB2312 for zh-Hans, Big5 for zh-Hant, EUC-KR for ko • Google Calendar • Event creation via text input: e.g. “6pm Party at John’s house” • Starting time: 6pm (on the day selected) • Place: John’s house • Spellchecker language • Auto-detect language and offer a correct language spell checker

  21. What do we need? • Fast Development Cycles • Support for a large number of languages • Many items to cover for I18n testing • Even automated tests take too long if they need to be run against 50+ languages … … • We need Efficient I18n Testing!

  22. Efficient I18n Testing: Code Coverage • Key concept is test coverage for the written code • Is there sufficient testing written for every class? • Is code testable easily? • Evangelize testability to development teams • Ask for a unit test for all check ins • Run Unicode compliance tests • Share testing with development team members • Automate testing wherever possible • Identify important coverage areas for I18n • Provide test APIs • Test locale dependent behavior at the code level • Doable for data generated by I18n libraries • Doable before localization begins

  23. Efficient I18n Testing: UI driven automation • For larger UI driven tests (e.g. user scenarios) • Use automation frameworks: Selenium, Eggplant • Minimize such test cases – often unreliable for web apps under development • Use mocks to speed up server interactions • Subset core test cases to suit I18n needs • Features that do not handle character data are unlikely to break under different UI languages • Run tests for a set of representative UI languages – not all of them • Choose different script types: Japanese, Chinese, Korean, Cyrillic, Latin1/2, Greek, Arabic, Hebrew, Devanagari, Turkish, Thai • Use data sets from these representative languages: core test data • Minimum effective data and language set for i18n coverage • Provides good representative coverage for various language scripts • Shorten the time to run localized language UI test cases

  24. Efficient I18n Testing: Use humans wisely! • Possible areas left for manual/exploratory i18n testing • Product/UI correctness for local markets • Complex functional testing scenarios • UI areas too new for automation • Language specific matters: CJK IME, Bidi UI/Layout, etc. • I18n Compatibility testing with other apps • Locale specific apps list • Not I18n but • Appropriateness for local markets • Local product testers • Translation quality • Linguistic reviewers/editors

  25. Tools: what do we need? • Tools and APIs that would help i18n testing … and globalization processes • Some exist (company internally) and others should be coming • Hopefully good ones would be offered as open-source software

  26. Tools: Data validation via I18n libraries Unit test your code: (before localization!) • Locale format validation tests • Any formatted data generated by a library should be unit testable with validating test classes • Open source libraries should come with such test suites for all format generating classes • Do they exist for ICU? If not, we should organize such efforts • Date/Time, Currency, Number/Decimal • Locale dependent function validation tests • Sorting validation • Time zones per locale • Calendar correctness: start date of the week, Calendar types • Segmentation • Transforms validation

  27. Tools: Random Data Generation • ICU • UnicodeSet: • Pattern generation • Build data programmatically: e.g. Character properties • Random string generator with ExemplarSet characters for specified locales • Dangerous character generator • Define what they are • Generate them on demand

  28. Tools: Real Language Data Generation • Provide: real/validated language data (not random character string) • For a large number of languages • Updatable database • Provide real language sentences – not just words • Representative Unicode compliance data (hand crafted) • Varied length data string • Provide custom locale-dependent format strings • e.g. possible to customize date format string from ICU • An API callable from within a test case/data • Where do we collect data? Most search companies have data such as: • Search key words • Language/encoding detection data • Translation training data • etc.

  29. Tools: UI String Translation Tool • Most test cases should be language independent • But some test cases may need translated UI string for validation • (ex) If a product has a language switch setting, one can validate the switching by matching a few UI strings in the target language • Usually translation data base exist for projects • A tool that will generate equivalent strings for different locales from the translation data base • Write test cases with place holders for UI strings – load the string values from locale data files

  30. Tools: Find language related problems • A tool that auto-detects a language of target pages • Helps with misplaced language pages: e.g. expects one language content but get another by mistake • A tool that crawls links and find dead ones or incorrect language ones • A tool that finds untranslated strings at a server build time • Exploit your translation infrastructure to come up with such a tool

  31. Tools/Methods: Pseudo localization schemes • A pseudolocalization scheme that catches • usual things like unextracted strings, string concatenation, string expansion • Additionally finds: • undesirable syntax and constructions in the original • unhealthy dependencies among adjectives, nouns, relational particles (e.g. on/at/from …), and numbers • A scheme that works well for catching and debugging Bidi/Mixed text issues • Make pseudo locale as permanent part of your test environment! • Others?

  32. Concluding summary • Large scale, fast development processes for web apps require efficient testing strategies • Test code directly wherever you can • I18n input/output testing is particularly suited for this approach • Share testing work with development teams • Testing is everyone’s business! • Use tools like Testability Explorer to measure test coverage • Use libraries for validation tests • Create tools for data generation and validation • Exploit your translation process and infrastructure to create tools that will help your testing and shipment

  33. Concluding summary (continued) • Repetitive tasks should be automated • Consider sub setting your automated test cases to those directly affecting i18n • UI driven automated tests can take too long if a large number of languages need to be supported • Core language set idea: • Establish a core set of languages for your product • Use data from these core set of languages • For example: CJK, RU, Latin 1 language, HE/AR • Run these core language automated tests frequently. Run against a full set of languages less frequently • Leave humans for testing that requires analytic skills and thinking

  34. Thank You! Q&A

More Related