1 / 33

An Experience of the Language Observatory Project

An Experience of the Language Observatory Project. Yoshiki Mikami Leader, Language Observatory Project Japan Science & Technology Agency Workshop on “Recent Experiences on Measuring Languages on the Cyberspace” UNESCO, Paris, February 22, 2007. Outlines. Global Digital Divide

tivona
Download Presentation

An Experience of the Language Observatory Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Experience of the Language Observatory Project Yoshiki Mikami Leader, Language Observatory Project Japan Science & Technology Agency Workshop on “Recent Experiences on Measuring Languages on the Cyberspace” UNESCO, Paris, February 22, 2007

  2. Outlines • Global Digital Divide • Language Observatory: How It Functions? • Major Findings 3.1 Survey Snapshots, Asia and Africa 3.2 Technical aspect of the Divide 3.3 Social aspect of the Divide 3.4 Several non-linguistic aspects • Future Agenda Regarding Measurement From Measurement to Empowerment

  3. 1. Global Digital DivideIncome, telephony 2004 1999 Source: ITU Statistics

  4. The Degree of Inequality Telephony<Income<Internet Gini-coefficient: Telephony 0.51 < GDP 0.73 < Internet 0.91

  5. UNESCO Recommendation Recommendation concerning the Promotion and Use of Multilingualism and Universal Access to Cyberspace, October 2003 [PREAMBLE] • Noting that linguistic diversity in the global information networks and universal access to information in cyberspace are at the core of contemporary debates and can be a determining factor in the development of a knowledge-based society,

  6. Linguistic activities moving onto the Web

  7. 2. Language ObservatoryHow It Functions? Internet http://gii.nagaokaut.ac.jp/gii/papers.php <HTML><HEAD> <TITLE>Language Observatory</TITLE> <META http-equiv=Content-Type content="text/html; charset=UTF-8"> </HEAD> <BODY> <A href="http://www.language-observatory.org"><IMG height=137 alt="logo" src = “LO.files/logo.gif" width=155></A> <H2>About us</H2> <P>Astronomical observatory catches the light from stars, likewise................. Crawler [ UbiCrawler ] pages Language Identifier [ LI ] Tag Analysis Analysis on Digital Language Divide Language Resources Contant nalysis

  8. Unit of Identification = LSE Language+Script+Encoding Difference of language Differnce of Encoding Difference of Script

  9. The First Workshopon the IMLD, 2004 UNESCO reported the launch of the project http://portal.unesco.org/ci/en/ev.php-URL_ID=14480&URL_DO=DO_TOPIC&URL_SECTION=201.html

  10. Milestones, 2003 to 2007

  11. Expert CollaborationCase of African Survey June 26-28, 2006 at Bamako, Mali ACALAN Mali Algeria Burkina Faso Ethiopia Kenya Malawi Nigeria Tunisia CNRS, France

  12. Researchers NetworkOver 35 countries Experts’ contribution is essential in collection of local coding text, seed URLs, and verification of LI results

  13. 3.1 Survey SnapshotLanguages on the net, Asia as of June 2006

  14. 3.1 Survey Snapshot (cont.)Languages on the net, Africa as of October 2006

  15. 3.2 Technical AspectLocalization Problem “Language Localization” has been the key obstacle to the use of new information technologies since type printing age.

  16. A Jesuit Friar’s letter, 1608Six hundred versus 24 "Before I end this letter I wish to bring before Your Paternity's mind the fact that for many years I very strongly desired to see in this Province some books printed in the language and alphabet of the land, as there are in Malabar with great benefit for that Christian community. And this could not be achieved for two reasons; the first because it looked impossible to cast so many moulds amounting to six hundred, whilst as our twenty-four in Europe." Doctrina Christam in Tamil, 1578 source: Priolkar, The Printing Press in India,Bombay, 1958

  17. Doctrina in Tagalog, 1593The script was finally lost Philippines postal stamp issued in 1995 “Doctrina Christiana”, bi-lingual version, printed in Tagalog by Tagalog script / in Tagalog by Latin script / in Spanish by Latin script.

  18. Encoding Chaos leads todelay of localization note: Local proprietary encodings are shown in this table by names of font (families). as of June 2006

  19. Unavailability of search engines :another problem Google As of June 2006

  20. differentiation strategy to enclose customers local media local IT firms encoding chaos delay in localization non-availability of search engines (SEs) gov. users lack of leadership in standardization Technical Aspect of the Digital Language Divide lack of standard in typewriter keyboard less attention from IT vendors global IT firms difficulty in access to standardization process Int’l standard bodies various localization by overseas communities

  21. 3.3 Social Aspect: languages in multilingual society Based on EU’s “Common European Framework of Reference for Languages” (2004)

  22. Language plays a different role in multilingual society ac.xx educational com.xx occupational gov.xx public others personal Socio-economic domains Globallanguages Regional languages secondarylevel domain Officiallanguage (s) Minoritylanguage (s)

  23. Specialization of LanguageSecondary domain analysis Cyprus Turkey Kazakhstan Iran

  24. Social Aspect of the Digital Language Divide restricted social activities overseas community non availability of SEs local business global IT firms e- business local media users users media press gov. primary seondary education higher education users gov absence of mother language low literacy

  25. 3.4 Non-linguistic Aspectsa. Network and Server • ○rw: Rwanda • △ml: Mali • □mz: Mozambique • White: servers installed in the country • Colored: servrs installed overseas 80% of servers under African domains are located outside of the country. 60% of servers in Asian domains are also “offshore” as of December 2005

  26. Complaint against accessA letter from Namibia I am the web master of the XXXXXXX Database. We are being severely hit by your LanguageObservatory‘s web crawler - already 37000 page hits this month. InDecember 2005 you hit us 34000 times. We are on limited bandwidth, and this puts unacceptable strain on our server. I notice that you considerone HTTP request every 5 seconds 'polite' and 'modest'. This may be true in Japan, but not in Africa - our connections are very slow and very narrow. I would appreciate it if you could prevent your crawlers from visitingour URL again. In return, I will be happy to provide you directly with whatever statistics about our site you need for your research. Sincerely we carefully control data collection speed using a set of parameters, such as revisiting interval, depth, maximum pages per server, prohibition URL list.

  27. b. Domain Governance pages Management of small Islands’ domains are often re-delegated to overseas web-hosting operators, who tend to admit spam, porn, etc. population (1,000) as of December 2005

  28. c. Access regulationsby the government Countries where only state controlled TV stations available, show higher percentage of links going to global news sites abroad.

  29. 4. Future Agenda • Regarding Measurement • Improvement of accuracy and coverage • Multi-stakeholder Collaboration • Global Observatories Network • From Measurement to Empowerment Goals/Targets/Indicators system which help and guide stakeholders in empowering languages

  30. World Network for Linguistic Diversity

  31. ”Language Empowerment”Mother language for creation localization of application SW based on standard local language search engines language community language portal OSS developers IT firms 母語情報処理技術OCR, TTS, 翻訳 promotion of NLP OCR, TTS, MTe-dictionary, etc media press mother language for creation higher education creation of local contents 豊富な母語コンテンツ 豊富な母語コンテンツ gov users electronic delivery of public services mother language use in higher education literacy

  32. Millennium Development Goals: Structure

  33. Thanks for your attention Jehan Rectus Square, Paris photo: courtesy by Wunna Ko Ko, June 2005

More Related