An Experience of the Language Observatory Project. Yoshiki Mikami Leader, Language Observatory Project Japan Science & Technology Agency Workshop on “Recent Experiences on Measuring Languages on the Cyberspace” UNESCO, Paris, February 22, 2007. Outlines. Global Digital Divide
Leader, Language Observatory Project
Japan Science & Technology Agency
“Recent Experiences on Measuring Languages
on the Cyberspace”
UNESCO, Paris, February 22, 2007
3.1 Survey Snapshots, Asia and Africa
3.2 Technical aspect of the Divide
3.3 Social aspect of the Divide
3.4 Several non-linguistic aspects
From Measurement to Empowerment
Source: ITU Statistics
Gini-coefficient: Telephony 0.51 < GDP 0.73 < Internet 0.91
Recommendation concerning the Promotion and Use of Multilingualism and Universal Access to Cyberspace, October 2003
<META http-equiv=Content-Type content="text/html; charset=UTF-8">
<A href="http://www.language-observatory.org"><IMG height=137 alt="logo" src = “LO.files/logo.gif" width=155></A>
<P>Astronomical observatory catches the light from stars, likewise.................
[ UbiCrawler ]
Identifier [ LI ]
Difference of language
Differnce of Encoding
Difference of Script
UNESCO reported the launch of the project
June 26-28, 2006 at Bamako, Mali
Experts’ contribution is essential in collection of local
coding text, seed URLs, and verification of LI results
as of June 2006
as of October 2006
“Language Localization” has been the key obstacle to the use of new information technologies since type printing age.
"Before I end this letter I wish to bring before Your Paternity's mind the fact that for many years I very strongly desired to see in this Province some books printed in the language and alphabet of the land, as there are in Malabar with great benefit for that Christian community. And this could not be achieved for two reasons; the first because it looked impossible to cast so many moulds amounting to six hundred, whilst as our twenty-four in Europe."
Doctrina Christam in Tamil, 1578
source: Priolkar, The Printing Press in India,Bombay, 1958
Philippines postal stamp issued in 1995
“Doctrina Christiana”, bi-lingual version, printed in Tagalog by Tagalog script / in Tagalog by Latin script / in Spanish by Latin script.
note: Local proprietary encodings are shown in this table by names of font (families). as of June 2006
As of June 2006
delay in localization
non-availability of search
lack of leadership in standardizationTechnical Aspect of the Digital Language Divide
lack of standard
in typewriter keyboard
less attention from IT vendors
difficulty in access to standardization process
various localization by
Based on EU’s “Common European Framework of Reference for Languages” (2004)
restricted social activities
absence of mother language
80% of servers under African domains are located outside of the country. 60% of servers in Asian domains are also “offshore”
as of December 2005
I am the web master of the XXXXXXX Database. We are being severely hit by your LanguageObservatory‘s web crawler - already 37000 page hits this month. InDecember 2005 you hit us 34000 times. We are on limited bandwidth, and this puts unacceptable strain on our server. I notice that you considerone HTTP request every 5 seconds 'polite' and 'modest'. This may be true in Japan, but not in Africa - our connections are very slow and very narrow.
I would appreciate it if you could prevent your crawlers from visitingour URL again. In return, I will be happy to provide you directly with whatever statistics about our site you need for your research.
we carefully control data collection speed using a
set of parameters, such as revisiting interval, depth,
maximum pages per server, prohibition URL list.
Management of small Islands’ domains are often re-delegated to overseas web-hosting operators, who tend to admit spam, porn, etc.
as of December 2005
Countries where only state controlled TV stations available, show higher percentage of links going to global news sites abroad.
Goals/Targets/Indicators system which help and guide stakeholders in empowering languages
localization of application SW
based on standard
local language search engines
母語情報処理技術OCR, TTS, 翻訳
promotion of NLP
OCR, TTS, MTe-dictionary, etc
creation of local contents
electronic delivery of public services
mother language use in higher education
Jehan Rectus Square, Paris
photo: courtesy by Wunna Ko Ko, June 2005