1 / 16

GSK: Development and Distribution of Resources

GSK: Development and Distribution of Resources. Licensing and Distribution of Resources and Applications. Hitoshi ISAHARA GSK : Gengo Shigen Kyokai (Language Resource Association) National Institute of Information and Communications Technology (NICT).

tevy
Download Presentation

GSK: Development and Distribution of Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GSK: Development and Distribution of Resources Licensing and Distribution of Resources and Applications Hitoshi ISAHARA GSK: Gengo Shigen Kyokai (Language Resource Association) National Institute of Information and Communications Technology (NICT)

  2. Organizing Creation & Utilization of Language Corpora Creation of language corpora needs some cost. Utilization needs a system to distribute corpora. Some activities started early in 1990s. 1992 LDC in U.S.A. 1995 ELRA in Europe Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  3. Japanese Activities GSK: Gengo Shigen Kyokai (Language Resource Association) Launched in 1999, Reformed as an NPO in 2003, Project accepted in 2005 for 3 years, Text corpora are its main concern at present. NII-SRC distributes speech corpora. Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  4. GSK and NII-SRC Language Resource Association (GSK) A nonprofit organization collecting and distributing text and speech corpora. http://www.gsk.or.jp/ NII-Speech Resources Consortium (NII-SRC) Collects and distributes most major speech corpora. http://research.nii.ac.jp/src/eng/ These two organizations try to play central roles for collecting and distributing speech and language corpora in Japan. Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  5. JEITA (Japan Electronics and Information Technology Industries Association) GSK NII-SRC Knowledge Information Processing Technologies Committee NII: National Institute of Informatics NICT: National Institute of Information and Communications Technology Language Resource Sub-committee TCL Natural Language Processing Portal Site SHACHI: Language Resource Metadata DB Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  6. Purpose of GSK Collection, distribution, investigation, research, and standardization of electronic data and software tools necessary for the promotion of science, technology, education and industry concerning natural language. Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  7. GSK Organization President Two vice presidents 11 board members 25 steering committee members All are voluntary workers. Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  8. No-fee Distribution Corpus Provider Distribution permission User GSK Payment Agreement As a rule, the cost of handling corpora falls on the user, though the corpus itself is free of charge. Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  9. Agency Agency Request GSK Provider User Form Commission Payment Agreement The providers of the corpora entrust GSK with requests received from users. GSK mediates between users and providers. Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  10. Advertizing Provider User Ad request GSK Publicity Ad rate Payment Agreement Corpora providers entrust GSK with advertizing useful information on their data or corpora. Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  11. Some Examples of GSK Corpora JEITA Multimodal Corpus Japanese Web N-ram Version 1 CICC Multilingual Dictionary IPAL Lexicon of Basic Japanese Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  12. JEITA Multimodal Corpus A corpus of collected person-to-person task-oriented dialogues. 80 min. of video for 9 conversations concerning topics of “faces” and “travel” included. Speech data transcribed and provided with annotations indicating morphemes, dialogue structure and prosody. Contained in 1 DVD-R (800 MB). Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  13. Japanese Web N-gram Version 1 N-grams that have been extracted from Google crawling publicly available Japanese webpages. Pages requiring special permission to brows or indicated with nonarchaive/noindex are not included. N-grams (1-7) with frequency greater than 20 were extracted from approximately 20 billion sentences. Contained in 6 DVD-Rs (26 GB after gzip compression). Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  14. CICC Multilingual Dictionary A collection of Malay, Indonesian, Chinese, and Thai Dictionaries containing 50,000 basic words, POS tags; some contains English translations. Technical Term Dictionary for each language is also available. Contained in 1 CD-ROM for each language. CICC: Center for the International Cooperation for Computation Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  15. IPAL Lexicon of Basic Japanese Containing 861 verbs, 136 adjectives, and 1,081 Nouns and glossary. English translations also provided for nouns contained in glossary. Contained in 1 CD-ROM. Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

  16. Summary 1. There are several distributers of language resources in Japan. 2. GSK is the only consortium of language resources qualified as NPO in Japan. 3. GSK plans to collaborate with Language Grid Project. Regional Conference on Localized ICT Development and Dissemination across Asia Jan. 15, Vientiane, Laos

More Related