GSK: Development and Distribution of Resources - PowerPoint PPT Presentation

Gsk development and distribution of resources l.jpg
Download
1 / 16

  • 228 Views
  • Updated On :
  • Presentation posted in: Travel / Places

GSK: Development and Distribution of Resources. Licensing and Distribution of Resources and Applications. Hitoshi ISAHARA GSK : Gengo Shigen Kyokai (Language Resource Association) National Institute of Information and Communications Technology (NICT).

Related searches for GSK: Development and Distribution of Resources

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

GSK: Development and Distribution of Resources

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Gsk development and distribution of resources l.jpg

GSK: Development and Distribution of Resources

Licensing and Distribution of Resources and Applications

Hitoshi ISAHARA

GSK: Gengo Shigen Kyokai (Language Resource Association)

National Institute of Information and Communications Technology (NICT)


Organizing creation utilization of language corpora l.jpg

Organizing Creation & Utilization of Language Corpora

Creation of language corpora needs some cost.

Utilization needs a system to distribute corpora.

Some activities started early in 1990s.

1992 LDC in U.S.A.

1995 ELRA in Europe

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Japanese activities l.jpg

Japanese Activities

GSK: Gengo Shigen Kyokai

(Language Resource Association)

Launched in 1999,

Reformed as an NPO in 2003,

Project accepted in 2005 for 3 years,

Text corpora are its main concern at present.

NII-SRC distributes speech corpora.

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Gsk and nii src l.jpg

GSK and NII-SRC

Language Resource Association (GSK)

A nonprofit organization collecting and distributing text and speech corpora.

http://www.gsk.or.jp/

NII-Speech Resources Consortium (NII-SRC)

Collects and distributes most major speech corpora.

http://research.nii.ac.jp/src/eng/

These two organizations try to play central roles for collecting and distributing speech and language corpora in Japan.

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Slide5 l.jpg

JEITA

(Japan Electronics and Information Technology Industries Association)

GSK

NII-SRC

Knowledge Information

Processing Technologies

Committee

NII: National Institute of Informatics

NICT: National Institute of Information and Communications Technology

Language Resource

Sub-committee

TCL

Natural Language Processing

Portal Site

SHACHI: Language

Resource Metadata DB

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Purpose of gsk l.jpg

Purpose of GSK

Collection, distribution, investigation, research, and standardization of electronic data and software tools necessary for the promotion of science, technology, education and industry concerning natural language.

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Gsk organization l.jpg

GSK Organization

President

Two vice presidents

11 board members

25 steering committee members

All are voluntary workers.

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


No fee distribution l.jpg

No-fee Distribution

Corpus

Provider

Distribution permission

User

GSK

Payment

Agreement

As a rule, the cost of handling corpora falls on the user, though the corpus itself is free of charge.

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Agency l.jpg

Agency

Agency

Request

GSK

Provider

User

Form

Commission

Payment

Agreement

The providers of the corpora entrust GSK with requests received from users. GSK mediates between users and providers.

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Advertizing l.jpg

Advertizing

Provider

User

Ad request

GSK

Publicity

Ad rate

Payment

Agreement

Corpora providers entrust GSK with advertizing useful information on their data or corpora.

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Some examples of gsk corpora l.jpg

Some Examples of GSK Corpora

JEITA Multimodal Corpus

Japanese Web N-ram Version 1

CICC Multilingual Dictionary

IPAL Lexicon of Basic Japanese

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Jeita multimodal corpus l.jpg

JEITA Multimodal Corpus

A corpus of collected person-to-person task-oriented dialogues. 80 min. of video for 9 conversations concerning topics of “faces” and “travel” included. Speech data transcribed and provided with annotations indicating morphemes, dialogue structure and prosody. Contained in 1 DVD-R (800 MB).

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Japanese web n gram version 1 l.jpg

Japanese Web N-gram Version 1

N-grams that have been extracted from Google crawling publicly available Japanese webpages. Pages requiring special permission to brows or indicated with nonarchaive/noindex are not included. N-grams (1-7) with frequency greater than 20 were extracted from approximately 20 billion sentences.

Contained in 6 DVD-Rs (26 GB after gzip compression).

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Cicc multilingual dictionary l.jpg

CICC Multilingual Dictionary

A collection of Malay, Indonesian, Chinese, and Thai Dictionaries containing 50,000 basic words, POS tags; some contains English translations. Technical Term Dictionary for each language is also available.

Contained in 1 CD-ROM for each language.

CICC: Center for the International Cooperation for Computation

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Ipal lexicon of basic japanese l.jpg

IPAL Lexicon of Basic Japanese

Containing

861 verbs, 136 adjectives, and 1,081 Nouns and glossary. English translations also provided for nouns contained in glossary.

Contained in 1 CD-ROM.

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


Summary l.jpg

Summary

1. There are several distributers of language resources in Japan.

2. GSK is the only consortium of language resources qualified as NPO in Japan.

3. GSK plans to collaborate with Language Grid Project.

Regional Conference on Localized ICT Development and Dissemination across Asia

Jan. 15, Vientiane, Laos


  • Login