1 / 31

Language Service Management with the Language Grid

Language Service Management with the Language Grid. NICT Language Grid Project Yohei Murakami E-mail: yohei@nict.go.jp Web: http://langrid.nict.go.jp/. Existing frameworks to combine language resources (data and tools) are constructed for NLP professionals

oni
Download Presentation

Language Service Management with the Language Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Language Service Management with the Language Grid NICT Language Grid ProjectYohei MurakamiE-mail: yohei@nict.go.jpWeb: http://langrid.nict.go.jp/

  2. Existing frameworks to combine language resources (data and tools) are constructed for NLP professionals End users have difficulties while trying to combine the existing language resources and use them in real field Less knowledge of language resources Complex contracts and intellectual property rights The Language Grid is a trial of service-oriented collective intelligence to share language resources worldwide. Users can combine existing language resources (machine translations, morphological analyzers, dictionaries etc.) to create customized language services. Users can create their own language resources and utilize them to further customize the language services. Public Language Grid 120 groups from 18 countries share more than 60 language services. Background

  3. The Language Grid Education Medical Care Disaster Management more more Sharing Multilingual Information Translation Services at Hospital Receptions Universal Playground Providing Language Support for Multicultural Societies Sharing language resources such as dictionaries and machine translators around the world German Research Center for Artificial Intelligence Kookmin University Stuttgart University National Institute of Informatics Princeton University National Research Council, Italy Google Inc. Chinese Academy of Sciences NICT NTT Research Labs Asian Disaster Reduction Center NECTEC Univ. of Indonesia

  4. Service-Oriented Approach Users • Enactment of a wide variety of policies and licenses • Policies and licenses depend on providers • Different from content-basedCI framework which relieson common license(e.g. Wikipedia) Access controller Request + Coordinationengine ServiceInterface+ Policy • Protecting intellectual property rights of the resources • Access control based on polices Resources+ License A • Combining services freely • Any combination of services are available Resources+License B Resources + License C Resource Providers

  5. Service Layers of the Language Grid Customized Multilingual Environment Intercultural Collaboration Tools Application System Multilingual communication is supported using various language services. Composite Language Services Composite Service (back translations, domain-specific translations, ….) Language Services (back translations, specialized translations, ….) Multiple atomic language services are composed using workflows. Atomic Service (machine translations, morphological analyzers, dictionaries, parallel texts…) Language Resources (machine translations, morphological analyzers, dictionaries, parallel texts…) Atomic Language Services Language resources are made usable as Web services with standardized interfaces. Cloud Services P2P Service Grid P2P Grid Infrastructure Allow users to connect to Language Grid servers on the Internet.

  6. P2P Service Grid Language Grid Core Node Language service management, search & composition, and access control Language Grid Service Node Provides language resources as Web services. Sharing Information ① Invoking Services ④ ⑥ ② ③ ⑤ Japanese Morphological Analyzer En to Fr Translator Korean Morphological Analyzer Life Science Dictionary (ja, en) Ja to En Translator Ja to Ko Translator Multi-language Glossary on Natural Disasters(Ja, En, Ko, Zh, Es, Fr)

  7. Atomic Service • Wrap language resources as Web services equipped with a standard interface • Language service ontology is required for wrapping language resources to standardize interfaces of machine translations or dictionaries. Language Resource Machine Translation Web Service Morphological Analyzer Morphological Analyzer ParallelTexts Wrapper Dictionary Machine Translation Dictionary ParallelTexts

  8. Translation Service Input: translate(sourceLang, targetLang, source) Output: String Morphological Analysis Service Input: analyze(language, text) Output: Mopheme[], Morpheme={word, lemma, partOfSpeech} Bilingual Dictionary Input: search(headLang, targetLang, headWord, matchingMethod) Output: Translation[], Translation={headWord, targetWords[]} Parallel Text Input: search(sourceLang, targetLang, source, matchingMethod) Output: ParallelText[], ParallelText={source, target} Pictogram, Paraphrase, … Wrapper libraries to ease implementation of wrappers will be provided as open sources.(http://langrid.nict.go.jp/langrid-developers-wiki/) Standard Interfaces

  9. Composite Service To create a new language service, describe an abstract workflow Register the abstract workflow into Language Grid Core Node Assign an concrete atomic service to each task in the abstract workflow in invoking the service Put the binding information into the SOAP header <An Abstract Workflow for two-hop Translation> Translation ja->en Translation en->de Change Service …… Web Transer JServer Translation Services

  10. Workflow can be Complex! Atomic Services Japanese-German Domain Specific Translation Multilingual Backtranslation (ja->zh->ja, ja->de->ja, ja->en->ja) MeCab Japanese Morphological Analysis (by NTT CS) Technical Term Extraction Translation ja->zh Translation ja->de Translation ja->en No No remaining terms? remaining terms? Translation zh->ja Translation de->ja Yes Yes Translation en->ja Pangaea’s Community Dictionary Technical TermMultilingual Dictionary IntermediateCode Table + + (by NPO Pangaea) 3 translation results, 3 back translation results Intermediate CodeInsertion Translation ja->en JServer Translation ja->de (by Kodensha) Term Replacement Translation en->de Web Transer (by Cross Language) 10 Japanese-German Translation

  11. Language Service Management Architecture Language ResourceProviders Policy Service Manager WSDL Language ServiceUsers create Monitor Service Registration get AccessConstraint Endpoint URL AccessLog WSDL Service Invoker AccessLogging LoadBalancer AccessController ApplicationSystem Virtual Endpoint AtomicService Engine LanguageService Wrapper 1. SOAP 4. SOAP 2. SOAP 3. SOAP LanguageResources Composite Service Engine (ActiveBPEL, Java, JavaScript, etc) Language Grid Service Node Language Grid Core Node

  12. Web-based tool to manage Language Grid users, language resources, and language services on the Language Grid.(http://langrid.org/operation/service_manager) Service Manager (http://langrid.org/operation/service_manager/)

  13. Monitoring & Control of Language Services To Monitor and Control the Language Services • Monitor access date, IP address, and data transfer size of each request • Set access right for each user • Control accesses per day/month/ year, and data transfer size

  14. Case Studies (1) • NICT • Hard to provide free EDR (Concept/Bilingual Dictionary) services because NICT sells it. • Set 1000/month and 15KB/access for bilingual dictionary • Set 2000/month and 35KB/access for concept dictionary(These polices are configured to take almost one year for downloading whole data!!) • Allow only members to access EDR services without any restrictions • Kodensha Co., Ltd. • Hard to provide free J-Server service to users who are Kodensha’s business target • Prohibit them to access the free J-Server service • Allow only members to access the latest and high-quality J-Server service operated by Kodensha

  15. Case Studies (2) • Kyoto University • Have a responsibility to prevent illegal usage because Kyoto U. provides services based on resources it purchased from companies • Monitor whether the services are abused or not • Detect excessive access from a specific IP address • GSK • Promote language resource distribution on behalf of language resource providers • Deploy the language resources on GSK’s server and allow users who purchase the resources to access them(This hosting model can reduce language resource providers’ burden for selling and operating them)

  16. Service-Oriented Approach: Pros and Cons Pros • From Having to Using: Service-oriented approach can relax complex issues of intellectual property rights of language resources. • Cloud Services: Service-oriented approach allows resource providers to scale up the usage of language resources. • Service Federation: Service-oriented approach allows language services to be easily combined with other services, i.e., e-learning services, ambient intelligent services, etc. Cons • Maintenance Cost: Language services should be maintained and provided continuously by secure providers. • Market Pull: Language services should be designed based on market demand that is hard to be controlled by academic communities.

  17. Propose service-oriented collective intelligence platform to manage language services Enable language resource providers to provide their services while holding their ownership of their resources Develop language service management architecture Monitoring of language services Access control of language services Service Manager is a Web-based GUI Collect experience of operating the first operation of service-oriented platform Several language service policies Pros and Cons of service-oriented approach Summary

  18. Difficulties often arise while trying to share and combine the existing language resources and use them in real field Complex contracts and intellectual property rights Non-standardized application interfaces Improve the accessibility and usability of those language resources and encourage users to create new language services that suit their needs by combining several language resources Standardize interfaces of language resources by wrappers Publish language resources not as source program but as Web services Combinelanguage resources by Web service workflows Manage those service profile Control access to those resources Role of the Language Grid

  19. Language Grid Core Node and Service Node Language ResourceProviders Language Service Management WSDL create Language ServiceUsers Monitor Service Registration get AccessConstraint Endpoint URL AccessLog WSDL Virtual Endpoint Service Invoker Atomic ServiceEngine IC Tools 1. SOAP 4. SOAP 5. HTTP, FunctionCall, etc. 2. SOAP Composite Service Engine (ActiveBPEL, UIMA, HoG, etc) LanguageResources 3. SOAP Language Grid Service Node Language Grid Core Node

  20. Participants / Language Services • Participants (17 countries, 118 groups) • University / Research Institute • Kyoto Univ. (Japan), Shanghai Jiaotong Univ. (China), Univ. of Stuttgart (Germany), IT Univ. of Copenhagen (Denmark), Princeton Univ. (U.S), DFKI (Germany), CNR (Italy), Chinese Academy of Sciences (China), NECTEC (Thailand), and more. • NPO/NGO/Public Sector • NGOs for disaster reduction, Public Junior-high schools, City Boards of Education, and more. • Corporate (CSR activities / language resource providers) • NTT, Toshiba, Oki, Google, Kodensha, Translution, and more. • Language Services (more than 60) • Machine Translator • J-Server, Web-Transer, Toshiba, Parsit, Google Translate, and more. • Dictionary, Parallel Text • EDR , Wordnet, Life Science Dictionary, Multi-language Glossary on Natural Disasters, and more. • Morphological Analyzer • Dependency Parser • Composite Services

  21. Atomic Service DictionaryService Hinanbasho (Disaster shelter) Wrapping Search translated word Dictionary disaster shelter Dictionary Service Parallel Text Service Hinanbasho ha iekaratooidesu (The disaster shelter is far from my house) Wrapping Search similar translated text Parallel Text The disaster shelter is []. Parallel Text Service Language Grid Machine Translation Service Hinanbasho ha ie kara tooi desu (The disaster shelter is far from my house) Wrapping Translate by machine Disaster shelter is school close from a house. MT Machine Translation Service Human Translation Service Hinanbasho ha ie kara tooi desu (The disaster shelter is far from my house) Wrapping Translate with high quality Your disaster shelter is the school closest to your house. Human Translator Human Translation Service

  22. Fourth Layer: Intercultural Collaboration ToolsLanguage Grid Toolbox (developed by NICT) Multilingual BBS Text Translation Translation Result Input Back Translation Result ・Submit messages in users’ mother languages ・Improve the translation result by post editing manually ・Estimate the translation accuracy using backtranslation Language Resource Creation Text Translation Multilingual BBS Multilingual Dictionary Multilingual Corpus Dictionary Data Parallel Texts XOOPS(Open Source Software) Language Services on the Language Grid Toolbox was released as OSS. http://langrid-tools.nict.go.jp/toolbox/ ・Create multilingual dictionaries specific to users’ communities 24

  23. Diverse stakeholders Language Service User Language Resource Provider Computation Resource Provider Language Grid Operator Language Grid for non-profit use has been operated by Kyoto Univ. since December 2007. The Letter of Agreement on the Language Grid is available.(http://langrid.org/operation/) 118 organizations (from 17 countries) signed the agreement Language Service User Operation of the Language Grid control their resources control their resources Language Resource Provider Computation Resource Provider Language Grid Operator

  24. Language Grid Association(http://langrid.org/associaiton/)

  25. Intercultural Collaboration ToolsLanguage Grid Playground (developed by Kyoto Univ.) http://langrid.org/playground

  26. M3(developed by Wakayama Univ.) For medical staff http://www.langrid.org/association/m3support/indexe.html For foreign patient

  27. Pangaea Community Site(NPO Pangaea) • Pangaea is an NPO which aims at supporting communication between children in various countries • Pangaea Community Site allows the participants and the staffs to • Communicate in their own language using translation service • Japanese, Korean, English, German • Revise the result of machine translation for other people

  28. Pangaea as a Language Resource Provider Pictograms and community dictionary • Pictograms designed for communication between children in different countries • Pangaea is also a provider of language resources • Pangaea Community dictionary which contains 500 terms for Pangaea’s activities Pangaea Activities (Community Site) LanguageGrid • e.g. Pangaean (participants of activities), Koetsuna (ice break-ing activity for children) Both resources are provided as Web services and combined with other services on the Language Grid Combining Korean, Japanese, English, and German Morphological Analyzers, community dictionary, and 2 Machine Translators

  29. Multilingual Communication System(Kyoto University, Ritsumeikan University) Fujimi Junior High School Every students 584 Filipino 4Chinese 6Korean 2Peruvian 1 Japanese user Chinese user Autocomplete Translation Backtranslation

More Related