1 / 13

Human Language Technologies & the European Research Area Joseph Mariani

Human Language Technologies & the European Research Area Joseph Mariani Former Director, ICT Department, French Ministry of Research & LIMSI-CNRS. LT for a Multilingual Europe. Language as a specific issue for Europe Economical, cultural and political challenge with 2 dimensions:

trilby
Download Presentation

Human Language Technologies & the European Research Area Joseph Mariani

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Language Technologies & the European Research Area Joseph Mariani Former Director, ICT Department, French Ministry of Research & LIMSI-CNRS

  2. LT for a Multilingual Europe • Language as a specific issue for Europe • Economical, cultural and political challenge with 2 dimensions: • Preserve the EU Member States cultures • Preference for native language (Web sites in German (75%)...) • Allow for communication across member states • 50% of European citizens speak only one language (97% in Japan) • 1650 translators at the EC - 1.4 Mpages translated per year • 30% European Parliament budget (300 M€) – 500 translators • EU: 25 countries, 20 official languages / 380 language pairs • Enormous cost for the EU, while mandatory • Need for the assistance of Language Technologies • Huge effort (# LT * # languages), too large for the EC alone • Effort should be shared with EU Member States • Would meet the needs of the European Union, but would also put Europe, and the European industry, in a strong position for providing tools for handling multilingualism worldwide Multilingualism & Language Technology : a challenge for Europe

  3. Building the European Research Area • European Research Area (ERA) • Need to coordinate EC (< 15%) and MS (> 85%) research efforts • The ERA instruments • ERA-Net in FP6 (CA & SSA) to coordinate MS national / regional programs (specific action in DG-Research) : EC only funds coordination activities, not R&D • ERA-Net+ in FP7 (CSA) to also coordinate with EC programs (thematic action) : EC may also fund R&D activities (?) • Article 169 to coordinate EC+MS+industrial efforts • Needs a joint European Council and Parliament decision • Single experience in infectious diseases (200 M€ * 3= 600 M€) • Topics evocated for FP7: SMEs, research in Baltic sea, Metrology… • ESFRI (European Strategy Forum on Research Infrastructure) • LT well and naturally fitted with ERA • Coordinate the national / regional efforts, mostly devoted to national / regional languages, with the EC effort, mostly addressing the multilingual dimension and the general coordination • Show a major value added by EU, in full agreement with subsidiarity principle Multilingualism & Language Technology : a challenge for Europe

  4. Support to LT in France : Techno-langue • Report to the Prime Minister (November 2000) • Need to develop Language Technologies for the French language • Techno-langue Action launched in 2002 • Basic Technological Research (RTB) • Articulate with related existing programs (RRIT) • Funded by 3 ministries : • Research, Industry, Culture • Call for Proposals • Up to 3-year projects (2003-2006) • Set up an infrastructure to conduct research in LT for French • Language Resources (Data / Tools) • Evaluation (Technology / Applications) • Standards • Technological survey Multilingualism & Language Technology : a challenge for Europe

  5. Funded projects • Budget: 20 M€ effort - 7.5 M€ public funding (over 3 years) • 94 participants (industry, research, public agencies, foreign) • 21 funded projects: • 10 on Language Resources (data and tools) • 2 on Standards (Spoken / Written) • 1 on Technological survey (Portal) : http://www.technolangue.net • 8 on Technology Evaluation (campaigns) • Written language processing (5) • EASY: Syntactic parsing • ARCADE 2: Text alignment • CESART: Terminology extraction • EQUER: Information query • CESTA: Machine translation • Spoken Language processing (3) • EVASY: Speech synthesis • MEDIA: Spoken dialog • ESTER: Speech transcription / automatic indexing Multilingualism & Language Technology : a challenge for Europe

  6. Sharing efforts on LT in Europe • LT well and naturally fitted with ERA • The EC would primarily support : • the coordination: management, standards, technology evaluation, communication... • Each Member State would primarily support the cost for covering its language(s): • Language Resources (essential) : (annotated) corpus (spoken / written), lexicon (including pronunciations), dictionaries… • Language specific technology development/adaptation • EC and MS would support the cost of: • Developing core Language Technologies: • Speech recognition, synthesis, understanding, spoken dialog, language tagging, parsing, analysis, generation, text retrieval, document understanding, machine translation, spoken translation, etc • Developing innovative applications using HLT Multilingualism & Language Technology : a challenge for Europe

  7. Lang-Net proposal • Build-up ERA-Net proposal of infrastructural nature • Language Resources, LT evaluation, Standards, Survey • Share of information • Strategic activities and Best Practices • Implementation of joint activities • Transnational research activities • Partnership of EU countries or regions having LT programs • 11 countries / regions in partnership : Germany, France, Italy, Trento region, Czech Republic, Denmark, Norway, The Netherlands / Belgium-Flanders (Dutch Language Union), Spain, Basque region, Sweden • Austria, Catalonia, Finland, Greece, Iceland, Portugal, Switzerland, UK (contacts) • Extendable to other partners • NMS (Slovenia, Cyprus, Poland, Hungary, Malta, Baltic countries…) • AS (Romania, Bulgaria…) • USA, Japan, South Africa, Israel, Canada… (contacts) Multilingualism & Language Technology : a challenge for Europe

  8. Situation at the EC • DG Research (ERA-Net program) • Lang-Net proposal submitted in march 2005, not selected • Look forward for Thematic ERA-Net+ in FP7 • DG INFSO + Media • «Science & Technology Forum on Multilingualism» • June 2005 and February 2006 in Luxembourg • Visit of a French delegation to H. Forster & B. Smith (September 2005) • DG Education, training, culture and multilingualism • « A new framework strategy for multilingualism » (Nov. 2005) • http://europa.eu.int/languages/ Web site in the 20 EU languages • EC will set up a High Level Group on Multilingualism • A EU ministerial conference will be held • Further communication will be presented by EC to Parliament and Council • Committee of EU regions (official use of regional Spanish languages) • TC-Star report : Introduction signed by V. Reding & J. Figel Multilingualism & Language Technology : a challenge for Europe

  9. Situation at the EC • Memorandum for a Digital Europe (submitted by France to Finnish presidency) • Includes « LT for a Multilingual Europe » as a specific research topic • EuropeanDigital Library • Stresses the multilingual (crosslingual ?) dimension and need for tools • ENISA (European Network and Information Security Agency) • Create a European multilingual information sharing and alert system • CLARIN : Common Language Resources & Technology Infrastructure • Labelled within ESFRI • Easy access to Language Resources and Technology for the Humanities community • Well in agreement with the objective of coordinating activities and of settling a necessary infastructure • But addresses only part of the needs: • Considers only the Humanities scientific area, neither the ICT, nor the industrial ones • A network of research centers specialized in Humanities, not of national programs Multilingualism & Language Technology : a challenge for Europe

  10. Situation in FP7 • FP7 ICT program (2007-2013) • Technology pillar :Simulation, Visualization, Interaction, mixed realities • Tools for innovative design, and creativity in products, services and digital media, and for natural, language-enabled and context-rich interaction and communication • Workprogram WP1 (2007-2008) • Challenge 2 « Cognitive systems, interaction, robotics » • Objective 2.1 « Cognitive systems, interaction, robotics » • Essentially oriented towards Cognitive robotics • Challenge 4 « Digital libraries » • Multilingual (crosslingual ?) content, summarization… • Strong MS reaction in favor of HLT at ISTC meeting (September 20, 2006) : France asked to add a second objective in Challenge 2 on interaction / LT • Similar V3.0 draft WP content (November 17, 2006) Multilingualism & Language Technology : a challenge for Europe

  11. HLT in WP1 (main) . CHALLENGE 2 : COGNITIVE SYSTEMS, INTERACTION, ROBOTICS • Objective 3.2.1.1: Cognitive Systems, Interaction, Robotics • Intuitive multimodal interfaces and interpersonal communication systems providing personalized interactivity in real-world and virtual environments, based on improved human interaction modelling and understanding of contextually-referred communication, for example, by signs and signals in all modes (such as sound, vision, touch) and modalities (such as natural language, both spoken and written), through autonomous adaptation and by addressing user needs, intentions and emotions. • New markets such as novel functionalities for embedded systems and assistive systems for interpersonal communications, such as support of dynamic translation, and effective medical diagnostics and therapeutics. • Explore and validate the use of new ways of combining statistical, knowledge driven and cognitive approaches to language understanding, generation, and translation, by machines. • A principled approach to structuring research in relevant areas, addressing in particular learning in artificial systems, the requirements for cognitive capacities of robotic, interactive and language support systems, and including the development of experimental scenarios, the development or construction of resources for experimentation, and the development of performance metrics and definitions of autonomy levels for artificial systems. • Co-ordination with related national or regional research programmes or initiatives. • Indicative budget distribution • 193 M€ (Call 1 [96 M€], Call 3 [97 M€]) • CP 173 M€, NoE 16 M€, CSA 4 M€ Multilingualism & Language Technology : a challenge for Europe

  12. HLT in WP1 (international cooperation) • Development-related ICT research exploitation and cooperation roadmaps (3 sub-themes) • Sub-theme 1: « Language and speech technologies with particular focus on Arabic-speaking regions / countries(including Mediterranean Partner Countries and ACP countries). The overall objective is to reduce language barriers and broaden access, usage and interaction between ICT services and applications. This preparatory action will focus on requirements and options for cost-effective natural language systems (written or spoken) in domains such as automated translation, information retrieval and indexing. It will also aim to reinforce collaboration with Arabic research communitieson natural language processing (NLP) methods and benchmarking, including for language resources such as corpora and knowledge bases. » • Indicative budget distribution • 2 M€ (for the 3 sub-themes, one action per sub-theme) (CSA) Multilingualism & Language Technology : a challenge for Europe

  13. Conclusions • Language Technologies needed for a Multilingual Europe, • Effort too large for the EC alone, • Programs exist in several EU Member States, at the EC and in various countries worldwide, • Maybe the most adequate topic for the EC/MS cooperation scheme, promoted in the construction of the European Research Area, • Need to address permanent infrastructural issues and to install an experimental framework : Language Resources, Evaluation, Standards and Survey. • A great opportunity & a grand challenge for Europe • Which is insufficiently present in WP1 of FP7 !!! Multilingualism & Language Technology : a challenge for Europe

More Related