clarin and flarenet new european initiatives for language resources and language technologies l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CLARIN and FLaReNet: new European Initiatives for Language Resources and Language Technologies PowerPoint Presentation
Download Presentation
CLARIN and FLaReNet: new European Initiatives for Language Resources and Language Technologies

Loading in 2 Seconds...

play fullscreen
1 / 28

CLARIN and FLaReNet: new European Initiatives for Language Resources and Language Technologies - PowerPoint PPT Presentation


  • 163 Views
  • Uploaded on

CLARIN and FLaReNet: new European Initiatives for Language Resources and Language Technologies. Nicoletta Calzolari Istituto di Linguistica Computazionale del CNR, Pisa, Italy glottolo@ilc.cnr.it. Today, many vitality & s uccess signs… for LRs.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CLARIN and FLaReNet: new European Initiatives for Language Resources and Language Technologies' - ferris


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
clarin and flarenet new european initiatives for language resources and language technologies

CLARIN and FLaReNet: new European Initiatives for Language Resources and Language Technologies

Nicoletta Calzolari

Istituto di Linguistica Computazionale del CNR, Pisa, Italy

glottolo@ilc.cnr.it

Boulder, March 2008

slide2

Today, many vitality & success signs… for LRs

  • In Spoken, Written, Multimodal areas … … in new emerging areas
  • Statistical approaches…
  • Different dimensions & layers: Content (Ontologies), Emotion, Time, …
  • For Evaluation
  • For Training
      • LREC(> 900 submissions); many LRs at COLING and even at ACL!!
      • ELRA (self-sustaining) & LDC
      • LRE (new Journal: N. Ide & NC)
      • ISO-TC37-SC4/WG4 (International Standards for LRs)
      • AFNLP…
      • ESFRI - CLARIN (also political & strategic role)
      • New calls or initiatives in EU, US, ASIA, on LRs, interoperability, cooperation, …

Boulder, March 2008

but an important point
BUT … an important point:

In the ’90s

  • There was a global vision of the field & its main components:
    • Standards
    • Creation of LRs
    • Distribution

Then:

    • Automatic acquisition

… towards the Infrastructure of LRs & LT

ELRA

LDC

While today:

    • There is an ever increasing set of initiatives for new LRs, basic robust technologies, models??, algorithms,
  • We have a LR community culture
  • BUT sort of scattered, opportunistic, not much coherence

Boulder, March 2008

today
Today …

The wealth of data & of basic technologies is such that:

  • We should reflect again at the field as a whole & ask if
    • Standards
    • Creation of LRs
    • Automatic acquisition
    • Distribution

are still “the” important components,

or how they have changed/must change

  • Content interoperability
  • Collaborative creation & Manag.
  • Dynamic LRs
  • Sharing

could be at the basis of a

new Paradigm for LRs & LT

& of a new Infrastructure ??

… Which new challenges towards a

new & more mature infrastructure of LRs & LTs??

Boulder, March 2008

iso lmf lexical markup framework
ISO LMF – Lexical Markup Framework

Builds also on

EAGLES/ISLE

Structural skeleton, with the basic hierarchy of information in a lexical entry

+ various extensions;

LMF specs comply with modeling UML principles; an XML DTD allows implementation

NEDO

Asian

Lang.

NICT Language-Grid Service Ontology

The field is mature

from Monica Monachini

Boulder, March 2008

xml based abstract lexicon interchange format mapping exercise
XML based Abstract Lexicon Interchange FormatMapping exercise

Major best practices:

  • OLIF
  • PAROLE/SIMPLE
  • LC-Star
  • WordNet - EuroWordNet
  • FrameNet
  • BDef formal database of lexicographic definitions derived from Explanatory Dictionary of Contemporary French
  • …others on the way…

Entries from existing lexicons have been mapped to LMF to prove that the model is able to represent many best practices and achieve unification

from Monica Monachini

Boulder, March 2008

lexical web content interoperability standards
Lexical WEB & Content Interoperability  ‘Standards’
  • As a critical step for semantic mark-up in the SemWeb

NomLex

WordNets

WordNets

ComLex

WordNets

with

intelligent

agents

SIMPLE

LMF

Lex_x

FrameNet

Lex_y

Standards for Interoperability

Enough??

Boulder, March 2008

need of tools to make this vision operational concrete
Need of tools to make this vision operational & concrete

New prototype “LeXFlow”:

(http://xmlgroup.iit.cnr.it:98/MILE/lexflow/demo.xhtml)

  • web-based collaborative environment for semi-automatic management/integration of lexical resources
  • enabling interoperability of distributedlexical resources
  • accessed by different types of agents
  • From Language Resources
          • To Language Services

Boulder, March 2008

architecture for cooperative integration of lexicons
Architecture for cooperative integration of lexicons

Agent Role3

Agent Role1

Agent Role4

Agent Role2

Coordination

Web service Interface

Simple-Wordnet

Relation Calculator

Application

MultiWordnet

Relation Calculator

Web service Interface

Italian

Simple

Italian

Wordnet

Chinese

Wordnet

ILI

Mapper

Relation

Mapper

Data

Boulder, March 2008

slide10

parte, tratto

N#12348

iperonimia/HYP

A new proposed mero relation

passaggio,

strada,via

N#1290

meronimy/MPT

curvatura,

svolta,curva

N#20944

iponimia/HPO

carreggiata

N#21225

Synonym

Derived

ILI1.5-3001757-n

road,route

ILI1.6-3243979-n

ILI1.5-5691718-n

stretch

ILI1.6-???

ILI1.5-2857000-n

passage

ILI1.6-3092396-n

ILI1.5-3002522-n

roadway

ILI1.6-3245327-n

ILI1.5-8488101-n

bend,crook,turn

ILI1.6-9992072-n

Synonym

Reinforcement

& validity

tong_dao

(通道)

N#03092396

上位(泛稱)詞_為/HYP

che_dao

(車道)

N#3245327

dao_lu,dao,lu

(道路,道,路)

N#03243979

下位(特指)詞_為/HPO

wan

(彎)

N#9992072

部件_部份詞_為/MPT

Boulder, March 2008

lexflow
LexFlow
  • Architecture for making distributed wordnets interoperable
  • It lends itself to different applications in LR processing:
    • Enrichment of existing lexical resources
    • Creation of new resources
    • Validation of existing resources
  • Can provide a platform for cooperative & collective creation & management of LRs, by providing a web-based environment for the collaboration & interaction of distributed agents and resources
  • Prototype of a web application supporting the GlobalWordNet Grid initiative, i.e. a shared multi-lingual knowledge base for cross-lingual processing based on distributed resources over the Grid

New project:KYOTO

Boulder, March 2008

some steps for a new generation of lrs
Some steps for a “new generation” of LRs
  • From huge efforts in building static, large-scale, general-purpose LRs
  • Tonon-static LRs rapidly built on-demand, tailored to spefic user needs
  • From closed, locally developed and centralized resources
  • To LRs residing over distributed places, accessible on the web, choreographed by agents acting over them
  • From Language Resources
  • To Language Services

Boulder, March 2008

uima at ilc
UIMA at ILC
  • Create an infrastructure to allow:
    • Distributed access to resources
    • Creation of shared resources
    • Use of methods to access NLP technologies
  • Integrate available software via Web Services
  • Standardise resources to be accessed from other research centers

Boulder, March 2008

distributed language services
Distributed Language Services

A long-term scenario implying

    • content interoperability standards,
    • supra-national cooperation and
    • development of architectures enabling accessibility
  • Create new resources on the basis of existing
  • Exchange and integrate information across repositories
  • Compose new services on demand
  • Collaborative & collective/social development and validation, cross-resource integration and exchange of information

Language Grid

Wiki

Boulder, March 2008

slide15

Many dimensions around the notion of language

finally

  • We need to put together
  • technical,
  • organisational,
  • strategic,
  • economic,
  • political issues of LRs

Two new European Infrastructural & Networking Initiatives

Multilingualism

Political issues

e.g. a commonly agreed list of minimal requirements for “national” LRs: BLARK

Need of bodies for

a broad research agenda & strategic actions for LT&LRs (W/S /MM)

based on all the dimensions

Interdisciplinarity &

Multidisciplinarity

  • Cultural issues
    • Language … and cultural identity
    • Language … and the Humanities
  • Economic,
  • social issues
    • Applications
    • Services

Technical issues

Boulder, March 2008

which communities
Which Communities?

Technologies exist, but the infrastructure that puts them together and sustains them is still missing

for

  • Humanities
  • Social Sciences
  • Digital Libraries
  • Cultural Heritage
  • Language Resources
  • Language Technologies
  • Standardisation

core

Enabling

infrastr

CLARIN

ResInfra

FLaReNet

Network

Multilinguality

on

  • Grid
  • Semantic Web
  • Ontologists
  • ICT

Focus on cooperation

  • Many application domains

(eculture, egovernment, ehealth, …)

for

Boulder, March 2008

clarin

ESFRI Research Infrastructures

CLARIN

Common Language Resources and Technologies Infrastructure

for the Humanities & Social Sciences

Large-scale pan-Europeancollaborative effort(31+ countries)

  • Make LRs & LTs available & readily usable to scholars of humanities & social sciences (& all disciplines)
  • Need to overcome the present fragmented situation by harmonising structural and terminological differences
  • Basis is a Grid-type infrastructure and Semantic Web technology
  • The benefits of computer enhanced language processing become available only when a critical mass of coordinated effort is invested in building an enabling infrastructure, which can provide services in the form of provision of tools & resources as well as training & counseling across a wide span of domains
  • The infrastructure will be based on a number of resource, service and expertise centres

Boulder, March 2008

slide18

CLARIN Mission

  • Create acomprehensive and free to use distributed archive of LRs & LTscovering not only the languages of all member states, but also other languages studied and used in Europe
  • Through the fact that the tools & resources will be interoperable across languages & domains,contribute to preserving andsupporting multilingual & multicultural European heritage
  • An operationalopen infrastructure of web serviceswill introduce anew paradigm of distributed collaborative development
  • Allow many contributors to add all kinds of new services based on existing ones, thus ensuring reusability and allowing scaling up to suit individual needs

Boulder, March 2008

how can we tackle these challenges
How can we tackle these challenges?
  • J. Taylor
  • “eScienceis about global collaboration in
  • key areas of science and the next generation
  • of infrastructures that will enable it”
  • Need to build new types of platforms
  • to allow researchers to combine existing resources easily to new ones to tackle the big challenges
  • to increase the productivity of all interested researchers, since currently too much time is wasted by preparatory work

from P. Wittenburg

Boulder, March 2008

slide20

CLARIN establishes such a new generation of extended infrastructure

  • Thus CLARIN is not about creating and building new language resources and technology, but
  • making them available and accessible
  • as services
  • in a stable and persistent infrastructure

to allow tackling the great challenges

  • CLARIN: http://www.clarin.eu
  • Grid Project: http://www.mpi.nl/dam-lr
  • ISO TC37/SC4: http://www.tc37sc4.org
  • Standards Project: http://lirics.loria.fr/

eScience Vision

from P. Wittenburg

Boulder, March 2008

we have still a long path
We have still a long path …

& also a “new project”

in an e-Contentplus Call for a:

  • “Thematic Network on Language Resources”:

FLaReNet

    • To providecommon recommendations (to the EC) for future actions
    • To give priorities
    • Need of ‘visions’

In a global context, in cooperation with CLARIN

& also with non-EU members

Boulder, March 2008

which communities22
Which Communities?

LRs & LTs exist, but a global vision, policy and strategy

is still missing

for

  • Humanities
  • Social Sciences
  • Digital Libraries
  • Cultural Heritage
  • Language Resources
  • Language Technologies
  • Standardisation
  • Ontologists
  • Content

core

CLARIN

ResInf

EU

Forum

FLaReNet

Network

Multilinguality

Focus on cooperation

for

  • EC
  • Funding agencies
  • Many application domains

(eculture, egovernment, ehealth, intelligence, domotics, content industry, …)

for

Boulder, March 2008

flarenet fostering language resources network
FLaReNet Fostering Language Resources Network

A European forum

  • to facilitate interaction among LR stakeholders

The Network structure considers that LRs present various dimensions and must be approached from many perspectives:

  • technical, but also
  • organisational
  • economic
  • legal
  • political

Addresses also

  • multicultural and multilingual aspects, essential when facing access and use of digital content in today’s Europe

Boulder, March 2008

organised in thematic working groups
Organised in Thematic Working Groups

A layered structure, with leading experts & groups (national and European institutions, SMEs, large companies) for all relevant LR areas (about 40 partners)

    • in collaboration with CLARIN
    • to ensure coherence of LR-related efforts in Europe

FLaReNet will

  • consolidate existing knowledge, presenting it analytically and visibly
  • contribute to structuring the area of LRs of the future by discussing new strategies to:
    • convert existing and experimental technologies related to LRs into useful economic and societal benefits
    • integrate so far partial solutions into broader infrastructures
    • consolidate areas mature enough for recommendation of best practices
    • anticipate the needs of new types of LRs

Boulder, March 2008

thematic areas
Thematic Areas
  • The Chart for the area of LRs in its different dimensions
  • Methods and models for LR building, reuse, interlinking and maintenance
  • Harmonisation of formats and standards
  • Definition of evaluation protocols and evaluation procedures
  • Methods for the automatic construction and processing of LRs

To build together:

  • Evolving RoadMap
  • Blueprint of actions and infrastructures

Boulder, March 2008

objectives expected results
Objectives & expected results

The largest Network of LR and HLT players, with diverse approaches, efforts and technologies

  • Enable progress toward community consensus
  • Give an extended picture of LRs & recast its definition in the light of recent scientific, methodological, technological, social developments
  • Consolidate methods & approaches, common practices, frameworks and architectures
  • A “roadmap” identifying areas where consensus has been achieved or is emerging vs. areas where additional discussion and testing is required, together with an indication of priorities
  • Recommendations in the form of a plan of coherent actions for the EU and national organizations
  • A European model for the LRs of the next years

Ambitious!

Boulder, March 2008

outcomes of flarenet
Outcomes of FLaReNet

The outcomes will be of a directive nature

  • to help the EC, and national funding agencies, identifying priority areas of LRs of major interest for the public that need public funding to develop or improve

A blueprint of actions will constitute input to policy development both at EU and national level

  • for identifying new language policies that support linguistic diversity in Europe
  • in combination with strengthening the language product market, e.g. for new products & innovative services, especially for less technologically advanced languages

Boulder, March 2008

these initiatives together
These Initiatives, … together
  • Call for international cooperation also outside Europe

and will be relevant for

  • setting up a global worldwide Forum of Language Resources and Language Technologies

Boulder, March 2008