Loading in 2 Seconds...
Loading in 2 Seconds...
Martin Wynne OUCS OeRC Linguistics firstname.lastname@example.org. CLARIN: A Pan-European Research Infrastructure for Language Resources and Technologies. Martin Wynne OUCS, OeRC & Linguistics Faculty University of Oxford email@example.com. Language Resources and Technologies.
CLARIN: A Pan-European Research Infrastructure for Language Resources and Technologies
OUCS, OeRC & Linguistics Faculty
University of Oxford
CLARIN is concerned with language resources and technologies, e.g.:
Linguistic corpus (a principled collection of texts sampled to be representative of a particular language variety for the purposes of empirical linguistic research)
Audio and video corpora
Lexical resources (wordlists, dictionaries, morphological tables, semantic resources, ontologies)
Language documentation (e.g. field notes about endangered languages)
Language processing tools (for annotation, analysis, linking, editing, speech recognition and synthesis, translation, summarisation, text mining, internet search etc)
Processing environments and workflow management tools
Other language resources...
These resources are of use not only in linguistics, but across the Humanities and in many areas of the Social Sciences.
Basic language resource toolkits (BLARKs) are essential; the existence of a BLARK is the pre-condition for building natural language-aware tools and services, so there are numerous potential applications beyond academic research which require these datasets and tools.
Many archives known only to certain communities
Archives are mostly unconnected, and data difficult to find
Every archive has its own standards for storage and access
There are not sufficient incentives to share resources
Resources are in different formats, follow different standards, are described in differing ways
Basic resources do not exist for all languages
Tools are hard to use for non-specialist
Tools and data are not available for online processing (only simple retrieval of files is possible)
Many researchers are not aware of the potential benefits of using language and speech technology tools
Many researchers are not aware of leading edge computational infrastructures
A researcher in Zagreb can, from his desktop computer:
single sign-on with local authentication
search for, find and obtain authorization to use data in Oxford, Warsaw and Bergen
select the precise (composite) dataset to work on, and save that selection
run semantic analysis tools from Budapest and statistical tools from Tübingen over the dataset
use computational power from the local or national computing centre where necessary
save the workflow and results of the analysis, and share those results with collaborators in Paris, Vienna and Helsinki
discuss and iteratively adopt and re-run the analyses with collaborators
create a research infrastructure that makes language resources and technologies (LRT) available to scholars of all disciplines, especially humanities and social sciences
unite existing digital archives into a federation of connected archives with unified web access
provide language and speech technology tools as web services operating on (language) data in archives
This represents the first coordinated and comprehensive attempt to address the technical, legal, administrative and financial barriers to the effective use of LRTs in academic research.
32 partners from 22 EU and associated countries
140-odd members in 32 countries
leading partners include:
Utrecht University (Steven Krauwer, coordinator)
Max Planck Institute Nijmegen (Peter Wittenburg)
Hungarian Academy of Sciences (Tamás Váradi)
Oxford University (Martin Wynne)
Tübingen University (Erhard Hinrichs)
Helsinki University (Kimmo Koskiennemi)
University of Copenhagen (Bente Maegaard)
plus many more
Promoting collaboration and interoperability between European language resource repositories, particularly in relation to:
Long-term Preservation and Access
Standards and best practices
Concept registry services
See the CLARIN Short Guides on these topics at http://www.clarin.eu/
CLARIN aims to enable e-Humanities
CLARIN is currently an early adopter of infrastructure services in Europe (e.g implementing access via Shibboleth, PIDs, metadata mappings)
CLARIN aims to be a gateway to language resource collections and technology services in other institutions (e.g. digital libraries, commercial collections)
Potentially providing language technologies and tools for other applications and services (e.g. for information extraction)
Much current work involves collaboration with other initiatives involving Grid, research infrastructure, standards, Digital Humanites. Etc. to help create a coherent and coordinated infrastructure.
CLARIN has received funding fromthe European Community's Seventh Framework Programmeunder grant agreement n° 212230