slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Martin Wynne OUCS OeRC Linguistics martin.wynne@oucs.ox.ac.uk PowerPoint Presentation
Download Presentation
Martin Wynne OUCS OeRC Linguistics martin.wynne@oucs.ox.ac.uk

Loading in 2 Seconds...

play fullscreen
1 / 10

Martin Wynne OUCS OeRC Linguistics martin.wynne@oucs.ox.ac.uk - PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on

Martin Wynne OUCS OeRC Linguistics martin.wynne@oucs.ox.ac.uk. CLARIN: A Pan-European Research Infrastructure for Language Resources and Technologies. Martin Wynne OUCS, OeRC & Linguistics Faculty University of Oxford martin.wynne@oucs.ox.ac.uk. Language Resources and Technologies.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Martin Wynne OUCS OeRC Linguistics martin.wynne@oucs.ox.ac.uk' - qiana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Martin Wynne

OUCS

OeRC

Linguistics

martin.wynne@oucs.ox.ac.uk

CLARIN: A Pan-European Research Infrastructure for Language Resources and Technologies

Martin Wynne

OUCS, OeRC & Linguistics Faculty

University of Oxford

martin.wynne@oucs.ox.ac.uk

language resources and technologies
Language Resources and Technologies

CLARIN is concerned with language resources and technologies, e.g.:

Linguistic corpus (a principled collection of texts sampled to be representative of a particular language variety for the purposes of empirical linguistic research)

Audio and video corpora

Lexical resources (wordlists, dictionaries, morphological tables, semantic resources, ontologies)

Language documentation (e.g. field notes about endangered languages)

Language processing tools (for annotation, analysis, linking, editing, speech recognition and synthesis, translation, summarisation, text mining, internet search etc)

Processing environments and workflow management tools

Other language resources...

language resources and technologies1
Language Resources and Technologies

These resources are of use not only in linguistics, but across the Humanities and in many areas of the Social Sciences.

Basic language resource toolkits (BLARKs) are essential; the existence of a BLARK is the pre-condition for building natural language-aware tools and services, so there are numerous potential applications beyond academic research which require these datasets and tools.

the problems
The problems

Many archives known only to certain communities

Archives are mostly unconnected, and data difficult to find

Every archive has its own standards for storage and access

There are not sufficient incentives to share resources

Resources are in different formats, follow different standards, are described in differing ways

Basic resources do not exist for all languages

Tools are hard to use for non-specialist

Tools and data are not available for online processing (only simple retrieval of files is possible)

Many researchers are not aware of the potential benefits of using language and speech technology tools

Many researchers are not aware of leading edge computational infrastructures

the clarin vision
The CLARIN Vision

A researcher in Zagreb can, from his desktop computer:

single sign-on with local authentication

search for, find and obtain authorization to use data in Oxford, Warsaw and Bergen

select the precise (composite) dataset to work on, and save that selection

run semantic analysis tools from Budapest and statistical tools from Tübingen over the dataset

use computational power from the local or national computing centre where necessary

save the workflow and results of the analysis, and share those results with collaborators in Paris, Vienna and Helsinki

discuss and iteratively adopt and re-run the analyses with collaborators

the clarin mission
The CLARIN Mission

what?

create a research infrastructure that makes language resources and technologies (LRT) available to scholars of all disciplines, especially humanities and social sciences

how?

unite existing digital archives into a federation of connected archives with unified web access

provide language and speech technology tools as web services operating on (language) data in archives

This represents the first coordinated and comprehensive attempt to address the technical, legal, administrative and financial barriers to the effective use of LRTs in academic research.

who we are
Who we are?

CLARIN consortium

32 partners from 22 EU and associated countries

CLARIN community

140-odd members in 32 countries

leading partners include:

Utrecht University (Steven Krauwer, coordinator)‏

Max Planck Institute Nijmegen (Peter Wittenburg)‏

Hungarian Academy of Sciences (Tamás Váradi)‏

Oxford University (Martin Wynne)‏

Tübingen University (Erhard Hinrichs)‏

Helsinki University (Kimmo Koskiennemi)‏

University of Copenhagen (Bente Maegaard)

plus many more

clarin technical work
CLARIN technical work

Promoting collaboration and interoperability between European language resource repositories, particularly in relation to:

Persistent identifiers

Component metadata

Trust domains

Long-term Preservation and Access

Service centres

Virtual collections

Standards and best practices

Concept registry services

See the CLARIN Short Guides on these topics at http://www.clarin.eu/

clarin and the grid
CLARIN and the GRID

CLARIN aims to enable e-Humanities

CLARIN is currently an early adopter of infrastructure services in Europe (e.g implementing access via Shibboleth, PIDs, metadata mappings)

CLARIN aims to be a gateway to language resource collections and technology services in other institutions (e.g. digital libraries, commercial collections)

Potentially providing language technologies and tools for other applications and services (e.g. for information extraction)

Much current work involves collaboration with other initiatives involving Grid, research infrastructure, standards, Digital Humanites. Etc. to help create a coherent and coordinated infrastructure.

slide10

Thank you for your attention

CLARIN has received funding fromthe European Community's Seventh Framework Programmeunder grant agreement n° 212230