1 / 20

ENABLER, BLARK, what’s next?

ENABLER, BLARK, what’s next?. Steven Krauwer Utrecht University / ELSNET. Overview. ENABLER BLARK BLARK Results Recent developments CLARIN Some reflections MyBLARK Concluding remarks. ENABLER. EU Project, FP5, under Information Society Technologies (see www.enabler-network.org)

giulia
Download Presentation

ENABLER, BLARK, what’s next?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ENABLER, BLARK, what’s next? Steven Krauwer Utrecht University / ELSNET

  2. Overview • ENABLER • BLARK • BLARK Results • Recent developments • CLARIN • Some reflections • MyBLARK • Concluding remarks

  3. ENABLER • EU Project, FP5, under Information Society Technologies (see www.enabler-network.org) • bringing together national language resources projects in many EU countries • aimed at providing a cooperative framework to foster cooperation and interoperability • with a strong industrial drive • led by Pisa, and ended –as an EU project– in 2004 • … but still existing as a community, in close collaboration with ELSNET (www.elsnet.org)

  4. BLARK (1) • Basic Language Resource Kit • Idea (first launched in 1998): definition of the minimal set that is needed to do any (precompetitive) R&D and education at all • Definition should be in principle language independent (although specific languages may require specific adaptations)

  5. BLARK (2) • Definition should include both data collections (corpora, lexicons) and modules (taggers, parsers, synthesizers, annotation tools) • It should include both qualitative aspects (e.g. standards) and quantitative aspects (e.g. size)

  6. BLARK (3) • Once the definition is available it can be used as a common reference point that allows to • assess the resources situation of a language (how much of the BLARK is available, and what is still missing) • make priority plans for bringing the resources situation up to date

  7. BLARK (4) • Note that the BLARK is necessarily dynamic, as new technological developments will come with new requirements • Note that the BLARK for a language will only work if there is a body that takes responsibility for its implementation and for the maintenance and distribution of the resources created

  8. BLARK Results • First adopted by the Dutch Language Union, resulting in a first 12 Meuro implementation programme launched at the end of 2004 • Explored and developed for Arabic in the NEMLAR project (CST, ELDA, ELSNET, and others; see www.nemlar.org and the presentation at this conference O27-G on Thursday) • BLARK concept included in a number of proposals, but without tangible results • Suggestions for a more advanced variant (ELARK) have been put forward by ELDA and others

  9. Recent developments • CLARIN: Common Language Resources and Technology Infrastructure (see LREC 2006 workshop on May 22, or otherwise www.mpi.nl/clarin) • NOT a project proposal, but rather a proposal for a Research Infrastructure to be included in the European Roadmap for Research Infrastructures

  10. CLARIN (1) • Creation of open European Language Resources Network with strong service centers and repositories, providing the humanities community at large (i.e. not just the language and speech technology community) with • knowledge about which language resources and tools exist and how to use them • access to existing language resources • coordinated creation of new resources • access to advanced services for access and adaptation • bundling of expertise in specific problem areas • training centers

  11. CLARIN (2) Three important observations: • CLARIN has no industrial drive • CLARIN aims at addressing all languages in the EU (and associated countries) • One of CLARIN’s objectives is the definition and the coordinated creation of BLARKs for all languages of the EU

  12. Some reflections • Whatever progress has been made (DLU, NEMLAR, ELARK) was mostly inspired by industrial needs • Industrial considerations do not favour smaller languages • Progress of the BLARK since 1998 has been slow • No new funding opportunities in FP6 to get anything done • CLARIN may offer exciting opportunities (if successful), but this will take a lot of time

  13. More reflections • The present (embryonic) BLARK definition may be one or more steps too far for under-resourced languages • So, why not add to the concept the BLARKette, which should represent a very basic entry level variant of the BLARK, targeting exclusively the research and (especially) education community • Small and simple, should fit on a CDROM

  14. And yet more reflections • Nothing funded will happen before well into 2007 • Why wait until then, e.g. if and when CLARIN is in place and some formal process has put into motion to define the BLARK (and the BLARKette)? • Why not start an action to consult the language communities and to arrive at a first proposal for a BLARK and BLARKette definition?

  15. MyBLARK, the proposal • We initiate MyBLARK, aiming at collecting (for each language in the EU) • a description of the essential components of the BLARK • and of the BLARKette • We try to distill from this a broadly supported proposal for the definition of both concepts • We offer this as an input to the CLARIN project if it ever happens, or otherwise use it to launch other initiatives

  16. MyBLARK, the process • ELSNET (possibly in collaboration with COCOSDA/WRITE) will send out a simple questionnaire to all known language resources centers, asking for descriptions of BLARK and BLARKette components • ELSNET (maybe with COCOSDA/WRITE) will set up a committee to synthesize the results in the form of recommendations

  17. MyBLARK participants • Language resources centers for languages of EU and associated countries known to us • Language resources centers in the EU (+associated countries) that send me a message that they are willing to participate (steven.krauwer@elsnet.org)

  18. Language Type of resource Usage Size Annotation required Brief description Available for your language? If so: pointer to it If not, pointer to similar resource for another language References Comments MyBLARK Questionnaire

  19. MyBLARK Schedule • June – August 2006: collection of contacts • Sept 2006: questionnaires sent out • October 2006: questionnaires in, 1st analysis and draft definition proposals • November 2006: proposals sent out for feedback • December 2006 – January 2007: collecting feedback • February 2007: Final report

  20. Concluding remarks • I have proposed the introduction of a slightly weaker variant of the BLARK, the BLARKette, for under-resourced languages • I have proposed an action entitled MyBLARK to arrive at an initial definition of both the BLARK and the BLARKette • I hope that this will (a) speed up the process, and (b) provide an intermediate coverage level for under-resourced languages

More Related