1 / 36

Radboud University Nijmegen

Results of R&D: BLaRK for Dutch Helmer Strik Dept. of Linguistics Centre for Language and Speech Technology (CLST) Radboud University Nijmegen, the Netherlands. Radboud University Nijmegen. Introduction. Terminology: BLaRK: Basic Language Resources Kit BaTaVo: Basis Taal-Voorzieningen

alexia
Download Presentation

Radboud University Nijmegen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Results of R&D: BLaRK for DutchHelmer StrikDept. of LinguisticsCentre for Language and Speech Technology (CLST)Radboud University Nijmegen, the Netherlands Radboud University Nijmegen

  2. Introduction • Terminology: • BLaRK: Basic Language Resources Kit • BaTaVo: Basis Taal-Voorzieningen • Platform-BC: see this presentation • Period • 2000 – : plans • – 2002 : results, future Cape Town, 24-11-2008

  3. NTU & Dutch HLT Platform • NTU - Nederlandse Taalunie • (Dutch Language Union) • Mission: Strengthening the position of the Dutch Language • Dutch HLT Platform • Aim: To contribute to the further development of an adequate language and speech technology infrastructure for Dutch Cape Town, 24-11-2008

  4. HLT platform Participants • Flanders: • Ministry of the Flemish Community • IWT (Flemish Institute for the Promotion of Scientific-technological Research in Industry) • FWO (Fund for Scientific Research - Flanders) • Netherlands: • Dutch Ministry of Education, Culture and Sciences • Dutch Ministry of Economic Affairs • Senter (agency of Dutch Ministry of Economic Affairs) • NWO (Netherlands Organisation for Scientific Research) Cape Town, 24-11-2008

  5. Objectives • Strengthening the position of Dutch in HLT • Establishing the proper conditions for a successful management and maintenance of basic HLT resources developed through governmental funding • Stimulating co-operation between academia and industry in the field of HLT • Contributing to the realisation of European co-operation in HLT-relevant areas • Establishing a network that brings together supply and demand for knowledge, products, and services Cape Town, 24-11-2008

  6. Action plan • ‘Action plan for Dutch in language and speech technology’ was defined to achieve objectives • Activities organised in four action lines (A, B, C, and D) Cape Town, 24-11-2008

  7. Dutch HLT PlatformFour action lines • Performing a market place function • Strengthening the HLT infrastructure • Working out standards and evaluation criteria • Developing a management, maintenance, and distribution plan Cape Town, 24-11-2008

  8. Action line A • Encourage co-operation between industry, academia and policy institutions • Raise awareness and give publicity to the results of HLT research “Performing a market place function” Cape Town, 24-11-2008

  9. Action line B • Defining the BLaRK (Basic Language Resources Kit) for Dutch • Carrying out a survey to determine what is needed to complete the BLaRK: field survey • Drawing up a priority list with cost estimates serving as policy guidelines “Strengthening the digital language infrastructure” Cape Town, 24-11-2008

  10. Action line C • Drawing up standards and criteria for evaluation of basic materials in BLaRK and for assessment of project results “Working out standards and evaluation criteria” Cape Town, 24-11-2008

  11. Action line D • Defining a Blueprint for management including intellectual property rights, maintenance, and distribution of HLT resources “Developing a management, maintenance, and distribution plan” Cape Town, 24-11-2008

  12. Actions carried out • Conducted mailings to contacts (about 1000) • Contacted and visited companies with HLT related needs, to: • Demonstrate benefits of HLT • Get clear picture of company’s knowledge status and future plans • Provide information on cross-linking services • Organised seminars and workshops Cape Town, 24-11-2008

  13. Platform BC • Performing a market place function • Strengthening the HLT infrastructure • Working out standards and evaluation criteria • Developing a management, maintenance, and distribution plan • B+C  Platform BC Cape Town, 24-11-2008

  14. Platform BCWho? • Steering committee: • 8 HLT experts • NTU • NWO (funding body) • Field survey, 4 researchers • 2 language technology • 2 speech technology Cape Town, 24-11-2008

  15. Platform BCWho? • Steering committee: 8 HLT experts Cape Town, 24-11-2008

  16. Platform BCHow? • Three stages: • Defining the BLaRK for Dutch • Making inventory of HLT resources • Establishing priority list Cape Town, 24-11-2008

  17. BLaRK: Basic Language Resources Kit • Components: • Data: sets of language data and descriptions in machine readable form • Modules (or semi-products): the basic software components of HLT applications • Applications: classes of applications rather than specific applications or products • 2 matrices: • Modules x Data • Applications x Modules •  BLaRK Cape Town, 24-11-2008

  18. Data Applications Modules Language Technology Quantify: 0, 1, or 2 (+’s) Field survey & Expert opinions Speech Technology Cape Town, 24-11-2008

  19. BLaRKLanguage technology • Modules • Robust modular text preprocessing • Morphological analysis and morphosyntactic disambiguation • Robust syntactic analysis • Aspects of semantic analysis (word meaning and reference) • Data • Monolingual lexicon • Annotated corpus of written Dutch • Benchmarks for evaluation Cape Town, 24-11-2008

  20. BLaRKSpeech technology • Modules • Automatic speech recognition • Speech synthesis system • Tools for annotation of speech corpora • Confidence measures and utterance verification • Identification (speaker, language, dialect) • Data • Monolingual speech corpora for specific applications • Multilingual speech corpora • Multimodal/medial speech corpora • Benchmarks for evaluation Cape Town, 24-11-2008

  21. From BLaRK to priority lists • BLaRK: Basic Language Resources Kit • Inventory & Evaluation • Priority lists BLaRK inventory priority Cape Town, 24-11-2008

  22. 2. Inventory & Evaluation • Inventory: • Which components in BLaRK are available? • Bought • Freely obtainable • Reusable • Of sufficient quality • Evaluation: • And of sufficient quality? • Checklist approach (vs. formal evaluation) Cape Town, 24-11-2008

  23. Modules Availability Quantify: 1-10 Field survey & Expert opinions Data Cape Town, 24-11-2008

  24. 3. Priority lists • The prioritisation was based on the following requirements: • The components should currently be unavailable, inaccessible, or of insufficient quality. • The components should be relevant for a large number of applications. • Developing the components should be possible in the short term. Cape Town, 24-11-2008

  25. Consensus, broad support • Report version 1 • Feedback Academia & industry • Sent to the Dutch-Flemish HLT field (1000 sites) • Workshop 15/11/2001 •  Report version 2, final version Cape Town, 24-11-2008

  26. From BLaRK to priority lists • BLaRK • Inventory & Eval. • Priority lists Report 1 • HLT Field • Workshop Feedback: • BLaRK • Inventory & Eval. • Priority lists Report 2 Cape Town, 24-11-2008

  27. Report • Version 1: • Version 2, final version: • W. Daelemans & H. Strik (eds.) (2002) • Het Nederlands in taal- en spraaktechnologie: • prioriteiten voor basisvoorzieningen Cape Town, 24-11-2008

  28. Recommendations (1) • Met betrekking tot de BaTaVo: • Verzamelen van bestaande onderdelen • Vervolledigen (stimulering, fondsen) • Beheer & onderhoud (actielijn D) • Aanbieden, ‘open’ licentie • Evaluatie: testcorpora & methodologie Cape Town, 24-11-2008

  29. Recommendations (2) • Algemeen: • Meer Taal & Spraak-technologen (opleiding, scholing, projecten) • Meer samenwerking • Naast middelen voor toepassingsgericht onderzoek, ook middelen voor fundamenteel onderzoek Cape Town, 24-11-2008

  30. Priority listLanguage technology • 1. Annotated corpus of written Dutch • 2. Syntactic analysis • 3. Robust text pre-processing • 4. Semantic annotations for treebank in 1 • 5. Translation equivalents • 6. Benchmarks for evaluation Cape Town, 24-11-2008

  31. Priority listSpeech technology • 1. Automatic speech recognition • 2. Speech corpora • 3. Multi-media speech corpora • 4. Tools for (semi-) automatic transcription of speech data • 5. Speech synthesis • 6. Benchmarks for evaluation Cape Town, 24-11-2008

  32. Future prospects [2002] • Action line A: • Stimulate HLT in The Netherlands and Flanders • Cooperation: industry, academia, etc. • Action line B & C: • Collect existing resources • Ensure priorities are realized • Action line D: • Implementation of recommendations in the Blueprint Cape Town, 24-11-2008

  33. When BLaRK is established... • Intellectual rights by NTU • Actual management and maintenance of resources by HLT agency, to be founded • Maintenance of expertise by • Dutch-Flemish steering committees and • HLT management committee, • both to be founded Cape Town, 24-11-2008

  34. General conclusions [2002] • Goals have been achieved so that the proper prior conditions for development of materials in BLaRK are created • This work, carried out in the Dutch speaking area, can be profitable for others when starting similar activities • Part of the report is translated into English • Presentations & publications • Other domains • Other countrie • http://lands.let.kun.nl/~strik/BLaRK.html Cape Town, 24-11-2008

  35. Questions? THE END Cape Town, 24-11-2008

  36. Cape Town, 24-11-2008

More Related