1 / 29

Dutch HLT Resources: from BLARK to Priority Lists

Dutch HLT Resources: from BLARK to Priority Lists. Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A 2 RT, Dept. of Language and Speech, Nijmegen * NTU, Dutch Language Union, The Hague Walter Daelemans Dept. of CNTS Language Technology, Antwerp.

dezso
Download Presentation

Dutch HLT Resources: from BLARK to Priority Lists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dutch HLT Resources: from BLARK to Priority Lists Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A2RT, Dept. of Language and Speech, Nijmegen * NTU, Dutch Language Union, The Hague Walter Daelemans Dept. of CNTS Language Technology, Antwerp

  2. Dutch HLT Platform NTU • NTU - Nederlandse Taalunie • (Dutch Language Union) • Mission: Strengthening the position of the Dutch Language • Dutch HLT Platform • Aim: To contribute to the further development of an adequate language and speech technology infrastructure for Dutch

  3. Dutch HLT PlatformOther participants • Ministry of the Flemish Community • Flemish Institute for the Promotion of Scientific-technological Research in Industry • Fund for Scientific Research - Flanders • Dutch Ministry of Education, Culture and Sciences • Dutch Ministry of Economic Affairs • Netherlands Organisation for Scientific Research • Senter (an agency of the Dutch Ministry of Economic Affairs)

  4. Dutch HLT PlatformFour action lines • Performing a market place function • Strengthening the HLT infrastructure • Working out standards and evaluation criteria • Developing a management, maintenance, and distribution plan

  5. This presentationPlatform BC • - • Strengthening the HLT infrastructure • Working out standards and evaluation criteria • - • B+C => Platform BC • Focus on method (skip many details) • More details: see publications, web sites

  6. Platform BCWhat? • BLARK: Basic LAnguage Resources Kit • Inventory & Evaluation • Priority lists

  7. Platform BCWho? • Steering committee: • 8 HLT experts • NTU • NWO (funding body) • 4 field researchers

  8. Platform BCHow? • BLARK • Inventory & Eval. • Priority lists Report 1 • Dutch HLT Field • Workshop 15/11/2001 Feedback: • BLARK • Inventory & Eval. • Priority lists Report 2

  9. 1. BLARK Basic LAnguage Resources Kit • Components: • Applications: classes of applications rather than specific applications or products. • Modules (or semi-products): the basic software components of HLT applications. • Data: sets of language data and descriptions in machine readable form.

  10. BLARK Basic LAnguage Resources Kit • 2 matrices: • Modules x Data • Modules x Applications • => BLARK

  11. Data Applications Modules

  12. BLARKLanguage technology • Modules • Robust modular text preprocessing • Morphological analysis and morphosyntactic disambiguation / unknown words • Robust syntactic analysis • Aspects of semantic analysis (word meaning and reference) • Data • Monolingual lexicon • Annotated corpus of written Dutch • Benchmarks for evaluation

  13. BLARKSpeech technology • Modules • Automatic speech recognition • Speech synthesis system • Tools for annotation of speech corpora • Confidence measures and utterance verification • Identification (speaker, language, dialect) • Data • Monolingual speech corpora for specific applications • Multilingual speech corpora • Multimodal/medial speech corpora • Benchmarks for evaluation

  14. 2. Inventory & Evaluation • B. Inventory: • Which components in BLARK are available? • C. Evaluation: • And of sufficient quality? • Checklist approach • => B&C together: platform BC • See matrix 3 - Availability

  15. Modules Availability

  16. 3. Priority lists BLARK Inventory Priority lists

  17. Priority lists • The prioritisation was based on the following requirements: • The components should currently be unavailable, inaccessible, or of insufficient quality. • The components should be relevant for a large number of applications. • Developing the components should be possible in the short term.

  18. Priority listLanguage technology • 1. Annotated corpus of written Dutch • 2. Syntactic analysis • 3. Robust text pre-processing • 4. Semantic annotations for treebank in 1 • 5. Translation equivalents • 6. Benchmarks for evaluation

  19. Priority listSpeech technology • 1. Automatic speech recognition • 2. Speech corpora • 3. Multi-media speech corpora • 4. Tools for (semi-) automatic transcription of speech data • 5. Speech synthesis • 6. Benchmarks for evaluation

  20. Feedback • Report 1 • Feedback • Sent to the Dutch-Flemish HLT field (2000) • Workshop 15/11/2001 • => Report 2

  21. Platform BCHow? • BLARK • Inventory & Eval. • Priority lists Report 1 • Dutch HLT Field • Workshop 15/11/2001 Feedback: • BLARK • Inventory & Eval. • Priority lists Report 2

  22. When BLARK is established... • Intellectual rights by NTU • Actual management and maintenance of resources by HLT agency, to be founded • Maintenance of expertise by • Dutch-Flemish steering committees and • HLT management committee, • both to be founded

  23. General conclusions • Goals have been achieved so that the proper prior conditions for development of materials in BLARK are created • This work, carried out in the Dutch speaking area, can be profitable for other countries when starting similar activities: • Presentations & publications • Part of the report is translated into English

  24. Web sites • http: • //www.taaluniversum.org/tst/ • //www.hltcentral.org/htmlengine.shtml?id=996 • //lands.let.kun.nl/TSpublic/strik/platform-BC.html

  25. That’s it

  26. Web sites • http: • //www.taaluniversum.org/tst/ • //www.hltcentral.org/htmlengine.shtml?id=996 • //lands.let.kun.nl/TSpublic/strik/platform-BC.html

  27. Objectives • strengthening the position of Dutch in HLT • establishing the proper conditions for a successful management and maintenance of basic HLT resources developed through governmental funding • stimulating co-operation between academia and industry in the field of HLT • contributing to the realisation of European co-operation in HLT-relevant areas • establishing a network that brings together supply and demand for knowledge, products, and services

  28. Platform BCWho? • Steering committee: 8 HLT experts

More Related