A Field Survey for Establishing Priorities in the Development of HLT Resources for Dutch

A Field Survey for Establishing Priorities in the Development of HLT Resources for Dutch D. Binnenpoorte, F. de Vriend, J. Sturm, W. Daelemans, H. Strik, C. Cucchiarini Feedback Feedback of the HLT field (academia and industry) was collected at a workshop with about 100 participants. 1: BLARK 2: Inventory of available resources In defining the BLARK a distinction was made between: • Applications • Modules • Data A matrix (fragment in Figure 1) was drawn up describing • which modules are required for which applications; • which data are required for which modules; • what the relative importance is of the modules and data. Figure 1 (“+” = important, “++” = very important) Based on the full matrix a BLARK was defined. An inventory was made to establish which of the components - modules and data - that make up the BLARK are: • available; i.e. can be bought or are freely obtainable e.g. by open source; • (re-)usable. Inventory based on: • expert knowledge; • information found on the internet and in the literature; • personal communication with actors in the field. Components can only be considered usable if they are of sufficient quality  quality evaluation. Limited to a descriptive level: modules and data were checked against a list of evaluation criteria. A second matrix (fragment in figure 2) describes the availability of the components in the BLARK. Figure 2 (1 = ‘module or data set is unavailable’ to 10 = ‘module or data set is easily obtainable’). BLARK for language technology: Modules: • Robust modular text pre-processing • Morphological analysis and morpho-syntactic disambiguation • Syntactic analysis • Semantic analysis Data: • Monolingual lexicon • Annotated corpus of text (a treebank • Benchmarks for evaluation BLARK for speech technology: Modules: • Automatic speech recognition • Speech synthesis • Tools for calculating confidence measures • Tools for identification • Tools for (semi-) automatic annotation of speech corpora Data: • Speech corpora for specific applications • Multi-modal speech corpora • Multi-media speech corpora • Multi-lingual speech corpora • Benchmarks for evaluation Comparison 3: Priority list Requirements for prioritization: • the components should be relevant for a large number of applications; • the components should currently be either unavailable, inaccessible, or have insufficient quality; • developing the components should be feasible in the short term. Language technology: 1. Annotated corpus written Dutch 2. Syntactic analysis: robust recognition of sentence structure 3. Robust text pre-processing: tokenization and named entity recognition 4. Semantic annotations for the treebank mentioned above 5. Translation equivalents 6. Benchmarks for evaluation Speech technology: 1. Automatic speech recognition (including non-native ASR, robust ASR, adaptation, and prosody recognition) 2. Speech corpora for specific applications (e.g. directory assistance, CALL) 3. Multi-media speech corpora (speech corpora that also contain information from other media such as newspapers, WWW, etc.). 4. Tools for (semi-) automatic transcription of speech data 5. Speech synthesis (including tools for unit selection) 6. Benchmarks for evaluation

A Field Survey for Establishing Priorities in the Development of HLT Resources for Dutch

A Field Survey for Establishing Priorities in the Development of HLT Resources for Dutch

Presentation Transcript

Establishing a survey frame for agriculture: The New Zealand experience

Priorities for the National Maritime Development Programme

Establishing IT Project Priorities

Establishing Bird Conservation Priorities

Establishing Guidelines for Access of STEM Resources Online

Development of A Working Protocol for Establishing Community Labs in Indiana

Establishing Research Priorities for Public Health Emergency Preparedness in Canada

PSOL: Priorities Survey for Online Learners

Java Development for HLT

HLT Development

Dutch HLT Agency (TST-Centrale)

Priorities for the protection of children in cars: available data from the field

Priorities of Latvia in the Field of Innovation

Physics Priorities for Trigger Development

Survey of Resources for Health Care Analysts

Establishing Outcome Priorities

Dutch HLT Resources: from BLARK to Priority Lists

Establishing Priorities In The Local Church!

User Group Priorities for Development

Establishing Resources

Strategic Priorities for Human Resources In 2020