70 likes | 162 Views
Learn about two projects supported under WP3 by University of Bergen/Uni Digital, focusing on corpus, lexicon, and text encoding. DATIST involves mining diagnostic reports by speech therapists, while DID works on the Danish Insular Dictionary. Challenges include preparing unstructured texts for statistical tools and transitioning to new tools. Evaluation highlights the need for international cooperation and overcoming digital obsolescence.
E N D
CLARIN Hum. project support Two projects supported under WP3 by University of Bergen / Uni Digital: • DATIST (U. of Nancy 2) • DID (Copenhagen U.) Expertise at Uni Digital: corpus, lexicon and text encoding technologies Coordination at University of Bergen: Koenraad De Smedt
DATIST PhD project by Frédérique Brin-Henry at U. of Nancy 2 Mining diagnostic reports by speech therapists, looking at the use of specific terms Challenges for advice from CLARIN: • Prepare the unstructured texts and metadata • Apply statistical tools to text analysis
DATIST Activities: • Encoding and structuring of texts and metadata (writing of conversion programs and manual work) • Guidance in statistical analysis of data (plots, diagrams, correspondence analysis)
DATIST Evaluation of cooperation: • Communication via email, phone, Skype was satisfactory • A process of personal supervision and training; takes time and negotiation • Illustrates the need for international cooperation on PhD training (such as CLARA ) and co-supervision
DID Danish Insular Dictionary (Ømålsordboken; contact: Henrik Hovmark) Long-term project financed by Copenhagen University Two challenges for advice from CLARIN: • Replacement for obsolete special editor with custom phonetic fonts • Replacement for Word Cruncher corpus tool
DID Activities: • Written statement was produced on a test of a new editor • Work on selecting new corpus tool ongoing
DID Evaluation of cooperation: • Communication through meetings, email • Understanding of requirements takes time • Project illustrates difficulties that arise when • Chosen tools and data formats do not supporting open standards • Migration to new tools is considered long after digital obsolescence has incurred