1 / 35

digiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen

digiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen. Digital Humanities. Language as object of research Language as means for research Modern languages Old languages Written, audio, video (collections of) documents. Treebanks.

yeriel
Download Presentation

digiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. digiTAALSome exciting examplesIneke Schuurmancoordinator CLARIN-Vlaanderen

  2. Digital Humanities • Language as object of research • Language as means for research • Modern languages • Old languages • Written, audio, video (collections of) documents

  3. Treebanks Available for most ‘modern’ languages But also possible for ‘dead’ languages like Latin, Ancient Greek http://nlp.perseus.tufts.edu/syntax/treebank/getinvolved.html Index Thomisticus Treebank, Milano http://itreebank.marginalia.it/ Full query language needed 3

  4. More treebanks • Medieval Portuguese treebank • Under construction • In the near future: INPOLDER (CLARIN NL) A parser, not yet a corpus, BUT: through web interface raw older Dutch text can be entered, and parsed text (syntactically analysed) will be returned • Uncorrected, but manual correction is possible

  5. Visualization Gabmap: doing dialect analysis on the web ADEPT-project (CLARIN-NL) Dialects (examples Netherlands/Flanders + USA) www.gabmap.nl, including tutorial, manual, video, FAQ, … 5

  6. Pronunciation distance Gabmap: doing dialect analysis on the web 6

  7. Dendrogram Gabmap: doing dialect analysis on the web 7

  8. Audio CLARIN pilot (NL/FL) TTNWW, audio part TAAL2SPRAAK (CLARIN-Vlaanderen) Audio as a means to enlarge accessibility of larger collections of data (tapes) Transcription, even if not 100% correct, is very helpful in finding what you are looking for, especially if synchronized with time (useful for psychology, sociology, history)

  9. Audio and older texts • Digitization of old texts still problematic (cf DigiHIST) Experiment: Read medieval text aloud and have it automatically transcribed (not trained, modern language model used)

  10. Audio Leuvense Schepenbank • http://www.ccl.kuleuven.be/CLARIN/SAL8130_0093_inge_moris.hardsubs.mp4 • http://www.ccl.kuleuven.be/CLARIN/SAL8130_0093_inge_moris_4gr.pdf Raw material !!

  11. Written part TTNWW • Relate documents, make texts more accessible by making explicit data that are not expressed as such Paris formulated objections, London/John didn’t What is a name, what kind of name is it? • Analysis of names in fiction • Sagalassos project (archaeology): temporal and geospatial analysis web service, end of 2012

  12. Some more examples • When is ‘now’? And where?

  13. Stylometry Stylene (CLARIN-Vlaanderen) • UAntwerpen/Univ.College Gent • Is text as a whole written by same person? • Show development in style of a specific author • Is a text clear? Is it really understandable by , say, children age 10-12?  Web service (autumn 2012)

  14. ‘stylometry’ as means • Is thesis X written by student or by ‘Wikipedia’ • Reliability • Can text X be written by a 10 year old girl  paedophily

  15. Reusability of data • For same kind of research • For completely other kind of research Both should be encouraged • time and money To be taken into account: IPR !

  16. Veterans project • Interviews veterans Dutch military actions (1940-2010) • 1000 interviews (2.5 h), semi-structured Original: social and military historians • Who else can use this archive ? • First: reluctance

  17. Veterans 2 People from divers disciplines invited to write paper: theology, psychology, discourse analysis, anthropology, sociology,..) Turned out to be a very valuable corpus! Digital Humanities aspect: several tools were made available to facilitate research in different disciplines, tools to give access to spoken content

  18. “Circulation of Knowledge” “Geleerdenbrievenproject” (Letters of scientists) 17th century: Grotius (Hugo de Groot), Constantijn Huygens, Christiaan Huygens, Descartes, … 20.000 letters, mainly Dutch, French, Latin Intended for “history of science”, of course also relevant for other disciplines

  19. Polish example: Sejm • Polish parliament, 1918 – now • Texts, records, video Goal: all kinds of linguistic research • But of course: wealth of information for other disciplines as well

  20. Conclusions • Several ‘easy-to-use’ research possibilities are (or will soon be) available • Others are still more complex, but do offer possibilities for new kinds of projects (or easier ways of doing research) • Lots of material could be used by third parties as well: do not keep stuff “in your drawer” • Students and (young) researchers should be made aware of new possibilities

  21. Sound Registers (1739-1799)

  22. 35

More Related