1 / 18

From DOBES to CLARIN and beyond

  . From DOBES to CLARIN and beyond. Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen. ?.   . FACTS AND FIGURES. Non-profit-making foundation established unter private law based in Hanover

wilson
Download Presentation

From DOBES to CLARIN and beyond

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1.   From DOBES to CLARIN and beyond Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen ?

  2.   FACTS AND FIGURES • Non-profit-making foundation established unter private law • based in Hanover • Not affiliated with the car manufacturer of the same name • Founded by the Governments of the Federal Republic of • Germany and the State of Lower Saxony in 1961 • Objective: to support science and technology as well as the • humanities and the social sciences in research and • university teaching • Assets: about 2.45 billion euros • Funding p.a.: about 110 million euros • One of the most potent private research funding • foundations in Europe

  3.   FOCUS ON HUMANITIES AND SOCIAL SCIENCES • Current funding initiatives • (see KURZINFORMATION / BASIC INFORMATION): • about 45 to 50 % of the funds given to H&SC • Initiatives focussing on infrastructural support of H&SC: • Kulturwissenschaftliche Dokumentation (closed) • Archive als Fundus der Forschung (closed) • DOBES: Dokumentation bedrohter Sprachen • Projects including infrastructural support of H&SC • Strategy building on digitization of endangered books • Digitization of the so-called “Aschebücher” of the HAAB Weimar (in preparation)

  4.   "E-HUMANITIES": POSSIBILITIES AND PERSPECTIVES • Strong interest in innovative approaches • Funds available for projects involving activities towards • "E-Humanities" (e.g.: digitization of data, collections, • archival material) within current funding initiatives • Funding possibilities for meetings, workshops, • conferences etc. focussing on "E-Humanities" (within • the funding initiative Symposia and Summer Schools) • New perspectives on "E-Humanities" (possibly) opened up • within a new funding initiative aiming at Research in • Museums (actually in planning) including to a certain • extent digitization activities - … and not to forget the • Flagship "DOBES" ...

  5.   Concrete steps or Babylonian Tower • we don’t know exactly what eHumanities means • we feel that mechanisms in research processes are changing • rapidly with technological innovation as motor • but we can’t say: “we are now going to design eHumanities” • we probably can say: “let’s plan further concrete projects • and actions and see” • many excellent projects around – let me just refer to the good sides • of DOBES as one of these steps • (Documentation of Endangered Languages funded by VolkswagenFoundation)

  6.   What is DOBES? 44 DOBES teams working fully distributed and self-organized incl. linguists, anthropologists, musicologists, ethno-biologists, etc. In addition, VWF installed a central archive Start in 2000

  7.   What changed in DOBES? • handing over all data after a limited time to an archive was completely new • and is an explicit step, although the results will not be ready • there is a push to make data accessible to others from the beginning - also • new for many and not without conflicts • asking researchers to categorize and organize material according to • agreed metadata was also new and still requires evangelization • including multimedia in the documentation and dealing with audio/video as • basis was kind of new and requires techno-knowledge

  8.   Which infrastructure by DOBES? • a stable, reliable and open repository/archiving system handling 30 TB • data storage not encapsulated and in open formats • introduction of persistent identifiers to ensure investments in relating • fragments • a network of 12 centers worldwide included in data distribution • of these 6 copies in centers with hardware migration strategy • a number of web-based applications offering various ways to access the data

  9.   CLARIN/D-SPIN Challenges • eResearch is about global collaboration in key areas of science and the next generation of infrastructure that will enable it (J. Taylor) • goal is an open research infrastructure to overcome the huge • fragmentation of language resources and tools and to offer them to • research communities - in particular to humanities • help tackling the LARGE challenges (multilingual societies) • but also helping the individual researcher • example: align a transcription and an audio signal • how many researchers know about how to do this • see CLARIN/D-SPIN as a huge virtual marketplace of resources • and tools that can be combined due to integration and • interoperability solutions • not forget Henry Thompsons (one of the XML fathers) • don't have an agreed descriptive system in our domain

  10.   CLARIN/D-SPIN Research Infrastructure • basis of big supermarket are classification and • convincing organization principles • based on 10 years of experience we know that only • a flexible component model will be accepted • seem to go towards a Federation of LRT producers • that can make contracts with Identity Federations • just one signature necessary to get all researchers • with their home identity integrated • have already setup a first small test federation (EC-DAM-LR) • researchers dream: virtual collection building and creating • workflows flexibly - not trivial due to import/export aspects • LREC showed that we know already a lot about the problem

  11.   CLARIN/D-SPIN Network of Service Centers • need a network of strong and persistent • centers of "new" type • researchers will only adapt if they can rely • on new mechanisms • need to simplify the IPR/license situation

  12.   towards eHumanities • CLARIN has > 100 members from 32 countries • in Germany 9 well-known centers and some more will join • is an enormous challenge to make a real step ahead in CLARIN • can we all together extend to eHumanities infrastructure or are we • already close to collapse?

  13.   a few questions I • will there be a separate infrastructure for each H discipline? • NO • there will be several shared services such as a PID registration and • resolution service • however: • building a joint infrastructure has to do with community building, • trust, common language etc • too big communities would not work • so let's move on in TextGrid, DARIAH, CLARIN etc • but let's have a close and fair contact to find synergies • competition will become heavy and our competitors are the Googles • of the world!

  14.   a few questions II • will there be a single market place for the humanities? • NO • acceptance of a market place is dependent on classification and • organization principles - as already said • these are different in all disciplines • so have to start from the disciplines in our solutions • already difficult enough • leave it to Semantic Web guys to enable cross-walk

  15.   a few questions III • who will be the main players? • of course the big libraries, archives and museums • but what about the universities and big organizations such as MPG • important: • we see new requirement profiles emerging • kind of job sharing can be predicted • of course: close collaboration with innovative libraries such as • SUB etc is required highly specialized groups highly specialized MPI departments content centers a number of domain MPIs curation centers MPDL + few domain MPIs computer centers RZG, GWDG

  16.   a few questions IV • key bricks for interoperability? • we need open registries of all sort and smart registry frameworks • schema registries • concept registries (ISOcat - a creation of ISO TC37/SC4) • relation registries • etc • however: • a very complex landscape seems to emerge • how to make it usable by laymen? • how to convince researchers to work with them? • no one knows yet - we need to try out - what else?

  17.   Summary • we need initiatives again and again to stepwise advance the borders • it is now also time to transform existing knowledge into persistent • infrastructures • will need a lot of sensitivity and patience - RI building costs time • emerging landscapes will have an underlying complexity • need to offer discipline vocabulary • need to hide complexity to a certain extent • need to offer persistency • Project solutions are not per se useful as infrastructure solutions!

  18.   End in Germany we have already a good mixture with TextGrid, DOBES, eAqua, DARIAH and CLARIN/D-SPIN have to get together frequently Thanks for the attention.

More Related