1 / 36

„Metadata“

„Metadata“. The DRIVER experience and the OpenAIRE direction. The metadata scope of this talk. Metadata is a multifacted thing and you can do many beautiful things with it… Focus in DRIVER and OpenAIRE

zaynah
Download Presentation

„Metadata“

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. „Metadata“ The DRIVER experienceandthe OpenAIRE direction Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann

  2. The metadata scope of this talk • Metadata is a multifacted thing and you can do many beautiful things with it… • Focus in DRIVER and OpenAIRE • Metadata for Research Publications but also administrative, authority files, terminologies etc. • Format: Simple DC but also DIDL, OAI-ORE, RDF… • Protocol: OAI-PMH but also Feeds, Syncs… • Function: aggregation & search but also deploy, mine …

  3. A coarse genealogy Portal D-NET v1.0 D-NET v1.2 2008 2009 2010 2011 2007

  4. The Beginnings & Essentials • Since 2004, originally a service forresearchersat Bielefeld University forfindingdocuments in repositoriesdistributedacrosstheglobe • In themeantimeusedworld-wide • Indexing >25 Mio. docsfrom >1500 sources • Simple, pragmatic, informal andindependent; minimal effort but highreliabilityandvalue • Mostly OAI-PMH > Synergieswith DRIVER • Nowwork on Thesauri, Mining, Syncing etc.

  5. Lessons learnt • OAI-PMH/SimpleDCallowseffectivesearchenginewithimmediateaddedvalue • Manyyearsofoperationshowthateven simple, distributedapproachesrequire a lotofcareandpatience • Heterogeneityofdistributedresourcesintroducesambiguityandrequires service-sidedeffort • Over 1000 profilesandprocessingpipelinesforsources • Negative effectsattenuatedbydisplayforhumans • „usersknowwhattheysee, whentheyseeit“ • Main drawbacks • Localdataquality • missingsharingandre-usebetween service-providers • „Repository Infrastructure“ needed

  6. A coarse genealogy Portal D-NET v1.0 D-NET v1.2 2008 2009 2010 2011 2007

  7. The DRIVER initiative for networking repositories 2007-2009

  8. DRIVER Objectives: Infrastructure! • Organisational structuresforrepositories • e.g. the „Confederation“ • Improvingqualityandstandards in localrep. • e.g. guidelinesandvalidationprocedures • Building a distributedinfrastructureformetadata • e.g. service andfunctionsharing • Target Groups • Repository Managers • Service Providers • Information System Executives

  9. What infrastructures are: DRIVER terms • Not an infrastructure • Single repository • Single application for search and retrieval (e.g. BASE) • Only local operation • Backwards causation on repositories is missing • Maybe an infrastructure • Distributed repository landscape as a whole • As a capacity for emergent properties, e.g. quality and quantity incentive for data population • Nurturing development of service providers • Definitely an infrastructure • Many service providers in one organisational and technical context (e.g. run-time environment) • Enabling re-use and remix of data and services

  10. The DRIVER approach was incremental • Start with publication metadata • Existing distributed system, somehow connected • Considerable homogeneity and formats: OAI-PMH • Extend geographical coverage • From 5 countries, to 10, to 27, to ??? • Extend towards other contents • From publication metadata to enhanced publications, i.e. representations of „texts + data“ • Learn about subject specificity • Data bring in disciplinary requirements

  11. The DRIVER Initiative • DRIVER-I 6/2006 – 11/2007 • Organisational Models and Technical Test-Bed • DRIVER-II 12/2007 – 11/2009 • Running Organisation and Production Infrastructure • DRIVER-Confederation and Technical Service 2010ff • Organisation and Technical Deployment 14

  12. Some Results: Studies

  13. Some Results: Guidelines • Build on knowledge from past & current IR projects (EU) • 26 actively involved contributors (experts and repository managers) from 8 countries. • Practical answers on how to: • Improve full-text access • Standardize metadata quality • Create a reliable infrastructure for permanent identification, resolution, traceability and storage • Resolve semantic and classification issues

  14. Some Results: A Portal

  15. Some Results: A Search

  16. Some Results: Repository Registration

  17. Some Results: Support structures

  18. Some Results: Repositories

  19. Some Results: Service-Oriented-Arch. 9hosting nodes 25+ Functionality typologies (services) 36 service Instances + other applications: Spain, Slovenia, EFG …

  20. Some Results: Runtime-System & Hosting National portals Advanced User Interfaces Project Applications End users Functionality Layer EU Open Access Repositories Data Layer Administrators Enabling Layer 23

  21. Some Results: A software Meant for large service providers only!

  22. Lessons learnt • Distributed data infrastructure requires links between organisational and technical concepts • Data specialists, computer scientists, service providers • Guidelines / content policies as a „glue“ • In distributed data provision, quality and access measures are the most ‚expensive‘ tasks • Infrastructure AND data focus very demanding • Distributed service operation (not data provision) can be solved but asks novel questions (SLAs) • Infrastructure is there, applications are next…

  23. Metadata aspects in DRIVER • OAI-PMH/SimpleDC corroborated • Necessity for other extensions shown • Administrative (CRIS): ‚project‘, ‚funder‘ • Subject-specific: NLM, PACS etc. • Authority files: institutions, journals, authors… • Enhanced Publications = Text + Data • Aggregation-Encoding: DIDL, OAI-ORE • Introduce preservation-challenges • Necessity for different Service-Typology

  24. A coarse genealogy Portal D-NET v1.0 D-NET v1.2 2008 2009 2010 2011 2007

  25. Primer Metadata-Workshop | Nijmegen | 7/8-SEP-2010 Wolfram Horstmann

  26. OpenAIRE Assignment • OpenAIRE Open Access Infrastructure for Research in Europe • Objective Support the Open Access Pilot ofthe EC & ERC (Practicalimplementationof „clause 39“) - European Helpdesk: National Nodes - Repository Infrastructure: Deposit-Multiplexer - Research on Metadata, Impact & Disciplines Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann

  27. OpenAIRE - factsheet Open Access Infrastructure for Research in Europe • Programme: FP7 – Research Infrastructures • Starting date: December 1, 2009 • Duration: 36 months • Budget: 4.1 Million • 38 partners covering all European member-states • To be reached at www.openaire.eu Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann

  28. European Helpdesk • Promote FP7-pilot and ERC OA guidelines • National Open Access Liaison Offices (27 countries) • Provide OA “toolkits” for • Researchers • Institutions • Setup 24/7 portal for deposit, search of OA publications • Liaison with • Other European OA initiatives • Publishers • CRIS systems Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann

  29. Liaison Offices Region 1 North(DTU) Region 2 South(UMINHO) Region 3 East(eIFL) Region 4 West(UGENT) Denmark (Danish Technical University) Cyprus (UniverstityofCyprus) Bulgaria (BulgarianAcademyofSciences) Austria (University of Wien) Finland (University of Helsinki) Greece (National Documentation Center) Czech Republic (Technical University of Ostrava) Belgium (Universtiyof Gent) Sweden (National Library ofSweden) Italy (CASPAR) France (Couperin) Estonia (University of Tartu) Malta (Malta Council for Science & Technology) Hungary (HUNOR) Germany (University ofKostanz) Latvia (University ofLatvia) Portugal (University ofMinho) Ireland (Trinity College) Spain (SpanishFoundationfor Science & Technology) Lithuania (Kaunas Technical University) Netherlands (Utrecht University) Poland (ICM – University ofWarsaw) UK (SHERPA) Romania (Kosson) Slovakia (university Library of Bratislava) Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann Slovenia (University ofLjubljana)

  30. Supporting Repository Infrastructure • OpenAIRE portal built on D-NET • Access to scientific publications • Search, browse • Visualization tools • Deposition of articles • Setup repository for homeless researchers (INVENIO) • Multiplexer for OA publications in existing repositories • Provide monitoring tools for • Document/depositing statistics • Usage statistics from repository infrastructure • Interoperation with other infrastructures Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann

  31. OpenAIRE system in a nutshell OpenAIRE overall overview: functionalities and domains served Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann

  32. Explorative activities JRA • Interoperability for usage statistics / metrics and administrative research information systems (CRIS/CERIF) • Explore the requirements, practices, incentives, workflows, data models, and technologies to deposit, access, and otherwise manipulate research datasets • Work with four (4) scientific communities • Health (Life Sciences) • Environment • Information & Communication Science • Socio-economic Sciences and Humanities Metadata-Workshop | Nijmegen | 7/8-SEP-2010 | Wolfram Horstmann

  33. Metadatadirectionsforeseeable • Repository compliance even more important than in DRIVER • Interface to administrative systems essential • E.g. EC project database • Authority files for authors, journals etc. • Exchange with others: ArXiV, PubMed etc. • Data extensions will introduce new worlds this is a demo slide presentation to show you all the layouts

  34. A coarse genealogy Portal D-NET v1.0 D-NET v1.2 2008 2009 2010 2011 2007

  35. Conclusions • Metadata allow and require serious international infrastructure in research • Even very simple approaches unfold complexity in distributed systems • „Division of labour“ necessary • Keep an eye on trade-offs between specialized expertise vs. organisational overhead • Suggested approach: Simple and integrative rather than complex and integrated

More Related