ESciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitiz...
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

Natasa Bulatovic Max Planck Digital Library Research and Development PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on
  • Presentation posted in: General

eSciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitized resources. Natasa Bulatovic Max Planck Digital Library Research and Development. The Max Planck Digital Library (MPDL) in a Nutshell.

Download Presentation

Natasa Bulatovic Max Planck Digital Library Research and Development

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Natasa bulatovic max planck digital library research and development

eSciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitized resources

Natasa Bulatovic

Max Planck Digital Library

Research and Development


The max planck digital library mpdl in a nutshell

The Max Planck Digital Library (MPDL) in a Nutshell

  • Max Planck Digital Library (MPDL) is a service unit within the Max Planck Society (MPG)

  • MPG consists of about 80 institutes in three scientific sections

    • the Chemistry, Physics and Technology Section

    • the Biology and Medicine Section

    • the Human Sciences Section

  • The core activities of the MPDL lie in building up service infrastructure and tools for publications and research data

  • MPDL develops software solutions in close cooperation with scientists, librarians and technicians

  • In the Human Sciences Section several institutes have digitizedcultural artefacts and want to make them open access


Escidoc soa landscape

eSciDoc SOA Landscape


Which data are managed

Which data are managed?


Natasa bulatovic max planck digital library research and development

How?

  • PubMan – Publication Management

  • VIRR – Textual digitized resources management

  • IMEJI – Image management


Natasa bulatovic max planck digital library research and development

PubMan: Management of publications


Virr is about

VIRR is about

  • Collaboration of the MPDL with the Max Planck Institute for European Legal History

  • Motivation: The period of the Holy Roman Empire produced a enormous corpus of legislative sources.Till now no complete collection of this works exist.


Virr key features

ViRR Key features

  • Web-based collaborative application

  • Editor (bibliographic metadata, table of contents and structural metadata)

  • Viewer (online representation)

  • Browser


Virr editor

ViRR Editor

  • Combines a set of tools

    • Paginator

    • Table of Contents Editor

    • Metadata Editor

  • One complex, but flexible workspace

  • No default order for the usage of the tools


Virr editor paginator

ViRR Editor - Paginator

  • Assign the logical page numbers to the physical ones

  • Choose between different formats (Arabic, Latin, custom)

  • Paginate manually or automatically


Virr editor toc editor

ViRR Editor - ToC Editor

  • Gather the logical structure of a work by breaking it down in structural elements

  • Arrange the hierarchical order of structural elements in the tree

  • Assign scans to structural elements

  • Choose from fine granular structural element types (over sixty)


Virr editor metadata editor

ViRR Editor – Metadata Editor

Assign descriptive metadata to structural elements

  • Detailed description of every structural element

  • Systematic browsing

  • Dedicated search will be possible


Natasa bulatovic max planck digital library research and development

ViRR Viewer

Browse by ToC

Navigate to page

View metadata of structural element

Browse by scan

Page

(web resolution)

Page

(full resolution)

on click


Virr sharing and reuse

ViRR: Sharing and reuse

http://virr.mpdl.mpg.de


From virr to digitization lifecycle project

From ViRR to Digitization Lifecycle Project

  • Goal

    • support the complete Digitization Lifecycle with guideliness, standards, tools and a publishing platform

  • Partners:

    • MPI for European Legal History, Frankfurt

    • KunsthistorischesInstitut, Florenz (KHI)

    • Bibliotheca Hertziana, Rom

    • MPI for Human Development, Berlin

  • Related projects:

    • ViRR(see http://colab.mpdl.mpg.de/mediawiki/ViRR:_Virtueller_Raum_Reichsrecht)

    • XML-Workflow (see http://colab.mpdl.mpg.de/mediawiki/MPDL_Project_XML_Workflow)


Imeji management of image collections

Imeji: Management of image collections


Imeji repository of digital images

Imeji: repository of Digital Images

Organized into

  • Collections

    Created and defined by the institution, project, working group

  • Albums

    Created and defined by the researcher


Imeji what is so different about it

Imeji: what is so different about it?

Imeji is not Flickr, nor Facebook...

  • Freely definable metadata profiles at collection level

  • Controlled Vocabularies may be integrated

  • Smart search for dates, ranges (based on the metadata type)

    Helps gathering the metadata more effectively

    Focusses on collaboration and metadata quality

    Repository: Data can be exported at any time


Escidoc and other services

eSciDoc and other services


Escidoc soa landscape1

eSciDoc SOA Landscape


Escidoc core infrastructure

Report Handler

Report Definition Handler

Aggregation Definition Handl.

Statistics Data Handler

Scope Handler

Admin Handler

Set Handler

(OAI-PMH)

Item Handler

Container Handler

Content Relation Handler

Context Handler

Organizational Unit Handler

Content Model Manager

User Account Handler

Role Handler

Group Handler

eSciDoc core infrastructure

Statistics

Security

Resources & Data


Cone service

CoNE Service

  • Manages named entities

    • Journals

    • Persons

    • Dewey Decimal Classification (3 public levels)

    • Creative Commons Licenses (CC licenses)

    • ISO 639-3 Languages

    • MIME Types

    • PACS classification

    • Custom classifications

  • Reuse

    • Data delivered in multiple formats (JSON, HTML, RDF/XML, Options list)

  • Motivation

    • Metadata quality: autosuggest components in solutions during metadata editing

    • Disambiguation: each entity is a named graph

    • Data linking: CoNE identifiers in publication metadata

    • Technical facilitation: all lists in one place

    • Persons: Researcher Portfolio

  • Extensions

    • Refresh data from external sources


Cone control of named entities http cone mpdl mpg de

CoNE – Control of Named Entitieshttp://cone.mpdl.mpg.de/

http://pubman.mpdl.mpg.de/cone/persons/resource/persons2450

+

Content negotiation supported


Transformation service

Transformation Service

  • Transforms textual data formats

    • Metadata

    • Resources

    • Standard formats

    • Specific formats (e.g. EndNote custom fields)

  • Motivation

    • Migration of data from MPI

    • Exports and dissemination

    • Imports

    • Continuous interoperability enhancement

    • Implement once, use wherever needed


Search export service ciation style manager

Search&Export ServiceCiation style manager

  • Searches and exports results

    • Citation styles (Citation style manager)

    • EndNote

    • BibTex

  • Reuse

    • Data delivered in multiple formats (PDF, HTML, XML, ODT)

    • By external systems (content management, wordpress)

  • Motivation

    • Search results should be available in various outputs

    • One service – many presentations (e.g. Wordpress Plug-in)

    • One interface – easy inclusion of various export formats


Syndication service

Syndication Service

  • Provides with the latest data updates

    • RSS

    • Atom

  • Reuse

    • Subscription to feeds and data reuse

    • By any external clients

  • Extensions

    • Media RSS

Feeds:

<feed>

<!--The title of the feed -->

<title>Recent releases in repository</title>

<!--Feed's description -->

<description>Recent releases in repository (item versions)</description>

</feed>

Feeds:

<feed>

<!--The title of the feed -->

<title>Recent releases in repository</title>

<!--Feed's description -->

<description>Recent releases in repository (item versions)</description>

</feed>

Feeds:

<feed>

<!--The title of the feed -->

<title>Recent releases in repository</title>

<!--Feed's description -->

<description>Recent releases in repository (item versions)</description>

</feed>

2: Get feed definition

2: Get feed definition

2: Get feed definition

Syndication

Service

1

4

Syndication

Service

1

4

Syndication

Service

1

4

3: Search/retrieve items

3: Search/retrieve items

3: Search/retrieve items

eSciDoc

Repository

eSciDoc

Repository

eSciDoc

Repository


Validation service

Validation service

  • Semantical validation

  • Contextual validation

  • Validation rule editor (upcoming)


Data acquisition service

Data acquisition service

  • Fetches data from known sources via identifier (unAPI interface)

  • Transforms data to other format


Pubman sword server

Pubman SWORD Server

  • Deposit of data packages (metadata and fulltexts)

  • Logic implements a pubman specific workflow


Pid cache manager

PID Cache manager

  • Fetches Handles from the GWDG Handle System (dummy resolution)

  • Assigns a pre-fetched handle to the resource

  • Synchronizes the assigned handle with the resolution to a resource in the Handle system

EPIC – European Persistent Identifier Consortium (GWDG Germany, SARA Netherlands, CSC Finland, http://www.pidconsortium.eu/ )


A note on the m etadata profiles

A note on the metadata profiles

  • DCAP based (Dublin Core Application Profile)

  • DC terms (identified URIs)

  • eSciDoc solution specific terms (identified by URIs)

  • METS/MODS

  • Publicly available

    • Functional description http://colab.mpdl.mpg.de/mediawiki/ESciDoc_Application_Profiles

    • Schemas http://metadata.mpdl.mpg.de/escidoc/metadata/schemas/0.1/

  • Interoperability levels

    • Shared term definitions (done)

    • Semantic interoperability (done)

    • Description set syntactic interoperability (prepared)

    • Description set profile interoperability (prepared)


Premises

Premises

  • Applications

    • Web-based

    • Internationalized

    • Integrated Help system

    • Easy to use

    • Easy to install

  • Services and infrastructure

    • Reusable, interoperable, composed, technology-independent

    • Extensible, Scalable and performant 

  • Data

    • Persistently identified, versioned, discoverable, provenance and authenticity information, fine-grained authorization

    • Described with published metadata profiles

    • Interoperable and enabled for reuse and repurpose


Related projects and new developments

Related projects and new developments

  • DARIAH

    Digital Research Infrastructure for Arts and Humanities (see http://dariah.eu)

    • Imeji

  • AWOB

    • Astronomers Workbench

  • Resource Registries

  • ECHO – European Cultural Heritage Online

    (seehttp://echo.mpiwg-berlin.mpg.de/home )


Thank you

Thank you!

  • [email protected]

    http://colab.mpdl.mpg.de

    http://escidoc.org


  • Login