1 / 15

Batch metadata update Draft

Batch metadata update Draft. Natasa Bulatovic 14.06.2010. What we need. eSciDoc Repository Very fast metadata updates RDF Metadata (preferred) Searching, indexing Versioning (not high requirement for metadata) AA Relations, linking etc. How we can achieve.

akira
Download Presentation

Batch metadata update Draft

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Batch metadata updateDraft NatasaBulatovic 14.06.2010

  2. What we need • eSciDoc Repository • Very fast metadata updates • RDF Metadata (preferred) • Searching, indexing • Versioning (not high requirement for metadata) • AA • Relations, linking etc.

  3. How we can achieve • eSciDoc batch metadata update is very slow • Metadata to be in separate store • But splitting it completely from eSciDoc repository would be disadvantage as metadata+content are not considered as a single resource • Drawback: only item level metadata with this proposal (not container/component-level metadata are covered)

  4. What we can use • eSciDoc Handlers • Current services (aa, indexing, etc.) • eSciDoc component – external url

  5. How - 1? Core services Additional (or core) service Container Handler Item Handler Metadata Handler eSciDoc Repository eSciDoc Metadata Store

  6. How - 2? Additional (or core) service • Some items (cmodel based) would store their metadata in an eSciDoc metadata store (link to graph node in metadata store) • Current services (aa, indexing, etc.) • eSciDoc component – external url Metadata Handler GET metadata/escidoc:25/metadata-records/md-rec-1 POST /PUT metadata/escidoc:25/metadata-records/md-rec-1 Core services Container Handler Item Handler eSciDoc Repository eSciDoc Metadata Store (RDF) MD-Face-2 MD-Face-1 Item happiness happiness young young female male Component 1 (image/Fulltext) Internal-managed Component 3 (Metadata record) External-url External content (e.g. supplementar y material) Component 2 (image/Fulltext) External-url

  7. How - 3? Core services Additional (or core) service Container Handler Item Handler • We would have to implement own-pdp for metadata update, but AA rules/policies are already stored in eSciDoc AA • eSciDoc AA properties: context-id, created-by, last-date-of-modification, public-status • Not clear completely, we need to work on this , but sufficient for start Metadata Handler GET metadata/escidoc:25/metadata-records/md-rec-1 POST /PUT metadata/escidoc:25/metadata-records/md-rec-1 eSciDoc Repository eSciDoc Metadata Store (RDF) Item MD-Face-1 happiness young Component 1 (image/Fulltext) Internal-managed Component 3 (Metadata record) External-url female Last-modification-date-metadata Component 2 (image/Fulltext) External-url eSciDoc-AA-properties External content (e.g. supplementar y material)

  8. Possible workflows (ingest)

  9. Possible workflows (metadata batch update) • Statuses of items in eSciDoc core are independent from updates in metadata store • Only pre-requisite: withdrawn can not be modified any longer (must be checked) • Modification of the content via external-url does not version the resource • If needed, versioning can be implemented in same principle (all metadata versions shall be kept in this case in metadata store) • Metadata-store filters / search only has to be implemented separately • Additionally, eSciDoc search service works with content-referenced by external url (according FIZ) (we might have to adopt the indexing of full-text a bit, checking with FIZ) • Submit/Release/Withdraw (purely eSciDoc operations, as so far) • Who can update? (All who can as well in escidoc, we have to implement the PDP for MDStore) • Bookmarking: as before (only difference: via escidoc metadata are retrieved as content via locator) • Metadata store must be persistent as escidoc:core • see notes on Locking on slide 13 Start metadata updates Finish metadata updates

  10. Possible workflows (metadata batch update – option) *after items/containers are unlocked, they can be re-released again *During this release (if necessary) metadata records can be stored as additional component of the item *This would require again some time to finish all operations, but needs to be tested *see notes on Locking on slide 13 Start metadata updates Finish metadata updates Release items/container (option) Grab referenced content and create another component as XML/RDF internal managed content in escidoc-item

  11. What is missing in this draft? • Containers/Components batch metadata edit • Why: because containers/components can not have components! • Potential workaround: each container has md-record which contains only a link to metadata store (but is quite cumbersome) • Stage 2 for escidoc-core extension could be: allow for external metadata storage • Integrity: in stage 1 metadata store could be separate storage, therefore integrity would be heavier to achieve • To check: maybe only allow it for released items? • Otherwise: MDStore must implement integrity checking towards eSciDoc (e.g. if items in escidoc were deleted, MDStore would still have the graph)

  12. Which metadata to be managed in MD Store? • Context vs. content model level settings • Recommended: Cmodel level settings • Future options: • Utility: temporary put MD in Temporary MD Store for update (on selected context (independently on Cmodel) • Can be applied to any resource • Requires lock of resources • Requires time to finish the batch-update operations • If not in Cmodel (if metadata are taken for quick modification) => items with updated records have to be batch-updated (evtl. Released, submitted) in escidoc core (will take some time however – but possible) • Whether to store metadata in MDStore or not? • Depends on use-cases e.g. if users would often have need to do batch updates (if that is actually part of normal work) • ToDo: find recommended top limit for batch updates in eSciDoc (5000-6000 thousand items) • However, these would depend of whether escidoc-core will take our model as native service or not (more modifications might be needed in this case)

  13. On Locking • eSciDoc resources will be locked in eSciDoc • Only user who locked them can unlock them • But anyhow, only one user e.g. collection editor can mark this operation as finished (see finish metadata updates) • Do we need it? • Depends, for stage 1 we may not need it • Purpose: to prevent updates via both regular ItemHandler and MDStore at the same time

  14. What is the metadata store? • RDF/Jena based? • Run team to decide: check Willy’s tests with triple store updates

  15. Next steps • Test, test, test • Check with FIZ • Check indexing when storage is external-url • Check possibility to put separate stylesheet • Note: this proposal is not final for escidoc-core updates • to bring this into escidoc-core slightly different approach should be considered • external storage for MDRecords shall be allowed • more integrity-level operations shall be implemented • metadata-locator has to be moved from the component level to the item/container/component level) • Metadata indexing … etc. etc.

More Related