1 / 14

The current state of Metadata - as far as we understand it -

The current state of Metadata - as far as we understand it -. Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen, The Netherlands . Old Concept. of course "metadata" is an old concept library cards were introduced to cope with

aqua
Download Presentation

The current state of Metadata - as far as we understand it -

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen, The Netherlands

  2. Old Concept • of course "metadata" is an old concept • library cards were introduced to cope with • mass and anonymity • not surprising that library people started thinking • about this to describe all kind web-accessible resources • DC and qualified DC wee the results • however, research world is different - not just search • therefore in many domains solutions were developed • 2 years ago CLARIN revised its 15 year old set&framework

  3. Big Ideas • of course managing increasing amounts of data • of course finding valuable data in the growing haystacks • but also • machine usage of metadata • automatic profile matching • research statistics - virtual sub-collection building • etc. • multilinguality in a multilingual European society • interdisciplinary research • biodiversity people should find information in linguistic archives • etc. • linking with contextual information • document lifecycle management (provenance)

  4. Big Change • until now researchers informed each other • culture of personal exchange • claim: this will only work partially in the future • have distributed centers storing lots of data • national and discipline dimensions • depositors upload their data into these centers • will have an anonymous landscape of data & tools • all offered as services • what do we have to find things: • proper metadata descriptions • social tagging by virtual organizations • content to operate on by "smart" data mining

  5. Big Question • are we ready to meet these wishes and changes? • probably not • some major issues • quality • interoperability • registry and reference stability • functional • multilingual • scalability • IT principles

  6. Quality Issue • lack quality in descriptions • not all elements filled in • (researchers are lazy, lack of tool support) • often not schema based (XLS) thus inconsistent • lack agreed and standardized vocabularies • ISO 639-3 - about 6000 language codes • what about subject classification schemes • what about institution names • thus many errors and inconsistencies • ontologies are expensive to maintain • misinterpretations/misuse of element semantics • etc

  7. Interoperability Issue • hampered by different approaches • (closed DB, no modularity, embedded ontologies) • structural difficulties up to context dependency • difficult semantic mapping • different description dimensions • bad element definitions • bad vocabulary definitions • only little support of OAI-PMH • reliance on DC semantics - but useless for research etc • often "hardwired" mappings • lack of a flexible framework to create/share/use relations • little is standardized - what about lifetime then

  8. Registry and Reference Stability Issue • flexibility only when we separate things • define & register all concepts in open registries • (we are using ISO 12620 - ISOcat) • define & register all components/profiles • (we are using CLARIN registry) • register all mappings (nothing yet) • but if we do this we need to refer • are our references stable?? • some are using Cool URIs - are they just URLs? • some using explicit Handles - are they maintained? • who takes care? • (we are using EPIC - European PID Consortium)

  9. Functional Issue • do we address new functional requirements • what about provenance information • is it automatically generated • what about versions - are they visible • what about ltp information • what about formal access information • do we know what is needed for the web services scenario • (profile matching, deployment information, etc)

  10. Multilingual Issue • what does it really include? • localizing all software • multilingual definitions of all concepts • elements and vocabulary terms • (no translations of proper names of course or?) • or do we simply rely on some lingua franca • answer probably discipline dependent • how much is (should be) public involved • whatever we do it is a lot of work • CLARIN: ISOcat covers almost all major EU languages

  11. Scalability Issue • are our solutions scalable? • in EUROPEANA millions of metadata records • in CLARIN about 270.000 • how to structure the offer • how to present this to naive users • do we share same granularity • (md at collection and/or resource level) • can we deal with aggregations in same way • can we apply semantic web technology • automatic mapping • automatic quality improvement

  12. IT Principles • we need to disseminate the message of some • basic IT principles • define and register your semantics • specify and register your syntax • use a stable reference scheme • in some areas separate definitions and relations • get things standardized or use standards such as • XML, some schema language • ISO 12620, etc • URI, Handles

  13. What can we do? • listen to each other first • increase awareness about metadata and basic principles • see how we can create an interoperable landscape • harmonizing approaches • harmonizing along major issues • making things explicit and scalable • look for proper interdisciplinary solutions

  14. moving towards an ideal e-Science domain Ümnichtto end in Babylonish scenario nous avons still algo time omsistemas teimprove. Thanks for your attention.

More Related