1 / 30

The Role Of Metadata

UKOLN is supported by:. The Role Of Metadata. Brian Kelly UKOLN University of Bath Bath, BA2 7AY. Email B.Kelly@ukoln.ac.uk URL http://www.ukoln.ac.uk/web-focus/presentations. Contents. Introduction. Introduction Background To Metadata Metadata Standards Metadata Management

gada
Download Presentation

The Role Of Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UKOLN is supported by: The Role Of Metadata Brian Kelly UKOLN University of Bath Bath, BA2 7AY Email B.Kelly@ukoln.ac.uk URL http://www.ukoln.ac.uk/web-focus/presentations

  2. Contents Introduction • Introduction • Background To Metadata • Metadata Standards • Metadata Management • Metadata And Quality • Conclusions • The Brief • "I know from conversations … I have had with customers, that metadata poses some really difficult questions …" • The talk addresses the questions: • What is metadata and why is it important? What's this Dublin Core I've heard about (and why Dublin?) What benefits will I get if I use metadata? How should I do it? What will it cost me?

  3. About UKOLN / Web Focus Introduction • UKOLN: • A national centre of expertise in digital information management (including metadata) • Based at University of Bath • Funded by JISC and Resource to support the Higher & Further / cultural heritage sectors • UK Web Focus: • Provides advice and support on Web issues, especially standards and best practices • Provided by Brian Kelly • Funded by JISC from Nov 1996 - August 2003. Now jointly funded by JISC & Resource • QA Focus: • Developing QA methodology to support JISC digital library programmes

  4. What is the extent of your knowledge of metadata? Novice Average Expert ??? MARC Dublin Core … RDF OAI CLD … About You Introduction • How many are: • Librarians • Software / systems developers (techies) • Commercial vendors • Others

  5. Metadata Demystified In current practice, the term has come to mean structured information that feeds into automated processes, and this is currently the most useful way to think about metadata http://www.niso.org/standards/resources/Metadata_Demystified.pdf What is Metadata? Background • "This metadata you've been talking about …. isn't it just catalogue records?" • Question at metadata seminar, 1998 • Metadata can be regarded as: • Catalogue records for the Web • Data about data • Structured information suitable for automated processing

  6. The Problem Background • Back in mid-1990s: • Size of Web growing exponentially • Web being used for both scholarly and non-scholarly (!) purposes • Need for better searching mechanisms • Search engines seemed promising, but concerns over abuse (e.g. porn index spammers) and difficulties in finding quality information • Various sectors came together to develop a core set of metadata attributes for resource discovery

  7. Dublin Core Dublin Core • In mid-1990s: • Meeting held in Dublin, Ohio in 1995 • Involvement from several sectors (libraries, museums, science, IT, …) • Agreement reached on a core set of metadata attributes for resource discovery • Given the name Dublin Core (DC) • DCMI organisation later formed • DC Working parties established to coordination development of DC • Regular annual conferences held See <http://dublincore.org/>

  8. Why So Complex? Dublin Core • Why is there a need for working groups, annual events, etc. for developing a standard for catalogue records? • It's not just documents: an Author record is inappropriate for a painting, a piece of music, etc. • It's not just for humans: the DC records will be processed by software, for which unambiguity in essential • It needs to be integrated: with a rapidly-developing Web architecture • It needs to be future-proofed : so we don't have to do it all again when a new technology emerges

  9. Using Dublin Core Dublin Core • Note that DCMI defined a core set of elements: Title A name given to the resource. Creator An entity primarily responsible for making the content of the resource. Publisher An entity responsible for making the resource available. Date A date of an event in the lifecycle of the resource. … … • How this format could be represented was not defined initially

  10. Representing Dublin Core Dublin Core • Initially many people thought that DC would be embedded in HTML pages: <META NAME="DC.Creator" CONTENT="Brian Kelly"> • but how are multiple author's represented: <META NAME="DC.Creator" CONTENT="Brian Kelly"> <META NAME="DC.Creator" CONTENT="John Smith"> • or <META NAME="DC.Creator" CONTENT="Brian Kelly, John Smith"> • It is not possible to describe the potential complexities of DC in the HTML language

  11. Dublin Core Is Too Simple! Dublin Core • Dublin Core was designed as a core set of metadata elements for resource discovery. However: • The benefits of the standard became apparent and DC became used in many areas • There was a need to be able to represent richer metadata content and relationship e.g. • Multiple authors and contact details • Alternative titles • Use of controlled vocabularies from particular schemes • A mechanism known as Qualified Dublin Core was developed to address this.

  12. Use In HTML Dublin Core • Dublin Core potential was recognised and the W3C's release of HTML 4.0 included a mechanism for defining schemes in the <meta> element: <meta name = "DC.Subject" content = "heart attack"> <meta name = "DC.Subject" scheme = "MeSH" content = "Myocardial Infarction; Pericardial Effusion"> <meta name = "DC.Type" scheme = "DCMIType" content = "Dataset"> <meta name = "DC.Type" scheme = "DCMIType" content = "Event"> See <http://dublincore.org/documents/2001/04/12/usageguide/qualified-html.shtml>

  13. XML • XML (Extensible Markup Language): • Developed by W3C • A meta-language used to create other languages • Addresses HTML's lack of extensibility • A family of standards which form the foundations for a richer and more interoperable Web: • XML  XML Namespaces • XSLT  XML Schemas • … • A proven success W3C Developments Rather than slowly tweaking HTML to allow rich DC to be embedded, XML allows new metadata applications to be developed which can be integrated with existing Web services

  14. Beyond Use In HTML • In parallel to release of HTML 4.0 W3C working on: • A rich metadata framework which could be used for any metadata application: • Content filtering (this resource contains nudity) • Defining collections of related resources (Web site maps) • Digital signatures • … • Development of the Semantic Web - An ambitious attempt to allow data from distributed services to be integrated W3C Developments RDF (Resource Description Framework) was developed as W3C's solution to both problems

  15. RDF • RDF: • An XML application • Richer than conventional XML applications:a mathematical model which describes relationships is embedded in the RDF • This richness comes with a price - increased complexity W3C Developments RDF applications are being developed. However at present it may be advisable to leave RDF to the research community or well-funded pilot studies to prove its benefits before committing to use in a service environment (However note that metadata in PDF documents is stored as RDF)

  16. Beyond Resource Discovery • Metadata has a role to play beyond item-level resource discovery • Other metadata applications include: • Metadata for digitised objects: about the object and about the digitisation process • Management / administrative metadata: review this resource by xx; delete this resource on …; this resource is managed by the XYZ group; … • Metadata about collections (physical and online) • … Using Metadata

  17. Metadata Modelling (1) • You want to use Dublin Core metadata. How do you choose how to model your metadata? • Do you use simple Dublin Core (the basic 15 elements)? • Do you use qualified Dublin Core to enable richer metadata to be described? • If the latter, how do you decide which qualified DC metadata to use? Using Metadata These are key issues to address. In some cases answers may be provided for you. In other cases, you musty answer these questions for yourself.

  18. Metadata Modelling (2) • Why do you wish to use metadata? • Because it fashionable? • Because you're a librarian and librarians 'do' metadata? • Because you want you Web site to be no. 1 in Google? • Because you are developing an application which requires use of metadata? Using Metadata • Please remember: • Developing applications which make use of metadata can be expensive. • Creating and managing metadata can be expensive • Search engines such as Google typically make little or no use of metadata

  19. Metadata Modelling (3) • Exploit Interactive case study: • EU-funded ejournal • Requirement to provide local searching better than simple free text searching: • Search by title, author and keywords • Search by funding stream • Search by issue and article type • The end-user interface is illustrated Using Metadata See <http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-01/>

  20. Metadata Modelling (4) • How did we manage and model the metadata? doc_title = "The XHTML Interview" author="Kelly, B." title="WebWatching National Node Sites" description = "In this issue's Web Technologies column we ask Brian Kelly to tell us more about XHTML." article_type = "regular" issue_num = "6" pub_date="25 Oct 2002" Issue metadata name = "Exploit Interactive" publisher="UKOLN" Site metadata Article metadata Processed by server-side script <meta name="DC.Title" content="The XHTML Interview"> <meta name="DC.Creator" content="Kelly, B."> <meta name="DC.Description" content="In this issue's Web Technologies …."> <meta name="DC.Relation.IsPartOf" content="http://www.exploit-lib.org/issue6/"> <meta name="DC.Type" content="text.article.regular" scheme="Exploit-categories">

  21. Storing DC Metadata • It is up to you how you store your metadata. Your choice will be affected by the use which will be made of your metadata and how it will be created and managed. Metadata Management You may wish to store your metadata in a database and make it available according to its use. • You may wish to: • Embed HTML metadata in HTML pages • Link to HTML metadata from HTML • Embed RDF • Store metadata in application (home-grown scripts, CMS, metadata repository, image management system, …) HTML RDF Metadata management tool

  22. A Simple DC Management Tool • DC-dot: • Simple Web-based DC creation and management tool • Output in range of formats (HTML, XHTML, RDF, …) • Provides validation • Useful for small-scale metadata creation • But: • Not ideal for large-scale usage • Doesn't provide rich management capabilities Metadata Management http://www.ukoln.ac.uk/metadata/dcdot/

  23. Management Tools • Many types of metadata tools: • Type the metadata by hand • Use File -> Properties menu in MS Office applications and export data • Home-grown database systems • Home-grown scripting solutions • Use of commercial systems: • Library management systems • Image management systems • … Metadata Management There is no single ideal solution. The solution you choose should reflect your needs, expertise, organisational culture, …

  24. Quality Assurance • The Need for QA: • Metadata is the 'glue' for integration of services • If the metadata quality is poor, services will not be able to be interoperable • There is therefore a need for quality assurance procedures to ensure fitness for purpose • What Can Go Wrong? • Things that can go wrong include: • Metadata is out-of-date or incorrect • Metadata is used inconsistently within service • Metadata is used inconsistently across services • Metadata is not modelled correctly • Metadata not compliant with storage standard • … Quality Assurance

  25. Think About The Implementation • It is important that when you deploy metadata systems you can manage and maintain the metadata. For example: • Details of the person maintaining the data change (name change due to marriage, person leaves, …) • Organisational details change (mergers, takeovers, …) • Technology changes Quality Assurance Prepare for change! People change, organisations change, responsibilities change, technologies change, … Ensure that you can manage the metadata which reflects such changes

  26. Need For Cataloguing Rules • Your Cataloguing Rules • You will need cataloguing rules to support your metadata creation • You will need to provide necessary training and support (especially if you are dependent on cataloguing by non-professionals) • Interoperability • How will you interoperate with services which deploy different cataloguing rules: 04/07/03 – what date is this? LSC – what does this stand for? • Humans use context; software products don't • There is a need to define the standards you're applying (in a machine understandable way) Metadata Management

  27. Need For QA Procedures • So we have: • Tools for managing metadata • Cataloguing rules • But: • People make mistakes • Software may have bugs • Our rules may be ambiguous • The standards may be ambiguous • The metadata may be correct but confusing in other contexts, • … Quality Assurance Although humans can adapt to errors and unambiguities, software typically can't. We therefore need quality assurance procedures to ensure that metadata applications will be interoperable.

  28. Approaches To QA • We may wish to consider: • Systematic checking at data creation • Systematic checking of output • Semi-automated checking (e.g. duplication, common misspellings, out-of-range checks, …) • Automated checking • … Quality Assurance Worst Case Scenario: You service is fine, and quality metadata provided. Your data is integrated with others services to provide an international portal to quality resources. However the other service providers have poor quality metadata. The poor quality of the final service brings your contributor into disrepute.

  29. Pulling It Together

  30. Conclusions • To conclude: • Metadata can provide richer searching and other services within a service and the glue for integration across several services • There are several key standards: Dublin Core, HTML, XML, … • You will need to select the standards appropriate to your service requirements • You will need to choose the metadata according to your service requirements • You will need to choose the architectural framework and applications for managing your metadata according to your service requirements • You will need to ensure that you have appropriate quality assurance mechanisms in place – otherwise the above work will have been wasted! • It can be worth it!

More Related