Metadata why and how for social science
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Metadata: why and how for social science PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on
  • Presentation posted in: General

Metadata: why and how for social science. Louise Corti Online Resources Day 15 November 2005, London. What Do Social Researchers Want?. Discover available datasets (globally, not just in their own country) and related research literature

Download Presentation

Metadata: why and how for social science

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Metadata why and how for social science

Metadata: why and how for social science

Louise Corti

Online Resources Day

15 November 2005, London


What do social researchers want

What Do Social Researchers Want?

  • Discover available datasets (globally, not just in their own country) and related research literature

  • Understand in detail the origin, methodology and structure of datasets (social sciences datasets are modest in size but big in complexity)

  • Compare and Link data from different sources

  • Model the social phenomena underlying the data

  • Publish their findings with all the supporting evidence (no ‘iceberg’ publishing) and Reproduce published results

  • Connect to other experts and Share informal comments and advice

  • Enforce confidentiality and intellectual property rights while mantaining accuracy and access to data sources.

  • … and more


Metadata why and how for social science

How?

  • through rich and systematic description – though a language that humans and computers can both understand

  • using commonly agreed or mappable vocabularies and standards

  • which must be flexible and adaptable

  • metadata


What are metadata

What are metadata?

Metadata are structured data which describe the characteristics of an object or resource. They share many similar characteristics to the cataloguing that takes place in libraries, museums and archives. The term "meta" derives from the Greek word denoting a nature of a higher order or more fundamental kind.

A metadata record typically consists of a number of pre-defined elements representing specific attributes of a resource, and each element can have one or more values.


Metadata why and how for social science

Grasshopper


Metadata schema

Metadata schema

Element nameValue

  • Title Web UKDA Catalogue

  • CreatorLouise Corti

  • PublisherUK Data Archive

  • Identifierhttp://www.data-archive.ac.uk/

  • FormatText/html

  • RelationData Archive Web site

    Each metadata schema will usually have the following characteristics:

    • a limited number of elements

    • the name of each element

    • the meaning of each element


International standards for metadata schema

International standards for metadata schema

  • to ensure that every element of information pertaining to the lifecycle of an object ( collection) can be captured:

    • creation, appraisal, accessioning, conservation, preservation, availability and access

  • must be dynamic and must be open to amendment

  • aim to be consistent, appropriate and self-explanatory description

  • facilitate the retrieval and exchange of information

  • enable the sharing of authority data

  • enable the integration of descriptions from different locations into a unified information system


Common metadata schemas

Common metadata schemas

Dublin Core

minimum number of elements required to facilitate the discovery of document-like objects in a networked environment (eg Internet). Currently 15:

Content: Title, Subject, Description, Source, Language, Relation, Coverage

Intellectual Property: Author/Creator, Publisher, Contributor, Rights

Electronic/Physical Manifestation: Date,Type, Format, Identifier

ISAD(G) General International Standard of Archival Description

E-GIF E-Government Interoperability Framework

OAIS Open Archival Information Systems Reference Model

OAI Open Archives Initiative Protocol for Metadata Harvesting


No shortage of statistical metadata standards

No shortage of statistical metadata standards

  • The Common Warehouse Metamodel (CWM) from OMG – data warehousing and business intelligence

  • ISO 11179 – data elements in a metadata repository

  • SDMX – multidimensional data and time-series

  • IQML, AskXML and Triple-S - questionnaire data

  • The Data Documentation Initiative (DDI) – a general metadata standard for statistical data (micro as well as aggregated)

  • And many other related standards. e-Social Science requires more than simple ”data” metadata:

    • Thesauri, Classifications


Encoding schemes

Encoding schemes

  • HTML (Hyper-Text Markup Language in Web pages, version 3.2 or 4.0)

  • SGML (Standard Generalised Markup Language)

  • XML (eXtensible Markup Language)

  • RDF (Resource Description Framework)

  • MARC (MAchine Readable Cataloging)

  • MIME (Multipurpose Internet Mail Extensions)

  • Z39.50 (protocol for distributed information retrieval)

  • LDAP (Lightweight Directory Application Protocol)


Example of deploying metadata for a simple web resource

Example of deploying metadata for a simple web resource

  • embedding the metadata in a Web page by the creator using META tags in the HTML coding of the page

  • as a separate document (eg XML) linked to a web resource it describes

  • in a database linked to the web resource. The records may either have been directly created within the database or extracted from another source, such as Web pages

  • but what about complex social science data?


Stepping back the standard study description

Stepping back:The Standard Study Description

  • devised in 1970s to describe academically created sociological/political science datasets

  • recommended key bibliographic elements

  • informally ‘adopted’ by CESSDA in 1980s

  • often adapted to suit local needs


The standard study description recommended elements

The Standard Study Descriptionrecommended elements:

  • subject category

  • title

  • depositor

  • principal investigator

  • abstract and main topics

  • kind of data

  • dimensions of dataset

  • universe sampled

  • sampling procedures

  • method of data collection

  • dates of coverage, fieldwork and deposit

  • availability and access conditions

  • references to reports and related datasets

Controlled vocabulary

  • adopted for some elements

    • e.g sampling, kind of data

  • subject and geographical key words from broad social science Thesaurus (HASSET)


The first step towards interoperability

The first step towards interoperability

  • driven by the need to search across European Data Archive holdings

  • development of a core element set for the Integrated Data Catalogue (IDC)

  • catalogue records marked with standard tags for inclusion into WAIS indexes (Wide Area Information Servers)

  • enabled multi-site searching via WAIS protocol

  • simplistic and excluded - links to additional metadata, documentation, thesaurus help, and browsing


Metadata why and how for social science

  • the DDI is widely adopted by social sciences data archives all over the world that provide many of the datasets used by social scientists for secondary analysis

  • initiated and organised by the the Inter-University Consortium for Political and Social Research (USA) in 1995 to create a metadata standard for the social science community

  • members coming from social science data archives and libraries in USA, Canada and Europe and from major producers of statistical data

  • first in SGML then in XML

  • DDI 1.0 published in 2000. Currently at version 2. Version 3 is being designed and it is scheduled for 2006


The structure of a ddi codebook

The Structure of a DDI Codebook

  • Document Description

    • Description of the codebook document itself (author, sources, etc)

  • Study Description

    • Information about the entire study or data collection (content, collection methods, processing, sources, access conditions etc)

  • File Description

    • Description of each single file of the data collection (formats, dimensions, processing information, etc.).

  • Data Description

    • Description of each single variable in a datafile (format, variable and value labels, definitions, question texts, imputations etc.)

  • Other Study-related Materials

    • References to reports and publications and other machine readable documentation


Metadata why and how for social science

000001 1 1 44 123 9 5 4 5

000002 1 3 47 003 1 3 3 3

000003 2 5 43 155 1 1 2 3

000004 1 3 36 012 2 5 5 5

000005 9 4 24 207 9 1 4 5

Data description - variables

Country Ocuupation

CaseNumber

Sex Age

QuestionResponses


Metadata why and how for social science

DDI in XML


Understanding statistical metadata

Understanding Statistical Metadata

Different approaches to understanding:

  • what is it for?

    • statistical metadata has no value in itself, it is just a means to an end. Its progress should be measured by the extent that it facilitates social research

  • what is it like?

    • Anything familiar we can relate it to? Form of communication might be a good choice


Benefits

Benefits

  • interoperability

    • homogeneous exchangeable documents

  • richer content

    • comprehensive set of elements providing the potential data analyst with broader knowledge

  • single document - multiple purposes

    • repurposed for different needs and applications – preservation, discovery, and dissemination

  • on-line subsetting and analysis

    • standard uniform structure and content for variables, ensures easy import into on-line analysis systemsp

  • precision in searching

    • field-specific searches across documents are enabled

  • and more …

    • human-readable and computer actionable

    • essential foundation for E-science and the Grid


Eu madiera portal

EU Madiera Portal

Search

Multilingual

Browsing

Meta(data) Browsing


Summary the ddi

Summary - the DDI

  • The DDI can serve as the foundation for content, distribution, use and preservation of data collections in the social and behavioural sciences, across institutions, countries, and disciplines

  • cooperation from both data producers and statistical software manufacturers, so that the DDI specification can readily become the basis for the entire research process, from generation of a data collection instrument to production of research articles

  • serves the social science community well with a specification that produces quality metadata with multiple purposes. It fully documents the details of datasets, it is user friendly and accessible, it integrates into the infrastructure of the Web and it supports automatic generation of statistical software system files.

  • the widespread adoption of the DDI will vastly improve access to a range of varied datasets. Expanded use will greatly enhance comparative research; the ability to harmonize datasets over time and geography will lead to significant improvement in our understanding of societies


The future

The future

Statistical metadata is here and it is already changing the way people locate and make sense of data but it does not yet support most use cases of interest to social scientist. What we will need to move forward is:

  • Grammar, a standard Semantic infrastructure (e.g. as provided by the Semantic Web):

    • semantic extendibility

    • ability of integrating (merging and overriding) descriptions from different sources

  • large Vocabulary, by integrating different flavours of metadata:

    • unique identifiers for data and research literature

    • statistical data metadata (full life cycle)

    • Ontologies, Thesauri and Classifications (and mappings among them)

    • statistical processing metadata

    • “Secondary metadata”: annotations, quality assessment, links to research literature

    • experts metadata (FOAF)


Not even half way there

  • Future developments:

  • Progress in metadata and technical standardisation

  • Latent knowledge capture and extraction

Not Even Half Way There ..

Annotations

Comparable

variables

Unified

Authentication

Integrated Data Catalogue

Nesstar – Data Web

Grid

Mappings

References

Extraction

Cooperative

Markup

ELSST

DDI Standard

RDF

Semantic Web

USI

TEI for QD


Qualitative data and the ddi

Qualitative data and the DDI

  • in October 2001 ESDS Qualidata formally adopted the DDI to describe data

  • in 2000, began to explore standards for archiving, and web representation of qualitative data

  • expertise from the text processing/arts and humanities communities - TEI

  • ESDS Qualidata Online show basic potential of what can be achieved by a common standard

  • need to catch up with the statistical community!

  • working model that will presented today


  • Login