weaving the web of european social science n.
Skip this Video
Loading SlideShow in 5 Seconds..
Weaving the Web of European Social Science PowerPoint Presentation
Download Presentation
Weaving the Web of European Social Science

Loading in 2 Seconds...

play fullscreen
1 / 27

Weaving the Web of European Social Science - PowerPoint PPT Presentation

  • Uploaded on

Weaving the Web of European Social Science. Jostein Ryssevik Director of Technology and Development Nesstar. The last page of the Internet. Go there. Next. ). (. time spent on digesting and thinking. Maximize. time spent on finding and accessing. Tools for thought.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Weaving the Web of European Social Science

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
weaving the web of european social science

Weaving the Web of European Social Science

Jostein Ryssevik

Director of Technology and Development


tools for thought



time spent on digesting and thinking


time spent on finding and accessing

Tools for thought

“Much more time went into finding or obtaining information than into digesting it.”

Dr. J.C.R. Licklider

as we may think
As We May Think
  • Vannever Bush and Memex (1945)
  • A “furniture” able to enter and hold large amount of scientific information on microfilm
  • ...all indexed
  • ...all hyperlinked (or linked together by trails)
  • ...even supporting the idea of user-supplied links as well as user-supplied annotations
  • ...allowing public information to be privatized/customized
  • ...as well as private information to become public
50 years later
50 years later
  • ...but even today, comparative social science research in Europe is hampered by the fragmentation of the scientific information space.
  • Data, information and knowledge are scattered in space and divided by technical, lingual and institutional barriers.
  • As a consequence too much of the research are based on data from a single nation, carried out by a single-nation team of researcher and communicated to a single-nation audience.
scenarios from the social science dream machine
Scenarios from “The Social Science Dream Machine”
  • …a user is looking for data on political trust from three different regions of Europe
  • He uses the geographical interface to a research location service to circle in the relevant regions on the map and enters the additional search criteria.
  • From the returned hit list he is able to browse layers of increasingly detailed metadata describing potential sources.
  • He is even allowed to perform simple statistical analysis and visualizations on-line to make sure that the data fulfills his requirements. Several datasets can be brought to the desktop at the same time to ease the comparison.
  • As soon as a decision has been made, the chosen datasets can be downloaded and automatically converted to the format of his favorite statistical package. All relevant metadata travels along with the data to assist the researchers in his analysis.
another scenario
Another scenario
  • …a user analysing a group of variables in dataset X would like to know if there are similar datasets from other countries that could be used for a comparative study.
  • ..by hitting the “get comparable dataset” button, a list of potential datasets is immediately returned.
  • …she would also like to have an overview of knowledge products (papers, articles etc.) based on this study
  • …as references and links to derived knowledge products are an integrated part of the metadata of each single dataset a sorted list can be displayed by a single keystroke. Some of these references are also including e-mail and website-addresses to the relevant researchers.
  • …finding a problem with one of the variables, she writes a note and appends it to the ”user experience-section” of the metadata to allert future users (she also submits a rating of the resource according to an established quality standard within here community)
  • ...and when the research paper is ready and published in an on-line journal, links to the dataset is added to allow future users to revisit her analysis
yet another scenario i promise this is the last one
Yet another scenario... (I promise – this is the last one)
  • …a user that is reading an article in an on-line journal finds a link that connects him to the data that was used by the author to underpin the argument. The link allows the user to rerun the analysis, and also to dig deeper into the same data-source.
  • ...through his ”digital research assistant” (an active agent), he is also also made aware of several other relevant data sources published after the article was written and he uses these to challenge the conclusion of the author
  • ...he also leaves the query with his ”digital reserach assistant” to make sure that he is alerted if a new dataset meeting his requirements is published somewhere around the world at a later stage (the agent is instructed to only include datasets from “trusted sources” and moreover to exclude any dataset below a certain sample size)
the web revolution a global memex
The Web revolution – a global Memex
  • The very first really global information system
  • The very first “many to many” communication technology
  • From many local to a single global hypertext-space
  • True multi-media: The Web has taken all existing media as its content
  • Decentralized
  • Scalable
local versus global information systems



Unlimited amount of information (unbounded)

Limited amount of information (bounded)

The complete information space is known, categorized and indexed

Only parts of the information space are known, categorized and indexed

A single unified system of concepts securing unambiguous categorization of recourses as well as semantic interoperability across resources (“complete understanding”)

Several partially competing systems of concept producing alternative categorization of resources as well as low levels of semantic interoperability across resources (“partial understanding”)

Deep standardization

Shallow standardization

Centralized: a single publisher and registration authority

Decentralized: millions of publishers – no central authority

Link integrity

Broken links (the famous 404)

Limited scalability

Unlimited scalability

Local versus global information systems
  • The original Web
  • a naming convention (the URL)
  • a simplistic hypertext format (HTML)
  • a simple transport protocol (HTTP)

“The design of the Web fundamentally differed from traditional hypertext systems in sacrificing link integrity for scalability.”

Tim Berners-Lee: “Web Architecture: Describing and Exchanging Data”,


towards the semantic web

The current Web

The Semantic Web

Towards the Semantic Web

“Human endeavor is caught in an eternal tension between the effectiveness of small groups acting independently and the need to mesh with the wider community.

A small group can innovate rapidly and efficiently, but this produces a subculture whose concepts are not understood by others. Coordinating actions across a large group, however, is painfully slow and takes an enormous amount of communication.

The world works across the spectrum between these extremes, with a tendency to start small—from the personal idea—and move toward a wider understanding over time.”

Tim Berners-Lee, James Hendler and Ora Lassila

The Semantic Web,

Scientific American, May 2001

“The Semantic Web is an extension of the current Web in which information is given well defined meaning, better enabling computers and people to work in cooperation.”

Tim Berners-Lee

  • From documents to data
  • From brainware to software
  • From machine readable to machine “understandable” information
  • Metadata – the glue of the Semantic Web
  • A framework for “knowledge representation” – RDF
  • The introduction of namespaces (allowing different system of terms and concepts to cohabitate in a single information system)
  • Partial understanding/agreement
  • The vision: the creation of a dynamic framework facilitating cooperation/interoperability across domains and communities - gradually expanding the “web of understanding”.
the semantic web at work


Kowledge products






The Semantic Web at work
hype or reality
Hype or reality.....?
  • Not many Semantic Web applications out there so far
  • RDF not really taken off
  • ...and the other layers of the semantic Web still on the planning stage
  • ....“it will never work”
  • ....”will be overshadowed and overtaken by the other buzz-word of today's Web community – “Web services””
  • ...on the other hand:
  • ...Berners-Lee and his crew has been right before
  • ...the amount of RDF-coded information is increasing
  • ...the number of tools and platforms is increasing
  • ...the “big guys” are gradually entering the scene, most noteworthy Adobe’s new “eXtensible Metadata Platform (XMP) using RDF to model and represent metadata within documents and other information objects.
metadata data about

Labeled stuff

Unlabeled stuff

The bean example is taken from: A Manager’s

Introduction to Adobe eXtensible Metadata Platform,


Metadata – data about.....
the functions of metadata


The functions of metadata




users of metadata

An example from the world of statistics

  • Human readable information about data quality:
  • information about sampling methods and procedures allowing a human reader to assess whether a statistical resource has been created according to a certain standard
  • Machine understandable information about data quality:
  • sample size
  • response rate
  • sampling methodology (according to a specific classification of methodologies – a controlled vocabulary identified by a namespace where the meaning of the different terms are well defined)
  • the number of publications produced on the basis of the resource
  • ratings (given by persons and organizations according to a well defined rating system – allowing a user to instruct a software agent to skip resources where the quality score given by a defined set of trusted organizations are below a certain limit)
Users of metadata
  • Metadata for human consumption
    • .... any human readable information that allow a user to find, assess and understand information objects in order to use them in a sound way
  • Metadata for machine consumption
    • ....any machine processable information that allow a piece of software to find, assess and “understand” information objects in order to manipulate them in a sound way on behalf of a human user
  • Metadata meant for humans versus machines are often just different representations of the same type of information.
metadata on the web

...all hyperlinked

Catalogue info


Data dictionary


Data quality



Sharing 1


Sharing 2

Metadata on the Web


an extended metadata concept
An extended metadata concept
  • Not only descriptions of data but also knowledge products derived from the use of data
  • A dynamic concept - metadata are constantly developing through the life-time of a dataset
  • A variety ofauthors - many who are not data producers
  • ...as opposed to what we might call the “Cathedral View” on metadata:
  • .... metadata seen as a coherent and centralised collection of information with clearly defined boundaries, provided by a single authority for a defined community of users.
Established in 1995 to create a universally supported metadata standard for the social science community
  • Initiated and organised by the the Inter-University Consortium for Political and Social Research (ICPSR), Michigan, USA
  • Members coming from social science data archives and libraries in USA, Canada and Europe and from major producers of statistical data
  • First version of the standard expressed as an SGML-DTD
  • Translated to XML in 1997
  • Extensive testing carried out spring-summer 1999
  • DDI 1.0 published spring 2000
  • DDI 1.1 with minor revisions and some additions published autumn 2001
  • The DDI 2.0 process???
  • Acceptance
    • fast take-up in the community of data archives and data libraries world-wide
  • Community building
    • revitalised the co-operation and sharing of know-how and technologies among the archives and libraries
  • Strengthening of the ties to the data producers
  • Software development
the ddi in action what do we know
The DDI in action – what do we know?
  • The costs of migrating a data archive to the DDI is high (much higher than the cost of any DDI-compliant archiving software)
  • The DDI is a “cathedral standard” (no data provider is buying the full package – they are all using the DDI building blocks to build their own more modest “parish churches”).
  • The DDI is a very loose structure loaded with alternatives and ambiguities (a single study described by two different archives will probably look quite different)
  • The DDI is only telling half the story (data providers will have to add their own local guidelines on top of the DDI (controlled vocabularies, mandatory elements etc) to secure internal standardization
  • The DDI is inflexible (there is no extension mechanism that allows a data provider to add local elements without breaking the standard)
  • A pure “bottom-up” approach: The DDI is used to describe concrete files or products coming out of the statistical process. It has no level of abstraction above or beyond a physical statistical product
  • Machine-understandable versus human-understandable: Using XML does not automatically create metadata that is complete and logical enough to drive software processes
nesstar vision
Nesstar - vision

To develop a truly distributed platform for electronic publishing of statistical data, building on object technology, open (metadata-) standards and lightweight Internet protocols.

....or simply

To bring the models, technologies and collective energy of the Web to the world of statistics.

nesstar an overview
An architecture for a totally distributed global data library

The ability to locate multiple data sources across national boundaries

The ability to browse detailed information about these data sources

..and to do data analysis and visualisation over the net

..or to download the appropriate subset of data in one of a number of formats

Supporting standard micro-data as well as aggregated tables/cubes

Allowing the user to bookmark/hyperlink resources in the data and metadata repositories



analysis (tables, models etc.)

..and to hyperlink these resources from external Web-objects (like texts)

Support for metadata indexing using a multilingual thesaurus

Powerful data preparation tools, including a system for remote publishing of data to NESSTAR servers

Nesstar – an overview
nesstar background
Nesstar - background
  • Originally funded by EC under the Electronic Publishing Program
    • NESSTAR (1998-1999)
    • FASTER (2000-2001)
  • Nesstar Limited established in June 2001
    • Owned jointly by The Norwegian Social Science data Services (NSD) and UK Data Archive
    • Offices in Colchester (UK) and Bergen (Norway)
nesstar a fully distributed data web





Nesstar - a fully distributed Data Web
madiera multilingual access to data infrastructures of the european research area
MADIERA (Multilingual Access to Data Infrastructures of the European Research Area)
  • An EC funded project to commence October 2002
  • Purpose: to develop and build a European-wide virtual social science data library based on Nesstar and Semantic Web technology
  • Building blocks:
    • A shared metadata standard (the DDI)
    • A shared technological platform (Nesstar)
    • A multilingual thesaurus (ELLST)
    • A classification system for social science data resources
    • A georeferencing system for social science data resources
    • A methodology for identification of comparable data
    • A feedback system for user supplied information
    • A system for creating links to between data and knowledge products