Data Grids - Data Management Environments for e-Science - PowerPoint PPT Presentation

Data grids data management environments for e science l.jpg
Download
1 / 26

Data Grids - Data Management Environments for e-Science. Kerstin Kleese van Dam et. al., CCLRC e-Science Centre k.kleese@dl.ac.uk http://www.e-science.clrc.ac.uk. Metadata. Data without further information is only of short and very limited use. Varying degree of Metadata

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Data Grids - Data Management Environments for e-Science

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data grids data management environments for e science l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Data Grids - Data Management Environments for e-Science

Kerstin Kleese van Dam et. al.,

CCLRC e-Science Centre

k.kleese@dl.ac.uk

http://www.e-science.clrc.ac.uk


Metadata l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Metadata

Data without further information is only of short and very limited use.

Varying degree of Metadata

Many standards and formats

Example: CLRC Scientific Metadata Schema http://www.e-science.clrc.ac.uk/Activity/ACTIVITY=DataPortal;SECTION=5;

used by ISIS, e-Minerals and e-Materials project

NERC DataGrid Metadata Model


Cclrc scientific metadata model diversity users searches l.jpg

Discovery

Excavation

Experimenter

Data curator

General community

Wider science community

Specialistuser

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

CCLRC Scientific Metadata Model - Diversity: Users & Searches


Cclrc scientific metadata model l.jpg

Keywords providing a index on what the study is about.

Provenance about what the study is, who did it and when.

Conditions of use providing information on who and how the data can be accessed.

Detailed description of the organisation of the data into datasets and files.

Locations providing a navigational to where the data on the study can be found.

References into the literature and community providing context about the study.

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

CCLRC Scientific Metadata Model

Metadata

Object

Topic

Study

Description

Access

Conditions

Data

Description

Data

Location

Related

Material


Nerc datagrid metadata and data model l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

NERC DataGrid Metadata and Data Model

  • Provides clear separation of function

    • Difference between data use and discovery etc.

    • “Tuning” of metadata to include relevant detail

  • Allows increased reuse of metadata model

    • Avoids tie-in to details of a particular fields data formats

    • Can plug-in another data model


Conceptual overview l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Conceptual Overview


Slide7 l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

NDG Data Model

Dataset: named container for a number of variables

Variable: physical parameters within the dataset; controlled vocabularies eg BODC datadictionary, CF standard names

Array: multidimensional container for other arrays or numeric data

Coordinate: may be shared between multiple Arrays; ‘anonymous’ if not georeferenced; MappedCoordinate vs ProductCoordinate; with respect to a Coordinate reference System (ref ISO 19111, ISO 19115)

GranuleDescriptor: describes data granule in terms of file storage; enables file aggregation; SQL/OGSA-DAI for RDBMS; physical or logical (eg SRB) files

“Profiles” of model defined for important data types


Different levels of metadata supporting discovery and selection l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Different Levels of Metadata supporting Discovery and Selection

A -Metadata – can be derived from the data itself

B -Metadata – A summary of all other types of metadata

C -Metadata – All related metadata, papers, pictures, related studies

D -Metadata – User provided information on what, who, what and when


Data discovery l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Data Discovery

Most data is currently ‘discovered’ by word of mouth from friends and colleagues or sheer luck.

In a grid environment it is necessary to automate these processes to enable humans and machines/processes alike to discover data.

Example: CCLRC DataPortal http://esc.dl.ac.uk:9000/index.html

The DataPortal software is also used in the e-Minerals Mini Grid.


Cclrc dataportal l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

CCLRC DataPortal

  • The CCLRC DataPortal currently allows access to selected metadata and data from four facilities. The first three housed by CLRC:

    • The Synchrotron Radiation Department (SRD)

    • The Neutron Spallation Source (ISIS)

    • The British Atmospheric Data Centre (BADC)

    • Max-Planck Institute for Meteorology (MPIM)

You are able to assess the available data via the basic search. A Grid enabled version of the DataPortal can be found under:

http://esc.dl.ac.uk:9000/dataportal/index.html

You can also download the code itself for your project under:

for unix http://esc.dl.ac.uk:9000/dist/dataportal/v3/dataportal-v3.tar.gz

for windows http://esc.dl.ac.uk:9000/dist/dataportal/v3/dataportal-v3.zip


General clrc dataportal architecture l.jpg

CLRC DataPortal Server

Other Instances of the CLRC DataPortal Server

XML wrapper

XML wrapper

XML wrapper

Local metadata

Local metadata

Local metadata

Local data

Local data

Local data

Facility 1

Facility N

Facility 1

...

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

General CLRC DataPortal Architecture


Dataportal architecture 2 l.jpg

Data Transfer

External Data File Store(s)

Authentication

&

Authorisation

DataPortal Web Interface

Service

Look Up

Certification

Authority

DataPortal Permanent Repository

Session Management

Query

&

Reply

Shopping Cart

The Shopping Cart allows registered users to permanently store and annotate pointers to the external data files and data sets.

Facilities Access Control

Facilities XML Wrappers

Facility Administration allows external facilities to advertise their grid services to the DataPortal.

Facility

Administration

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

DataPortal Architecture (2)

As well as interacting with the DataPortal via the Web Interface users can also run queries by directly calling the Query & Reply service assuming that they are properly authenticated. Other services are also externally visible, for example the Shopping Cart.


Slide13 l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam


Slide14 l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam


Slide15 l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam


Slide16 l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam


Slide17 l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam


Metadata capture l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Metadata Capture

Metadata needs to be captured or harvested. Some metadata can only be obtained through interaction with the user other metadata can be obtained automatically. The first option needs to be reduced to the absolute minimum.


Slide19 l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam


Automatic capture from climate simulation l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Automatic capture from Climate Simulation


Data management l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Data Management

The Grid environment provides access to a multitude of storage systems, often hiding the type of system behind services interfaces.

Managing personal data in a Grid environment.

Two possible solutions to manage your data:

Globus Data Management tools - example ESG http://www.earthsystemsgrid.org

Storage Resource Broker (SRB) from the San Diego Super Computing Centre

http://www.npaci.edu/DICE/SRB


Storage resource broker 1 l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Storage Resource Broker (1)

Professional Data Storage Management System initially developed in the mid 90’s by the San Diego Super Computing Centre. http://www.npaci.edu/DICE/SRB/. Current version supports many platforms and authentication methods. Web services Interfaces.


Storage resource broker l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Storage Resource Broker

Integrated access to data on PC, UNIX, LINUX, DB and Tape Store http://www.npaci.edu/dice/srb/mySRB/mySRB.html

SRB is currently used within CCLRC and Southampton, operated for the e-Minerals Mini Grid, Bristol PPD, will be tested for NERC DataGrid, e-Materials.


Slide24 l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Functions including ingestion, movement and replication of data. Providing access to data for others

Version of Data

Type of Data

Replica or Original Data

Physical Data Location and Type of Resource


Current projects of the data management group of the cclrc e science centre l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

Current projects of the Data Management Group of the CCLRC e-Science Centre

CLRC DataPortal

Environment from the Molecular Level

NERC DataGrid

Automatic Collection of Climate Simulation Metadata

Storage Resource Broker

e-Science Database Service

Hydrology Data Grid (just funded)

e-Science Technologies for the Simulation of Complex Materials


More information can be found under l.jpg

Environmental e-Science

Challenges & Opportunities

Kerstin Kleese van Dam

More Information can be found under:

CLRC e-Science Centre Projects -

http://www.e-science.clrc.ac.uk/web/projects/

Kerstin Kleese

k.kleese@dl.ac.uk

http://www.e-science.clrc.ac.uk/web/staff/kerstin_kleese


  • Login