Support for the Full e-Experimentation
1 / 11

Support for the Full e-Experimentation Cycle in the Virtual Laboratory Infrastructure - PowerPoint PPT Presentation

  • Uploaded on

Support for the Full e-Experimentation Cycle in the Virtual Laboratory Infrastructure. Piotr Nowakowski (1), Eryk Ciepiela (1), Tomasz Gubała (1), Maciej Malawski (1, 2), Marian Bubak (1, 2) ( 1 ) ACC Cyfronet AGH, ul. Nawojki 11, 30-950 Kraków, Poland

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Support for the Full e-Experimentation Cycle in the Virtual Laboratory Infrastructure' - vahe

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Support for the Full e-ExperimentationCycle

in the Virtual Laboratory Infrastructure

Piotr Nowakowski (1), Eryk Ciepiela (1), Tomasz Gubała (1), Maciej Malawski (1, 2), Marian Bubak (1, 2)

(1) ACC Cyfronet AGH, ul. Nawojki 11, 30-950 Kraków, Poland

(2) Institute of Computer Science AGH, Mickiewicza 30,

30-059Kraków, Poland


Zakopane, 18-19 March 2010


  • Motivation

  • Problem definition

  • Scientific challenges

  • Iterative experimentation support

  • Experiment pipelines and traces

  • Sharing experiment data through Data Nets

Motivation: e-Science Experiments,Dataand Publications

  • Reproducible experiments, provenance in e-Science

  • Need to link publications with primary data (experimental data, algorithms, software, workflows, scripts)

  • Plentitude of scientific software: jobs, workflows, services, components, scripts, experiment plans

  • Huge amount of scientific data consumed and producedby e-Science

    • Earth and life Sciences, HEP, etc.

  • Large number of publications makes research difficult:

    • Computer Science: DBLP contains more than 220 = 1,048,576 publications,

    • PubMed stores ~17 million articles to date,

    • CM digital library, ISI Web of Knowledge, Scopus, Citeseer,arXiv, Google Scholar

  • Emergence of the Web 2.0-based Scientific Social Community (SSC) model

Open Science & Science 2.0

  • New means of scientific communication:

    • Wikis, blogs

    • collaborative web 2.0 technologies

  • New methods of conducting science:

    • e-science,

    • in-silico experiments,

    • exploratory applications

  • Democratization of science

  • Increasing role of openness

Problem Definition

  • To construct a theoretical model facilitating open, collaborative e-experimentation, from experiment inception to publication of results, including primary scientific data

  • To develop a framework implementing the above model

  • To exploit the emerging solution in the context of existing HPC infrastructures and scientific collaboration

Scientific Challenges

  • Theoretical: A common method for referencing primary data (experimental data, algorithms, software, workflows, scripts) as part of publications should be developed and integrated with modern e-Science infrastructures

  • Technological: An integratedarchitecture for storing, annotating, publishing, referencing and reusing primary data sources.This architecture should span existing virtual laboratory and grid computing systems

Description of the Solution

  • Phase 1: Iterative experiment preparation

  • Phase 2: Experiment execution involving semantic storage of results and ensuring repeatability

Experimentation Pipeline

  • The process of developing an experiment beings with drafting its specification

  • This is followed by iteratively constructing an experiment plan

  • Each prototype is tested by a specific research community, using tools provided by the PL-Grid virtual laboratory

  • Upon completion of tests the experiment can be executed in a production mode

  • Obtained results can be published along with the experiment plan (i.e. a set of operations which enable reenactment and validation of a given experiment)


  • An experiment trace consists of the following:

    • any input data provided by the experiment enactor;

    • all steps performed in order to transform this data into publishable scientific results (chronologically arranged);

    • the documentation of the experiment plan, prepared by a domain scientist (in the form of annotations and comments).

  • The outcome of this process will be easily manageable and readable, similarly to weblog entries

  • Our VL system will enable enrichment of individual data elements with provenance information, linking them to appropriate stages of the experiment

SharingPrimary Data: DataNets

Data Net– unifying modern data storage mechanisms (relational databases, Grid-based file systems, Wiki pages etc.)

A Data Net is a group of data entities linked by named relationships. Such relationships impose a structure upon the dataset and facilitate querying for entities


  • W. Funika, D. Harezlak, D. Krol, M. Bubak; Environment for Collaborative Development and Execution of Virtual Laboratory Applications. In: M. Bubak, G.D.v. Albada, J. Dongarra, P.M.A. Sloot (Eds.), Proceedings ICCS 2008, Kraków, Poland, LNCS 5103, pp. 246-458, Springer 2008.

  • T. Gubala, M. Bubak, P.M.A. Sloot; Semantic Integration of Collaborative Research Environments, M. Cannataro (ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global.

  • M. Bubak, M. Malawski, T. Gubala, M. Kasztelnik, P. Nowakowski, D. Harezlak, T. Bartynski, J. Kocot, E. Ciepiela, W. Funika, D. Krol, B. Balis, M. Assel, and A. Tirado Ramos. Virtual laboratory for collaborative applications. In M. Cannataro, editor, Handbook of Research on Computational GridTechnologies for Life Sciences, Biomedicine and Healthcare, chapter XXVII, pages 531-551. IGI Global, 2009.