Environments for escience on distributed infrastructures
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Environments for eScience on Distributed Infrastructures PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on
  • Presentation posted in: General

Environments for eScience on Distributed Infrastructures. Marian Bubak Department of Computer Science and Cyfronet AGH University of Science and Technology Krakow , Poland http:// dice.cyfronet.pl Informatics Institute, System and Network Engineering

Download Presentation

Environments for eScience on Distributed Infrastructures

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Environments for escience on distributed infrastructures

Environments for eScienceon Distributed Infrastructures

Marian Bubak

Department of Computer Science and Cyfronet

AGH University of Science and Technology

Krakow, Poland

http://dice.cyfronet.pl

Informatics Institute, System and Network Engineering

University of Amsterdam www.science.uva.nl/~gvlam/wsvlam/


Environments for escience on distributed infrastructures

Coauthors

  • BartoszBalis

  • Tomasz Bartynski

  • ErykCiepiela

  • WlodekFunika

  • Tomasz Gubala

  • Daniel Harezlak

  • Marek Kasztelnik

  • MaciejMalawski

  • Jan Meizner

  • Piotr Nowakowski

  • KatarzynaRycerz

  • Bartosz Wilk

  • Adam Belloum

  • Mikolaj Baranowski

  • Reggie Cushing

  • Spiros Koulouzis

  • Michael Gerhards

  • Jakub Moscicki

www.science.uva.nl/~gvlam/wsvlam

dice.cyfronet.pl


Environments for escience on distributed infrastructures

Motivation and main goal

  • Recent trends

    • Enhanced scientific discovery is becoming collaborative and analysis focused; in-silico experiments are more and more complex

    • Available compute and data resources are distributed and heterogeneous

  • Main goal

    • Optimal usage of distributed resources (e-infrastructures, ubiquitous) for complex collaborative scientific applications


Environments for escience on distributed infrastructures

Collaborative eScience experiments

  • (2)Experiment

  • Prototyping:

  • Design experiment workflows

  • Develop necessary components

  • (4) Results

  • Publication:

  • Annotate data

  • Publish data

  • Problem

  • investigation:

  • Look for relevant problems

  • Browse available tools

  • Define the goal

  • Decompose into steps

Shared

repositories

  • (3) Experiment

  • Execution:

  • Execute experiment processes

  • Control the execution

  • Collect and analysis data

A. Belloum, M.A. Inda, D. Vasunin, V. Korkhov, Z. Zhao, H. Rauwerda, T. M. Breit, M. Bubak, L.O. Hertzberger: Collaborative e-Science Experiments and Scientific Workflows, Internet Computing, July/August 2011 (Vol. 15, No. 4),pp. 39-47


Environments for escience on distributed infrastructures

Applications

Stream oriented applications

Data parallel application

Parameter sweep applications

Infrastructure

Desktops

Clusters

Grids

Clouds

Storage

Federated Cloud Storage

Hbase

Scaling

Automatic Task farming for grid jobs and web services

MapReduce

Provenance

Open Provenance model

Xml history Tracing

System under research

Repository

Provenance

workflow

Cloud

Cloud

www.science.uva.nl/~gvlam/wsvlam/


Environments for escience on distributed infrastructures

Research objectives

Investigating applicability of distributed computing infrastructures (DCI; clusters, grids, clouds) for complex scientific applications

Optimization of resource allocation for applications on DCI

Resource management for services on heterogeneous resources

Urgent computing scenarios on distributed infrastructures

Billing and accounting models

Procedural and technical aspects of ensuring efficient yet secure data storage, transfer and processing

Methods for component dependency management, composition and deployment

Information representation model for DCI federation platforms, their components and operating procedures


Environments for escience on distributed infrastructures

Spatial and temporal dynamics in grids

  • Grids increase research capabilities for science

  • Large-scale federation of computing and storage resources

    • 300 sites, 60 countries, 200 Virtual Organizations

    • 10^5 CPUs, 20 PB data storage, 10^5 jobs daily

  • However operational and runtime dynamics have a negative impact on reliability and efficiency

~95%

1 job

asynchronous and frequent failures and hardware/software upgrades

100 jobs

<10%

long and unpredictable job waiting times

3 hours

seconds

J.T.Moscicki:Understanding and mastering dynamics in Computing Grids,UvA PhD thesis, promoter: M. Bubak, co-promoter: P. Sloot; 12.04.2011


Environments for escience on distributed infrastructures

User-level overlay with late binding scheduling

  • Improved job execution characteristics

  • HTC-HPC Interoperability

  • Heuristic resource selection

  • Application awaretaskscheduling

1.5 hours

40 hours

Completion time

with late binding.

Completion time

with early binding.

J.T.Moscicki, M.Lamanna, M.Bubak, P.M.A.Sloot: Processing moldable tasks on the Grid: late job binding with lightweight user-level overlay, FGCS 27(6) pp 725-736, 2011


Environments for escience on distributed infrastructures

Cloud performance evaluation

  • Performance of VM deployment times

  • Virtualization overhead Evaluation of open source cloud stacks (Eucalyptus, OpenNebula, OpenStack)

  • Survey of European public cloud providers

  • Performance evaluation of top cloud providers (EC2, RackSpace, SoftLayer)

    • A grant from Amazon has been obtained

M. Bubak, M. Kasztelnik, M. Malawski, J. Meizner, P. Nowakowski and S. Varma:Evaluation of Cloud Providers for VPH Applications, posterat CCGrid2013 - 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, the Netherlands, May 13-16, 2013


Environments for escience on distributed infrastructures

Resource allocationmanagement

The Atmosphere Cloud Platform is a one-stop management service for hybridcloud resources, ensuringoptimaldeployment of application services on the underlying hardware.

Admin

External application

VPH-Share Master Int.

OpenStack/Nova Computational Cloud Site

VPH-ShareCore Services Host

Amazon EC2

Other CS

Atmosphere Management Service (AMS)

CloudFacade

(secureRESTful API )

Developer

Scientist

Cloud Manager

AtmosphereInternal Registry (AIR)

Cloud stackplugins(Fog)

Development Mode

Generic Invoker

Workflow management

WorkerNode

Worker Node

Worker Node

Worker Node

Worker Node

Worker Node

Worker Node

Worker Node

Head Node

Cloud Facade client

CustomizedapplicationsmaydirectlyinterfaceAtmosphere via itsRESTfulAPI calledthe Cloud Facade

Image store (Glance)

P. Nowakowski, T. Bartynski, T. Gubala, D. Harezlak, M. Kasztelnik, M. Malawski, J. Meizner, M. Bubak: Cloud Platform for Medical Applications, eScience 2012 (2012)


Environments for escience on distributed infrastructures

Costoptimization of applications on clouds

  • Infrastructure model

    • Multiple compute and storage clouds

    • Heterogeneous instance types

  • Application model

    • Bag of tasks

    • Leyered workflows

  • Modeling with AMPL (A Modeling Language for Mathematical Programming)

  • Cost optimization underdeadline constraints

  • Mixed integer programming

  • Bonmin, Cplex solvers

M. Malawski, K.Figiela, J.Nabrzyski:Cost minimization for computational applications on hybrid cloud infrastructures, Future Generation Computer Systems, Volume 29, Issue 7, September 2013, Pages 1786-1794, ISSN 0167-739X, http://dx.doi.org/10.1016/j.future.2013.01.004


Environments for escience on distributed infrastructures

Workflow management systems in eScience

“are key technology to integratecomputing and data analysis components, and to controlthe execution and logical sequences among them. By hidingthe complexity in an underlying infrastructure, SWMSs allow scientists to design complex scientific experiments, access geographically distributed data files, and execute the experiments using computing resources at multiple organizations.“

Report of the NSF/Mellon Workshop on Scientific and Scholarly Workflow. Oct 3-5, 2007, Baltimore, MD


Environments for escience on distributed infrastructures

Auto-scaling workflows

  • Automatic scaling of workflow components based

    • Resource load

    • Application load

    • provenance data

  • Scaling across various infrastructures

    • desktop

    • Grids

    • Clouds

R. Cushing, S.Koulouzis, A. S. Z. Belloum, M.Bubak:Dynamic Handling for Cooperating Scientific Web Services, 7th IEEE International Conference on e-Science, December 2011, Stockholm, Sweden


Environments for escience on distributed infrastructures

Auto-scaling workflows

Service Load

Running Service

instances

R. Cushing, S.Koulouzis, A. S. Z. Belloum, M.Bubak:Dynamic Handling for Cooperating Scientific Web Services, 7th IEEE International Conference on e-Science, December 2011, Stockholm, Sweden


Environments for escience on distributed infrastructures

Auto-scaling workflows

R. Cushing, S.Koulouzis, A. S. Z. Belloum, M.Bubak:Prediction-based Auto-scaling of Scientific Workflows, Proceedings of the 9th International Workshop on Middleware for Grids, Clouds and e-Science, ACM/IFIP/USENIX December 12th, 2011, Lisbon, Portugal


Environments for escience on distributed infrastructures

Workflow as a Service

  • Once a workflow is initiated on the resources it stays alive and process data/jobs continuously

  • Reduce the scheduling overhead

R. Cushing, Adam S. Z. Belloum, V. Korkhov, D. Vasyunin, M.T. Bubak, C. Leguy:Workflow as a Service: An Approach to Workflow Farming, ECMLS’12, June 18, 2012, Delft, The Netherlands


Environments for escience on distributed infrastructures

Provenance in Practice: Blast Application

[Department of Clinical Epidemiology, Biostatistics and Bioinformatics (KEBB), AMC ]

The aim of the application is the alignment of DNA sequence data with a given reference database.

A workflow approach is used to run this application on distributed computing resources.

  • For Each workflow run

  • The provenance data is collected an stored following the XML-tracing system

  • User interface allows to reproduce events that occurred at runtime (replay mode)

  • User Interface can be customized (User can select the events to track)

  • User Interface show resource usage

on-going work UvA-AMC-fh-aachen


Environments for escience on distributed infrastructures

Semanticworkflowcomposition

  • GworkflowDL language (with A. Hoheisel)

  • Dynamic, ad-hoc refinement of workflows based on semantic description in ontologies

  • Novelty

    • Abstract, functional blocks translated automatically into computation unit candidates (services)

    • Expansion of a single block into a subworkflow with proper concurrency and parallelism constructs (based on Petri Nets)

    • Runtime refinement: unknown or failed branches are re-constructed with different computation unit candidates

T. Gubala, D. Harezlak, M. Bubak, M. Malawski:Semantic Composition of Scientific Workflows Based on the Petri Nets Formalism.In:"The 2nd IEEE International Conference on e-Science and Grid Computing", IEEE Computer Society Press, http://doi.ieeecomputersociety.org/10.1109/E-SCIENCE.2006.127, 2006


Environments for escience on distributed infrastructures

Semantic integration for science domains

  • Concept of describing scientific domains for in-silico experimentation and collaboration within laboratories

  • Based on separation of the domain model, containing concepts of the subject of experimentation from the integration model, regarding the method of (virtual) experimentation (tools, processes, computations)

  • Facets defined in integration model are automatically mixed-in concepts from domain model: any piece of data may show any desired behavior

  • Proposed, designed and deployed themethod for 3 domains of science:

  • Computational chemistry inside InSilicoLab chemistry portal

  • Sensor processing for early warning and crisis simulation in UrbanFlood EWS

  • Processing of results of massive bioinformatic computations for protein folding method comparison

  • Composition and execution of multiscale simulations

  • Setup and management of VPH applications

T. Gubala, K. Prymula, P. Nowakowski, M. Bubak: Semantic Integration for Model-based Life Science Applications. In: SIMULTECH 2013 Proceedings of the 3rd International Conference on Simulation and Modeling Methodologies, Technologies and Applications, Reykjavik, Iceland 29 - 31 July, 2013, pp. 74-81


Environments for escience on distributed infrastructures

Cooperative virtual laboratory for e-Science

  • Design of a laboratory for virologists, epidemiologists and clinicians investigating the HIV virus and the possibilities of treating HIV-positive patients

  • Based on notion of in-silico experiments built and refined by cooperating teams of programmers, scientists and clinicians

  • Novelty

  • Employed full concept-prototype-refinement-production circle for virology tools

  • Set of dedicated yet interoperable tools bind together programmers and scientists for a single task

  • Support for system-level science with concept of result reuse between different experiments

T. Gubala, M. Bubak, P.M.A. Sloot:Semantic Integration of Collaborative Research Environments, chapter XXVI in “Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare”, Information Science Reference IGI Global 2009, ISBN: 978-1-60566-374-6, pages 514-530


Environments for escience on distributed infrastructures

GridSpace - platform for e-Science applications

  • Experiment: an e-science application composed of code fragments (snippets), expressed in either general-purpose scripting programming languages, domain-specific languages or purpose-specific notations. Each snippet is evaluated by a corresponding interpreter.

  • GridSpace2 Experiment Workbench: a web application - an entry point to GridSpace2. It facilitates exploratory development, execution and management of e-science experiments.

  • Embedded Experiment: a published experiment embedded in a web site.

  • GridSpace2 Core: a Java library providing an API for development, storage, management and execution of experiments. Records all available interpreters and their installations on the underlying computational resources.

  • Computational Resources: servers, clusters, grids, clouds and e-infrastructures where the experiments are computed.

E. Ciepiela, D. Harezlak, J. Kocot, T. Bartynski, M. Kasztelnik, P. Nowakowski, T. Gubała, M. Malawski, M. Bubak: ExploratoryProgramming in the Virtual Laboratory. In: Proceedings of the International Multiconference on Computer Science and Information Technology, pp. 621-628, October 2010, thebestpaper award.


Environments for escience on distributed infrastructures

Goal:

Extending the traditionalscientificpublishing model with computationalaccess and interactivitymechanisms; enablingreaders (includingreviewers) to replicate and verifyexperimentationresults and browselarge-scaleresultspaces.

Collage - executable e-Science publications

Challenges:

Scientific: A commondescriptionschema for primary data (experimental data, algorithms, software, workflows, scripts) as part of publications; deploymentmechanisms for on-demandreenactment of experiments in e-Science.

Technological: Anintegratedarchitecture for storing, annotating, publishing, referencing and reusingprimary data sources.

Organizational: Provisioning of executablepaper services to a largecommunity of usersrepresentingvariousbranches of computational science; fosteringfurtheruptakethroughinvolvement of major players in the field of scientificpublishing.

P. Nowakowski, E. Ciepiela, D. Harężlak, J. Kocot, M. Kasztelnik, T. Bartyński, J. Meizner, G. Dyk, M. Malawski: The Collage Authoring Environment. In: Proceedings of the International Conference on Computational Science, ICCS 2011 (2011), Winner of the Elseview/ICCS Executable Paper Grand Challenge

E. Ciepiela, D. Harężlak, M. Kasztelnik, J. Meizner, G. Dyk, P. Nowakowski, M. Bubak: The Collage Authoring Environment: From Proof-of-Concept Prototype to Pilot Servicein ProcediaComputer Science, vol. 18, 2013


Environments for escience on distributed infrastructures

GridSpace2 / Collage - Executable e-Science Publications

Jun 2012

Dec 2011

Jun 2011

23

  • Goal: Extend the traditionalway of authoring and publishingscientificmethods with computationalaccess and interactivitymechanismsthus bringing reproducibility to scientific computationalworkflows and publications

  • Scientific challenge: Conceive a model and methodology to embracereproducibility in scientificworflows and publications

  • Technological challenge: supportthese by modern Internet technologies and availablecomputinginfrastructures

  • Solution proposed:

    • GridSpace2 – web-orienteddistributedcomputing platform

    • Collage – authoring environment for executablepublications


Environments for escience on distributed infrastructures

GridSpace2 / Collage - Executable e-Science Publications

  • Results:

  • GridSpace2/Collage won Executable Paper Grand Challenge in 2011

  • Collage was integrated with Elsevier ScienceDirect portal so papers can be linked and presented with corresponding computational experiments

  • Special Issue of Computers & Graphics journal featuring Collage-basedexecutable papers was released in May 2013

  • GridSpace2/Collage hasbeen applied to multiplecomputationalworkflows in the scope of PL-Grid, PL-Grid Plus and Mapperprojects

E. Ciepiela, D. Harężlak, M. Kasztelnik, J. Meizner, G. Dyk, P. Nowakowski, M. Bubak: The Collage Authoring Environment: From Proof-of-Concept Prototype to Pilot Service.In: ProcediaComputer Science, vol. 18, 2013

E. Ciepiela, P. Nowakowski, J. Kocot, D. Harężlak, T. Gubała, J. Meizner, M. Kasztelnik, T. Bartyński, M. Malawski, M. Bubak: Managing entire lifecycles of e-science applications in the GridSpace2 virtual laboratory–from motivation through idea to operable web-accessible environment built on top of PL-grid e-infrastructure. In: Building a National Distributed e-Infrastructure–PL-Grid, 2012

P. Nowakowski, E. Ciepiela, D. Harężlak, J. Kocot, M. Kasztelnik, T. Bartyński, J. Meizner, G. Dyk, M. Malawski: The Collage Authoring Environment. In: ProcediaComputer Science, vol. 4, 2011


Environments for escience on distributed infrastructures

Cookery – framework for buildingDSLs

  • Workflowsbased on graphrepresentationsarewidelyused to developscientificapplications. Howevertheyencountercertainissues, theyare not easy to share, to trackchagnes and to performtests.

  • Applications developedusinggeneral-purposeprogramminglangaugesdon’tmeettheseissues – a widerange of toolsweredeveloped for software development for codesharing and trackingchanges (version controll, codereviews).

  • We propose a solutionbased on Rubyprogramminglanguagethatcombinesadvanteges from twoworlds, itis not morecomplex for the end-userthansolutionsbased on graphicalrepresentations and itenables the widerange of tools for software development

  • Applications can be written in DSL thatisclose to English:

Read file /tmp/test_data.gzip.

Count words.

Print result.


Environments for escience on distributed infrastructures

Transformingscripts intoworkflows

  • Scientific workflowsareconsidered to be a convinient high-levelalternative to solutionsbased on programminglanguages

  • We investigateGridSpacecollaborative and execution environment based on Rubylanguagethatenablesacces to GridinfrastructureusingAPIs

  • We describehow to addressissues of analysingRubysorucecode to buildworkflowrepresentations

a = GObj.create

b = a.async_do_sth("")

c = b.get_result

d = a.async_do_sth(c)

e = d.get_result

M. Baranowski, A. Belloum, M. Bubak and M. Malawski: Constructing workflows from script applications,Scientific Programming, 2012, doi:10.3233/SPR-120358


Environments for escience on distributed infrastructures

HyperFlow: model & execution engine

HyperFlow model JSON serialization

{

"name":"...", name of the app

"processes":[...], processes of the app

"functions":[...],functions used by processes

"signals":[...], exchanged signals info

"ins":[...],inputs of the app

"outs":[...]outputs of the app

}

  • Supports a rich set of workflow patterns

  • Suitable for various application classes

  • Abstracts from other distributed app aspects (service model, data exchange model, communication protocols, etc.)

Simple yet expressive model for complex scientific apps

App = set of processes performing well-defined functions and exchanging signals


Environments for escience on distributed infrastructures

Scalable data access

  • Storage federation

  • In service orchestration, all data is passed to the workflow engine

  • Data transfers are made through SOAP, which is unfit for large data transfers

S.Koulouzis, R.Cushing, K.Karasavvas, A. Belloum, M.Bubak: Enabling web services to consume and produce large distributed datasets, to be published JAN/FEB, IEEE Internet Computing, 2012


Environments for escience on distributed infrastructures

Data reliability and integrity

DRI is a toolwhichcankeepstrack of binary data storedina cloudinfrastructure, monitor data availability and faciliateoptimaldeployment of application services in a hybridcloud (bringingcomputations to data ortheotherwayaround).

LOBCDER

DRI Service

Metadata extensions for DRI

A standaloneapplication service, capable of autonomousoperation. It periodicallyverifiesaccess to anydatasetssubmitted for validation and iscapable of issuingalerts to datasetowners and system administrators in case of irregularities.

Validation policy

Register files

Get metadata

Migrate LOBs

Get usage stats

(etc.)

Configurable validation runtime

(registry-driven)

Runtime layer

Extensible

resource

client layer

End-user features

(browsing, querying,

direct access to data,

checksumming)

Binary

data

registry

Store and marshal data

VPH Master Int.

OpenStack Swift

Cumulus

Amazon S3

Data management portlet (with DRI management extensions)

Distributed Cloudstorage


Environments for escience on distributed infrastructures

Data security in clouds

  • Data should be secure stored and realiable deleted when no longer needed

  • Clouds not secureenough, data optimisationspreventingensuringthat data weredeleted

  • A solution:

    • end-to-end encryption (decryption key stays in protected/private zone)

    • data dispersal (portion of data, dispersed between nodes so it’s non-trivial/impossible to recover whole message)

Jan Meizner, Marian Bubak, Maciej Malawski, and Piotr Nowakowski: Secure storage and processing of confidential data onpublic clouds.In: Proceedings of the International Conference On Parallel Processing and Applied Mathematics(PPAM) 2013, Springer LNCS

  • To ensuresecurity of data in transit

  • Modern applicationsusesecuretranportprotocols (e.g.TLS)

  • For legacyunencryptedprotocolsifabsolutlyneeded, or as additionalsecuritymeasure:

    • Site-to-Site VPN, e.g. between cloud sites is outside of the instance, might use

    • Remote access – for individual users accessing e.g. from their laptops


Environments for escience on distributed infrastructures

Colaborativemetadatamanagement

Objectives

  • Provide means for ad-hoc metadata model creation and deployment of corresponding storage facilities

  • Create a research space for metadata model exchange and discovery with associated data repositories with access restrictions in place

  • Support different types of storage sites and data transfer protocols

  • Support the exploratory paradigm by making the models evolve together with data

    Architecture

  • Web Interface is used by users to create, extend and discover metadata models

  • Model repositories are deployed in the PaaS Cloud layer for scalable and reliable access from computing nodes through REST interfaces

  • Data items from Storage Sites are linked from the model repositories


Environments for escience on distributed infrastructures

MapReduce specificlanguage

  • We provide a domainspecificlanguagefor definingMapReduceoperations

  • It allowes to executeoncespecifiedqueries on manyMapReduceengines

  • Applications canswitch data sourceseasier

  • Applications canhaveseparatedenvironmenatsfor differentstages of development (development, testing, production) – morerobustcode


Environments for escience on distributed infrastructures

Separation of concerns

  • Scientific applications are constructed from 3 types of components

  • We strictly define their concerns

    • Tasks is the place where we define computations

    • Resource is where we define used resources

    • In Mapping we join resources with

  • We limit interactions by defining relations

    • Tasks use constructs determined by Resource (e.g. MapReduce constructs

    • Mapping maps corresponding Tasks to Resources


Environments for escience on distributed infrastructures

Towards ecosystem of data and processes

Is it possible to create an ecosystem where scientific dataand processescan be linked through semanticsand used as alternative to the current manual composition of eScience applications?

  • How to implement adaptive scheduling needed for workflow enactment across multiple domains?

  • How to achieve QoS for data centric application workflows that have special requirements on network connections?

  • How to achieve robustness and fault tolerance for workflow running across distributed resources?

  • How to increase re-usability of workflows, workflow components, and refine workflow execution?


Environments for escience on distributed infrastructures

WorkflowlesseScience


Environments for escience on distributed infrastructures

Self-organizing linked process ecosystem

A Networked Open Processes. built from an RDF store describing SADI services.

  • Vertexesare operations described in BioMoby Semantics.

  • Edgesshow a semantic match between output and input


Environments for escience on distributed infrastructures

Computing on browsers

R. Cushing, G.a Putra, S. Koulouzis, A.S.Z Belloum, M.T. Bubak, C. de Laat: Distributed computing on an Ensemble of Browsers, IEEE Internet Computing, PrePress 10.1109/MIC.2013.3, January 2013


Environments for escience on distributed infrastructures

Automata-based dynamic data processing

  • Data processing schema can be considered as a state transformation graph

  • The graph facilitates data processing in many ways

    • Data state can be easily tracked

    • Using the graph as a protocol header, a virtual data processing network layer is achieved

    • Data becomes self routable to processing nodes

    • Collaboration can be achieved by joining the virtual network

State Graph describing a filtering state machine

for tweets which is mapped to 11 VMs

R.Cushing, A.Belloum, M.Bubaket al.: Automata-based Dynamic Data Processing for Clouds, BigDataClouds 2014


Environments for escience on distributed infrastructures

Building scientific software based on Feature Model

Research on Feature Modeling:

  • modelling eScience applications family component hierarchy

  • modelling requirements

  • methods of mapping Feature Models to Software Product Line architectures

    Research on adapting Software Product Line principles in scientific software projects:

  • automatic composition of distributed eScienceapplications based on Feature Model configuration

  • architectural design of Software Product Line engine framework

B. Wilk, M. Bubak, M. Kasztelnik: Software for eScience: from feature modeling to automatic setup of environments, Advances in Software Development, Scientific Papers of the Polish Informations Processing, Society Scientific Council, 2013 pp. 83-96


Environments for escience on distributed infrastructures

Common Information Space (CIS)

  • Facilitatecreation, deployment and robustoperation of EarlyWarning Systems in virtualizedcloud environment

  • EarlyWarning System (EWS): any system

    workingaccording to foursteps:

    monitoring, analysis, judgment,

    action (e.g. environmental

    monitoring)

  • Common Information Space

  • connectsdistributed component into EWS and deployitoncloud

  • optimizesresourceusagetakingintoacount EWS importancelevel

  • provides EWS and selfmonitoring

  • equipped with autohealing

B. Balis, M. Kasztelnik, M. Bubak, T. Bartynski, T. Gubala, P. Nowakowski, J. Broekhuijsen: The UrbanFlood Common Information Space for Early Warning Systems. In: Elsevier Procedia Computer Science, vol 4, pp 96-105, ICCS 2011.


Environments for escience on distributed infrastructures

Multiscaleprogrammingand executiontools

  • A method and an environment for composing multiscaleapplications from single-scale models

  • Validation of the themethodagainst real applicationsstructuredusingtools

  • Extension of applicationcompositiontechniquesto multiscalesimulations

  • Support for multisite execution of multiscalesimulations

  • Proof-of-concepttransformation of high-levelformaldescriptionsintoactualexecutionusing e-infrastructures

MaMe

MAD

GridSpace

K. Rycerz, E. Ciepiela, G. Dyk, D. Groen, T. Gubala, D. Harezlak, M. Pawlik, J. Suter, S. Zasada, P. Coveney, M. Bubak: Support for Multiscale Simulations with Molecular Dynamics, ProcediaComputer Science, Volume 18, 2013, pp. 1116-1125, ISSN 1877-0509

K. Rycerz, M. Bubak, E. Ciepiela, D. Harezlak, T. Gubala, J. Meizner, M. Pawlik, B.Wilk: Composing, Execution and Sharing of Multiscale Applications, submitted to FutureGenerationComputer Systems, after 1st review (2013)

K. Rycerz, M.Bubak, E.Ciepiela, M. Pawlik, O. Hoenen, D. Harezlak, B. Wilk, T. Gubala, J. Meizner, and D. Coster: Enabling Multiscale Fusion Simulations onDistributed Computing Resources, submitted to PLGrid PLUS book 2014

  • MAPPER Memory (MaMe) a semantics-aware persistence store to record metadata about models and scales

  • Multiscale Application Designer (MAD) visual composition tool transforming high level description into executable experiment

  • GridSpace Experiment Workbench (GridSpace) execution and result management of experiments


Environments for escience on distributed infrastructures

PL-Grid Project Results

  • First working NGI in Europe in the framework of EGI.eu (since March 31, 2010)

  • Number of users (March 2012): 900+

  • Number of jobs per month:750,000 - 1,500,000

  • Resourcesavailable:

    • Computing power: ca.230 TFlops

    • Storage:ca. 3600 TBytes

    • High level of availiability and realibility of the resources

  • Facilitating effective use of these resources by providing:

    • innovative grid services and end-user toolslikeEfficient Resource Allocation, Experimental Workbenchand GridMiddleware

    • Scientific Software Packages

    • User support: helpdesk system, broadtrainingoffer

  • Various, well-performeddissemination activities, carried out at national and internationallevels, whichcontributed significantly to increasing of awareness and knowledge about the Project and the grid technology in Poland.


Environments for escience on distributed infrastructures

PLGrid Plus Project Results

  • New domain-specific services for 13 identified scientific domains

  • Extension of the resources available in the PL-Grid Infrastructureby ca. 500 TFlops of computing power and ca. 4.4 PBytes of storage capacity

  • Design and start-up of support for new domain grids

  • Deployment of Quality of Service system for users

  • by introducing SLA agreement

  • Deployment of new infrastructure services

  • Deployment of Cloud infrastructure for users

  • Broadconsultancy, training and disseminationoffer


Environments for escience on distributed infrastructures

Summary

  • Modelling of complex collaborative scientific applications

    • domain-oriented semantic descriptions of modules, patterns, and data to automate composition of applications

  • Studying the dynamics of distributed resources

    • investigating temporal characteristics, dynamics, and performance variations to run applications with a given quality

  • Modelling and designing a software layer to access and orchestrate distributed resources

    • mechanisms for aggregating multi-format/multi-source data into a single coherent schema

    • semantic integration of compute/data resources

    • data aware mechanisms for resource orchestration

    • enabling reusability based on provenance data


Environments for escience on distributed infrastructures

Topics for collaboration

  • Optimization of service deployment on clouds

    • Constraint satisfaction and optimization of multiple criteria (cost, performance)

    • Static deployment planning and dynamic auto-scaling

  • Billing and accounting model

    • Adapted for the federatedcloud infrastructure

    • Handle multiple billing models

  • Supporting system-level (e)Science

    • tools for effective scientific research and collaboration

    • advanced scientific analyses using HPC/HTC resources

  • Cloud security

    • security of data transfer

    • reliable storage and removal of the data

  • Cross-cloud service deployment based on container model

dice.cyfronet.pl

www.science.uva.nl/~gvlam/wsvlam


  • Login