A nation wide Experimental Grid

Grid’5000 * 1 of the 30+ projects of ACI Grid * A very brief overview A nation wide Experimental Grid Franck Cappello (with all project members) INRIA fci@lri.fr

Agenda Grid’5000 project Grid’5000 design Grid’5000 development Technical and Management difficulties Conclusion

Grid & P2P raise research issues but also methodological challenges • Grid & P2P are complex systems: • Large scale, Deep stack of complicated software • Grid & P2P raise a lot of research issues: • Security, Performance, Fault tolerance, Scalability, Load Balancing, Coordination, Message passing, Data storage, Programming, Algorithms, Communication protocols and architecture,Deployment, Accounting, etc. • How to test and compare? • Fault tolerance protocols • Security mechanisms • Networking protocols • etc.

Tools for Distributed System Studies To investigate Distributed System issues, we need: 1) Tools (model, simulators, emulators, experi. Platforms) Models: Sys, apps, Platforms, conditions Real systems Real applications “In-lab” platforms Synthetic conditions Key system mecas. Algo, app. kernels Virtual platforms Synthetic conditions Real systems Real applications Real platforms Real conditions Abstraction Realism emulation math simulation live systems validation 2) Strong interaction between these research tools

Existing Grid Research Tools France USA Australia • SimGRid and SimGrid2 • Discrete event simulation with trace injection • Originally dedicated to scheduling studies • Single user, multiple servers Japan • GridSim • Dedicated to scheduling (with deadline), DES (Java) • Multi-clients, Multi-brokers, Multi-servers • Titech Bricks • Discrete event simulation for scheduling and replication studies • GangSim • Scheduling inside and between VOs • MicroGrid, • Emulator, Dedicated to Globus, Virtualizes resources and time, Network (MaSSf) These tools do not scale well (many are limited to ~100 nodes) • They do not capture the dynamic and complexity of real life conditions

We need Grid experimental tools In the first ½ of 2003, the design and development of two Grid experimental platforms was decided: • Grid’5000 as a real life system • Data Grid eXplorer as a large scale emulator log(cost & complexity) This talk Grid’5000 DAS 2 TERAGrid PlanetLab Naregi Testbed Major challenge Data Grid eXplorer WANinLab Emulab Challenging AIST SuperCluster SimGrid MicroGrid Bricks NS, etc. Reasonable Model Protocol proof log(realism) emulation math simulation live systems

An Exampleon Fault tolerance Consider an non trivial application: (transport of a polluting composite across geological layers) Using a parallel iterative finite difference algorithm, Implemented in MPI, With two variants : asynchronous and synchronous If you are considering any Grid software stack, there are several layers involved in fault tolerance: -Network protocols, OS, Grid middleware, Application Runtime, Communication library, Application Which layer(s) should be involved in the actual management of nodes and network failures? How to coordinate layer decisions? What is the most efficient approach on the Grid: Synchronous or asynchronous iterative algorithm?

An Exampleon Application Validation XtremLab is an experimental platform running tests, experimental applications, measurement tools to study P2P systems BOINC (Berkeley) is the software behind SETI@home,Folding@home, Climate Prediction, HEP@home, etc. In comparison with PlanetLab, the experiments are run on participant PC NOT on dedicated machines (there are advantages and drawbacks) Main drawback: we can’t afford running a test that will a) crash some machines, b) slowdown significantly some machines, c) reduce significantly the network bandwidth of the participants. We need a large scale infrastructure to validate the tests, experimental application and measurement tools before running them on XtremLab, at a comparable scale.

Grid’5000 foundations:Collection of experiments to be done • Networking • End host communication layer (interference with local communications) • High performance long distance protocols (improved TCP) • High Speed Network Emulation • Middleware / OS • Scheduling / data distribution in Grid • Fault tolerance in Grid • Resource management • Grid SSI OS and Grid I/O • Desktop Grid/P2P systems • Programming • Component programming for the Grid (Java, Corba) • GRID-RPC • GRID-MPI • Code Coupling • Applications • Multi-parametric applications (Climate modeling/Functional Genomic) • Large scale experimentation of distributed applications (Electromagnetism, multi-material fluid mechanics, parallel optimization algorithms, CFD, astrophysics • Medical images, Collaborating tools in virtual 3D environment

Grid’5000 goal:Experimenting all layers of the Grid and P2P software stack Application Programming Environments Application Runtime Grid or P2P Middleware Operating System Networking A highly reconfigurable experimental platform

Grid’5000 foundations:Collection of properties to evaluate Quantitative metrics : • Performance • Execution time, Communication time, throughput, overhead • Scalability • Resource occupation (CPU, memory, disc, network) • Applications algorithms • Number of users • Fault-tolerance (dependability) • Tolerance to very frequent failures (volatility), tolerance to massive failures (a large fraction of the system disconnects) • Fault tolerance consistency across the layers of the software stack.

Experiment workflow Log into Grid’5000 Import data/codes yes Build an env. ? no Reserve nodes corresponding to the experiment Reserve 1 node Reboot node (existing env.) Reboot the nodes in the user experimental environment (optional) Adapt env. Transfer params + Run the experiment Reboot node Collect experiment results Env. OK ? Exit Grid’5000 yes

The Grid’5000 Project • Building a nation wide experimental platform for • Grid & P2P researches (like a particle accelerator for the computer scientists) • 9 geographically distributed sites • every site hosts a cluster (from 256 CPUs to 1K CPUs) • 2/3 of the machines should be homogeneous • All sites are connected by RENATER (French Res. and Edu. Net.) • RENATER hosts probes to trace network load conditions • Design and develop a system/middleware environment • for safely test and repeat experiments • 2) Use the platform for Grid experiments in real life conditions • Address critical issues of Grid system/middleware: • Programming, Scalability, Fault Tolerance, Scheduling • Address critical issues of Grid Networking • High performance transport protocols,Qos • Port and test applications • Investigate original mechanisms • P2P resources discovery, Desktop Grids

Grid’5000 map The largest Instrument to study Grid research problem 500 500 1000 500 RENATER 10Gb/s (fall 2005) 500 500 500 500 500

Today Planning June 2003 2004 2005 2006 2007 Discussions Prototypes 5000 Installations Clusters & Net Preparation Calibration ~3500 2500 Experiments International collaborations CoreGrid ~1500 1250 Current budget Processors

Routeur RENATER Router RENATER Router RENATER Switch Grid Security Architecture 2 fibers (1 dedicated to Grid’5000) Grid5000 site MPLS Router RENATER 8 VLANs per site G5k site LAb normal connection to Renater RENATER Cluster + env. Switch/router labo G5k site Controler (DNS, LDAP, NFS, /home, Reboot, DHCP, Boot server) Grid’5000 User access point LAB Lab. Clust Firewall/nat Local Front-end (logging by ssh) Grid5000 site Reconfigurable nodes Configuration for private addresses 9 x 8 VLANs in Grid’5000 (1 VLAN per tunnel)

User environment /home in every site for every user; manually triggered synchronization G5k site admin/ (ssh loggin + password) LDAP LDAP Creat user Controler Creat user /home and authentication /home/site1/user /site2/user /site… Creat user /home and authentication G5k site Creat user /home and authentication Controler LDAP rsync (directory) rsync (directory) G5k site LAB/Firewall Controler Router LDAP /home/site1/user /site2/user /site… Users/ (ssh loggin + password) Cluster Firewall/nat rsync (directory) /tmp/user User script for 2 level sync: -local sync -distant sync Cluster

Rennes Lyon Sophia Grenoble Bordeaux Toulouse Orsay

Grid’5000 Software Stackon Front-end nodes Co-reservation, allocation, scheduling (OAR-Grid) Local allocation and scheduling tool (OAR) Software provisioning system (Kadeploy) Users and resources authentication (ssh, LDAP,etc.) Operating System (Linux) Networking

Grid’5000

Grid’5000 prototype

Grid’5000

Grid’5000 prototype

First Difficulty: Developing a Very Fast Reconfiguration System G5k site Users/ admin (ssh loggin + password) G5k site Kadeploy Control Control Control commands -rsync (kernel,dist) -orders (boot, reset) G5k site LAB/Firewall Router Control Firewall/nat Cluster Controler: (Boot server + dhcp) Lab’s Network Site 3 System kernels and distributions are downloaded from a boot server. They are uploaded by the users as system images. 10 boot partitions on each node (cache) Cluster

Grid’2005 paper First Difficulty: Developing a Very Fast Reconfiguration System

Second Difficulty: Technical Issues • -Platform heterogeneity (non uniform hardware), • Due to the scale and funding, it’s very difficult to keep a given level of homogeneity • -Non functioning IPMI (Intelligent Platform Management Interface), • 10% of the machines are lost after every reconfiguration • -Performance disappointment (cluster networks), • With Ethernet, you never get the performance you expected, (almost impossible to reach 4 Gbits/s full duplex on a node) • -LDAP crashes (many at Grid level) •  scalability problem,

Third Difficulty: Management Issues • Steering committee meeting every 4 months • Technical committee meeting every month • Nevertheless: • -Abusive behaviors  • sharing policy establishment • -Distributed installation and management  • system inconsistency + strong system variations • -Distributed & multi sources funding  • heterogeneity increases • -System Complexity  • distributed knowledge and know-how • -Engineer short term contract (1 or 2 years)  • Lost of very valuable knowledge and know-how

Conclusion • Grid’5000 is a experimental platform (still under development) != production Grid • In addition to “classical” management and technical difficulties, We have encountered technical and management issues related to Grid’5000 specificities. + The technical collaboration between the 9 sites is very fruitful. We have started discussions to have international collaborations with DAS and NAREGI (Japan) And, this is a very exiting human adventure!

A nation wide Experimental Grid

A nation wide Experimental Grid

Presentation Transcript

A Unique Authority Nation to Nation

A NATION DIVIDED BECOMES A NATION UNITED

FutureGrid : an experimental, high-performance grid testbed

World Wide Consortium for the Grid

A Unique Authority Nation to Nation

Grid-wide Intrusion Detection

The Wide Area Grid CNES experiment

World wide LHC Computing Grid WLCG

The LHC Computing Grid A World-Wide Computer Centre

A Nation Wide Experimental Grid

Nation-wide E-Governance Projects

State-Wide Collaborative Grid Computing Course

Nation Wide Recycling

Toward a Campus-Wide Grid Computing System

Wide Area Grid – Technical Requirements

Grid-wide Intrusion Detection

Nation Wide Book Buyers Company Profile

A nation wide Experimental Grid

Meteo-GRID: World-wide Local Weather Forecasts by GRID Computing

Grid-wide Intrusion Detection