1 / 56

ACI MD GDS

Le middleware pour GDS http://graal.ens-lyon.fr/~diet. ACI MD GDS. Plan. Réservation de ressource dans un ASP hiérarchique Déploiement automatique DIET en P2P DIET vs NetSolve VizDIET Communications dans DIET Une application pour GDS. Context.

hugh
Download Presentation

ACI MD GDS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Le middleware pour GDS http://graal.ens-lyon.fr/~diet ACI MD GDS

  2. Plan • Réservation de ressource dans un ASP hiérarchique • Déploiement automatique • DIET en P2P • DIET vs NetSolve • VizDIET • Communications dans DIET • Une application pour GDS

  3. Context • One long term idea for Grid computing: renting computational power and memory capacity over the Internet • Very high potential • Need of Problem Solving Environments (PSEs) • Applications need more and more memory capacity and computational power • Some proprietary libraries or environments need to stay in place • Difficulty of installation for some libraries or applications • Some confidential data must not circulate over the net • Use of computational servers accessible through a simple interface • Need of schedulers • Moreover … • Still difficult to use for non-specialists • Almost no transparency • Security and accounting issues usually not addressed • Often application dependent PSEs • Lack of standards • (CORBA, JAVA/JINI, sockets, …) to build the computational servers

  4. RPC and Grid-computing  GridRPC • A simple idea • RPC programming model for the Grid • Use of distributed collections of heterogeneous platforms on the Internet • For applications require memory capacity and/or computational power • Task parallelism programming model (synchronous/asynchronous) + data parallelism on servers  mixed parallelism • Needed functionality • Load balancing • resource discovery • performance evaluation • Scheduling • Fault tolerance, • Data redistribution, • Security, • Interoperability, …

  5. Request S2 ! A, B, C Answer (C) GridRPC Client AGENT(s) Op(C, A, B) S1 S3 S4 S2

  6. GridRPC (con’t) • 5 main components: • Client • submits problems to servers • Gives users’ interfaces • Server • solves problems sent by clients • Runs software • Database • contains dynamic and static information about software and hardware resources • Scheduler • chooses an appropriate server depending of • the problem sent • the information contained in the database • Monitor • gets information about the status of the computational resources

  7. LA DIET - Distributed Interactive Engineering Toolbox - • Hierarchical architecture for an improved scalability • Distributed information in the tree • Plug-in schedulers MA MA MA Master Agent MA MA Server front end A Direct connection LA LA

  8. C A B FAST - Fast Agent’s System Timer - • NWS-based (Network Weather Service, UCSB) • Computational performance • Load, memory capacity, and performance of batch queues (dynamic) • Benchmarks and modeling of available libraries (static) • Communication performance • To be able to guess the data redistribution cost between two servers (or between clients and servers) as a function of the network architecture and dynamic information • Bandwidth and latency (hierarchical)

  9. PIF - Propagate Information Feedback - • Algorithm from distributed system research • PIF : Propagate Information Feedback • Two steps • First phase: broadcast phase • Broadcast one message through the tree • Second phase: feedback phase • When the leaf has no descendant feedback message is sent to their parent • When the parent receives the feedback messages from all its descendants, it sends a feedback message to its own parent, and so on

  10. A A LA LA LA LA S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 PIF and DIET - broadcast phase - MA 1. Broadcast the client’s request 2. Sequential FAST interrogation for each LA 3. Resource reservation

  11. A A S12 S15 S4 S7 LA LA LA LA S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 PIF and DIET - feedback phase - MA 1. chooses the identity of the most (or list of) ``appropriate'' server(s) 2. unused resources are released

  12. S12 S4 A A LA LA LA LA S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 PIF and DIET - feedback phase - MA 1. chooses the identity of the most (or list of) ``appropriate'' server(s) 2. unused resources are released

  13. A A LA LA LA LA S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 PIF and DIET - feedback phase - S12 MA 1. chooses the identity of the most (or list of) ``appropriate'' server(s) 2. unused resources are released

  14. Server failure and reactivity • Take into account server failure and increase the DIET reactivity • Time out at the LA level : • Dead Line 1 = ß1* Call_FAST_time + ß2 * nb_server MA A A S2 [DEAD LINE 1] S12 S15 S2 S7 LA LA LA LA S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16

  15. Hierarchical fault tolerance • No answer after dead line 1 • Dead Line 2 = ß3* level_tree MA S7 [DEAD LINE 2] S12 A A S12 S15 S7 LA LA LA LA S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16

  16. Simulation: SimGRID2 • Real experiments or simulations are often used to test or to compare heuristics. • Designed for distributed heterogeneous platforms • Simulations can enable reproducible scenarios • Simgrid: a distributed application simulator for scheduling algorithm evaluation purposes. • Event-driven simulation • Fixed or According to a trace values to characterize SimGrid resources • Processors • Network links • Simgrid2 : A simulator built using SG. This layer implements realistic simulations based on the foundational SG and is more application-oriented. • Simulations are built in terms of communicating agents.

  17. The DIET SimGRID2 simulator

  18. Evaluation of the PIF scheduler

  19. Conclusion and future work • Conclusion • Benefit from distributed system research • Fault tolerance into DIET • Server failure • Branch failure • Resource reservation performs a good QoS for client requests • DIET SimGRID2 simulator • Can be reused to validate other algorithms • Future work • Implementation of some tools to guarantee the resource reservation • Integrate NWS trace into the simulator • How to fix deadlines on a given heterogeneous platform ?

  20. Plan • Réservation de ressource dans un ASP hiérarchique • Déploiement automatique • DIET en P2P • DIET vs NetSolve • VizDIET • Communications dans DIET • Une application pour GDS

  21. Automatic deployment • Problem : Take the right number of components (resources) and place them in right way, to increase the overall performance of the platform. • Motivation : “how to deploy DIET on the grid ?” • Foundation : Idea given in the article “Scheduling strategies for master-slave tasking on heterogeneous processor grids” by C.Banino, O.Beaumont, A. Legrand and Y.Robert.

  22. Introduction • Solution : • Generate a new structure by arranging the resources according to the graph, which gives best throughput. • For homogeneous platform, the resources should be arranged in a binary tree type structure. • For heterogeneous platform, more resources should be added by checking the bottleneck in the structure.

  23. Deployment • Architectural model -

  24. Deployment • wi(Mflp/s): Computing power of node Pi • bij: capacite of link (links are symmetric and bidirectional) • Sini: size of incomming request from client • Souti: size of outgoing request (response) • alphaini: fraction of time for the computation of incomming request • alphaouti: fraction of time taken for the computaion of outgoing resuest

  25. Pi bij Pj Operations in steady state • Calculation of the throughput of a node

  26. Calculation of the throughput of a graph

  27. 7 20 Min(9,3+4) Min(7,6+5) 12 9 7 5 6 4 3 Example: Calculation of throughput of graph 12 Min(20,12) Min(12,7+7)

  28. Homogeneous Structures All nodes have same computing power and bandwidth link Star Graph 2 Depth Star Graph Binary Tree 2 Chain Graph Chain Graph

  29. Homogeneous Structures • Simulation results (with 8nodes) -

  30. Homogeneous Structures • Simulation results (with 32 nodes) -

  31. Homogeneous Structures • Simulation results (for Binary graph) -

  32. Heterogeneous Networks

  33. 24 17.88 19 15 0.31 0.052 29 0.88 Throughput of network 25 R = 2 30 25 25 30 25 25 30 1200 140 115

  34. 1.05 0.1167 0.1167 120 1100 1100 1.05 120 Throughput of network by adding LAs 24 R = 2 R = 2.2 R = 2.65 17.88 19 15 0.11 0.052 29 0.88 140 1200 115

  35. Heterogeneous Network

  36. Experimental results • 1 client with n requests • no steady state (MA performed good) • n clients with n requests • no steady state (MA performed good) • Pipeline effect • not enough nodes (clients) • Buffered the requests at MA • a new client implementation to make an effect of steady state • MA failed with 960 requests (due to memory problem)

  37. Experimental results

  38. Conclusion • Select best structure • Improve the throughput of the network • Predict the performance of the structure • Can find the effects on performance if different changes are done in the structure configuration • Bottleneck is not caused at MA

  39. Conclusion • Homogeneous : • Binary tree type structure is best • Number of nodes is proportionate to number of servers • Star graph type structure, when nodes are less and servers are more than 60 • Heterogeneous : • Find the bottleneck • Improve the throughput • Modelizing the DIET

  40. Future work • Calculate the throughput of structures with multi-client and multi-master agents. • Dynamic updating with the use of package GRAS • Timer addition into the tool to get real value for CORBA implementation of DIET • Check the LA and SeD as the cause of bottleneck • Combine scheduling and deployment to increase the performance • Validation of work by real deployment.

  41. <?xml version="1.0" standalone="yes"?> <launch> <masteragent name = "MA1"> <IP>193.253.175.223</IP> <binary>/home/ckochhof/work/diet/src/dmat_manips/bin/ma1</binary> <localagent name = "LA1"> <IP>193.253.175.224</IP> <binary>/home/ckochhof/work/diet/src/dmat_manips/bin/LA1</binary> <server name = "SeD1"> <IP>193.253.175.226</IP> <binary>/home/ckochhof/work/diet/src/dmat_manips/bin/SeD1</binary> <service>all</service> </server> <server name = "SeD2"> <IP>193.253.175.227</IP> <binary>/home/ckochhof/work/diet/src/dmat_manips/bin/SeD2</binary> <service>T:MatSUM</service> </server> </localagent> <localagent name = "LA2"> <IP>193.253.175.225</IP> <binary>/home/ckochhof/work/diet/src/dmat_manips/bin/LA2</binary> <server name = "SeD3"> <IP>193.253.175.228</IP> <binary>/home/ckochhof/work/diet/src/dmat_manips/bin/SeD3</binary> <service>T:MatSUM</service> </server> </localagent> </masteragent> </launch> #!/usr/bin/perl -w use strict; use XML::DOM; my $parser = XML::DOM::Parser->new(); my $file = ‘launch.xml'; my $doc = $parser->parsefile ($file); package MasterAgent; sub MasterAgent::new { my $class = shift; my $self = {};$self->{type} = undef;$self->{name}= undef;$self->{IP}= undef; $self->{localAgents}= {};$self->{server} = {};bless $self,$class;$self; } sub MasterAgent::funtion{ my $self = shift; my $agt = shift; $self->{name}= $agt->getAttribute('name');; $self->{type} = "MasterAgent"; my @child = $agt->getChildNodes; for(my $i=0;$i<$#child;$i++){ if ($child[$i]->getNodeName == 'IP'){ $self->{IP} = $child[$i]->getData;} if ($child[$i]->getNodeName == 'localagent'){ my $local_name = $child[$i]->getData; my $child_pass = $child[$i]; my $obj_la_ma = new LocalAgent; $self->{localAgents}->{$local_name} = $obj_la_ma->funtion($child_pass,$self);} if ($child[$i]->getNodeName == 'server'){ my $server_name =$child[$i]->getData; my $child_pass = $child[$i]; my $obj_ser_ma = new Server; $self->{servers}->{server_name}= $obj_ser_ma->funtion($child_pass,$self);} } return $self; } traceLevel = 1 agentType = DIET_LOCAL_AGENT name = LA1 Parent name = MA1 fastUse = 1 ldapUse = 0 nwsUse = 1 Master Agent : . Name = MA1 . Binary = /home/ckochhof/work/diet/src/dmat_manips/bin/ma1 . IP = 193.253.175.223 . Local agent = LA1 LA2 Local Agent : . Name = LA1 . Parent agent= MA1 . Binary = /home/ckochhof/work/diet/src/dmat_manips/bin/LA1 . IP = 193.253.175.224 . servers = SeD1 SeD2 Server : . Name = SeD1 . Parent agent = LA1 . binary = /home/ckochhof/work/diet/src/dmat_manips/bin/SeD1 . IP = 193.253.175.226 . services = all Server : . Name = SeD2 . Parent agent = LA1 . Binary = /home/ckochhof/work/diet/src/dmat_manips/bin/SeD2 . IP = 193.253.175.227 . services = T:MatSUM Local Agent : . Name = LA2 . Parent agent= MA1 . Binary = /home/ckochhof/work/diet/src/dmat_manips/bin/LA2 . IP = 193.253.175.225 . servers = SeD3 Server : . Name = SeD3 . Parent agent = LA2 . Binary = /home/ckochhof/work/diet/src/dmat_manips/bin/SeD3 . IP = 193.253.175.228 . services = T:MatSUM LA1.cfg To Do • Script to launch the agents. • Simulations to check the parser • Practical implementation Automatic Deployment : first tool

  42. Plan • Réservation de ressource dans un ASP hiérarchique • Déploiement automatique • DIET en P2P • DIET vs NetSolve • VizDIET • Communications dans DIET • Une application pour GDS

  43. DIET en P2P • Existant • Multi-MA disponible avec connection en JXTA • Docs disponibles • Archive disponible : diet-0.7_beta-dev-jxta.tgz • TODO list • Evaluer les performances • Vérifier le respect des coding standard • Intégration au CVS DIET • Briser la contrainte 1 composant JXTApour 1 composant DIET • Algorithmes “intelligents” pour le parcours des MA ? MA MA MA Connexions JXTA MA MA A LA LA LA

  44. Plan • Réservation de ressource dans un ASP hiérarchique • Déploiement automatique • DIET en P2P • DIET vs NetSolve • VizDIET • Communications dans DIET • Une application pour GDS

  45. Scripts de déploiement. Utilisation de CVS pour mettre à jour les fichiers de configuration. gateway_vthd DIET vs NetSolve clients paraski agents + servers sunlabs clients ls

  46. DIET vs NetSolve

  47. DIET vs NetSolve • TODO List • Tests avec API asynchrone. • 'Multithreader' le client. • Amélioration des statistiques (indice de dispersion). • Amélioration des scripts de déploiements (fichiers de configuration DIET et omniORB). • Expliquer les résultats de NetSolve • Expliquer le problème des 40 clients de DIET • Tests sur les SPARC • Tests icluster2 ?

  48. Plan • Réservation de ressource dans un ASP hiérarchique • Déploiement automatique • DIET en P2P • DIET vs NetSolve • VizDIET • Communications dans DIET • Une application pour GDS

  49. Chaque LogManager collecte les infos de son agent et les envoie au LogCentral situé en dehors de la structure DIET. VizDiet : outil de visualisation en Java Interraction sur la plate-forme VizDIET

  50. VizDIET 1.0 • Intégration de LogService (LogManager/LogCentral) dansles agents DIET • Transfert de messages depuis l'agent par l'intermédiaire du LogManager • pas de stockage sur disque • Etude vizPerf vs vizDIET • Conclusion : vizPerf trop éloigné de la structure DIET

More Related