1 / 29

Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments

Enol Fernández UAB. Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments. Introduction CrossBroker Glide In Parallel Job Support Interactive Job Support Conclusions. Batch execution on Grids. Job. F1. F2. O1. O2. SERVICES. Middleware. Middleware.

nau
Download Presentation

Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enol Fernández UAB Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments

  2. INGRID 2008, 9th april 2008 Introduction CrossBroker Glide In Parallel Job Support Interactive Job Support Conclusions

  3. INGRID 2008, 9th april 2008 Batch execution on Grids Job F1 F2 O1 O2 SERVICES Middleware Middleware Middleware Internet REMOTE SITE REMOTE SITE

  4. INGRID 2008, 9th april 2008 Job Job F1 F1 F2 F2 I/O forwarding SERVICES Middleware Middleware Middleware Parallel & Interactive Job Execution • Use of resources from different sites • Resource-sets search • Co-allocation & synchronization • Fast start-up • Execution in high-occupancy situations Internet REMOTE SITE REMOTE SITE MPI

  5. INGRID 2008, 9th april 2008 CrossBroker CrossBroker does automatic scheduling in Grid Environments Resource discovery Resource Selection Job Execution Jobs not treated by gLite: parallel jobs (MPI)‏ Run in more than one resource, in a coordinated fashion. Interactive jobs The user interacts with the application during its execution

  6. INGRID 2008, 9th april 2008 CrossBroker Outdated information Dynamic changes Information Index Migrating Desktop Scheduling Agent Resource Searcher LRMS (PBS, LSF, Condor): limited external control Non cooperative LRMS Local user jobs CrossBroker Replica Manager Application Launcher Condor-G DAGMan CE CE EGEE/Globus EGEE/Globus LRMS LRMS WN WN

  7. INGRID 2008, 9th april 2008 Glide In • The idea • Each batch job is encapsulated in an agent that takes control over the WN independently of its LRMS • Lightweight Virtual Machines • Each Worker Node is divided in 2 VM • Each VM can execute jobs independently (e.g. batch and interactive) • Fast startup of jobs (no need to go trough globus + LRMS) • NOT a full virtual machine (Xen, VMWare,…) • NO need for special priviledges in the WN

  8. INGRID 2008, 9th april 2008 Glide In Grid Resource CrossBroker LRMS Batch Job Scheduling Agent Application Launcher Condor-G

  9. INGRID 2008, 9th april 2008 Glide In Grid Resource CrossBroker LRMS Batch Job Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G

  10. INGRID 2008, 9th april 2008 Glide In Grid Resource CrossBroker LRMS Batch Job Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G

  11. INGRID 2008, 9th april 2008 Glide In Grid Resource CrossBroker LRMS Batch Job Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G Available for other jobs

  12. INGRID 2008, 9th april 2008 Parallel Job Support Support for parallel jobs: Open MPI PACX-MPI MPICH-P4 MPICH-G2 Plain (just the machines) Takes into account sites capabilites. Low level details of MPI implementations and sites handled by starter scripts. mpi-start is configured automatically and used by default.

  13. INGRID 2008, 9th april 2008 Parallel Job Support Changes in JDL JOBTYPE: Normal: sequential jobs, just one CPU Parallel: more than one CPU SUBJOBTYPE: openmpi pacx-mpi mpich mpich-g2 Plain Plain allows easy extension for supporting new parallel job types

  14. INGRID 2008, 9th april 2008 Parallel Job Support Type = "Job"; VirtualOrganisation = "imain"; JobType = "Parallel"; SubJobType = "pacx-mpi"; NodeNumber = 5; Executable = "test-app"; Arguments = "-v"; InputSandbox = {"test-app", "inputfile"}; OutputSanbox = {"std.out", "std.err"}; StdErr = "std.err“; StdOutput = "std.out"; Rank = other.GlueHostBenchmarkSI00 ; Requirements = other.GlueCEStateStatus == "Production";

  15. INGRID 2008, 9th april 2008 CE2=aocegrid.uab.es FreeCPUs = 10 Disk =100 AverageSI = 4000 CE1=zeus.cyf-kr.edu.pl FreeCPUs = 2 Disk =100 AverageSI = 2000 CE CE CE3=bee001.ific.uv.es FreeCPUs = 3 Disk =100 AverageSI = 1000 Cross Broker CE CE5=lngrid02.lip.pt FreeCPUs = 2 Disk =100 AverageSI = 1000 CE CE4= xgrid.icm.edu.pl FreeCPUs = 6 Disk =100 AverageSI = 1000 CE [Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10 MPI enabled CE [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 Non-MPI enabled CE Rank=1000] lngrid02.lip.pt/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 Parallel Job Support [Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10 [Groups with 2 CEs] [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 [Rank=1000] bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 lngrid02.lip.pt:2129/jobmanager-pbs-workq freeCPUs = 2

  16. INGRID 2008, 9th april 2008 CE3=bee001.ific.uv.es FreeCPUs = 3 Disk =100 AverageSI = 1000 CE CE5=lngrid02.lip.pt FreeCPUs = 2 Disk =100 AverageSI = 1000 CE Parallel Job Support Startup server Cross Broker MPI SubTask MPI SubTask 1. Launch a PACX Startup Server 2. Submit MPI Subtasks 3. MPI-START will start each of the Subtasks 4. Subtask notify the startup server and start running 5. CrossBroker monitors the application

  17. INGRID 2008, 9th april 2008 Parallel Job Support CrossBroker search and selects sets of resources for the jobs There is no guarantee that all tasks of the same job will start at the same time 1st choice: select only sites with free resources. The job will run immediately. Unfortunately, free resources are not always available 2nd choice: allocate a resource temporally and wait until all other tasks show up. Timeshare the resource with a backfilling policy to avoid resource iddleness

  18. INGRID 2008, 9th april 2008 Glide In for co-allocation Grid Resource CrossBroker LRMS MPI JOB Scheduling Agent Condor-G

  19. INGRID 2008, 9th april 2008 Glide In for co-allocation Grid Resource CrossBroker LRMS MPI JOB Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G MPI Task Waiting for the rest of tasks

  20. INGRID 2008, 9th april 2008 Glide In for co-allocation Grid Resource CrossBroker JOB LRMS Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G MPI TASK BackFilling While the MPI waits

  21. INGRID 2008, 9th april 2008 Glide In for co-allocation Grid Resource CrossBroker LRMS Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G MPI TASK JOB All tasks Ready!

  22. INGRID 2008, 9th april 2008 Interactive Job Support • Fast startup: • Cache of resources: fast matchmaking • Scheduling priority: use free resources or glideins • Fast notification of events • CrossBroker injects interactive agents that enable communication between user and job • Transparent to the user • Condor Bypass & glogin agents

  23. INGRID 2008, 9th april 2008 Interactive Job Support Changes in JDL INTERACTIVE: true/false. Indicates that the job is interactive and the broker should treat it with higher proirity INTERACTIVEAGENT INTERACTIVEAGENTARGUMENTS These attributes specify the command (and its arguments) used to communicate with the user.

  24. INGRID 2008, 9th april 2008 Interactive MPI application Type = "Job"; VirtualOrganisation = "imain"; JobType = "Parallel"; SubJobType = “openmpi"; NodeNumber = 4; Interactive = TRUE; InteractiveAgent = “glogin“; InteractiveAgentArguments = “-r –p 195.168.105.65:23433“; Executable = "test-app"; InputSandbox = {"test-app", "inputfile"}; OutputSanbox = {"std.out", "std.err"}; StdErr = "std.err“; StdOutput = "std.out"; Rank = other.GlueHostBenchmarkSI00 ; Requirements = other.GlueCEStateStatus == "Production";

  25. INGRID 2008, 9th april 2008 Interactive MPI application Started by the CrossBroker User’s Machine Remote Resource Master glogin Video Stream MPI Worker Worker Worker Started with mpi-start

  26. INGRID 2008, 9th april 2008 Glide In for interactive jobs Grid Resource CrossBroker INT. JOB LRMS Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G BATCH

  27. INGRID 2008, 9th april 2008 Glide In for interactive jobs Grid Resource CrossBroker INT. JOB LRMS Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G BATCH BATCH Priority adjustment Startup-time Reduction Only one layer involved

  28. INGRID 2008, 9th april 2008 Conclusions & Future work • CrossBroker gives support to Parallel and Interactive jobs • Automatically • Interoperable with EGEE • Glide In • Fast startup of jobs • Co-allocation without reservation or wasting resources • Future work: • Explore more complex multiprogramming (e.g. 3 or more VM) • Decentralization of the services

  29. Enol Fernández UAB Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments

More Related