1 / 34

Decentralized Resource Management for Multi-core Desktop Grids

Decentralized Resource Management for Multi-core Desktop Grids. Jaehwan Lee , Pete Keleher, Alan Sussman Department of Computer Science University of Maryland. Multi-core is not enough. Multi-core CPU is the current trend of desktop computing

zody
Download Presentation

Decentralized Resource Management for Multi-core Desktop Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decentralized Resource Management for Multi-core Desktop Grids Jaehwan Lee, Pete Keleher, Alan Sussman Department of Computer Science University of Maryland

  2. Multi-core is not enough • Multi-core CPU is the current trend of desktop computing • Not easy to exploit Multi-core in a single machine for high throughput computing • “Multicore is Bad news for Supercomputers”, S. Moore, IEEE Spectrum, 2008 • No decentralized solution for Multi-core Grids

  3. Challenges in Multi-core P2P Grids • Feature of Structured P2P Grids • For effective matchmaking, structured P2P platform based on Distributed Hash Table (DHT) is needed • Structured DHT is susceptible to frequent dynamic update for node’s status • How to represent a multi-core node in P2P structure ? • If a distinct logical peer represent s a core, • Cannot support multi-thread jobs • Cannot accommodate jobs requiring large shared resources • If a logical peer represents a multi-core machine, • Contention for shared resources among each cores • Can waste some of cores due to misled matchmaking • Needs to advertise dynamic status for residual resource • Contention for Shared Resources • No simple model for p2p grid

  4. Our Contributions • Decentralized Resource Management Schemes for Multi-core Grids • Two logical nodes for a physical machine • Dual-CAN & Balloon Model for p2p structure • New Matchmaking & Load Balancing Scheme • Simple Analytic Model for a Multi-core Node • Contention for shared resources

  5. Outline • Background • Decentralized Resource Management for Multi-core Grids • Simulation Model • Experimental Results • Related work • Conclusion & Future Work

  6. P2P Desktop Grid System CPU Job J CPU 2.0GHz Mem  500MB Disk  1GB P2P Network Disk Memory Decentralized Matchmaking and Load balancing? Content-Addressable Network (CAN)

  7. Owner Node Initiate Matchmaking Heartbeat Route Job J Send Job J Find Heartbeat Insert Job J Run Node Injection Node J FIFO Job Queue Overall System Architecture • P2P grids Job J Peer-to-Peer Network (DHT - CAN) Clients Assign GUID to Job J

  8. Matchmaking Mechanism in CAN Memory A D G Run Node J Pushing Job J FIFO Queue B E H Client Heartbeat Job J CPU >= CJ && Memory >= MJ Job J Insert J C F I Owner MJ CJ CPU

  9. Outline • Background • Decentralized Resource Management for Multi-core Grids • Simulation Model • Experimental Results • Related work • Conclusion & Future Work

  10. Job 3 Job 1 Job 2 Job 4 CPU: 1.2GHz Mem:0.7GB CPU: 1.5GHz Mem:1.2GB CPU: 1.5GHz Mem:0.2GB CPU: 2GHz Mem:1.5GB Quad-core node CPU: 2GHz Mem: 4GB # of CPUs: 4 CPU: 2GHz Mem: 4GB # of CPUs: 4 CPU: 2GHz Mem: 4GB # of CPUs: 4 CPU: 2GHz Mem: 4GB # of CPUs: 4 CPU: 2GHz Mem: 4GB # free CPUs: 4 Free Node CPU: 2GHz Mem: 2.1GB # free CPUs: 2 CPU: 2GHz Mem: 3.3GB # free CPUs: 3 CPU: 2GHz Mem: 0.6GB # free CPUs: 1 Max-node Residue-node Two logical nodes • Max-node: the maximum values for each resource • A static point in the CAN (like a single-core case) • Residue-node: the current available resources • Dynamic usage status for the node • Always a freenode • If a node is free (has no job in the queue) or totally busy (all cores are running jobs), Residue-node does not exist : Residue-node is much fewer than Max-node

  11. Job 1 CPU: 2GHz Mem:2GB Dual-CAN Memory B’ Free Residue- node 1GB • Primary CAN : composed of Max-nodes • The same as original CAN in single-core case (Static) • Secondary CAN : composed of Residue-nodes • Fewer nodes in Secondary CAN (dynamic) • Example • a single-core node A (1.5GHz,2GB) and a dual-core node B (2GHz,3GB) 2GHz CPU Memory Primary CAN Secondary CAN B A 3GB 2GB Qlen=1 Max-node 2GHz 1.5GHz CPU

  12. Job 1 CPU: 2GHz Mem:2GB Balloon Model • Balloons : light-weight structure for residue-nodes • Only keeping the coordinates (current available resource) and load information • Attached to a zone in the (Primary) CAN • No CAN join & leave and exchanging updates is necessary • Example • a single-core node A (1.5GHz,2GB) and a dual-core node B (2GHz,3GB) Free Residue-node B’ 1.5GHz 2GHz CPU 1GB B A Qlen=1 2GB Max-node 3GB Memory

  13. Computing Aggregated Load Memory Dimension Node A Node B Aggregation of load information (Number of Nodes, Balloons & Cores, Sum of used Cores) Node C Node D Node E CPU Dimension

  14. Decision Algorithms - Pushing • Target Node (Where?) • Smaller aggregated average core utilization and Larger available number of cores • Stopping Criteria (When?) • Found a free node • Probabilistic stopping • Criteria for the Best Run Node (Which?) • Among the free nodes : the node with the fastest CPU • If no free nodes: the fastest Balloon or a node in Secondary CAN • Using a score function : prefer less core-utilization and faster CPU

  15. Pushing : Dual-CAN Memory Dimension Memory R Aggregating load information Job J C D Run Node S MJ Stop! Client B Secondary CAN CJ CPU Job J CPU  CJ Mem  MJ MJ Aggregating load information A O Owner Job J Primary CAN CJ CPU Dimension

  16. Pushing : Balloon Model Memory Dimension Aggregating load information Job J D S Run Node (Balloon) Stop! Client C Job J CPU  CJ Mem  MJ MJ Aggregating load information A B O Owner Job J CJ CPU Dimension

  17. Model comparison

  18. Outline • Background • Decentralized Resource Management for Multi-core Grids • Simulation Model • Experimental Results • Related work • Conclusion & Future Work

  19. Contention for shared resources (the worst case) • Contention for shared resources (memory, I/O) can worsen overall performance in multi-core CPU • If the jobs are extremely memory intensive, the performance can drop drastically. • What is STREAM ? • The benchmark test to measure memory bandwidth • Generate extremely memory-intensive jobs (copy, scale, add and triad) • Experiments • (1) Run a memory-intensive job (STREAM) on a dual-core CPU (leave other core idle) • (2) Run two memory intensive jobs on a dual-core CPU simultaneously • Compare running times of (1) & (2) • On average, running time of (2) is longer than that of (1) by 2.09 times

  20. Effect of Contention with general scientific computing jobs • Alam et al’s experiment • Run several benchmark test (NAS,AMBER,LAMMPS,POP) for scientific computing on a dual-core machine • Compare running a task on a dual-core and running two tasks on a dual-core • Running time for two tasks is higher by 3.8% to 27% (average : 10.97%) • SPEC CPU2006 Experiment • Do the same experiment with SPEC CPU2006 benchmark test on a dual-core machine • Running increment is 6% (with gcc compiler) and 10% (with icc compiler) on average

  21. Our Simulation Model • Assumption • The job requiring more memory is likely more memory-intensive. • For the worst case • running time can increase by n times (n: the number of cores) • For the general case • running time can increase by p% (p=10 from previous experiments) Running time ratio α n xn=Ω 1 + p 1 Ri Ci α: running time increment n: the number of cores p : contention penalty from experiment result Ri : amount of Resource i in the node Ci : sum of job requirement for resource i Ω: contention penalty

  22. Outline • Background • Decentralized Resource Management for Multi-core Grids • Simulation Model • Experimental Results • Related work • Conclusion & Future Work

  23. Experimental Setup • Event-driven Simulations • A set of nodes and events • 1000 initial nodes and 5000 job submissions • Jobs are submitted with a Poisson distribution • A node has 1,2,4 or 8 cores. • Job run time follows uniform distribution (30mins~90mins) • Node Capability (Job Requirements) • CPU, Memory, Disk and the number of cores • Steady stateexperiments

  24. Experimental Setup • Performance metrics • Matchmaking Frameworks • CAN (Dual-CAN & Balloon Model) • Multiple Peers (MP) and Centralized Matchmaker (CENT) Matchmaking Cost Wait Time Running Time Job Turn-around Time Injected into the system Arrives at Owner Arrives at Run Node Starts Execution Finishes Execution

  25. Comparison Models • CentralizedMatchmaker(CENT) • Online and global scheduling mechanism • Not feasible in a complete implementation of P2P system • Multiple Peers(MP) • An individual peer on each core with equally divided shared resources • Current Condor’s strategy Job J CPU 2.0GHz Mem  500MB Disk  1GB Centralized Matchmaker

  26. Result (1) – Completeness • Single thread jobs • Dual-CAN & Balloon model can run all jobs • MP: 80% completeness • For fair comparison • Submit jobs which can be run on MP to Balloon or Dual-CAN (Balloon-L, Dual-CAN-L) • Balloon-L ,Dual-CAN-L,MP shows similar performance • MPcannot meet completeness

  27. Cost (1) - Overheads • Cost: # of messages, volume of messages • MP is higher than the two schemes • Cost is proportional to the number of peers • Balloon model is cheaper than Dual-CAN

  28. Result (2) – Load Balance • Multi-thread jobs • Load balancing Performance • Dual-CAN > CENT > Balloon model • Why CENT is worse? • CENT is based on a Greedy algorithm (over-provisioning)

  29. Cost (2) - Overhead • Vanilla : The cost without additional costs incurred by Balloon or Dual-CAN • Costs • Dual-CAN > Balloon model == Vanilla

  30. Evaluation Summary • Performance • Completeness : Dual-CAN, Balloon (MPcannot) • Load Balance : Dual-CAN >= Balloon == CENT (competitive load balance) • Overheads • MP >> Dual-CAN >= Balloon (the number of peers) • Dual-CAN >= Balloon == Vanilla (Low overhead)

  31. Related Work • Time-to-Live (TTL) based mechanisms • Caromel et al. (Parallel Computing, 2007), Mastroianni et al (EGC, 2005). • Lack of Completeness • Encoding resource information using DHT • Cheema et al. (Grid, 2005), CompuP2P (TPDS, 2006) • Lack of Load balance and Parsimony • Grids for Multi-core Desktop • Condor : static partitioning to handle a multi-core node as a set of independent entities

  32. Conclusion and Future Work • New decentralized Resource management for Multi-core P2P Grids • Two logical nodes for static & dynamic feature • Dual-CAN and Balloon Model • Simple analytic model for multi-core simulation considering resource contention • Evaluation via simulation • Completeness (better than Multiple Peers) • Load-balance (competitive with Centralized Matchmaker) • Low overhead • Future work • Real experiment (co-operation with Astronomy Dept.) • Resource Management for Heterogeneous Multi-processors

  33. Decentralized Resource Management for Multi-core Desktop Grids Jaehwan Lee, Pete Keleher, Alan Sussman Department of Computer Science University of Maryland

  34. Aggregated load information along the dimension d Objective function to minimize Stopping Factor Target Dimension Probability to stop pushing from node N Score function for a candidate run node C Decision Functions • Target Node (Where?) • Stopping Criteria (When?) • Found a free node OR • Criteria for the Best Run Node (Which?) • Among the free nodes OR

More Related