1 / 81

Memory restriction, limits and heterogeneous grids. A case study.

Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia. Or an example of how to adapt your policies to your needs. DISCLAIMER

alena
Download Presentation

Memory restriction, limits and heterogeneous grids. A case study.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory restriction, limits and heterogeneous grids. A case study. • Txema Heredia Or an example of how to adapt your policies to your needs

  2. DISCLAIMER What I am going to present is not either the panacea nor has to adapt to nor solve immediately your cluster issues. This is just a brief description of the problems we faced and how did we use different SGE’s options to handle them. Also, no animal was harmed in the making of this powerpoint.

  3. Our story

  4. “hey, let’s buy a cluster” - my boss

  5. What did we need?

  6. What did we need? • Users: • biologists, not programmers • Processes: • user-made scripts • single core biological software

  7. What did we NOT need? • Nopes: • threads / parallel programming (mostly) • GPUs • Ayes: • thousands of single-core jobs

  8. And thus, our baby was born

  9. Our cluster • 8 computing nodes • 8 cores • 8 Gb RAM • 1 front-end

  10. Our cluster • NFS • Rocks cluster (CentOS) • SGE

  11. First steps with SGE

  12. First steps with SGE • 1st try: • One queue to rule them all

  13. First steps with SGE • 1st try: • all.q queue • free for all

  14. First steps with SGE • 1st try - conclusions: • chaos reigned • constant conflicts between users (specially time related) • FIFO queuing • swapping

  15. 2nd try • 2nd try • round-robin-like scheduling • share tree/functional tickets • split cluster by time usage: • 3 queues: fast / medium / slow

  16. 2nd try • fast: • 2 hours / 2 nodes • medium: • 48 hours / 3 nodes • slow: • ∞ hours / 3 nodes

  17. 2nd try • Conclusions: • ↓ chaos • ↓ user conflicts • Still swapping • High undersubscription of the cluster

  18. 2nd try • 3 types of jobs • Don’t need to coexist at the same time • 1 user → 1 type of job • User knowledge • Saturation of the unlimited queue

  19. 2nd try • Queue tinkering: • wallclock time • number of hosts • Better results, but not good enough: • Waiting jobs & idle nodes

  20. 2nd try • There are 2 wars here: • memory / swap • splitting leads to undersubscription

  21. The memory war

  22. Memory • Buy more memory • from 8x8Gb • to 4x 32Gb, 3x 16Gb, 1x 8Gb • This reduces our problem, but doesn’t fix it

  23. Swap • Swapping in a cluster is the root of all evil

  24. Swap • Complex attribute “h_vmem”

  25. h_core h_fsize h_rss h_stack h_rt ≠ h_cpu h_data = h_vmem

  26. h_vmem • h_vmem • SIGKILL • s_vmem • SIGXCPU • You can combine both

  27. h_vmem • Requestable by default • We want them to be consumable • qmon / qconf -mc

  28. h_vmem

  29. h_vmem • requestable = YES • consumable = YES / JOB • default = whatever you want

  30. h_vmem • Only for parallel environment jobs: • consumable = YES • sge_shepherd memory = h_vmem*slots • consumble = JOB • sge_shepherd memory = h_vmem

  31. h_vmem • default = 100M • “everything” dies • default = 6G • “everything” works

  32. h_vmem • Now we can limit the memory • But we can still have swapping

  33. h_vmem • Define h_vmem in each host • qmon / qconf -me hostname

  34. h_vmem • Exact memory: • more secure • Bigger memory: • more margin

  35. Memory • From now on, any job submission must contain a memory request: • qsub ... -l h_vmem=3G...

  36. No more swapping!!

  37. Undersubscription

  38. Undersubscription • Dual restriction: • 8 jobs/slots per node • 32 / 16 / 8 GB mem per node • The minimum of both will apply

  39. 8 Gb node 32 Gb node

  40. 8Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 8 Gb node 7 slots free 0 Gb free Stupid scheduling 0 slots free 24 Gb free 32 Gb node

  41. 8Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 1Gb 8 Gb node 0 slots free 0 Gb free Smart scheduling 7 slots free 24 Gb free 32 Gb node

  42. Smart scheduling • We want each job to go to the node where it better fits.

  43. (another) DISCLAIMER This is strictly for our case and needs. It may appeal to you, or some ideas can inspire you, but it is not intended to be a step-by-step solution for everyone. It is just an example of “things that can be done”.

  44. Smart scheduling • Create 3 hostgroups: • @32G, @16G and @8G • Group nodes by memory

  45. Smart scheduling • Maximize the ratio memory/core: • job <1Gb → 8Gb nodes • 1Gb < job < 2Gb → 16Gb nodes • 2Gb < job → 32Gb nodes

  46. Smart scheduling • 3 different queues: • all-32 • all-16 • all-8 • assign the corresponding hostgroup

More Related