Region scheduling

Region scheduling A cache-aware scheduler for CMP environment

Abstract • Last level cache become more performance-critical • HW approach • Intel's smart cache • NUCA (non-uniform cache architecture) • SW approach • Need manage cache and schedule better • Hypervisor’s ability to control memory access (Region)

Abstract Current working set A B C D • Working set • How big? What consists of it? How much sharing? • ‘Region’ is used to capture these working set Task A Over utilized! Less utilized! Bad  Task B Task C A C B D Task D Fully utilized Fully utilized Good 

Region P1 R1(2) P2 P3 R2(2) P4 • Physical memory is partitioned into regions • Regions are allocated to each cache • Core can access only the allowed regions • Page table enforcement • Private/Shared region • We focus on private region P5 R3(2) P6 P7 R4(2) P8 P9 P10 R5(5) P11 P12 P13 P14 R6(4) P15 P16 P17 R7(2) P18 P19 ...... Private Region (Size) Shared Region (Size) Physical page

Region • Regions are implemented in Xen • Transparent to the guest • Guest’s memory accesses are controlled • Page table enforcement • Caches are effectively managed by Xen • Regions can dynamically changes by the guest’s behavior • Application’s phase change

Region • Page tables are managed to provide ‘page touch’ • Page touch is generated when VCPU illegally access non-allocated region • Page touch invokes microscheduling

Region • Regions are allocated to cache • VCPU may need run on other core to access it • microscheduling

Example1 (Single VCPU) • Region ID 0x1884f is allocated to Cache0 • Can be accessed only from PCPU0,2 • Region ID 0x12b44 is allocated to Cache1 • Can be accessed only from PCPU1,3 • When VCPU want to access certain region, it may need run on other cores Microscheduling VCPU : Region ID

Example2 (multiple VCPUs) Time • Cache0 (PCPU0,2) has 0x13d83, 0xd7d4, 0xf3a6, so on… • Cache1 (PCPU1,3) has 0xb638, 0xcd4b, so on…

Example2 (multiple VCPUs) VCPU0 0xf3a6 0xb638 Cache0 (PCPU0&2) 0x13d83 0xf3a6 0x3d92e 0x1909b 0xd7d4 0x18d38 • Each regions are allocated to caches • E.g. VCPU0 is microscheduled to PCPU1 or 3 to access the region 0xb638 VCPU1 0xfcad 0xcd4b VCPU2 0x13d83 0x1225e Cache1 (PCPU1&3) 0x1225e 0xb638 0xfcad 0xcd4b VCPU3 0xb638 0xf3a6 0x3d92e 0x1909b 0xd7d4

Initial result • Simple experiments shows good result • Run multiple copies of SPEC2006 libquantum • Over 40% CPU time reduction • But need more experiments... • Data is on the way

Conclusion • SW approach to manage caches • Transparent to the guests • Memory access are controlled by the hypervisor

Region scheduling

Region scheduling

Presentation Transcript

Scheduling Introduction to Scheduling

Scheduling

Scheduling

Scheduling

Scheduling

Scheduling

Scheduling

SCHEDULING

Scheduling

Scheduling

Scheduling

Scheduling

Scheduling

Scheduling

Scheduling

Scheduling

Scheduling

Scheduling

Scheduling

Scheduling