1 / 36

Virtual Hierarchies to Support Server Consolidation

Virtual Hierarchies to Support Server Consolidation. Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007. 64-core CMP. L2 Cache. Core. L1. Motivation: Server Consolidation. www server. database server #2. database server #1. middleware server #1. middleware server #1.

hopes
Download Presentation

Virtual Hierarchies to Support Server Consolidation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtual Hierarchies to Support Server Consolidation Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007

  2. 64-core CMP L2 Cache Core L1 Motivation: Server Consolidation www server database server #2 database server #1 middleware server #1 middleware server #1

  3. 64-core CMP Motivation: Server Consolidation www server database server #2 database server #1 middleware server #1 middleware server #1

  4. 64-core CMP Motivation: Server Consolidation www server database server #2 data database server #1 middleware server #1 data middleware server #1 Optimize Performance

  5. 64-core CMP Motivation: Server Consolidation www server database server #2 database server #1 middleware server #1 middleware server #1 Isolate Performance

  6. Motivation: Server Consolidation www server 64-core CMP database server #2 database server #1 middleware server #1 middleware server #1 Dynamic Partitioning

  7. 64-core CMP Motivation: Server Consolidation www server database server #2 data database server #1 VMWare’s Content-based Page Sharing  Up to 60% reduced memory middleware server #1 middleware server #1 Inter-VM Sharing

  8. Executive Summary • Motivation: Server Consolidation • Many-core CMPs increase opportunities • Goals of Memory System: • Performance • Performance Isolation between VMs • Dynamic Partitioning (VM Reassignment) • Support Inter-VM Sharing • Hypervisor/OS Simplicity • Proposed Solution: Virtual Hierarchy • Overlay 2-level hierarchy on physically-flat CMP • Harmonize with VM Assignment

  9. Outline • Motivation • Server consolidation • Memory system goals • Non-hierarchical approaches • Virtual Hierarchies • Evaluation • Conclusion

  10. duplicate tag directory TAG-DIRECTORY A Read A

  11. duplicate tag directory TAG-DIRECTORY A A Read A fwd 3 data 1 2 getM A duplicate tag directory

  12. STATIC-BANK-DIRECTORY A Read A data 3 1 fwd 2 getM A A

  13. STATIC-BANK-DIRECTORY with hypervisor-managed cache 2 A fwd getM A A Read A 1 data 3

  14. Optimize Performance Isolate Performance Allow Dynamic Partitioning Support Inter-VM Sharing Hypervisor/OS Simplicity Goals STATIC-BANK-DIRECTORY w/ hypervisor-managed cache STATIC-BANK-DIRECTORY TAG-DIRECTORY • No • No • Yes • Yes • Yes • Yes • Yes • ? • Yes • No • No • No • Yes • Yes • Yes

  15. Outline • Motivation • Virtual Hierarchies • Evaluation • Conclusion

  16. Virtual Hierarchies • Key Idea: Overlay 2-level Coherence Hierarchy on CMP • - First level harmonizes with VM/Workload • - Second level allows inter-VM sharing, migration, reconfig

  17. VH: First-Level Protocol • Intra-VM Directory Protocol w/ interleaved directories • Questions: • How to name directories? • How to name sharers? • Dynamichome tile selected by VM Config Table • Hardware VM Config Table at each tile • Set by hypervisor during scheduling • Full bit-vector to track any possible sharer • Intra-VM broadcast also possible getM INV

  18. L2 Cache Core L1 p12 p13 p14 0 p12 1 p13 2 p14 Address 3 p12 ……000101 offset 4 p13 Home Tile: p14 5 p14 6 63 p12 VH: First-Level Protocol • Example: • Hypervisor/OS can freely change VM Config Table • No cache flushes • No atomic updates • No explicit movement of directory state per-Tile VM Config Table Dynamic

  19. Virtual Hierarchies Two Solutions for Global Coherence: VHA and VHB memory controller(s)

  20. Protocol VHA • Directory as Second-level Protocol • Any tile can act as first-level directory • How to track and name first-level directories? • Full bit-vector of sharers to name any tile • State stored in DRAM • Possibly cache on-chip • + Maximum flexibility • - DRAM State • - Complexity

  21. VHA Example 2 A getM A 6 data getM A 1 data 3 5 directory/memory controller Fwd A 4 Fwd data A

  22. Protocol VHB • Broadcast as Second-level Protocol • Attach token count for each block [token coherence] • T tokens for each block. One token to read, all to write • Allows 1-bit at memory per block • Eliminates system-wide ACK • + Minimal DRAM State • + Enables easier optimizations • - Global coherence requires more activity

  23. VHB Example 2 A getM A getM A 1 memory controller global getM A 3 Data+tokens 4 A

  24. Optimize Performance Isolate Performance Allow Dynamic Partitioning Support Inter-VM Sharing Hypervisor/OS Simplicity Memory System Goals VHA and VHB • Yes • Yes • Yes • Yes • Yes

  25. Outline • Motivation • Virtual Hierarchies • Evaluation • Conclusion

  26. Evaluation: Methods • Wisconsin GEMS • Full-system, execution-driven simulation • Based on Virtutech Simics • http://www.cs.wisc.edu/gems • 64-core tiled CMP • In-order SPARC cores • 512 KB, 16-way L2 cache per tile • 2D mesh interconnect, 16-byte links, 5-cycle link latency • Four on-chip memory controllers, 275-cycle DRAM latency

  27. Evaluation: Methods • Workloads: OLTP, SpecJBB, Apache, Zeus • Separate instance of Solaris for each VM • Approximating Virtualization • Multiple Simics checkpoints interleaved onto CMP • Assume workloads map to adjacent cores • Bottomline: No hypervisor simulated

  28. Evaluation: Protocols • TAG-DIRECTORY: • 3-cycle central tag directory (1024 ways!) • STATIC-BANK-DIRECTORY • Home tiles interleaved by frame address • VHA • All data allocates in L2 bank of dynamic home tile • VHB • Unshared data always allocates in local L2 • All Protocols: one L2 copy of block on CMP

  29. Result: Runtime Eight VMs x Eight Cores Each = 64 Cores (e.g. eight instances of Apache) 1.4 1.2 1 0.8 Normalized Runtime 0.6 0.4 0.2 0 VHA VHB VHA VHB VHA VHB VHA VHB TAG-DIR TAG-DIR TAG-DIR TAG-DIR STATIC-BANK-DIR STATIC-BANK-DIR STATIC-BANK-DIR STATIC-BANK-DIR Zeus SpecJBB Apache OLTP

  30. Result: Memory Stall Cycles Eight VMs x Eight Cores Each = 64 cores (e.g. eight instances of Apache) VHA VHB VHA VHB VHA VHB VHA VHB TAG-DIR TAG-DIR TAG-DIR TAG-DIR STATIC-BANK-DIR STATIC-BANK-DIR STATIC-BANK-DIR STATIC-BANK-DIR SpecJBB OLTP Apache Zeus

  31. Executive Summary • Server Consolidation an Emerging Workload • Goals of Memory System: • Performance • Performance Isolation between VMs • Dynamic Partitioning (VM Reassignment) • Support Inter-VM Sharing • Hypervisor/OS Simplicity • Proposed Solution: Virtual Hierarchy • Overlay 2-level hierarchy on physically-flat CMP • Harmonize with Workload Assignment

  32. Backup Slides

  33. Omitting 2nd-Level Coherence: Protocol VH0 • Impacts: • Dynamic Partitioning • Inter-VM Sharing (VMWare’s Content-based Page Sharing) • Hypervisor/OS complexity • Example: Steps for VM Migration from Tiles {M} to {N} • Stop all threads on {M} • Flush {M} caches • Update {N} VM Config Tables • Start threads on {N}

  34. Omitting 2nd-Level Coherence: Protocol VH0 • Example: Inter-VM Content-based Page Sharing • Up to 60% reduced memory demand • Is read-only sharing possible with VH0? • VMWare’s Implementation: • Global hash table to store hashes of pages • Guest pages scanned by VMM, hashes computed • Full comparison of pages on hash match • Potential VH0 Implementation: • How does hypervisor scan guest pages? Are they modified in cache? • Even read-only pages must initially be written at some point

  35. P P P P P P P P Shared L2 Shared L2 L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ Shared L2 Shared L2 L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ P P P P P P P P Physical Hierarchy / Clusters

  36. P P P P P P P P L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ L1 $ P P P P P P P P Physical Hierarchy / Clusters middleware server #1 www server Shared L2 Shared L2 database server #1 Shared L2 Shared L2

More Related