Towards Multi-Tenant Performance SLOs

Towards Multi-Tenant Performance SLOs Willis Lang*, Srinath Shankar+, Jignesh M. Patel*, Ajay Kalhan^ *University of Wisconsin-Madison +Microsoft Gray Systems Lab ^Microsoft Corp. To appear in ICDE 2012

Overall Operating Costs of Providing Cloud Services are High • Dominating costs are server and power costs: 57% and 31% respectively Networking8% Networking$260,039 Infrastructure4% Infrastructure$130,019 Servers $1,852,778 Power $1,007,651 Server & Power 88%

Performance Service Level Objectives and Managing Cloud Costs • Tenants can get their own server and high performance • Tenants have performance objectives • Consolidate tenants onto the fewest number of servers (maximize the degree of multi-tenancy) while maintaining perf objectives Performance per Tenant Data Center Costs

Given: Groups of tenants with different performance objectives and a number of server configurations An Optimization Problem High Perf Low Perf Find: Tenant Scheduling Policies and (2) Hardware Provisioning Policies Such that costs are minimized and performance is delivered

Multi-Tenant Scheduling • Perf Objective – TPC-C throughput • H tenants– 100tps • L tenants– 10tps • Want to maximize degree of multi-tenancy without breaking SLO • What if we also have different server types available? H Tenants L Tenants 20 40 #H: 1 5 #L: 20 1 Avg H Perf Avg L Perf tps ea. 2000 900 130 2000 110 tps ea. 30

Hardware Setup • 2 x Intel Nehalem L5630 • 32GB DDR3 • RAID battery-backed cache • 1 x 10k RPM SAS – OS/software + • “diskC” - $4000 ($111 per month) • Data: 2 x 10k RPM SAS 300GB • Log: 1 x 10k RPM SAS 300GB • “ssdC” - $4500 ($125 per month) • Data: 2 x Crucial C300 256GB • Log: 1 x Crucial C300 256GB

Software Setup • SQL Server 2012 • All tenants of the ‘H’ performance class get an individual database within a SQL Server instance • Databases in SQL Server have their own physical files for data and log • All tenants of the ‘L’ performance class get an individual database within a different SQL Server instance • SQL Server instance memory provisioning to control performance (not VM)

Heterogeneous SLO Characterization • Benchmark server to find max degree multi-tenancy for perf objectives • Systematically reduce ‘H’ tenants, steadily increase ‘L’ tenant scheduling until a perf objective fails • Server characterizing function: • Both perf objectives met • Some perf objective fails diskC ssdC

Scenario: 10,000 tenants, 2,000x100tps & 8,000x10tps • Optimal Solution: • 94 ssdC servers, 38 10tps tenants and 20 100tps tenants • + 5 diskC servers, 25 10tps tenants and 20 100tps tenants • + 43 ssdC servers, 100 10tps tenants Applying Our Optimization Framework 38

Applying Our Optimization Framework ssdC – 100tps tenants diskC – 10tps tenants

Summary • We have presented an optimization framework that tells a Database-as-a-Service provider how to provide performance Service Level Objectives while minimizing cluster infrastructure costs

Thesis Research An optimization framework to determine the optimal tenant scheduling and server provisioning in light of tenant performance goals [ICDE 2012] Complex parallel analytic workloads cause non-linear speedup and force low-power server clusters to be much larger and more expensive than traditional clusters [DaMoN 2010 Best Paper] Parallel data processing bottlenecks such as network bandwith and algorithmic choices are a cause of energy inefficiency [Under Submission] ICDE 12 DaMoN 10, Under Submission Computational complexity of MR jobs affects the ability to save energy by using smaller clusters [VLDB 2010] By exploiting existing replication schemes, an elegant relationship between load balancing and energy efficiency can be exploited [SIGMOD Record 2009] Demonstrated that it is possible to decrease energy and performance in a controlled way using hardware mechanisms (e.g., CPU frequency/voltage and memory parking) and algorithmic choices [CIDR 2009, IEEE DEB 2011] Performance Data Center Costs VLDB 10, SIGMOD Rec 09 CIDR 09, IEEE DEB 11

Acknowledgements • Special thanks to David DeWitt, Jeff Naughton, Alan Halverson, Eric Robinson, RimmaNehme, DimitrisTsirogiannis, Nikhil Teletia, Chris Ré • Funded by a grant from Microsoft Gray Systems Lab ICDE 12 DaMoN 10, Under Submission VLDB 10, SIGMOD Rec 09 CIDR 09, IEEE DEB 11

Memory-based resource governor • E.g., 2 performance goals, 100tps and 10tps • 20 tenants pay for 100tps and 30 tenants pay for 10tps • The aggregate memory for all 100tps tenants: • Similarly, for 10tps tenants:

Simplicity vs Cost diskC cost -10% vsssdC None of these heuristic methods consistently provides solutions near to the optimal method. diskC cost -30% vsssdC

Log Disk Bottlenecks

Towards Multi-Tenant Performance SLOs