1 / 38

Resource-Freeing Attacks: Improve Your Cloud Performance (at Your Neighbor's Expense)

Resource-Freeing Attacks: Improve Your Cloud Performance (at Your Neighbor's Expense). (Venkat)anathan Varadarajan, Thawan Kooburat, Benjamin Farley, Thomas Ristenpart, and Michael Swift. Department of Computer Sciences. Public Clouds (EC2, Azure, Rackspace, …). Multi-tenancy

iona
Download Presentation

Resource-Freeing Attacks: Improve Your Cloud Performance (at Your Neighbor's Expense)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resource-Freeing Attacks:Improve Your Cloud Performance(at Your Neighbor's Expense) (Venkat)anathan Varadarajan, Thawan Kooburat, Benjamin Farley, Thomas Ristenpart, and Michael Swift Department of Computer Sciences

  2. Public Clouds (EC2, Azure, Rackspace, …) Multi-tenancy Different customers’ virtual machines (VMs) share same server VM VM VM VM VM VM VM Why multi-tenancy? Improved resource utilization

  3. Implications of Multi-tenancy • VMs share many resources • CPU, cache, memory, disk, network, etc. • Virtual Machine Managers (VMM) • Goal: Provide Isolation • Deployed VMMs don’t perfectly isolate VMs • Side-channels [Ristenpart et al. ’09, Zhang et al. ’12] VM VMM VM • Today: Performance degraded by other customers

  4. Contention in Xen • 3x-6x Performance loss  Higher cost VM Work-conserving scheduling VM Non-work-conserving CPU scheduling

  5. What can a tenant do? Ask provider for better isolation … requires overhaul of the cloud VM Pack up VM and move (See our SOCC 2012 paper) … but, not all workloads cheap to move VM This work: Greedy customer can recover performance by interfering with other tenants Resource-Freeing Attack

  6. Resource-freeing attacks (RFAs) • What is an RFA? • RFA case studies • Two highly loaded web server VMs • Last Level Cache (LLC) bound VM andhighly loaded webserver VM • Demonstration on Amazon EC2

  7. The Setting Victim: • One or more VMs • Public interface (eg, http) Beneficiary: • VM whose performance we want to improve Helper: • Mounts the attack Beneficiary and victim fighting over a target resource Victim VM VM Beneficiary Helper

  8. Example: Network Contention • Beneficiary&Victim • Apache webservers hosting static and dynamic (CGI) web pages. • Target Resource: Network Bandwidth • Work-conserving scheduler • network bandwidth Clients Beneficiary Victim Local Xen Test bed Net What can you do?

  9. Ways to Reduce Contention? Break into victim VM and disable it Helper Clients The good: frees up resources used by victim Beneficiary Victim • But: • Requires knowledge of vulnerability • Drastic • Easy to detect Local Xen Test bed Net

  10. Ways to Reduce Contention? Do a simple DoS attack? This may NOT free up target resources Beneficiary Victim Clients Local Xen Test bed Net Backfires: May increase the contention SYN flood Helper

  11. Recipe for a Successful RFA Shift resource away from the target resource towards the bottleneck resource CPU intensive dynamic pages Proportion of CPU usage Limits Shift resource usage via public interface Push towards CPU bottleneck Static pages Proportion of Network usage Reduce target resource usage

  12. An RFA in Our Example Result in our testbed: Increases beneficiary’s share of bandwidth No RFA: 1800 page requests/sec W/ RFA: 3026 page requests/sec CPU Utilization Clients Net CGI Request 50% 85%share of bandwidth Helper

  13. Resource-freeing attacks 1) Send targeted requests to victim 2) Shift resources use from target to a bottleneck Shared CPU Cache: • Ubiquitous: Almost all workloads need cache • Hardware controlled: Not easily isolated via software • Performance Sensitive: High performance cost! Can we mount RFAs when target resource is CPU cache?

  14. Cache Contention RFA Goal

  15. Case Study: Cache vs. Network • Victim: Apache webserver hosting static and dynamic (CGI) web pages • Beneficiary: Synthetic cache bound workload (LLCProbe) • Target Resource: Cache • No cache isolation: • ~3x slower when sharing cache with webserver Clients Beneficiary Victim $$$ Local Xen Test bed Core Core Net Cache

  16. Cache vs. Network Victim webserver frequently interrupts, pollutes the cache • Reason: Xen gives higher priority to VM consuming less CPU time $$$ Clients Core Core Net Cache Beneficiary starts to run decreased cache efficiency cache state Cache state time line Heavily loaded web server Webserver receives a request

  17. Cache vs. Network w/ RFA RFA helps in two ways: • Webserver loses its priority. • Reducing the capacity of webserver. $$$ Clients Core Core Net Cache Beneficiary starts to run CGI Request cache state Cache state time line Heavily loaded web server Webserver receives a request Helper Heavily loaded webserver requests under RFA

  18. RFA: Performance Improvement 60% Performance Improvement RFA intensities – time in msper second 196% slowdown 86% slowdown

  19. RFA Effect on Interruptions Beneficiary: LLCProbe 40% 85% + x

  20. RFA Effect on Victim’s capacity Decreases with increasing RFA intensity

  21. Experiments on Amazon EC2 Multiple Accounts Co-resident VMs from our accounts: Stand-ins for victimand beneficiary VM VM VM VM VM VM Separate instances for helper and web clients No direct interact with any other customers Indirect interaction akin to normal usage cases

  22. LLCProbeSynthetic Benchmark Average performance improvement: 6% Highest performance improvement of 13%, recovering 33% of performance lost. RFA improved performance of LLCProbe on all experimental EC2 instances!

  23. mcf from SPEC-CPU 3% performance improvement = 35% reduction in performance loss 10% slowdown 6% slowdown On average RFA improved performance across all SPEC workloads!

  24. Discussion: Practical Aspects RFA case studies used CPU intensive CGI requests • Alternative: DoS vulnerabilities (Eg. hash-collision attacks) Identifying co-resident victims • Easy on most clouds (Co-resident VMs have predictable internal IP addresses) No public interface? • Paper discusses possibilities for RFAs VM VM

  25. Conclusion Resource-Freeing Attacks • Interfere with victim to shift resource use • Proof-of-concept of efficacy in public clouds Open questions: • Other RFAs? • Countermeasures: Detection, stricter isolation, smarter scheduling? VM VM

  26. References [MMSys10] Sean K. Barker and Prashant Shenoy. “Empirical evaluation of latency-sensitive application performance in the cloud.” In MMSys, 2010. [Security10] Thomas Moscibroda and Onur Mutlu. “Memory performance attacks: Denial of memory service in multi-core systems.” In Usenix Security Symposium, 2007. [CCS09] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. “Hey, you, get off my cloud: exploring information leakage in third party compute clouds.” In CCS, 2009.

  27. Backup Slides

  28. Discussion: Countermeasures Detection? • May be hard to differentiate RFA from legitimate Stricter Isolation? • Works but expensive Contention-aware scheduling • Not yet used in public IaaS

  29. Discussion: Economies • Cost of RFA • Helper instance, and • RFA traffic. • Co-resident helper • An efficient implementation of helper can run inside the attacker’s VM. • Current helper implementation consumes 15 Kbps of network bandwidth and a CPU utilization of 0.7%. • Multiplex Singe Helper Instance for many beneficiaries. • Note: Currently, internal EC2 network traffic is free-of-cost.

  30. Identifying Co-resident VMs • Identifying the public interface: • Predictable numerical distance between internal IP addresses in public clouds. • Identifying port used by the victim application (standard ports like http(s), etc.).

  31. Experiment: Measuring Resource Contention • Synthetic workloads

  32. Other RFAs • RFAs are not limited to the presented case studies. • LLC vs. Disk • Sending spurious, random disk requests asynchronously to create a bottleneck for the shared disk resource. • Memory vs. Disk • Similarly to the above RFA

  33. Discussion: More on Practical Aspects • Work-conserving vs. Non-work-conserving schedulers • It is expected that public cloud environment manage resources in a non-work-conserving fashion. • Eg. Net vs. Net RFA won’t work on Amazon EC2. • Simulated client workload • What is the effect of RFA in the presence of multiple independent client requests originating from numerous clients?

  34. Xen Internals • Domain-0 • Privileged Domain, direct access to I/O devices. • All I/O requests goes through Dom-0 • Xen scheduler internal • Boost priority for interactive workloads VM VM VM VM Incoming request VM VM VM VM Dom0 Dom0 Dom0 Dom0 Hypervisor Core Core Core Core N/W cache memory Disk

  35. Experiment: Measuring Resource Contention Some have huge performance degradation • On a local Xen test bed VM VM VM VM VM VM VM VM VM Local Xen Test bed Core Core Core Core N/W LLC LLC memory Not all resources conflict Disk

  36. Boost Priority and Interruptions Victim: Webserver Beneficiary: LLCProbe 40% 95% 85% < 30% Fewer interruptions  Higher cache efficiency

  37. Demonstration on EC2 • Problem #1: Achieving Co-residence • Launching multiple instances simultaneously from two or more accounts. • Problem #2: Verifying Co-residency • Numerical distance between internal IP addresses [CCS09]. • Faster packet round-trip times. • Using resource contention experiments.

  38. Normalized Performance on EC2 Aggregate performance degradation is within 5 performance points On an average all SPEC workloads benefitted from RFA Baseline Higher is better 6%

More Related