1 / 23

vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core

vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core. Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella, Dongyan Xu 2013 USENIX Annual Technical Conference. Embedded Lab. Kim Sewoog. Motivation. Pay-as-you-go: Server Consolidation

nuala
Download Presentation

vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella, Dongyan Xu 2013 USENIX Annual Technical Conference Embedded Lab. Kim Sewoog

  2. Motivation • Pay-as-you-go: Server Consolidation • Save cost in running application and operational expenditure • Multiple VMs sharing the same core • CPU access latency VM1 VM2 VM3 VM4 Hypervisor(or VMM) Low I/OThroughput

  3. I/O Processing • Two basic stages • Device interrupts are processed synchronously in the kernel • Application asynchronously copies the data in kernel buffer VM1 VM2 VM3 Application Kernel Buffer IRQ Processing CPU Time IRQ processing delay < I/O Processing Workflow > < Effect of CPU Sharing on I/O Processing >

  4. Effect of CPU Sharing on TCP Receive TCP Client Hypervisor Shared Buffer Scheduled VMs DATA DATA VM1 VM2 IRQ ProcessingDelay DATA VM3 ACK ACK ACK

  5. Effect of CPU Sharing on UDP Receive UDP Client Hypervisor Shared Buffer Scheduled VMs DATA Shared Buffer DATA VM1 Dropped Full VM2 ApplicationBuffer DATA VM3

  6. Effect of CPU Sharing on Disk Write Scheduled VMs Application Kernel Memory Disk Drive Kernel Memory DATA VM3 DATA DATA IRQ ProcessingDelay VM1 VM2 VM3

  7. Intuitive Solution • Reduce time-slice of each VM • Causes significant context switch overhead

  8. Our Solution: vTurbo

  9. Our Solution: vTurbo • IRQ processing offloaded to a dedicated turbo core • Turbo core : Any physical core with micro-slicing (e.g., 0.1 ms) • Expose turbo core as a special vCPU to the VM • Turbo vCPU runs on a turbo core • Regular vCPUs run on regular cores • Pin IRQ context of guest OS to turbo vCPU • Benefits • Improved I/O throughput (TCP/UDP, Disk) • Self-adaptive system

  10. vTurbo Design

  11. vTurbo Design VM1 VM2 VM3 Application Regular Core VM1 VM2 VM3 VM1 VM2 VM3 Buf Buf IRQ IRQ Turbo Core Time Data Data

  12. vTurbot’s Impact on Disk Write vTurbo Disk Drive Application Regular Core Kernel Memory VM3 DATA Kernel Memory VM3 VM1 VM2 VM3 VM1 VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 VM3 VM2 VM1 VM2

  13. Effect of CPU Sharing on UDP Receive UDP Client Hypervisor Shared Buffer Regular Cores vTurbo KernelBuffer Shared Buffer DATA VM1 VM3 VM1 Kernel Buffer VM2 VM3 VM2 VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 Application Buffer DATA VM3

  14. Effect of CPU Sharing on TCP Receive Backlog Queue TCP Client Hypervisor Shared Buffer KernelBuffer Regular Cores vTurbo DATA VM1 ACK VM3 VM1 VM2 Receive Queue VM3 VM2 VM1 VM2 Locked VM3 VM1 VM2 VM3 VM3 VM1 VM2 DATA Application Buffer

  15. VM Scheduling Policy for Fairness • Turbo cores are not free • Maintain CPU fair-share among VMs • Calculate the credits on both regular and turbo cores • Guarantee the CPU allocation on turbo cores • Deduct I/O intensive VMs’ credits on regular cores • Allocate the deduction to non-IO intensive VMs < total capacity among the regular and turbo cores > < each VMs’ turbo core fair share > < total capacity > < actual usage of the turbo core > < each VM’s fair share of CPU >

  16. Evaluation • VM hosts • 3.2 GHz Intel Xeon Quad-cores CPU, 16GB RAM • Assign an independent core to driver domain(dom0) • Xen 4.1.2 • Linux 3.2 • Choose 1 core as Turbo core • Gigabit Ethernet switch(10Gbps for 2 experiments)

  17. File Read/Write Throughput: Micro-Benchmark regular core <-> turbo core

  18. TCP/UDP Throughput : Micro-Benchmark

  19. NFS/SCP Throughput : Application Benchmark

  20. Apache Olio : Application Benchmark • 3 components • a web server to process user requests • a MySQL database server to store user profiles and event information • an NFS server to store images and documents specific to events

  21. Conclusions • Problem : CPU sharing affects I/O throughput • Solution : vTurbo • Offload IRQ processing to a turbo-sliced dedicated core • Results : • Improve UDP throughput up to 4x • Improve TCP throughput up to 3x • Improve Disk write up to 2x • Improve NFS’ throughput up to 3x • Improve Olio’s throughput by up to 38.7%

  22. Reference • CHENG, L., AND WANG, C.-L. “vbalance: Using interrupt load balance to improve i/o performance for smp virtual machine”, In ACM SoCC (2012) • DONG, Y., YU, Z., AND ROSE, G. “SR-IOV networking in Xen: architecture, design and implementation”,In WIOV (2008). • GORDON, A., AMIT, N., HAR’EL, N., BEN-YEHUDA, M., LANDAU, A., SCHUSTER, A., AND TSAFRIR, D. “ELI: baremetal performance for I/O virtualization”,In ACM ASPLOS(2012).

  23. Thank you !

More Related