1 / 23

System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O. *. Kshitij Sudan* Saisanthosh Balakrishnan § Sean Lie § , Min Xu § Dhiraj Mallick § , Gary Lauterbach § Rajeev Balasubramonian *. §. Exec Summary. Focus on web-scale applications

oihane
Download Presentation

System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. System Architecture for Web-Scale Applications Using Lightweight CPUs and Virtualized I/O * Kshitij Sudan* SaisanthoshBalakrishnan§ Sean Lie §, Min Xu § DhirajMallick §, Gary Lauterbach§ Rajeev Balasubramonian* §

  2. Exec Summary • Focus on web-scale applications • Contribution 1: use of simple cores • This amplifies the power/cost contribution of the I/O subsystem • Contribution 2: virtualize I/O, e.g., single disk shared by many cores • Contribution 3: software stack optimizations • Contribution 4: evaluations on a production quality real design HPCA-2013

  3. Exec Summary • Focus on web-scale applications • Contribution 1: use of simple cores • This amplifies the power/cost contribution of the I/O subsystem • Contribution 2: virtualize I/O, e.g., single disk shared by many cores • Contribution 3: software stack optimizations • Contribution 4: evaluations on a production quality real design HPCA-2013

  4. Exec Summary • Focus on web-scale applications • Contribution 1: use of simple cores • This amplifies the power/cost contribution of the I/O subsystem • Contribution 2: virtualize I/O, e.g., single disk shared by many cores • Contribution 3: software stack optimizations • Contribution 4: evaluations on a production quality real design HPCA-2013

  5. Web Scale Applications • Targeting datacenter platforms • Focus on power and cost (OpEx and CapEx) • Web scale applications have large datasets, high concurrency, high communication, high I/O – e.g., MapReduce • Typically, performance increases as cluster size grows, but so does power and cost HPCA-2013

  6. Energy Efficient CPUs • For embarrassingly parallel workloads, energy per instruction (EPI) is important • For a given power/energy budget, many low-EPI cores can yield a higher throughput than a few high-EPI cores • Hence, use many light-weight energy-efficient CPUs (Atom CPU at 8.5 W) HPCA-2013

  7. Contribution of the I/O Sub-System • With light-weight cores, the energy and cost contributions of “other” components grow • Intel Atom CPU + Chipset = 11 Watts • Typical disk, or Ethernet card = 5-25 Watts • Fans, power supplies etc… • The application only uses 20-60 MB/s disk bw, while the disk has a peak read bw of 120 MB/s HPCA-2013

  8. Wasting energy on over-provisioned resources HPCA-2013

  9. Cluster-in-a-Box with Virtualized I/O • Use energy-efficient CPUs • ~10x more CPUs in same power budget than using typical server class CPUs • Virtualize I/O devices – disk and Ethernet • Balanced resource provisioning and lower cost/power • Amortize fixed server overheads by sharing components • Fans, power supplies, etc. HPCA-2013

  10. Compute Cards Compute card – 6 CPUs share 4 ASICs (PCIe connection), ASIC implements the fabric, 4GB DDR2 memory per CPU on the back HPCA-2013

  11. Compute Cards Compute card – 6 CPUs share 4 ASICs (PCIe connection), ASIC implements the fabric, 4GB DDR2 memory per CPU on the back HPCA-2013

  12. Logical Organization CPU + Chipset ASIC Compute Card Storage FPGA S-Cards Ethernet FPGA E-Cards (Up to 8 per system, each with 8x1 GbEor 2x10 GbE) (Up to 8 per system each with 8xSATA HDD/SSD) 3D-Torus Interconnect formed by ASICs HPCA-2013

  13. Physical Organization Midplane Interconnect E-Card S-Card Compute Card HDD/SSD HPCA-2013

  14. Cluster-in-a-Box Summary • 768 CPU cores interconnected using a high bandwidth fabric in a 3D torus topology • Low-latency distributed fabric architecture based on low-power ASICs • FPGAs implement the disk and ethernet controllers • Fabric and FPGAs implement I/O virtualization • Up to 64 disks shared by 384 server nodes • Server nodes don’t require a rack-top-switch to communicate • All internal cluster communication via fabric • Entire cluster consumes < 3.5kW under full-load HPCA-2013

  15. System Software Improvements • Implement large SATA packet sizes to reduce disk seek overheads • Other OS/ethernet configuration knobs: avoid journaling in the filesystem, jumbo TCP/IP frames, interrupt coalescing • MapReduce configuration: designate the few nodes near the S-cards as DataNodes HPCA-2013

  16. Methodology • Compare two cluster designs with the same power envelope to evaluate TCO and power for cluster architectures • 17-node Core i7 CPU based cluster (baseline) and a 384-node Atom cluster-in-a-box • 4 kW Core i7 cluster; 3.5 kW Atom cluster-in-a-box • FourApache Hadoop benchmarks • TCO calculations based on Hamilton’s model HPCA-2013

  17. HPCA-2013

  18. Improvement in EDP HPCA-2013

  19. HPCA-2013

  20. Performance/TCO vs. Number of Disks and Number of Cores HPCA-2013

  21. Conclusions • Datacenter power and cost are limiting factors when scaling web-scale apps • Build clusters using light-weight, low-power CPUs • Balanced resource provisioning can improve utilization, cost, power • Virtualize I/O (disk and Ethernet) • Amortize the overheads of fans, power supplies, etc. • The cluster-in-a-box system yieldsup to 6x improvement in EDP, relative to a traditional cluster HPCA-2013

  22. Questions? Thank You

  23. CPU and Disk Utilization 768 CPUs, 64 disks 64 CPUs, 32 disks HPCA-2013

More Related