1 / 24

“Single-chip Cloud Computer” An experimental many-core processor from Intel Labs

“Single-chip Cloud Computer” An experimental many-core processor from Intel Labs. Xiaocheng Zhou Intel Labs China. Source: electronic visualization lab University of Illinois. What is Tera-scale?. TIPs of compute power operating on Tera-bytes of data. Entertainment. TIPS.

Download Presentation

“Single-chip Cloud Computer” An experimental many-core processor from Intel Labs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Single-chip Cloud Computer”An experimental many-core processor from Intel Labs Xiaocheng Zhou Intel Labs China

  2. Source: electronic visualization lab University of Illinois What is Tera-scale? TIPs of compute power operating on Tera-bytes of data Entertainment TIPS Learning & Travel RMS Personal Media Creation and Management GIPS 3D & Video Performance Tera-scale Mult- Media MIPS Multi-core Text Single-core KIPS Health Kilobytes Megabytes Gigabytes Terabytes Dataset Size http://techresearch.intel.com/articles/Tera-Scale/1421.htm

  3. Performance Scaling Challenges Energy Efficiency • Design • Complexity • Programming Strategy • Emerging • Applications

  4. Cloud Computing Today Cloud datacenters: • 1000s of networked computers • Millions of threads & petabytes of data Opportunity: • Lower power, higher density via integration • Greater efficiency and better programmability • Example: Intel’s Open Cirrus testbed • Intel Labs Pittsburgh Future: Many-core Processor?

  5. Single-chip Cloud Computer (SCC) • Experimental many-core CPU on 45 nm Hi-K metal-gate silicon • 48 IA-compatible cores • Network of 2-core nodes mimics cloud computing at chip level • Fine-grained power management scales from 25-125W • Supports proven, highly parallel “scale-out” programming models

  6. Inside the SCC Dual-core SCDC Tile Core 1 L2 Cache 24 Tiles 24 Routers 48 IA cores MC MC MC MC ROUTER Message Buffer ROUTER 1TILE R R R R MEMORY CONTROLLER • 2D mesh network • 4 Integrated DDR3 memory controllers (64GB addressable) L2 Cache Core 2 R R R

  7. On-die Interconnect • Architecture • 6x4 2D Mesh NOC • 16B wide data links + 2B sideband • 8 Virtual Channels in 2 classes • Fixed (X-Y) routing • Performance • Target freq: 2GHz @ 1.1V • Link Bandwidth 64GB/s • 4 cycle latency • Power Management • Independent Frequency & Voltage control • Sleep mode, clock gating, low power RF

  8. Memory Architecture • Memory • Up to 64GB DDR3 via 4 memory controllers @ 21.3GB/s • 16KB SRAM in each tile as Message Passing Buffer (MPB) • Caching • 32KB L1 per core (16KB I,D), 12MB L2 cache (256KB/core) • No HW cache-coherent shared memory • Addressing • Core physical to system physical addresses in 16MB sections • Memory mapped configuration & control registers

  9. Address Translation:From Core Address to System Address Look Up Table (LUT) Core Physical Address Space Core Physical Address Space Physical-Physical Mapping Physical-Physical Mapping System Physical Address Space

  10. Message Passing on SCC • Regions of memory mapped to multiple cores • Message Passing Buffer (MPB) for small fast messages • Larger buffers in off-die memory • Message Passing Data Type (MPDT) • R/W bypass L2 cache – tagged in L1 as MPDT • New instruction to selectively invalidate MPDT lines • Read/Write to other core’s MPB on-die • Synchronize through special atomic register bits • Core-core asynchronous interrupts • High-level API for applications – “RCCE” • One-sided communication (Get, Put, Send, Recv) • MPB allocation, synchronization

  11. Improving Energy EfficiencyFine-grain, software-controlled power management 8 voltage and 28 frequency islands • Each tile can run at a different frequency • 6 banks of four tiles can run at different voltages • Also independent V&F control for I/O network & MCs V2 V1 V3 Fn Fn Memory Controller Memory Controller Fn Fn Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile Tile R R R R R R R R R R R R R R R R R R R R R R R R V6 V4 V5 Memory Controller Memory Controller

  12. Package and Test Board

  13. SCC Platform Board Overview

  14. SCC “Chipset” • System Interface FPGA • Connects to SCC Mesh interconnect • IO capabilities like PCIe, Ethernet & SATA • Bitstream loaded by BMC • Board Management Controller (BMC) • JTAG interface for Clocking, Power etc. • USB Stick to hold FPGA bitstream • Network interface for User intercation via Telnet • Status monitoring

  15. Software Environment • SCC Software • Customized Linux • Bare Metal • RCCE communication & power management • Tools • Selected Intel tools (e.g., icc, ifort, ...) • Microsoft research release of SCC extensions to Visual Studio • Management Console PC Software • PCIe driver with integrated TCP/IP driver • Programming API for communication with SCC platform • GUI for interaction with SCC platform • Command line tools for interaction with SCC platform

  16. RCCE Communication API • A compact, lightweight communication environment. • SCC and RCCE were designed together side by side: • … a true HW/SW co-design project. • A research vehicle to understand how message passing APIs map onto many core chips. • For experienced parallel programmers willing to work close to the hardware. • Static SPMD Execution Model: • identical UEs created together when a program starts (this is a standard approach familiar to message passing programmers)

  17. RCCE power management emphasizes safe control: V/GHz changed together within each 4-tile (8-core) power domain. A Master core sets V + GHz for all cores in domain. RCCE_istep_power(): steps up or down V + GHz, where GHz is max for selected voltage. RCCE_wait_power(): returns when power change is done RCCE_step_frequency(): steps up or down only GHz Power management latencies V changes: Very high latency, O(Million) cycles. GHz changes: Low latency, O(few) cycles. RCCE Power Management API

  18. sccGui for debugging Modify config registers Read system memory

  19. sccBoot & sccReset • sccBoot:A command-line tool that allows to boot Linux on selected cores and to check the status (“which cores are currently booted”). • sccReset:A command-line tool that allows to reset selected SCC cores.

  20. sccKonsole • Regular konsole, with automatic login to selected cores. • Enables broadcasting amongst shells.

  21. MARC - Many-core Application Research Community • Worldwide research partnership program with academia & industry • Providing access to SCC for many-core programming research • Overwhelming interest - ~200 research proposals received • SCC datacenter is online - Community website up and running • http://communities.intel.com/community/marc

More Related