Assaf Shacham , Member, IEEE, Keren Bergman, Senior Member, IEEE, and

IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 9, SEPTEMBER 2008Photonic Networks-on-Chip for FutureGenerations of Chip Multiprocessors AssafShacham, Member, IEEE, Keren Bergman, Senior Member, IEEE, and Luca P. Carloni, Member, IEEE A. Shacham is with Aprius Inc., 440 N. Wolfe Rd., Sunnyvale, CA 94085. E-mail: assaf@ee.columbia.edu. K. Bergman is with the Department of Electrical Engineering, Columbia University, 500 W. 120th St., 1300 Mudd, New York, NY 10027. E-mail: bergman@ee.columbia.edu. L.P. Carloni is with the Department of Computer Science, Columbia University, 466 Computer Science Building, 1214 Amsterdam Avenue, Mail Code: 0401, New York, NY 10027-7003. E-mail: luca@cs.columbia.edu. 2011. 06. 08. Kim Yeo-myung RFAD LAB, YONSEI University

CONTENTS • I. INTRODUCTION • II. RELATED WORK • III. HYBRID NOC MICROARCHITECTURE • IV. NETWORK DESIGN • V. DESIGN ANALYSIS AND OPTIMIZATION • VI. COMPARATIVE POWER ANALYSIS • VII. CONCLUSION RFAD LAB, YONSEI University

INTRODUCTION • Parallel Computational Core • New commercial release for driving performance • The role of interconnect and associated global communication infrastructure is becoming central to the chip performance • Issue of Network-on-Chip(NoC) • Large Bandwidth & stringent latency requirements • Electrical NoC can provide enough performance but required large power consumption → Photonic NoC • Photonic NoCs can deliver a dramatic reduction in power expended on intrachip global communi-cations while satisfying the high bandwidths requirements of CMPs • HibrydNoC Architecture – Photonic + Electronic RFAD LAB, YONSEI University

RELATED WORK • Relative performance of optical and electrical on-chip interconnects <Collet et al> • The penetration of on-chip optical interconnects can be envisioned in lengths larger than 1,000 times the wavelength where they can have lower power and latency than electronic interconnects • Multicore processor architecture where remote memory accesses are implemented as transactions on a global on-chip optical bus <Kirmanet al> • A latency reduction as high as 50 percent for some applications and a power reduction of about 30 percent over a baseline electrical bus RFAD LAB, YONSEI University

RELATED WORK • An optical NoC based on a wavelength-routed crossbar <Briereet al> • The crossbar, comprised of passive resonator devices and routing between an input-output pair, is achieved by selecting the appropriate wavelength • Problem : requires either widely tunable laser sources or large arrays of fixed-wavelength sources with fast wavelength-selection switches • Benefits of optical intrachipinterconnects<Intel> • While optical clock distribution networks are not especially attractive, wavelength division multiplexing (WDM) does offer interesting advantages for intrachip optical interconnects over copper in deep-submicron processes. RFAD LAB, YONSEI University

HYBRID NOC MICROARCHITECTURE • Meaning of Hybrid • Optical + Electronic • Circuit-switched network(bulk message) + packet-switched network(short message) • Why Hybrid? • Photonic packet switching? Two necessary functions for packet switching, namely, buffering and header processing, are very difficult to implement with optical devices • Electronic NoC Problem? Electronic NoCs do have many advantages in flexibility and abundant functionality, but tend to consume high power, which scales up with the transmitted bandwidth RFAD LAB, YONSEI University

HYBRID NOC MICROARCHITECTURE • Operation of optical circuit switching • Electronic control packet is transmitted → routed in the electronic network & setting up a photonic path • Buffering takes place for the electronic packets during the path-setup phase • The established paths are optical circuits between processing cores → enabling low power, low latency, high BW. • Advantage of photonic path • Bit-rate transparency : 어떤 소자가 광 신호의 전송 속도(bit-rate)에 관계없이 처리 할 수 있는 능력 → Dynamic power dissipation scales with the bit rate in electronics(switching power). But photonic switches switch on and off once per message and their energy dissipation does not depend on the bit rate • Low loss in optical wave guides RFAD LAB, YONSEI University

HYBRID NOC MICROARCHITECTURE * Optical Clock Distribution Network • Exploiting Photonics in NoC Design * Torus Networks * Off-Chip Laser * WDM Optical Switch (Microring-resonator structure) Waveguide & Fiber Coupling lens The construction of the photonic NoCin a single layer, above the metal Modulator RFAD LAB, YONSEI University

HYBRID NOC MICROARCHITECTURE • Life of a Message in the Photonic NoC • A write operation that takes place from a processing unit in a core to a memory that is located in another coreis start. • As soon as the write address is knowna path-setup packet is sent on the electronic control network. • The control packet is routed in the electronic network, reserving the photonic switches along the path for the photonic message which will follow it. • When the path-setup packet reaches the destination port, the photonic path is reserved and is ready to route the message. • A short light pulsecan then be transmitted onto the waveguide in the opposite direction (from the destination to the source), signaling to the source that the path is open. • After the message transmission is completed, a path teardown packet is sent to free the path resources for usage by other messages. RFAD LAB, YONSEI University

NETWORK DESIGN(Building Blocks) • Photonic Switching Element(PSE) • Microring-resonator structure(similar device : optically pumped) • OFF state: The resonant frequency of the rings is different from the wavelength • ON state: The switch is turned on by the injection of electrical current into p-n contacts surrounding the rings • Switching time : 30 ps • Their merit lies mainly in their extremely small footprint, with ring diameters of approximately 12um, and their low power RFAD LAB, YONSEI University

NETWORK DESIGN(Building Blocks) • Photonic Switching Element(PSE) • 4 X 4 switches (controlled by electronic circuit termed an ER) • Control packets are received in the ER, processed, and sent to their next hop, while the PSEs are switched ON and OFF accordingly • Blocking Relation is exist. (Nonblocking switches offer improved performance and simplify network management and routing.) RFAD LAB, YONSEI University

NETWORK DESIGN(Topology) • 4 X 4 folded torus network • The communication requirements of a CMP are best served by a 2D regular topology such as a mesh or a torus • A regular 2D topology requires 5 X 5 switches which are overly complex to implement using photonic technology. • Therefore use a folded-torus topology as a base and augment it with access points for the gateways. RFAD LAB, YONSEI University

NETWORK DESIGN(Topology) • 4 X 4 folded torus network • The access points for the gateways are designed with two goals in mind: 1) to facilitate injection and ejection without interference with the through traffic on the torus and 2) to avoid blocking between injected and ejected traffic which may be caused by the switches internal blocking.

NETWORK DESIGN(Topology) • 4 X 4 folded torus network

NETWORK DESIGN(Flow Control) • XY dimension-order routing on the torus network • Path setup time is required (travel a number of ERs and undergo some processing in each hop & blocking) (nanosecond order) • The transmission latency of the optical data is very short and depends only on the group velocity of light in a silicon waveguide : 2cm – 300ps RFAD LAB, YONSEI University

DESIGN ANALYSIS AND OPTIMIZATION • Simulation Setup • Developed POINTS(Photonic On-chip Interconnection Network Traffic Simulator) • 36-core CMP, 6X6 Planar layout, 22nm CMOS tech. • The chip size is assumed to be 20 mm along its edge, so each core is 3.3 X 3.3 mm in size. • The network is a 6 X 6 folded-torus network augmented with 36 gateway access points, so it uses a matrix of 12 X 12 switches. • A propagation velocity of 15.4 ps/mm in a silicon waveguide for the optical signals • The inter-PSE delay and interrouter delay are, therefore, 13 and 220 ps, respectively • The PSE setup time is assumed to be 1 ns and the router processing latency is 600 ps RFAD LAB, YONSEI University

DESIGN ANALYSIS AND OPTIMIZATION • Dealing with Deadlock • Deadlock : • 프로그램 1이 자원 A를 요청하여, 그것을 할당받았다. • 프로그램 2가 자원 B를 요청하여, 그것을 할당받았다. • 프로그램 1이 자원 B를 추가로 요청하였으나, 자원 B가 다른 프로그램에 의해 사용 중이므로, 사용 가능한 상태가 될 때까지 대기 열에서 기다리고 있다. • 프로그램 2가 자원 A를 추가로 요청하였으나, 자원 A가 다른 프로그램에 의해 사용 중이므로, 사용 가능한 상태가 될 때까지 대기 열에서 기다리고 있다.

DESIGN ANALYSIS AND OPTIMIZATION • OptimizingMessage Size • Large messages→ Link utilization is compromised and serialization latency is increased. • Small messages→The relative overhead of the path-setup latency becomes too large and efficiency is degraded.

DESIGN ANALYSIS AND OPTIMIZATION • OptimizingMessage Size • The optimal DMA block size for the transactions over the photonic NoC ranges between 4 and 16 Kbytes

DESIGN ANALYSIS AND OPTIMIZATION • Increasing Path Multiplicity

DESIGN ANALYSIS AND OPTIMIZATION • Evaluating Path-setup Procedures • Reductions in path-setup latency translate to improved efficiency of the network interfaces and to higher average bandwidth. • tqis a major contributor to the overall setup latency • Some of the Technique is mentioned to reduce the tq. (Immediately dropping any path-setup packet that is blocked instead of buffering it)

COMPARATIVE POWER ANALYSIS • Power Analysis → The main motivation for the design of a photonic NoC • To evaluate this power analysis, perform a comparative high level power analysis. • Condition of Power Analysis • Same bandwidth & same number of processing core • Assume : 22nm CMOS technology, hosting 36 processing cores, each requiring a peak bandwidth 800 Gbps, average bandwidth 512 Gbps • Assume : uniformtraffic model, mesh topology, XY dimension-order routing RFAD LAB, YONSEI University

COMPARATIVE POWER ANALYSIS • Reference Electronic NoC • Reading from a buffer (for high-BW, Large parallel line is required) • Traversing the routers’ internal crossbar, • Transmission across the interrouterlink, • Writing to a buffer in the subsequent router, and • Triggering an arbitration decision. RFAD LAB, YONSEI University

COMPARATIVE POWER ANALYSIS • Proposed Photonic NoC • The photonic data-transfer network (6X6 CMP) Path multiplicity factor : 2 → 12 X 12 Photonic mesh (576 PSEs) Power of PSE : On state → 10 mW, Off state → no dissipation Total Power consumption (statistic) 2. Electronic Control network (6X6 CMP) Each photonic message is accompanied by two 32-bit control packets and the typical size of a message is 2 Kbytes.

COMPARATIVE POWER ANALYSIS • Proposed Photonic NoC 3. The electronic control network 960 Gbps BW → 40 Gbps X 24 Wavelengths → 24 modulator and receiver is required. We estimate that Silicon ring-resonator modulator, SiGe photo-detectors the energy will decrease to about 0.2 pJ/bit in the next 8-10 years (Supplementary circuits that are usually required for the implementation of optical receivers(CDR,serializeretc) are not needed in an ultrashort link in which the modulation rate is equal to the chip clock rate) (The off-chip laser sources consume an estimated power of 10 mW per wavelength. Although a large number of lasers are required to exploit the bandwidth potential of the optical NoC, their power is dissipated off-chip and does not contribute to the chip power density)

CONCLUSION • The motivation behind our work • 1. Multicore processors step into an era where high bandwidth communicationsbetween large numbers of cores is a key driver of computing performance. • 2. Power dissipation has clearly become the limiting factor in the design of high-performance microprocessors • 3. Recent breakthroughs in the field of silicon photonicssuggest that the integration of optical elements with CMOS electronics is likely to become viable in the near future. • This paper aims at laying the groundwork for future research progress by providing a complete discussion of the fundamental issues that need to be addressed to design a photonic NoC for CMPs

Assaf Shacham , Member, IEEE, Keren Bergman, Senior Member, IEEE, and