1 / 33

Déjà Vu Switching for Multiplane NoCs

Déjà Vu Switching for Multiplane NoCs. Ahmed Abousamra. Rami Melhem. Alex Jones. University of Pittsburgh. Motivation. Power efficiency has become a primary concern in the design of CMPs. The NoC of Intel’s TeraFLOPS processor consumes more than 28% of the chip’s power.

olympe
Download Presentation

Déjà Vu Switching for Multiplane NoCs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Déjà Vu Switching for MultiplaneNoCs Ahmed Abousamra Rami Melhem Alex Jones University of Pittsburgh

  2. Motivation • Power efficiency has become a primary concern in the design of CMPs. • The NoCof Intel’s TeraFLOPS processor consumes more than 28% of the chip’s power. • Network messages can be classified into critical and non-critical • It may be possible to send non-critical messages on a slower plane without hurting performance.

  3. Setting Baseline Single Plane NoC Control Plane Data Plane • Carries control and coherence messages: data requests, invalidates, acknowledgments, … • Operates at baseline’s voltage & frequency • Carries data messages • Operates at a lower voltage & frequency to save power

  4. Outline • Motivation & Related work • Importance of data messages to performance • The Optimization Problem • Proposed Solution • Déjà Vu Switching • Analysis of the acceptable data plane speed • Evaluation • Summary

  5. Importance of data messages to performance Critical Word Cache line having 8 data words 1 2 3 4 5 6 7 8 Data Message 5 1 2 3 4 6 7 8 Header Critical Word Non-Critical Words Subsequent miss for another word in the line  Delayed cache hit

  6. Importance of data messages to performance • If delayed cache hits are overly delayed, performance can suffer. Percentage of L1 misses that are delayed cache hits

  7. Outline • Motivation & Related work • Importance of data messages to performance • The Optimization Problem • Proposed Solution • Déjà Vu Switching • Analysis of the acceptable data plane speed • Evaluation • Summary

  8. The Optimization Problem • How slow can we operate the data plane to maximize energy savings while not impacting performance?

  9. Proposed Solution • Split NoC physically into 2 planes: control + data • On data plane: • Use circuit-switching to speed-up communication. • Reduce voltage & frequency. • Send a control message to establish the circuit once a cache hit is detected. • Do not block circuit establishment message: Déjà Vu Switching • Analyze acceptable slow down of the data plane to minimize energy while maintaining performance.

  10. Déjà Vu Switching – Circuit Switching Data Packet Request Packet Reservation Packet Control Plane 1) Upon a cache miss, a data request is sent Data Plane

  11. Déjà Vu Switching – Circuit Switching Data Packet Request Packet Reservation Packet Control Plane 2) Upon a data hit in the next cache level, the circuit reservation packet is sent Data Plane

  12. Déjà Vu Switching – Circuit Switching Data Packet Request Packet Reservation Packet Control Plane 3) Finally, the data packet is sent to the requester on the reserved circuit Data Plane

  13. Déjà Vu Switching – Conflicting Reservations Router Control Plane Reservation Packets 3 1 2 West-East West-North East-North Data Plane Reserved Circuits 3 1 2 Data Packets

  14. Déjà Vu Switching - Reservation Scheme Head of the Queues Reservation Queue of West Input Port Reservation Queue of East Output Port W E Reserving the West-East Connection: Port Names: North West South East Local

  15. Déjà Vu Switching – Realizing Connections Head of the Queues Reservation Queues of the Input Ports Reservation Queues of the Output Ports N S W S North North West West N E N N E N South South East East S W Local Local Port Names: North West South East Local

  16. Déjà Vu Switching – Realizing Connections Head of the Queues Reservation Queues of the Input Ports Reservation Queues of the Output Ports N W North North S West West N N E South South East East Local Local Port Names: North West South East Local

  17. Déjà Vu Switching – Realizing Connections Head of the Queues Reservation Queues of the Input Ports Reservation Queues of the Output Ports N W North North West West N N E South South East East S Local Local Port Names: North West South East Local

  18. Déjà Vu Switching – Ensuring Correct Routing • Each input and output port must independently track the reserved circuits it is part of. • Any two reservation packets that share part of their paths, must traverse all the shared links in the same order • Data packets must be injected onto the data plane in the same order their reservation packets are injected onto the control plane.

  19. Outline • Motivation & Related work • Importance of data messages to performance • The Optimization Problem • Proposed Solution • Déjà Vu Switching • Analysis of the acceptable data plane speed • Evaluation • Summary

  20. Reservations should always be ahead of data packets 8 2 3 4 5 6 7 9 12 10 11 1 13 Time Reservation Packet injected early at cycle 1 Data Packet injected at cycle 3

  21. Reservations should always be ahead of data packets 8 2 3 4 5 6 7 9 12 10 11 1 13 Time Reservation Packet injected early at cycle 1 Data Packet injected at cycle 3

  22. Acceptable data plane slowdown - Message Latency

  23. Outline • Motivation & Related work • Importance of data messages to performance • The Optimization Problem • Proposed Solution • Déjà Vu Switching • Analysis of the acceptable data plane speed • Evaluation • Summary

  24. Evaluation Environment • We use the functional simulator, Simics, to simulate cache coherent CMPs of 16 and 64 cores. • We use Orion2 to get power numbers for the interconnect routers • We evaluate with • Synthetic traces: allows varying the network load • Execution driven simulation of parallel benchmarks from the SPLASH-2 and PARSEC suits • Interconnect Topology: Mesh

  25. Evaluation Environment • Baseline NoC: • Single plane, 16 byte links, packet switched with 3 cycles router pipeline, clocked at 4 GHz • Evaluated NoC: • Control plane: 6 byte links, packet switched, 4GHz • Data plane: 10 byte links, circuit switched. • Control and Coherence Packets: 1-flit • Data Packets: • Baseline NoC: 5 flits • Data Plane: 7 flits

  26. Evaluation – Synthetic Traces Normalized NoC energy and completion time of synthetic traces on a 64-core CMP *Each request receives a reply data packet

  27. Evaluation – Execution Driven Simulation Normalized execution time on a 16-core CMP

  28. Evaluation – Execution Driven Simulation Normalized NoC energy on a 16-core CMP

  29. Evaluation – Execution Driven Simulation Normalized NoC energy and execution time on a 64-core CMP

  30. Evaluation – Isolating Effect of Déjà Vu Switching Performance relative to CMP with a single plane NoC on a 16-core CMP

  31. Related Work • L. Cheng et al. (ISCA'06) and A. Flores et al. (IEEE Trans. Computers'10): Heterogeneous NoC using wires of different latency and power characteristics to improve performance and reduce NoC energy. • Proposal requires wide links (75 bytes), but performance degrades with narrow links. • Our work differs in: • Tying latency of messages to performance • Using Déjà Vu Switching to Compensate for slower data plane

  32. Summary • Problem: Saving power in the NoC by reducing the data plane’s power consumption without impacting performance. • Delayed cache hits are important to performance. • Operating data plane in circuit-switched mode allows it to operate at reduced frequency. • Déjà Vu Switching allows reservation to proceed when resources are not currently available. • The constraints governing the speed of the data plane can be estimated analytically.

  33. Thank you!

More Related