html5-img
1 / 41

High-Bandwidth Packet Switching on the Raw General-Purpose Architecture

High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group September 19, 2002 Talk at a Glance Motivation Architecture of Internet Routers Raw Processor Overview Raw Router Architecture Switch Fabric Design

Audrey
Download Presentation

High-Bandwidth Packet Switching on the Raw General-Purpose Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group September 19, 2002

  2. Talk at a Glance • Motivation • Architecture of Internet Routers • Raw Processor Overview • Raw Router Architecture • Switch Fabric Design • Distributed Scheduling Algorithm • Results and Analysis • Future Work and Conclusion

  3. We are on… • Motivation • Architecture of Internet Routers • Raw Processor Overview • Raw Router Architecture • Switch Fabric Design • Distributed Scheduling Algorithm • Results and Analysis • Future Work and Conclusion

  4. Motivation • Build a fast IP router on a general-purpose architecture Why? • Flexibility  new protocols and services • Price  economies of scale

  5. We are on… • Motivation • Architecture of Internet Routers • Raw Processor Overview • Raw Router Architecture • Switch Fabric Design • Distributed Scheduling Algorithm • Results and Analysis • Future Work and Conclusion

  6. NetworkProcessor ForwardingEngine ForwardingEngine ForwardingEngine ForwardingEngine Interface Interface Interface Interface SwitchFabric Architecture of Internet Routers

  7. Switch Fabric

  8. Click Modular Router

  9. We are on… • Motivation • Architecture of Internet Routers • Raw Processor Overview • Raw Router Architecture • Switch Fabric Design • Distributed Scheduling Algorithm • Results and Analysis • Future Work and Conclusion

  10. Raw Processor Overview • 16 MIPS-like tiles on a single die • 2 Megabytes of SRAM on-chip • Over a thousand signal I/O pins • Over 200 Gbps of external chip bandwidth • Scalable to thousands of tiles!

  11. Raw Layout

  12. Raw Communication Mechanisms • Two static networks • Two dynamic networks

  13. Raw Static Networks • Destinations known at compile time • Message size known at compile time • Cycle-by-cycle switch schedule • Three-cycle nearest neighbor send-to-use latency • No processing overhead

  14. Static Network: Send

  15. Static Network: Receive

  16. Raw Dynamic Networks • Unpredictable events • External asynchronous interrupts • Cache misses • 15- to 30-cycle nearest neighbor send-to-use latency (message header processing overhead)

  17. Raw is Good for Streaming

  18. We are on… • Motivation • Architecture of Internet Routers • Raw Processor Overview • Raw Router Architecture • Switch Fabric Design • Distributed Scheduling Algorithm • Results and Analysis • Future Work and Conclusion

  19. 2 1 3 4 Given: Four Networks…

  20. … and Sixteen Tiles:

  21. Problem: Mapping? ? StaticInterconnect Dynamic Communication

  22. Solution: Rotating Crossbar Out 0 Out 1 In 0 In 1 In 3 In 2 Out 3 Out 2

  23. We are on… • Motivation • Architecture of Internet Routers • Raw Processor Overview • Raw Router Architecture • Switch Fabric Design • Distributed Scheduling Algorithm • Results and Analysis • Future Work and Conclusion

  24. Rotating Crossbar Highlights • The idea of a Token Ring network absolute fairness • Algorithm uses two static networks, dynamic networks are idle • All deadlock-free configurations are scheduled at compile time • Four headers and token location define a global configuration • Global configuration is computed in a distributed manner at run time

  25. Rotating Crossbar Illustrated

  26. Rotating Crossbar Illustrated

  27. Phases of the Algorithm TILE PROCESSOR SWITCH PROCESSOR headers_request headers send_prev_config choose_new_config route_body confirm update_token

  28. We are on… • Motivation • Architecture of Internet Routers • Raw Processor Overview • Raw Router Architecture • Switch Fabric Design • Distributed Scheduling Algorithm • Results and Analysis • Future Work and Conclusion

  29. Configuration Space • Let’s enumerate the number of configurations: SPACE = |Hdr0| x … x |Hdr3| x |Token|, where |Hdr0| = … = |Hdr3| = 5, and |Token| = 4  therefore SPACE = 54 x 4 = 2,500 distinct configurations

  30. So What?... • Each tile has 8,192 words of instruction memory, same for switch   8,192/2,500 = 3.3 instructions per configuration  not enough!  need to use off-chip memory  slow!   need to minimize SPACE

  31. Minimization out cwnext in ccwprev cwprev ccwnext

  32. Clients and Servers of a Crossbar Processor

  33. Outcome of Minimization • We cut down the number of configurations by 78 times! Now there are only 32 entries!   the program can fit in the local instruction memory!

  34. We are on… • Motivation • Architecture of Internet Routers • Raw Processor Overview • Raw Router Architecture • Switch Fabric Design • Distributed Scheduling Algorithm • Results and Analysis • Future Work and Conclusion

  35. Implementation • Raw Router was tested in a cycle-accurate simulator of the Raw processor • Raw prototype clock speed is assumed to be 250 MHz • The focus of research is on switch fabric, NOT on route lookup, etc.

  36. Peak Throughput

  37. Average Throughput

  38. We are on… • Motivation • Architecture of Internet Routers • Raw Processor Overview • Raw Router Architecture • Switch Fabric Design • Distributed Scheduling Algorithm • Results and Analysis • Future Work and Conclusion

  39. Future Work • Take advantage of dynamic networks • Implement IP route lookup • Add computation on data (encryption) • Add support of multicast traffic • Implement Quality of Service • Add virtual output queueing • Explore larger router configurations

  40. Conclusion • Implemented a gigabit switch on Raw • Mapped dynamic communication to static interconnect • Can intermix switch fabric with computation • High-bandwidth I/O allows performance of custom ASIC processors

  41. Questions?

More Related