Switch EECS 252 – Spring 2006 RAMP Blue Project

SwitchEECS 252 – Spring 2006RAMP Blue Project Jue Sun and Gary Voronel Electrical Engineering and Computer Sciences University of California, Berkeley May 1, 2006

Outline • Goal of switch • Implementation • Performance • Future implementation • Current state of project • Project experience CS252-s06, Project Presentation

One Piece of the Puzzle • Main goal of RAMP Blue is to build a large scale system • To do useful work, processors must be able to communicate • Therefore, we need an interconnection network CS252-s06, Project Presentation

Implementation Goals • Support communication between all processors in system • Flexible hardware allowing parameterization of global system constants, especially number of Microblaze cores per FPGA • Minimal resource utilization • High throughput • Low latency • Simple, homogenous hardware • Simple software interface CS252-s06, Project Presentation

Hardware Design Constraints • RAMP Blue will be implemented on the BEE2 • 4 user FPGAs per BEE2 board • 2 LVCMOS links FPGA-to-FPGA communication • Relatively low latency (2 or 3 cycles) • Throughput: more than 64bit • 16 MGT links per board (4 per FPGA) for board-to-board communication • Relatively high latency (20 or more cycles) • Throughput: 32bit or 64 bit • To achieve lowest latency possible, we limit the packet routes to at most 1 MGT link • 16 Microblaze cores per FPGA (64 per board) • Depending on resource utilization, number of cores per FPGA may need to be reduced CS252-s06, Project Presentation

Physical Topology • Topology is fixed and homogenous throughout the system • Each FPGA directly connected to 2 other FPGAs on the same board and 4 other boards • Number of cores per FPGA is the same on every FPGA • Each board has a direct connection to every other board in the system (maximum of 17 boards) • BOARD n hooks up to board BOARD 16 through MGT n • With 16 cores per FPGA, 17 boards supports 1088 processors! CS252-s06, Project Presentation

Board Level Connectivity CS252-s06, Project Presentation

FPGA Level Connectivity For clarity, configuration shown is with 4 Microblaze cores per FPGA CS252-s06, Project Presentation

Switch Fabric Specifications • Crossbar switch with maximal connectivity • Every Microblaze can access every other Microblaze on the same FPGA directly • Every Microblaze can access both LVCMOS links • Every Microblaze can access all FPGA-local MGT links • Buffering on inputs and outputs • Store-and-forward buffers for Microblazes to decrease complexity and simplify software interface • Cut through buffers for LVCMOS links • MGT links wrapped XAUI cores that already have internal buffers CS252-s06, Project Presentation

Microblaze Level Connectivity For clarity, configuration shown is with 4 Microblaze cores per FPGA CS252-s06, Project Presentation

Switch Overall CS252-s06, Project Presentation

Scheduler • If two ports want to send to the same port at the same time, the port attached to the port with the lowest number will be allowed to send first • Other control logic not shown here is used to implement protocol between switch and buffers CS252-s06, Project Presentation

Source Routing • Fixed topology allows for straightforward source routing implementation • Destination routing would be more robust, but would require significantly more resources and greater complexity • Packet header is extremely simple: just a concatenated sequence of hops • Minimal hardware required to determine next hop and adjust the header at every hop (zero LUTs used – can’t get better than that!) • The next hop is encoded in the lowest bits of the header • To adjust the header, the hardware must simply shift out the lowest bits CS252-s06, Project Presentation

Source Routing – Hop Encoding • Need 5 bits to represent each hop • Must be able to encode 16 cores per FPGA + 4 MGT links + 2 LVCMOS links = 22 total encodings (+ 1 for a FIN code) • If 8 or less cores per FPGA are used, then each hop can be represented using only 4 bits (hardware supports parameterization of the hop encoding width) • Maximum of 6 hops based on physical topology • Constrained MGT links to 1 hop per route • Therefore, worst case route is:LVCMOS  LVCMOS  MGT  LVCMOS  LVCMOS  MB • Hop encoding allows header to fit into 1 word • 6 hops x 5 bits/hop = 30 bits CS252-s06, Project Presentation

Source Routing – Global Naming • Processors are globally named • Necessary to reach the goal of a simple software interface • If there are 16 cores per FPGA with 4 FPGAs per board and 17 total boards, then the processors are numbered 0 - 1087 • Naming scheme scales down with less cores • Necessary to support parameterization of global system constants (especially number of cores per FPGA) • If there are 4 cores per FPGA with 4 FPGAs per board and 17 total boards, then the processors are numbered 0 – 271 • Invalid processor number triggers error at the software level • Again, supports simple software interface • Ensures that only packets with valid headers enter the network CS252-s06, Project Presentation

Source Routing Example • For simplicity, let’s assume there are 4 cores per FPGA • Let’s send from processor #10 to processor #24 (representative of worst case path) CS252-s06, Project Presentation

Source Routing Example • Destination core is on a different board, so packet must first be routed from the source FPGA (FPGA 2) to the FPGA that is connected to the destination board (which is FPGA 0) • This requires 2 hops over the LEFT LVCMOS link CS252-s06, Project Presentation

Source Routing Example • Once at the proper FPGA, packet can be sent across the MGT link to an FPGA on the destination board CS252-s06, Project Presentation

Source Routing Example • Then, the packet must be routed to the destination FPGA, which requires 2 more LVCMOS hops CS252-s06, Project Presentation

Source Routing Example • Finally, the packet must be forwarded to the destination Microblaze core CS252-s06, Project Presentation

Source Routing Example • Each arrow head represents a hop – takes 5 hops to reach the destination FPGA • Requires one more hop to send the packet to the destination Microblaze core totalling 6 hops in the worst case CS252-s06, Project Presentation

Source Routing – 17th Board • To support the 17th board, boards communicate to the 17th board through the MGT link of their own board number CS252-s06, Project Presentation

Source Routing – 17th Board • For example, for BOARD 0 to send to BOARD 16, it sends over MGT 0 CS252-s06, Project Presentation

Microblaze Interface • Store and forward • Connecting to FSL bus for now • Essentially double buffered • MB FSL reading speed = extremely slow compare to switch delay time – at the fastest compilation with most efficient code, takes 48 cycle to write one value to FSL bus! • Example: send from MB to LVCMOS, loop back to LVCMOS link and then back to MB CS252-s06, Project Presentation

LVCMOS interface • 2 cycles of latency • Two buses connecting 2 FPGAs, can be used to do anything • Wire control bus and data bus on LVCMOS, except data_full or free signal is high 2 cycle before it is really full CS252-s06, Project Presentation

XAUI Interface • Much simplified because of XAUI has internal buffer • Essentially just some control signals • Interface has recently changed, so this is still in progress CS252-s06, Project Presentation

Software Interface • Simple interface to send and receive data • int send(int src, int dest, byte *buf, int len) • Copies len bytes of buf into local outgoing Buffer Unit • Constructs source route from src MB core to dest MB core • Blocks until all data copied • Returns number of bytes sent or -1 on error • Receive is called by interrupt • int recv(byte *buf, int len) • Copies len bytes into buf from local incoming Buffer Unit • Blocks until all data received • Returns number of bytes received or -1 on error CS252-s06, Project Presentation

Simplifications • Fixed packet length simplifies control hardware • Packet length fits completely into all buffers in the system, so the entire packet can be transferred from hop to hop • Once data transmission starts from MB buffer, it is not interrupted till MB input buffer • Store-and-forward implementation of MB buffers CS252-s06, Project Presentation

Performance (still need to clean this up) • Latency1 =~ 48*packet length to write into FSL bus • Latency2 =~ 2* packet length to wait for MB buffer to be full • Latency3 =~ 2 in switch transmission • Latency4 =~ 48*packet length to read into FSL bus • Bandwidth = 32bit/cycle or 64 bit/cycle (current fsl do not support 64 bit) CS252-s06, Project Presentation

Utilization on BEE2: Note: Measured with switch that connects 8 ports: 2 MB, 2 LVCMOS link, but no XAUI. All buffers are 32 bit wide and 16 word deep. CS252-s06, Project Presentation

Future implementation • Switch topology change • Allow variable packet length – using control in fsl • DMA • 4 MB share a DMA CS252-s06, Project Presentation

“Associated Switch” CS252-s06, Project Presentation

Clustered Organization • Microblaze cores organized into clusters • Since there are 4 DIMMs on the BEE2, split into 4 clusters • NIC will coordinate transfer of data for all MBs in cluster • Faster transfer for MBs in the same cluster because its DMA • Faster overall transfer because data copying done in hardware • Only 4 bits per hop now, but extra hop needed CS252-s06, Project Presentation

Switch EECS 252 – Spring 2006 RAMP Blue Project

Switch EECS 252 – Spring 2006 RAMP Blue Project

Presentation Transcript

EECS 150 - Components and Design Techniques for Digital Systems Lec 13 – Project Overview

Ramp Metering Pilot Project

Switch Fabric Architectures Vahid Tabatabaee Fall 2006

Cal Poly SuPER Project - Spring 2006

Bay Area Ramp Metering

LOAD BALANCING SWITCH

Ramp up to I-BEST Walla Walla

Krste Asanovic krste@eecs.berkeley inst.eecs.berkeley /~cs252/sp14

RAMP II: KSU Grain Science Accomplishments, 2005-2006

Switches

Etruscan Sigla Project

EECS 823 Microwave Remote Sensing

MBL Biomedical Informatics Spring 2006

Cisco 3D Icons

Restart Sept 2010

SWITCHlambda - Experiences with national dark fibers @ SWITCH

By Evan Smith, project coordinator, Head of Three Rivers Project Nov. 17, 2006

Blue

Project 5: Ramp Metering Control in Freeway System

Monday 17.05