High bandwidth packet switching on the raw general purpose architecture
Download
1 / 41

High-Bandwidth Packet Switching - PowerPoint PPT Presentation


  • 410 Views
  • Updated On :

High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group September 19, 2002 Talk at a Glance Motivation Architecture of Internet Routers Raw Processor Overview Raw Router Architecture Switch Fabric Design

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'High-Bandwidth Packet Switching ' - Audrey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
High bandwidth packet switching on the raw general purpose architecture l.jpg

High-Bandwidth Packet Switching on the Raw General-Purpose Architecture

Gleb Chuvpilo

Saman Amarasinghe

MIT LCS Computer Architecture Group

September 19, 2002


Talk at a glance l.jpg
Talk at a Glance

  • Motivation

  • Architecture of Internet Routers

  • Raw Processor Overview

  • Raw Router Architecture

  • Switch Fabric Design

  • Distributed Scheduling Algorithm

  • Results and Analysis

  • Future Work and Conclusion


We are on l.jpg
We are on…

  • Motivation

  • Architecture of Internet Routers

  • Raw Processor Overview

  • Raw Router Architecture

  • Switch Fabric Design

  • Distributed Scheduling Algorithm

  • Results and Analysis

  • Future Work and Conclusion


Motivation l.jpg
Motivation

  • Build a fast IP router on a general-purpose architecture

    Why?

    • Flexibility  new protocols and services

    • Price  economies of scale


We are on5 l.jpg
We are on…

  • Motivation

  • Architecture of Internet Routers

  • Raw Processor Overview

  • Raw Router Architecture

  • Switch Fabric Design

  • Distributed Scheduling Algorithm

  • Results and Analysis

  • Future Work and Conclusion


Architecture of internet routers l.jpg

NetworkProcessor

ForwardingEngine

ForwardingEngine

ForwardingEngine

ForwardingEngine

Interface

Interface

Interface

Interface

SwitchFabric

Architecture of Internet Routers




We are on9 l.jpg
We are on…

  • Motivation

  • Architecture of Internet Routers

  • Raw Processor Overview

  • Raw Router Architecture

  • Switch Fabric Design

  • Distributed Scheduling Algorithm

  • Results and Analysis

  • Future Work and Conclusion


Raw processor overview l.jpg
Raw Processor Overview

  • 16 MIPS-like tiles on a single die

  • 2 Megabytes of SRAM on-chip

  • Over a thousand signal I/O pins

  • Over 200 Gbps of external chip bandwidth

  • Scalable to thousands of tiles!



Raw communication mechanisms l.jpg
Raw Communication Mechanisms

  • Two static networks

  • Two dynamic networks


Raw static networks l.jpg
Raw Static Networks

  • Destinations known at compile time

  • Message size known at compile time

  • Cycle-by-cycle switch schedule

  • Three-cycle nearest neighbor send-to-use latency

  • No processing overhead




Raw dynamic networks l.jpg
Raw Dynamic Networks

  • Unpredictable events

    • External asynchronous interrupts

    • Cache misses

  • 15- to 30-cycle nearest neighbor send-to-use latency (message header processing overhead)



We are on18 l.jpg
We are on…

  • Motivation

  • Architecture of Internet Routers

  • Raw Processor Overview

  • Raw Router Architecture

  • Switch Fabric Design

  • Distributed Scheduling Algorithm

  • Results and Analysis

  • Future Work and Conclusion


Given four networks l.jpg

2

1

3

4

Given: Four Networks…



Problem mapping l.jpg
Problem: Mapping?

?

StaticInterconnect

Dynamic Communication


Solution rotating crossbar l.jpg
Solution: Rotating Crossbar

Out 0

Out 1

In 0

In 1

In 3

In 2

Out 3

Out 2


We are on23 l.jpg
We are on…

  • Motivation

  • Architecture of Internet Routers

  • Raw Processor Overview

  • Raw Router Architecture

  • Switch Fabric Design

  • Distributed Scheduling Algorithm

  • Results and Analysis

  • Future Work and Conclusion


Rotating crossbar highlights l.jpg
Rotating Crossbar Highlights

  • The idea of a Token Ring network absolute fairness

  • Algorithm uses two static networks, dynamic networks are idle

  • All deadlock-free configurations are scheduled at compile time

  • Four headers and token location define a global configuration

  • Global configuration is computed in a distributed manner at run time




Phases of the algorithm l.jpg
Phases of the Algorithm

TILE PROCESSOR

SWITCH PROCESSOR

headers_request

headers

send_prev_config

choose_new_config

route_body

confirm

update_token


We are on28 l.jpg
We are on…

  • Motivation

  • Architecture of Internet Routers

  • Raw Processor Overview

  • Raw Router Architecture

  • Switch Fabric Design

  • Distributed Scheduling Algorithm

  • Results and Analysis

  • Future Work and Conclusion


Configuration space l.jpg
Configuration Space

  • Let’s enumerate the number of configurations:

    SPACE = |Hdr0| x … x |Hdr3| x |Token|,

    where |Hdr0| = … = |Hdr3| = 5,

    and |Token| = 4 

    therefore

    SPACE = 54 x 4 = 2,500 distinct configurations


So what l.jpg
So What?...

  • Each tile has 8,192 words of instruction memory, same for switch 

     8,192/2,500 = 3.3 instructions per configuration  not enough!  need to use off-chip memory  slow! 

     need to minimize SPACE


Minimization l.jpg
Minimization

out

cwnext

in

ccwprev

cwprev

ccwnext



Outcome of minimization l.jpg
Outcome of Minimization

  • We cut down the number of configurations by 78 times! Now there are only 32 entries! 

     the program can fit in the local instruction memory!


We are on34 l.jpg
We are on…

  • Motivation

  • Architecture of Internet Routers

  • Raw Processor Overview

  • Raw Router Architecture

  • Switch Fabric Design

  • Distributed Scheduling Algorithm

  • Results and Analysis

  • Future Work and Conclusion


Implementation l.jpg
Implementation

  • Raw Router was tested in a cycle-accurate simulator of the Raw processor

  • Raw prototype clock speed is assumed to be 250 MHz

  • The focus of research is on switch fabric, NOT on route lookup, etc.




We are on38 l.jpg
We are on…

  • Motivation

  • Architecture of Internet Routers

  • Raw Processor Overview

  • Raw Router Architecture

  • Switch Fabric Design

  • Distributed Scheduling Algorithm

  • Results and Analysis

  • Future Work and Conclusion


Future work l.jpg
Future Work

  • Take advantage of dynamic networks

  • Implement IP route lookup

  • Add computation on data (encryption)

  • Add support of multicast traffic

  • Implement Quality of Service

  • Add virtual output queueing

  • Explore larger router configurations


Conclusion l.jpg
Conclusion

  • Implemented a gigabit switch on Raw

  • Mapped dynamic communication to static interconnect

  • Can intermix switch fabric with computation

  • High-bandwidth I/O allows performance of custom ASIC processors



ad