high bandwidth packet switching on the raw general purpose architecture
Download
Skip this Video
Download Presentation
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture

Loading in 2 Seconds...

play fullscreen
1 / 41

High-Bandwidth Packet Switching - PowerPoint PPT Presentation


  • 410 Views
  • Uploaded on

High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group September 19, 2002 Talk at a Glance Motivation Architecture of Internet Routers Raw Processor Overview Raw Router Architecture Switch Fabric Design

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'High-Bandwidth Packet Switching ' - Audrey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
high bandwidth packet switching on the raw general purpose architecture

High-Bandwidth Packet Switching on the Raw General-Purpose Architecture

Gleb Chuvpilo

Saman Amarasinghe

MIT LCS Computer Architecture Group

September 19, 2002

talk at a glance
Talk at a Glance
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion
we are on
We are on…
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion
motivation
Motivation
  • Build a fast IP router on a general-purpose architecture

Why?

    • Flexibility  new protocols and services
    • Price  economies of scale
we are on5
We are on…
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion
architecture of internet routers

NetworkProcessor

ForwardingEngine

ForwardingEngine

ForwardingEngine

ForwardingEngine

Interface

Interface

Interface

Interface

SwitchFabric

Architecture of Internet Routers
we are on9
We are on…
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion
raw processor overview
Raw Processor Overview
  • 16 MIPS-like tiles on a single die
  • 2 Megabytes of SRAM on-chip
  • Over a thousand signal I/O pins
  • Over 200 Gbps of external chip bandwidth
  • Scalable to thousands of tiles!
raw communication mechanisms
Raw Communication Mechanisms
  • Two static networks
  • Two dynamic networks
raw static networks
Raw Static Networks
  • Destinations known at compile time
  • Message size known at compile time
  • Cycle-by-cycle switch schedule
  • Three-cycle nearest neighbor send-to-use latency
  • No processing overhead
raw dynamic networks
Raw Dynamic Networks
  • Unpredictable events
    • External asynchronous interrupts
    • Cache misses
  • 15- to 30-cycle nearest neighbor send-to-use latency (message header processing overhead)
we are on18
We are on…
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion
problem mapping
Problem: Mapping?

?

StaticInterconnect

Dynamic Communication

solution rotating crossbar
Solution: Rotating Crossbar

Out 0

Out 1

In 0

In 1

In 3

In 2

Out 3

Out 2

we are on23
We are on…
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion
rotating crossbar highlights
Rotating Crossbar Highlights
  • The idea of a Token Ring network absolute fairness
  • Algorithm uses two static networks, dynamic networks are idle
  • All deadlock-free configurations are scheduled at compile time
  • Four headers and token location define a global configuration
  • Global configuration is computed in a distributed manner at run time
phases of the algorithm
Phases of the Algorithm

TILE PROCESSOR

SWITCH PROCESSOR

headers_request

headers

send_prev_config

choose_new_config

route_body

confirm

update_token

we are on28
We are on…
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion
configuration space
Configuration Space
  • Let’s enumerate the number of configurations:

SPACE = |Hdr0| x … x |Hdr3| x |Token|,

where |Hdr0| = … = |Hdr3| = 5,

and |Token| = 4 

therefore

SPACE = 54 x 4 = 2,500 distinct configurations

so what
So What?...
  • Each tile has 8,192 words of instruction memory, same for switch 

 8,192/2,500 = 3.3 instructions per configuration  not enough!  need to use off-chip memory  slow! 

 need to minimize SPACE

minimization
Minimization

out

cwnext

in

ccwprev

cwprev

ccwnext

outcome of minimization
Outcome of Minimization
  • We cut down the number of configurations by 78 times! Now there are only 32 entries! 

 the program can fit in the local instruction memory!

we are on34
We are on…
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion
implementation
Implementation
  • Raw Router was tested in a cycle-accurate simulator of the Raw processor
  • Raw prototype clock speed is assumed to be 250 MHz
  • The focus of research is on switch fabric, NOT on route lookup, etc.
we are on38
We are on…
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion
future work
Future Work
  • Take advantage of dynamic networks
  • Implement IP route lookup
  • Add computation on data (encryption)
  • Add support of multicast traffic
  • Implement Quality of Service
  • Add virtual output queueing
  • Explore larger router configurations
conclusion
Conclusion
  • Implemented a gigabit switch on Raw
  • Mapped dynamic communication to static interconnect
  • Can intermix switch fabric with computation
  • High-bandwidth I/O allows performance of custom ASIC processors
ad