Crs 1 overview tau mar 07 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 44

CRS-1 overview TAU – Mar 07 PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on
  • Presentation posted in: General

CRS-1 overview TAU – Mar 07. Rami Zemach. Cisco’s high end router CRS-1. Future directions. Agenda. CRS-1’s NP Metro (SPP). CRS-1’s Fabric. CRS-1’s Line Card. What drove the CRS?. A sample taxonomy. OC768 Multi chassis Improved BW/Watt & BW/Space New OS (IOS-XR)

Download Presentation

CRS-1 overview TAU – Mar 07

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Crs 1 overview tau mar 07 l.jpg

CRS-1 overviewTAU – Mar 07

Rami Zemach


Agenda l.jpg

Cisco’s high end routerCRS-1

Future directions

Agenda

CRS-1’s NP Metro (SPP)

CRS-1’s Fabric

CRS-1’s Line Card


What drove the crs l.jpg

What drove the CRS?

A sample taxonomy

  • OC768

  • Multi chassis

  • Improved BW/Watt & BW/Space

  • New OS (IOS-XR)

  • Scalable control plane


Multiple router flavours l.jpg

Multiple router flavours

A sample taxonomy

  • Core

    • OC-12 (622Mbps) and up (to OC-768 ~= 40Gbps)

    • Big, fat, fast, expensive

    • E.g. Cisco HFR, Juniper T-640

      • HFR: 1.2Tbps each, interconnect up to 72 giving 92Tbps, start at $450k

  • Transit/Peering-facing

    • OC-3 and up, good GigE density

    • ACLs, full-on BGP, uRPF, accounting

  • Customer-facing

    • FR/ATM/…

    • Feature set as above, plus fancy queues, etc

  • Broadband aggregator

    • High scalability: sessions, ports, reconnections

    • Feature set as above

  • Customer-premises (CPE)

    • 100Mbps

    • NAT, DHCP, firewall, wireless, VoIP, …

    • Low cost, low-end, perhaps just software on a PC


Routers are pushed to the edge l.jpg

Routers are pushed to the edge

A sample taxonomy

  • Over time routers are pushed to the edge as:

    • BW requirements grow

    • # of interfaces scale

  • Different routers have different offering

    • Interfaces types (core is mostly Eathernet)

    • Features. Sometimes the same feature is implemented differently

    • User interface

    • Redundancy models

    • Operating system

  • Costumers look for:

    • investment protection

    • Stable network topology

    • Feature parity

      Transparent scale


What does scaling means l.jpg

What does Scaling means …

A sample taxonomy

  • Interfaces (BW, number, variance)

  • BW

  • Packet rate

  • Features (e.g. Support link BW in a flexible manner)

  • More Routes

  • Wider ECO system

  • Effective Management (e.g. capability to support more BGP peers and more events)

  • Fast Control (e.g. distribute routing information)

  • Availability

  • Serviceability

  • Scaling is both up and down (logical routers)


Low bw feature rich centralized l.jpg

CPU

Buffer

Memory

Route

Table

CPU

Line

Interface

Line

Interface

Line

Interface

Memory

MAC

MAC

MAC

Typically <0.5Gb/s aggregate capacity

Low BW feature rich – centralized

Off-chip Buffer

Shared Bus

Line Interface


Slide8 l.jpg

Fwding

Table

High BW – distributed

“Crossbar”: Switched Backplane

Line

Card

CPU

Card

Line

Card

Local

Buffer

Memory

Local

Buffer

Memory

Line Interface

CPU

Routing

Table

Memory

Fwding

Table

MAC

MAC

Typically <50Gb/s aggregate capacity


Distributed architecture challenges examples l.jpg

Distributed architecture challenges (examples)

  • HW wise

    • Switching fabric

    • High BW switching

    • QOS

    • Traffic loss

    • Speedup

  • Data plane (SW)

    • High BW / packet rate

    • Limited resources (cpu, memory)

  • Control plane (SW)

    • High event rate

    • Routing information distribution (e.g. forwarding tables)


Crs 1 system view l.jpg

Shelf controller

Shelf controller

Sys controller

Shelf controller

Shelf controller

Shelf controller

Sys controller

CRS-1 System View

Line Card Shelves

Contains Route Processors, Line cards, System controllers

Fabric Shelves

Contains Fabric cards,

System Controllers

100m

NMS

(Full system view)

Out of band GE control bus to all shelf controllers


Crs 1 system architecture l.jpg

Line Card

8 of 8

Line Card

S1

S2

S3

Modular Service Card

2 of 8

8K Qs

Cisco SPP

1 of 8

Interface Module

MID-PLANE

S1

S2

S3

8K Qs

Cisco SPP

Route Processor

Route Processor

S1

S2

S3

µ

µ

CRS-1 System Architecture

Fabric Chassis

FORWARDING PLANE

  • Up to 1152x40G

  • 40G throughput per LC

MULTISTAGE SWITCH FABRIC

1296x1296 non-blocking buffered fabric

Roots of Fabric architecture from Jon Turner’s early work

DISTRIBUTED CONTROL PLANE

Control SW distributed across multiple control processors


Switch fabric challenges l.jpg

Switch Fabric challenges

  • Scale - many ports

  • Fast

  • Distributed arbitration

  • Minimum disruption with QOS model

  • Minimum blocking

  • Balancing

  • Redundancy


Previous solution gsr cell based xbar w centralized scheduling l.jpg

Previous solution: GSR – Cell based XBAR w centralized scheduling

  • Each LC has variable width links to and from the XBAR, depending on its bandwidth requirement

  • Central scheduling ISLIP based

    • Two request-grant-accept rounds

    • Each arbitration round lasts one cell time

  • Per destination LC virtual output queues

  • Supports

    • H/L priority

    • Unicast/multicast


Crs cell based multi stage benes l.jpg

CRS Cell based Multi-Stage Benes

  • Multiple paths to a destination

  • For a given input to output port, the no. of paths is equal to the no. of center stage elements

  • Distribution between S1 and S2 stages. Routing at S2 and S3

  • Cell routing


Fabric speedup l.jpg

Fabric speedup

  • Q-fabric tries to approximate an output buffered switch

    • to minimize sub-port blocking

    • Buffering at output allows better scheduling

  • In single stage fabrics a 2X speedup very closely approximates an output buffered fabric *

  • For multi-stage the speedup factor to approx output buffered behavior is not known

    • CRS-1 fabric’s ~5X speed up

    • constrained by available technology

      • * Balaji prabhakar and nick McKeown computer systems technical report CSL-TR-97-738. November 1997.


Fabric flow control overview l.jpg

Fabric Flow ControlOverview

  • Discard - time constant in the 10’s of mS range

    • Originates from ‘from fab’ and is directed at ‘to fab’.

    • Is a very fine level of granularity, discard to the level of individual destination raw queues.

  • Back Pressure - time constant in the 10’s of mS range.

    • Originates from the Fabric and is directed at ‘to fab’.

    • Operates per priority at increasingly coarse granularity:

      • Fabric Destination (one of 4608)

      • Fabric Group (one of 48 in phase one and 96 in phase two)

      • Fabric (stop all traffic into the fabric per priority)


Reassembly window l.jpg

Reassembly Window

  • Cells transitioning the Fabric take different paths between Sprayer and Sponge.

  • Cells for the same packet will arrive out of order.

  • The Reassembly Window for a given Source is defined as the the worst-case differential delay two cells from a packet encounter as they traverse the Fabric.

  • The Fabric limits the Reassembly Window


Linecard challenges l.jpg

Linecard challenges

  • Power

  • COGS

  • Multiple interfaces

  • Intermediate buffering

  • Speed up

  • CPU subsystem


Cisco crs 1 line card l.jpg

Cisco CRS-1 Line Card

MODULAR SERVICES CARD

PLIM

Egress Packet Flow

From Fabric

4

OC192Framer and Optics

IngressQueuing

RX METRO

3

2

Interface Module ASIC

OC192Framer and Optics

1

SquidGW

CPU

MIDPLANE

OC192Framer and Optics

8

From Fabric ASIC

TXMETRO

EgressQueuing

OC192Framer and Optics

6

7

5


Cisco crs 1 line card20 l.jpg

MODULAR SERVICES CARD

PLIM

Power Regulators

Egress Packet Flow

From Fabric

4

Egress Queuing

OC192Framer and Optics

IngressQueuing

RX METRO

3

2

Interface Module ASIC

Egress Metro

From Fabric

OC192Framer and Optics

1

SquidGW

CPU

MIDPLANE

OC192Framer and Optics

Fabric Serdes

Line Card CPU

8

From Fabric ASIC

TXMETRO

EgressQueuing

OC192Framer and Optics

6

7

Ingress Metro

Ingress Queuing

5

Cisco CRS-1 Line Card


Cisco crs 1 line card21 l.jpg

Power Regulators

Egress Queuing

From Fabric

Fabric Serdes

Line Card CPU

Ingress Queuing

Cisco CRS-1 Line Card

Egress Metro

Ingress Metro


Cisco crs 1 line card22 l.jpg

Cisco CRS-1 Line Card

Ingress Metro


Metro subsystem l.jpg

Metro Subsystem


Metro subsystem24 l.jpg

Metro Subsystem

  • What is it ?

    • Massively Parallel NP

    • Codename Metro

    • Marketing name SPP (Silicon Packet Processor)

  • What were the Goals ?

    • Programmability

    • Scalability

  • Who designed & programmed it ?

    • Cisco internal (Israel/San Jose)

    • IBM and Tensilica partners


Metro subsystem25 l.jpg

  • QDR2 SRAM

    • 250Mhz DDR

    • 5 Channels

    • Policing state Classification results Queue length state

  • Metro

    • 2500 Balls

    • 250Mhz 35W

  • TCAM

    • 125MSPS

    • 128kx144-bit entries

    • 2 channels

  • FCRAM

    • 166Mhz DDR

    • 9 Channels

    • Lookups and Table Memory

Metro Subsystem


Metro top level l.jpg

  • Packet Out

  • 96 Gb/s BW

Packet In

96 Gb/s BW

Control Processor Interface

Proprietary 2Gb/s

  • 18mmx18mm - IBM .13um

  • 18M gates

  • 8Mbit SRAM and RAs

Metro Top Level


Gee whiz numbers l.jpg

Gee-whiz numbers

  • 188 32-bit embedded Risc cores

  • ~50 Bips

  • 175 Gb/s Memory BW

78 MPPS peak performance


Why programmability simple forwarding not so simple l.jpg

100k+ of adjacencies

Pointer to Statistics Counters

L3

load

balance

entry

L2

info

L3

Millions of Routes

info

Hundreds of Load balancing Entries per

leaf

Lookup

Load Balancing and Adjacencies : Sram/DRAM

Sram/Dram

Increasing pressure to add 1-2 level of increased indirection for High Availability and increased update rates

policy based

PBR associative

routing TCAM

data

table

1:1

TCAM

Sram/DRAM

Why Programmability ?Simple forwarding – not so simple

Example FEATURES:

  • IPv4 Unicast

lookup algorithm

  • MPLS–3 Labels

  • Link Bundling (v4)

  • Load Balancing L3 (v4)

  • 1 Policier Check

  • Marking

  • TE/FRR

  • Sampled Netflow

  • WRED

  • ACL

  • IPv4 Multicast

  • IPv6 Unicast

  • Per prefix accounting

  • GRE/L2TPv3 Tunneling

  • RPF check (loose/strict) v4

  • Load Balancing V3 (v6)

  • Link Bundling (v6)

  • Congestion Control

L2

Adjacency

Programmability also means

Ability to juggle feature ordering

Support for heterogeneous mixes of feature chains

Rapid introduction of new features (Feature Velocity)


Metro architecture basics l.jpg

On-Chip Packet Buffer

Resource Fabric

188

PPE

Metro Architecture Basics

Packet tails stored on-chip

Packet Distribution

96G

96G

96G

96G

Run-to-completion (RTC)

simple SW model

efficient heterogeneous feature processing

RTC and Non-Flow based Packet distribution means scalable architecture

Costs

High instruction BW supply

Need RMW and flow ordering solutions

PPE

~100Bytes of packet context sent to PPEs

Resource

Resource


Metro architecture basics30 l.jpg

On-Chip Packet Buffer

Resource Fabric

188

PPE

Metro Architecture Basics

Packet Gather

96G

96G

96G

96G

Gather of Packets involves : Assembly of final packets (at 100Gb/s)

Packet ordering after variable length processing

Gathering without new packet distribution

PPE

Resource

Resource


Metro architecture basics31 l.jpg

Resource Fabric

188

PPE

Metro Architecture Basics

Packet Buffer accessible as Resource

On-Chip Packet Buffer

96G

96G

96G

96G

Resource Fabric is parallel wide multi-drop busses

Resources consist of

Memories

Read-modify-write operations

Performance heavy mechanisms

PPE

Resource

Resource


Metro resources l.jpg

Metro Resources

Statistics

512k

Interface Tables

Policing

100k+

Queue Depth State

Lookup Engine

2M Prefixes

TCAM

Lookup Engine uses TreeBitmap Algorithm

FCRAM and on-chip memory

High Update rates

Configurable performance Vs density

CCR April 2004 (vol. 34 no. 2) pp 97-123. “Tree Bitmap : Hardware/Software IP Lookups with Incremental Updates”, Will Eatherton et. Al.

Table DRAM (10’sMB)


Packet processing element ppe l.jpg

Packet Processing Element (PPE)

16 PPE Clusters

Each Cluster of 12 PPE’s

.5sqmm per PPE


Packet processing element ppe34 l.jpg

Packet Processing Element (PPE)

ICACHE

  • Tensilica Xtensa core with Cisco enhancements

    • 32-bit, 5-stage pipeline

    • Code Density : 16/24 bit instructions

  • Small instruction cache and data memory

  • Cisco DMA engine – allows 3 outstanding Descriptor DMAs

  • 10’s Kbytes Fast instruction memory

To12

PPE’s

Cluster

Instruction

Memory

Global

Instruction

Memory

instruction bus

From Resources

Pkt Distribution

32-bit RISC

ProcessorCore

Memory mapped Regs

Distribution Hdr

To12

PPE’s

Cluster

Data

Mux Unit

Cisco

DMA

Pkt Hdr

Scratch Pad

Pkt Gather

DATA Mem

To Resources

PPE


Programming model and efficiency l.jpg

Programming Model and Efficiency

Metro Programming Model

  • Run to completion programming model

  • Queued descriptor interface to resources

  • Industry leveraged tool flow

    Efficiency Data Points

  • 1 ucoder for 6 months: IPv4 with common features (ACL, PBR, QoS, etc..)

  • CRS-1 initial shipping datapath code was done by ~3 people


Challenges l.jpg

Challenges

  • Constant power battle

    • Memory and IO

  • Die Size Allocation

    • PPEs Vs HW acceleration

  • Scalability

    • On-chip BW vs off-chip capacity

    • Procket NPU 100MPPS - limited scaling

  • Performance


Future directions l.jpg

future directions

POP convergence

Edge and core differences blur

Smartness in the network

More integrated services into the routing platforms

Feature sets needing acceleration expanding

Must leverage feature code across platforms/markets

Scalability (# of processors, amount of memory, BW)


Summary l.jpg

Summary

Router business is diverse

Network growth push routers to the

edge

Costumers expect scale from one hand

… and smart network

Routers become a massive parallel

processing machines


Questions l.jpg

Questions ?

Thank You


Crs 1 positioning l.jpg

CRS-1 Positioning

  • Core router (overall BW, interfaces types)

    • 1.2 Tbps, OC-768c Interface

  • Distributed architecture

  • Scalability/Performance

    • Scalable control plane

  • High Availability

  • Logical Routers

  • Multi-Chassis Support


Networks planes l.jpg

Networks planes

  • Networks are considered to have three planes / operating timescales

    • Data: packet forwarding [μs, ns]

    • Control: flows/connections [ ms, secs]

    • Management: aggregates, networks [ secs, hours ]

  • Planes coupling is in descendent order (control-data more, management-control less)


Exact matches in ethernet switches trees and tries l.jpg

log2N

N entries

Exact Matches in Ethernet Switches Trees and Tries

Binary Search Tree

Binary Search Trie

<

>

0

1

<

>

<

>

0

1

0

1

010

111

Lookup time bounded and independent of table size, storage is O(NW)

Lookup time dependent on table size, but independent of address length, storage is O(N)


Exact matches in ethernet switches multiway tries l.jpg

Exact Matches in Ethernet Switches Multiway tries

16-ary Search Trie

Ptr=0 means no children

0000, ptr

1111, ptr

1111, ptr

0000, 0

1111, ptr

0000, 0

000011110000

111111111111

Q: Why can’t we just make it a 248-ary trie?


  • Login