1 / 34

Handout # 4: Scaling Switches – Parallelism

Handout # 4: Scaling Switches – Parallelism . CSC 2203 – Packet Switch and Network Architectures. Professor Yashar Ganjali Department of Computer Science University of Toronto yganjali@cs.toronto.edu http://www.cs.toronto.edu/~yganjali. TexPoint fonts used in EMF.

sanaa
Download Presentation

Handout # 4: Scaling Switches – Parallelism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Handout # 4:Scaling Switches – Parallelism CSC 2203 – Packet Switch and Network Architectures Professor Yashar Ganjali Department of Computer Science University of Toronto yganjali@cs.toronto.edu http://www.cs.toronto.edu/~yganjali TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

  2. Announcements • Draft proposal • Friday Oct. 5th • Feedback by Oct. 9th • Final proposal • Friday Oct. 12th • Paper [5] for this week University of Toronto – Fall 2012

  3. Outline – Scaling Packet Switches • Load balancing • Parallelism • External parallelism: • Parallel Packet Switches: multiple packet switches in parallel. • Internal parallelism: • Distributed shared memory routers, • Load-balanced two-stage switches, • Parallel packet buffers. • Problems we’ll encounter: • Mis-sequencing packets, • Resource conflicts. University of Toronto – Fall 2012

  4. Basic Idea of Parallelism “Serial system” R R Some packet processing function; e.g. header processor, address lookup, packet buffer or complete packet switch. “Parallel system” R/k R/k 1 R R R/k R/k k Q: If packets are load-balanced randomly, what cansay about the performance of the resulting system? University of Toronto – Fall 2012

  5. Problem 1Example of Parallelism Causing Mis-sequencing 3 2 1 R R R/2 R/2 R 3 2 1 R R/2 R/2 If packet “1” takes longer to process/store thanpacket “2”, then their departure order could be reversed. University of Toronto – Fall 2012

  6. Problem 2Example of Parallelism Causing Resource Conflicts 3 2 1 R R “Read packet 1, then 3, then 2” R/2 R/2 3 1 R R R/2 R/2 2 If packet “1” and packet “3” in the same memory, they can’t be read contiguously. University of Toronto – Fall 2012

  7. Outline – Scaling Packet Switches • Load balancing • Parallelism • External parallelism: • Parallel Packet Switches: multiple packet switches in parallel. • Internal parallelism: • Distributed shared memory routers, • Load-balanced two-stage switches, • Parallel packet buffers. • Problems we’ll encounter: • Mis-sequencing packets, • Resource conflicts. University of Toronto – Fall 2012

  8. Parallel Packet Switches (PPS) • Goals: • Build a big packet switch out of lots of little packet switches, • Each memory to run slower than line rate, • No packet mis-sequencing, • Emulation of a FCFS OQ switch. University of Toronto – Fall 2012

  9. Architecture of a PPS • A PPS is comprised of multiple identical lower speed packet-switches operating independently and in parallel. • An incoming stream of packets is spread, packet-by-packet, by a demultiplexer, across the slower packet-switches. • Outputs from each switch are then recombined by a multiplexer. University of Toronto – Fall 2012

  10. 1 2 3 N=4 Architecture of a PPS Demultiplexer OQ Switch Multiplexer (R/k) (R/k) R R 1 1 Multiplexer Demultiplexer R R OQ Switch 2 2 Demultiplexer Multiplexer R R 3 OQ Switch Demultiplexer Multiplexer R R k=3 N=4 (R/k) (R/k) University of Toronto – Fall 2012

  11. R R R R R R R R Yes No =? Emulation of an OQ Switch OQ Switch PPS University of Toronto – Fall 2012

  12. 5 4 3 2 1 5 5 4 1 4 1 1 4 5 4 1 2 2 2 2 5 4 4 3 3 2 2 2 1 1 4 3 2 2 1 1 1 1 3 2 3 3 3 3 3 N=4 Emulation Scenario Layer 1 R R 1 1 R/3 R R Layer 2 2 2 R R 3 R/3 Layer 3 R R N=4 R/3 University of Toronto – Fall 2012

  13. 1 5 1 1 4 4 1 1 1 j j j j j j j j 1 4 5 5 4 j j 2 j 2 2 2 2 2 j 4 3 3 3 4 3 3 3 N=4 4 No Choice at the Input? Layer 1 R R 1 R R Layer 2 2 2 R R 3 Layer 3 R R N=4 University of Toronto – Fall 2012

  14. 5 4 1 1 2 3 3 N=4 Result of No Choice Layer 1 R R 1 j j 1 4 5 R R Layer 2 2 2 2 R R 3 3 Layer 3 R R N=4 4 University of Toronto – Fall 2012

  15. 1 1 4 4 1 1 4 1 1 j j j j j j j j j 4 5 5 5 5 j j j 5 2 2 2 3 j 3 3 3 N=4 Increasing Choice Using Speedup Layer 1 R R (2R/3) (2R/3) 1 1 R R Layer 2 2 2 R R 3 Layer 3 R R N=4 (2R/3) (2R/3) University of Toronto – Fall 2012

  16. Effect of Speedup on Choice Layer 1 2R/k Layer 2 R A speedup of S = 2, with k = 10 links Layer 9 University of Toronto – Fall 2012 Layer 10

  17. Definition Available Input Link Set (AIL): AIL(i, n) is the set of layers to which external input port i can start writing a cell to, at time slot n. University of Toronto – Fall 2012

  18. Definition Departure Time of a Cell (n’) The departure time of a cell, n’, is the time it would have departed from an equivalent FIFO OQ switch. University of Toronto – Fall 2012

  19. Definition Available Output Link Set (AOL): AOL(j, n’) is the set of layers that output j can start reading a cell from, at time slot n’. University of Toronto – Fall 2012

  20. 1 5 j j j j 1 1 j j 2 2 2 3 j N=4 Observation • Inputs can only send to the AIL set. • Outputs can only read from the AOL set. Layer 1 R R 4 1 1 (2R/3) (2R/3) R Layer 2 R 2 2 2 R R 3 Layer 3 3 R R N=4 (2R/3) (2R/3) University of Toronto – Fall 2012

  21. Lower Bounds on Choice Sets • Minimum size of AIL, AOL: |AIL|, |AOL| ≥ (Number of links) – (Max number of busy links) University of Toronto – Fall 2012

  22. Overcoming Resource Conflict • A cell must be sent to a link which belongs to both the AIL and the AOL set: • AILAOL  • |AIL|+ |AOL| > k • (k – k/S + 1) + (k – k/S + 1) > k • S > 2k/(k+2) University of Toronto – Fall 2012

  23. Parallel Packet Switch – Results • If S > 2k/(k+2)2 then each cell is guaranteed to find a layer that belongs to both the AIL and AOL sets. • If S > 2k/(k+2)2then a PPS can precisely emulate a FIFO output queued switch for all traffic patterns. University of Toronto – Fall 2012

  24. Outline – Scaling Packet Switches • Load balancing • Parallelism • External parallelism: • Parallel Packet Switches: multiple packet switches in parallel. • Internal parallelism: • Distributed shared memory routers, • Load-balanced two-stage switches, • Parallel packet buffers. • Problems we’ll encounter: • Mis-sequencing packets, • Resource conflicts. University of Toronto – Fall 2012

  25. B5 C5 B5 A5 B6 A5 B6 C6 B6 A6 B5 Time slot = 2 Time slot = 3 Time slot = 1 A6 B A8 A8 A7 A8 A7 Parallel Output-Queued Router Constant size packets 1 A 1 2 k=3 C N=3 At most two memory operations per time slot: 1 write and 1 read University of Toronto – Fall 2012

  26. Work-conserving Router Theorem. A parallel output-queued router is work-conserving with 3N –1 memories that can perform at most one memory operation per time slot. University of Toronto – Fall 2012

  27. Restating the Problem • There are K pigeon holes. • Each can contain an infinite number of pigeons. • Assume that time is slotted, and in any one time slot • at most N pigeons can arrive and at most N can depart. • at most 1 pigeon can enter or leave a cage via a pigeon hole. • the time slot at which arriving pigeons will depart is known. • For any router • What is the minimum K, such that all N pigeons can be immediately placed in a pigeon hole when they arrive, and can depart at the right time? University of Toronto – Fall 2012

  28. DT=t DT=t+X DT=t+X Intuition for Theorem 1 • Only one packet can enter a memory at time t. • Only one packet can enter or leave a memory at time t. • Only one packet can enter or leave a memory at any time. Packets Time = t Memory University of Toronto – Fall 2012

  29. Proof of Theorem 1 • When a packet arrives in a time slot it must choose a memory not chosen by • The N – 1 other packets that arrive at that timeslot. • The N other packets that depart at that timeslot. • The N - 1 other packets that can depart at the same time as this packet departs (in future). • Proof: • By the pigeon-hole principle, 3N –1 memories that can perform at most one memory operation per time slot are sufficient for the router to be work-conserving University of Toronto – Fall 2012

  30. Memory Memory Memory Memory Memory Memory Memory Memory A5 1 A4 A3 A4 A1 A1 A5 B1 C3 C3 C1 B3 B1 K=8 C1 The Parallel Shared Memory Router At most one operation – a write or a read per time slot A R B R C Arriving Packets Departing Packets • From Theorem 1, k = 7 memories don’t suffice .. but 8 memories do University of Toronto – Fall 2012

  31. Distributed Shared Memory Router Switch Fabric Memories Memories Memories R R R Line Card 1 Line Card 2 Line Card N The central memories are moved to distributed line cards and shared. Memory and line cards can be added incrementally. From Theorem 1, 3N –1 memories which can perform one operation per time slot i.e. a total memory bandwidth of  3NR suffices for the router to be work-conserving. University of Toronto – Fall 2012

  32. Corollary 1 • Problem: • What is the switch bandwidth for a work-conserving DSM router? • Corollary 1: (sufficiency) • A switch bandwidth of 4NR is sufficient for a distributed shared memory router to be work-conserving • Proof: • There are a maximum of 3 memory accesses and 1 port access University of Toronto – Fall 2012

  33. Corollary 2 • Problem: • What is the switching algorithm for a work-conserving DSM router? • Bus : No algorithm needed, but impractical • Crossbar : Algorithm needed because only permutations are allowed • Corollary 2: (existence) • An edge coloring algorithm can switch packets for a work-conserving distributed shared memory router • Proof : • Any bipartite graph with maximum degree D has an edge coloring (permutation) with D colors • For a DSM router, D = 4 (3 memory and 1 port access) University of Toronto – Fall 2012

  34. Summary – Routers with 100% Throughput Switch Algorithm Total MemoryBW Switch BW Fabric # Mem. Mem. BW Output-Queued Bus N (N+1)R N(N+1)R NR None Shared Mem. Bus 1 2NR 2NR 2NR None Input Queued MWM Crossbar N 2R 2NR NR 2N 3R 6NR 2NR Maximal CIOQ (Cisco) Crossbar Time Reserve* 2N 3R 6NR 3NR PSM Bus k 3NR/k 3NR 3NR C. Sets DSM (Juniper) N 3R 3NR 4NR Edge Color Xbar N 3R 3NR 6NR C. Sets N 4R 4NR 4NR C. Sets PPS - OQ Clos Nk 2R(N+1)/k 2N(N+1)R 4NR C. Sets Nk 4NR/k 4NR 4NR C. Sets PPS –Shared Memory Clos None Nk 2NR/k 2NR 2NR University of Toronto – Fall 2012

More Related