1 / 52

EECS 470

EECS 470. Busses in the real world Lecture 22 – Fall 2013. Today’s lecture. I want to talk about interconnects. There are tons of them on a computer Some to memory Some to I/O There have been lots of implementations I’m going to talk mostly about two older ones: PCI and P6

ranae
Download Presentation

EECS 470

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 470 Busses in the real world Lecture 22– Fall 2013

  2. Today’s lecture • I want to talk about interconnects. • There are tons of them on a computer • Some to memory • Some to I/O • There have been lots of implementations • I’m going to talk mostly about two older ones: • PCI and P6 • But I’m also going to talk a bit about newer ones • Quick Path and (very little) PCI Express • Let’s first look at the big picture.

  3. Various buses

  4. Basic bus issues • What are the basic wires for specifying the transaction and moving the data • What are the types of transactions? How are they specified? • How is length of data transfer specified? • Who can delay (insert wait states?) • How is arbitration done? • Out-of-order transfers allowed? • Any restrictions? • Error reporting? • Weirdness? • Alignment for example.

  5. Transaction types • Usually read/write with a length • But in a given domain, other info might be important. • Data vs. Code access. • I/O vs. memory access • Hints to target device • Length might be arbitrary.

  6. Delaying • Who can delay and how • Usually a target (slave) can delay • Sometimes initiator (master) can delay • Sometimes initiator can drop the transaction • Sometimes the target has options on how to delay.

  7. Arbitration • Fairness • Even sharing, priority sharing, weighted sharing • Mechanism • Centralized arbiter • Distributed arbiter • Combination • Duration • Until done • Until someone else requests • Until certain time passes. • Combination

  8. Out-of-order • Does the bus allow transactions to complete out-of-order? • If so, can increase bandwidth (why?) • If so, might have to worry about ordering issues • Memory consistency models not a topic for this class (take EECS 570!) but basics are pretty easy to grasp

  9. BSB BSB L2 L2 Proc Proc P6 bus Chipset Mem PCI PCI • PCI stands for “Peripheral Component Interconnect • Many cards you plug into a computer are PCI (most network cards, older graphics cards, etc.) • Normal configurations have PCI as a 33MHz bus with a 32 bit shared address/data lines. • This is based on version 2.1 of the PCI spec. • Changes with 3.0 and 2.3 are fairly minor from our viewpoint.

  10. Speeds • Conventional PCI is at version 2.3 • Basic version is 32-bits at 33MHz and 5 volts • Version 2.1 allowed 5V or 3.3V and up to 64bit 66MHz • PCI-X • Backwards compatible (but not 5 volts apparently) • Up to 533MHz with only 1 load

  11. AD[31:0] Address/data and command C/BE#[3:0] PCI Master Device (required signals only) PAR REQ# Arbitration FRAME# GNT# TRDY# CLK Interface control System IRDY# RST# STOP# PERR# Error reporting DEVSEL# SERR#

  12. Basics • AD[31:0] bus is for the address and the data • The C/BE#[3:0] is the Command in the address phase and the Byte Enablein the data phase • FRAME#, TRDY#, IRDY# are main control signals. • Other signals: • PAR is even parity over AD and C/BE# buses. • PERR# and SERR# are Parity and System error reporting • CLK is clock • RST# is a request to reset all devices.

  13. AD and C/BE • AD • First phase is address • Everything afterwards is data • C/BE# • First phase is command • Rest is byte enable.

  14. Control • FRAME# is asserted during the first phase of the transaction and until the last data phase. • TRDY# indicates that the target has valid data on the bus (READ) or is able to read valid data (WRITE) • IRDY# is the same as TRDY# but for the initiator.

  15. CLK AD[0:31] ?? ADS D1 D2 D3 FRAME# C/BE# ?? CMD BE BE BE IRDY# TRDY# Simple PCI read transaction

  16. Simple PCI read transaction cont. CLK AD[0:31] ?? ADS D1 D2 D2 D3 FRAME# C/BE# ?? CMD BE BE BE IRDY# TRDY#

  17. Deep thoughts with Mark • Notice that the length of the transaction is not specified explicitly • Starts at the given address. Keeps giving next data until done. • But this makes things hard for the target. How much data should be fetched? • So the various read commands give hints about how much data to move • Read is for less than a cache line • Read line is for a cache line or so • Read multiple is for more than 1 or 2 cache lines

  18. More deep thoughts • It turns out many NIC cards did things in a really wacky way. • They would read (and/or write) 4KB pages by reading 4 bytes, going away, reading 4 more bytes • This caused significant problems on high-end (web) server performance. • But was okay on most workstations/desktops. • Moral: There is a cost vs. performance trade-off on almost everything you do. Be sure to consider the ramifications of solving the problem for only one domain.

  19. Try to draw the write… CLK AD[0:31] FRAME# C/BE# IRDY# TRDY#

  20. CLK AD[0:31] ?? ADS D1 D2 D3 D4 FRAME# C/BE# ?? CMD BE BE BE BE IRDY# TRDY# One solution

  21. PCI write types • Memory Write • Just says gonna write • Memory Write and Invalidate • Writing • Will (100% promise!) that will start and end on cache line boundaries. • Why is this useful?

  22. PCI Arbitration • Hidden phase • That is, done in parallel with transfers • Centralized arbiter • Arbitration algorithm unspecified, but must be “fair” • Fair isn’t all that fair… • #REQ, #GNT • Get bus when #GNT asserted, and FRAME, TRDY#, IRDY# not asserted. • Must give up when #GNT de-asserted in some reasonable time. • Notice, arbiter has separate grant and request lines for each PCI master…

  23. Ordering • PCI target can say “go away” • STOP# signal • Initiator is obligated to come back to finish request. • (Notice with #FRAME the target can tell if transaction was done anyways) • Any ordering restrictions not PCI problem.

  24. Basic bus issues: PCI? • What are the basic wires for specifying the transaction and moving the data • What are the types of transactions? How are they specified? • How is length of data transfer specified? • Who can delay (insert wait states?) • How is arbitration done? • Out-of-order transfers allowed? • Any restrictions? • Error reporting? • Weirdness? • Alignment for example.

  25. BSB BSB L2 L2 Proc Proc P6 bus Chipset Mem PCI Basics of the P6 bus • The goal of the P6 bus is to allow communication among the processors and the chipset • Transactions are directed toward the chipset. • All of the processors “snoop” the bus. • It uses about 170 pins total

  26. Basics of the P6 bus (cont.) • There are generally 6 phases of a transaction. • Arbitration - ask to use the bus • Request - Send Transaction details (R/W, size) • Error - parity error on request mainly • Snoop - let other processors get involved • Response - The “Ack” • Data transfer - Actual movement of data • In general the same phase of two transactions are separated by 3 clocks.

  27. Why bother? • The goal of this part of the presentation is to expose you to a more complex bus. • The bus is a true “split-transaction” bus • That is, it is pipelined. • Increased bandwidth due to overlapping of accesses • No real impact on latency (why?) • It is the most complex bus I’m aware of. • Newer versions of the bus (P3, P4, Itanium) have some changes, but basics are the same.

  28. Bus protocol • Each device on the bus has to be fairly sophisticated. • Arbitration is handled without a centralized arbiter. • Each device must keep track of the order of the transactions and which transaction is in what stage. • This ordering is called the “In order queue” or IOQ. • In addition there are “Out of Order” transactions. • These are used for transactions which are likely to take a while. (So they don’t interfere with the others)

  29. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 CLK ARB 1 2 REQ 1 1 2 2 ERROR 1 2 SNOOP 1 2 RESP 1 2 DATA 1 2 One or more clock ticks Timing between phases

  30. Phase 1: Arbitration • The arbitration phase mainly involves 5 pins • BREQ#[0:3] - Symmetric agent request • BPRI# - Priority agent request • Each processor keeps track of a rotating ID • The rotating ID is the last device to perform a bus transaction • Each device is only allowed to perform one transaction at a time if other devices also want to use the bus • If more than one device wants to use the bus the winner is the device which is “next” • So if the current ID is 2 the priorities are 3, 0, 1, 2 • If it is 0 the order is 1,2,3,0.

  31. The arbitration rules • The device must continue to hold BREQ asserted until the clock before it gets asserts ADS# (starts the transaction) • Once a device starts its transaction it must deassert its BREQ line if any other BREQ line is asserted. • On the clock it’s BREQ is deasserted all devices re-compute which device will be allowed to go next. • Each agent updates its rotating ID after it deasserts BREQ# • If the bus is idle then it can assert ADS# two clocks after winning arbitration. • Each ADS# assertion must be at least 3 cycles apart.

  32. Symmetric Arbitration example(with bus parking…) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 CLK BREQ0 BREQ1 BREQ2 BREQ3 0a 1a 2a 0b 0c ADS# R. ID 3 3 3 3 0 0 1 1 1 2 2 2 0 0 0 0 Active? N N Y Y Y Y Y Y Y Y Y Y Y Y Y Y

  33. 0 has a request at time 2 and 91 has a request at time 12 has a request at time 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 CLK BREQ0 BREQ1 BREQ2 BREQ3 ADS# 3 3 R. ID N N Active?

  34. Request • There are about 40 pins involved in this phase • The phase lasts 2 clocks. • The total of 80 signals (2 x 40) includes: • 36 bit address • Type of transaction • Byte enables • Size of transaction • Code/Data info • The #ADS signal “qualifies” the request signals • It is low during the first clock of the request phase.

  35. Error • Fairly trivial. • Parity is checked for. • If the Parity check fails then AERR# is asserted. • All transactions in the IOQ are canceled and everything starts over. • I believe current implementations may crash with a parity error at this point.

  36. Snoop • There are only 3 signals in the snoop phase • HIT#, HITM#, and DEFER# • None-the-less the snoop phase is the most complex part of the whole P6 protocol. • HIT# • If a processor has the data in its cache in the Shared or Exclusive state it asserts HIT# • HITM# • Is asserted by a processor if it has a “dirty” or Modified version of the data in its cache

  37. Snoop (cont.) • DEFER# • Is only asserted by the chipset (or perhaps by some other priority agent). • It says that the chipset wants to pull this transaction out of the IOQ because it could take a while to respond. • DEFER# can also result in a “retry” request • If HIT# and HITM# are asserted • It is a snoop stall (ie. an agent on the bus could not respond to the request in time) • Snoop results are re-checked in 2 clocks

  38. Snoop (cont.) • If HITM# and DEFER# are asserted • The DEFER# is ignored. • If HITM# is asserted • The processor asserting HITM# is responsible for supplying the data • The chipset is expected to “snarf” the data (ie. copy it into the DRAM) as it passes by. • Once the snoop phase has happened and DEFER# has not been asserted the transaction must complete.

  39. Response Phase • This phase is mostly concerned with 3 signals, called RS[0:2]. • The 8 different orderings of these signals encodes the following possibilities: • Hard Failure -- Something went VERY wrong • Implicit Writeback -- HITM# was asserted • Deferred -- Transaction deferred • Retry -- Only if DEFER# was asserted • Normal Data -- Standard response • No Data -- Transaction requires no data

  40. Data Phase • This phase consists of • 64 bits of data D[0:63]# • A DRDY and TRDY (pretty much the same as IRDY and TRDY on PCI) • All transactions are one of: • 0 bytes -- Invalidate • 8 bytes or less -- Write thru mode and uncacheable addresses will do this • 32 bytes -- moving a whole cache line • Which one it will be was determined during the request and snoop phases • What does the snoop phase have to do with it?

  41. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 CLK ARB 1 2 REQ 1 1 2 2 ERROR 1 2 SNOOP 1 2 RESP 1 2 DATA 1 2 One or more clock ticks P6 Review

  42. L2 Cache L2 Cache Processor Processor Memory & I/O Chipset Front Side Bus (Processor bus) Memory I/O I/O I/O I/O PCI bus (I/O bus)

  43. QuickPath Interconnect • Here things are all point-to-point. • No shared bus • Can be as simple as a single processor talking to the chipset • Can be as complex as picture shown. • Memory and I/O interfaces are different Largely taken from: https://en.wikipedia.org/wiki/Intel_QuickPath_Interconnectand http://www.intel.ie/content/dam/doc/white-paper/quick-path-interconnect-introduction-paper.pdf

  44. Details • QPI has two 20-bit signals, one in each direction • Each direction also has a clock • So 42 signals. • Each signal is a differential pair • Thus 84 pins. • 80-bit “flit” is the packet size. • Transferred in two clock cycles (four 20 bit transfers, two per clock.) • The 80-bit "flit" has 8 bits for error detection, 8 bits for "link-layer header," and 64 bits for "data". • Thus 8 bytes of useable information per 2 clocks in both directions.

  45. Higher-level protocols • These 64-bit packets can be used for anything • Addresses • Data • Routing information • etc. • Wires no longer are the thing (address, data, etc.) • More complex for hardware, but so what?

  46. Point-to-point?What about snooping • We’re back to a network. • So we need a directory-based solution. • Uses a variation of MESI, MESIF • F state is like shared, but is allowed to supply clean data • Why? • Has two schemes for doing snooping. • Home snoop • Source snoop

  47. Home snoop (1/2)

  48. Home snoop (2/2)

  49. Source Snoop (1/2)

More Related