1 / 78

Buses

Buses. A. Jantsch / Z. Lu / I. Sander . Dally: Chapter 22, 18. Some History….

keelty
Download Presentation

Buses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Buses A. Jantsch / Z. Lu / I. Sander Dally: Chapter 22, 18

  2. Some History…

  3. ENIAC, short for Electronic Numerical Integrator and Computer, was the first large-scale, electronic, digital computer capable of being reprogrammed to solve a full range of computing problems, although earlier computers had been built with some of these properties. ENIAC60 years ago… Source: http:://www.embedded.com SoC Architecture

  4. ENIAC was designed and built to calculate artillery firing tables for the U.S. Army's Ballistics Research Laboratory. The first problems run on the ENIAC however, were related to the design of the hydrogen bomb. ENIAC60 years ago… Source: http:://www.embedded.com SoC Architecture

  5. ENIAC contained 17,468 vacuum tubes, 7,200 crystal diodes, 1,500 relays, 70,000 resistors, 10,000 capacitors and around 5 million hand-soldered joints It weighed 27 tons, was roughly 2.4 m by 0.9 m by 30 m, took up 167 m² ENIAC60 years ago… Source: http:://www.embedded.com SoC Architecture

  6. ENIAC consumed 150 kW of power or about the same as 2000 Pentium 4 chips. That's 500 million times the power used by some of TI's MSP430 processors when operating at 1 MHz (200 times the ENIAC's clock rate of 5kHz). It took up to 29 milliseconds to do a division. The Pentium 4 is a million times faster and twice as precise. ENIAC60 years ago… Source: http:://www.embedded.com SoC Architecture

  7. The machine cost $500k in 1946 dollars, equivalent to $5,134,000 today Now some variants of Microchip's PIC10 go for $0.39 ENIAC60 years ago… Source: http:://www.embedded.com SoC Architecture

  8. Classic Bus

  9. Introduction • Buses are the simplest and most widely used interconnection networks • A number of modules is connected via a single shared channel Digital Signal Processor Input/ Output Device Micro- controller Memory Bus SoC Architecture

  10. 2 1 Bus Properties • Serialization • Only one component can send a message at any given time • There is a total order of messages Digital Signal Processor Input/ Output Device Micro- controller Module 1 Module 2 Module 3 Module 4 Memory Bus Bus SoC Architecture

  11. Bus Properties • Broadcast • A module can send a message to several other components without an extra cost Module 1 Module 2 Module 3 Module 4 Bus SoC Architecture

  12. Module 1 Module 2 Module 3 Reg Reg Reg ER ER ER ET ET ET Bus Hardware • Principle for hardware to access the bus • Bus Transmit: ET active • Bus Receive: ER active Bus SoC Architecture

  13. Bus Transmitter Interfaces ET Bus T Bus ET ET T T Dotted emitter driver Tri-state driver Open-drain driver SoC Architecture

  14. Cycles, Messages and Transactions • Buses operate in units of cycles, messages and transactions. • Message: Logical unit of information (a read message contains an address and control signals for read) • Cycles: A message requires a number of cycles to be sent from sender to receiver over the bus • Transaction: A transaction consists of a sequence of messages which together form a transaction (a memory read requires a memory read message and a reply with the requested data) SoC Architecture

  15. Includes a clock in the control lines A fixed protocol for communication that is relative to the clock Advantage: involves very little logic and can run very fast Disadvantages: Every device on the bus must run at the same clock rate To avoid clock skew, they cannot be long if they are fast Synchronous Bus CLK READ ADR DATA SoC Architecture

  16. It is not clocked It can accommodate a wide range of devices It can be lengthened without worrying about clock skew It requires a handshaking protocol Asynchronous Bus READ ADR DATA ACK Master puts address on bus and asserts READ when address is stable Memory puts data on bus and asserts ACK when data is stable Master deasserts READ when data is read Memory deasserts ACK SoC Architecture

  17. Bus Arbitration

  18. Bus Arbitration • Since only one bus master can use the bus at a given time bus arbitration is used • An arbiter collects the requests of all bus masters and gives only one module the right to access the bus (bus grant) Arbiter Req Req Req Grant Grant Grant Module 1 Module 2 Module 3 Bus SoC Architecture

  19. Importance of Arbiters • Arbiters are not only used in bus-system, but everywhere where several devices request shared resources • In network-on-chips arbitration is for instance needed, if two or more packets want to enter the same channel SoC Architecture

  20. This arbiter interface can be used to give a bus grant for a fixed number of cycles (a): 1 cycle (b): 4 cycles Arbiter Interfaces SoC Architecture

  21. Arbiter Interfaces • This arbiter allows for variable length grants • The grant is hold as long as the “hold”-line (controlled by client) is asserted • In cycle 2 requester 0 gets the bus for 3 cycles • In cycle 5 requester 1 gets the bus for 2 cycles • In cycle 7 requester 1 gets the bus for one cycle SoC Architecture

  22. Fairness • Fairness is a key property of an arbiter • Some definitions: • Weak fairness: Every request is eventually served • Strong fairness: Requests will be served equally often • Weighted “strong” fairness: The number of times requester i is served is equal to its weight wi • FIFO fairness: Requests are served in the order the requests have been made SoC Architecture

  23. Local Fairness vs. Global Fairness • Even if an arbiter is locally fair, a system with several arbiters employing that arbiter may not be fair. • Though each arbiter Ai allocate 50% of their bandwidth to its two inputs, r0only gets 12.5% of the total bandwidth, while r3 gets 50%. SoC Architecture

  24. Fixed-Priority Arbiter • A fixed-priority arbiter can be constructed as an iterative circuit • Each cell receives a request input ri and a carry input ci and generates a grant output gi and a carry output ci+1 • The resulting arbiter is not fair, since a continuously asserted request r0 means that none of the other requests will ever be served! SoC Architecture

  25. Fair Arbiters • A fair arbiter can be generated by changing the priority from cycle to cycle • Depending on the priority generation, different arbitration schemes and degrees of fairness can be achieved • Only one input pi has the value 1. All other inputs pj have the value 0. SoC Architecture

  26. Fair Arbiters • Oblivious Arbiters • If pi is generated without knowledge of ri and gi, the result is an oblivious (unconscious) arbiter • Examples are: • Randomly generated pi • Rotating priorities(by shiftregister) • Weak fairness, but not strong fairness SoC Architecture

  27. 1 0 1 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 Oblivious Arbiters • Oblivious arbiters provide • weak fairness • but not strong fairness (i.e. if r0 and r1 are constantly asserted) • Request r1 wins the arbitration only when p1 is true, in all other cases r0 gets the grant SoC Architecture

  28. 0 0 0 0 1 0 0 0 0 1 1 0 0 Round-Robin Arbiter • A round-robin arbiter achieves strong fairness • A request that was just served gets the lowest priority SoC Architecture

  29. Weighted Round-Robin Arbiter • A weighted round-robin arbiter allows to give requesters a larger number of grants than other requesters in a controlled fashion • If three devices have the weight 1,2,3 they get 1/6, 1/3 and 1/2 of the grants • The preset line is activated periodically after N (here 6 cycles) to load the counter with its weight • If some modules do not issue any requests during that interval, the shared resource will remain idle until the next preset cycle SoC Architecture

  30. gi gi wij ij gj gi wij gj Matrix Arbiter • A matrix arbiter implements a least recently served priority scheme by maintaining a triangular array of state bits wijfor all i < j • If wijis true, then request i takes priority over request j • Each state bit is set on column grant and reset on row grant= a gi results in lowest priority for stage i in next cycle • Only the upper triangular portion needs to be maintained • The matrix arbiter has to be proper initialized • The Matrix arbiter is very good suited for a small number of inputs, since it is fast, easy to implement and provides strong fairness! • (Exercise Dally 18-3) SoC Architecture

  31. Grand-Hold Circuit • Allows for uninterrupted access to a resource for several cycles • Extends the duration of a grant • As long as hold is asserted further arbitration is disabled SoC Architecture

  32. Queuing Arbiter • A queuing arbiter provides FIFO fairness • It assigns each request a time stamp when it is asserted • The request with the earliest time stamp receives the grant • Cost is determined by size of the time stamp wi = log2 (Δt / ta) Δt = 2nTmax wi… number of bits for time stamp Δt… time stamp range ta … arrival interval n … number of inputs to arbiter Tmax … maximum service time SoC Architecture

  33. Bus Bridge

  34. Bus Bridges • Bus bridges are used to separate high-performance devices from low-performance devices • All communication from high-performance bus with the low performance device goes via the bridge SoC Architecture

  35. AHB to ISA Bus Bridge SoC Architecture

  36. AHB Basic Transfer SoC Architecture

  37. AHB and ISA Timing SoC Architecture

  38. Bridge Implementation SoC Architecture

  39. Bus Protocols

  40. Low Performance Bus Protocol • Without a special bus protocol the bus is not efficiently used • In the example module 2 requests the bus in cycle 2, but must wait until cycle 6 to receive the grant SoC Architecture

  41. RPLY ARB ACK ARB RQ RQ AG AG AR AR P Bus Pipelining • A memory access consists of several cycles (including arbitration) • Since the bus is not used in all cycles, pipelining can be used to increase the performance Write Access Read Access Arb request Arb request Arbiter Arbiter Arb grant Arb grant Bus Bus • Only one transaction can • Receive the grant during a given cycle • Use the bus during a given cycle SoC Architecture

  42. RPLY RPLY RPLY ARB ARB ARB ARB ARB ARB ACK ACK Stall Stall Stall Stall Stall Stall Stall Stall Stall Stall Stall Stall Stall Stall RQ RQ RQ RQ RQ RQ AG AG AG AG AG AG AR AR AR AR AR AR P P P 1. Read Bus busy Bus Pipelining • Pipelining leads to an efficient use of the bus • Stalls are inserted since only one instance can use the bus • Sometimes (cycle 12) two transactions can overlap • However this cannot be done in cycle 5 (2. Write) since otherwise RPLY and ACK would overlap in cycle 6! 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2. Write 3. Write 4. Read 5. Read 6. Read SoC Architecture

  43. Split-Transaction Bus • In a split-transaction bus a transaction is splitted into a two transactions • ”request”-transaction • ”reply”-transaction • Both transactions have to compete for the bus by arbitration SoC Architecture

  44. RPLY RPLY RPLY RPLY ARB ARB ARB ARB ARB ARB ARB ARB ARB ARB ARB ARB Stall Stall Stall Stall Stall Stall Stall Stall RQ RQ RQ RQ RQ RQ AG AG AG AG AG AG AG AG AG AG AG AR AR AR AR AR AR AR AR AR AR AR AR Split-Transaction Buses 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 1. Read 2. Write 3. Write 4. Read 1. Reply 5. Read 2. Reply 6. Read 3. Reply 4. Reply 5. Reply 6. Reply 1 2 Bus busy 3 4 1 2 3 4 5 6 SoC Architecture

  45. RQ C RQ C RQ B RQ B RQ A RQ A RP C RP C RP A RP B RP A RP B Split-Transaction Buses • The advantages of the split-transaction bus are evident, if there is a variable delay for requests. Pipelined Bus 2 3 4 5 6 7 8 9 10 11 12 13 14 1 1. Trans 2. Trans 3. Trans Split-Transaction Bus 1. Trans 2. Trans 3. Trans SoC Architecture

  46. Cmd Cmd Cmd Cmd Data Data Data Data ARB ARB ARB ARB Adr Adr Adr Adr Burst Messages • There is a considerable amount of overhead in a bus transaction • Arbitration • Addressing • Acknowledgement Request Efficiency = Transmitted Words / Message Size = 1/3 SoC Architecture

  47. Cmd Cmd Cmd Cmd Cmd Data Data Data Data Data Data Data Data ARB ARB ARB ARB ARB Adr Adr Adr Adr Adr Burst Messages • The overhead can be reduced, if messages are sent as blocks (bursts) Request Burst Request Efficiency = Transmitted Words / Message Size = 2/3 SoC Architecture

  48. Res A Gnt A Gnt B Cmd Cmd Data Data Data Data Data Data Data Data Data Adr Adr Adr Burst Messages • The longer the burst, the better the efficiency • BUT • Other bus masters have to wait, which may be unacceptable in many systems (Real-Time) • Possible solution: • Maximum length for a burst • Interrupt of long messages • Restart or Resume Arbitration Message A Message B SoC Architecture

  49. Modern SoC buses

  50. Embedded busses • Current system-on-chips are advanced enough to need a hierarchy of busses • A new set of bus standards have been defined to be used in SoCs, e.g. • ARM Amba • Altera Avalon • OCP – Open Communication Protocol • These busses allow for higher performance than traditional Tri-State busses SoC Architecture

More Related