1 / 21

Lecture 2: System Metrics and Pipelining

Lecture 2: System Metrics and Pipelining. Today’s topics: (Sections 1.6, 1.7, 1.9, A.1) Quantitative principles of computer design Measuring cost and dependability Introduction to pipelining. Amdahl’s Law. Architecture design is very bottleneck-driven – make the

Download Presentation

Lecture 2: System Metrics and Pipelining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 2: System Metrics and Pipelining • Today’s topics: (Sections 1.6, 1.7, 1.9, A.1) • Quantitative principles of computer design • Measuring cost and dependability • Introduction to pipelining

  2. Amdahl’s Law • Architecture design is very bottleneck-driven – make the • common case fast, do not waste resources on a component • that has little impact on overall performance/power • Amdahl’s Law: performance improvements through an • enhancement is limited by the fraction of time the • enhancement comes into play • Example: a web server spends 40% of time in the CPU • and 60% of time doing I/O – a new processor that is ten • times faster results in a 36% reduction in execution time • (speedup of 1.56) – Amdahl’s Law states that maximum • execution time reduction is 40% (max speedup of 1.66)

  3. Principle of Locality • Most programs are predictable in terms of instructions • executed and data accessed • The 90-10 Rule: a program spends 90% of its execution • time in only 10% of the code • Temporal locality: a program will shortly re-visit X • Spatial locality: a program will shortly visit X+1

  4. Exploit Parallelism • Most operations do not depend on each other – hence, • execute them in parallel • At the circuit level, simultaneously access multiple ways • of a set-associative cache • At the organization level, execute multiple instructions at • the same time • At the system level, execute a different program while one • is waiting on I/O

  5. Factors Determining Cost • Cost: amount spent by manufacturer to produce a finished • good • High volume  faster learning curve, increased • manufacturing efficiency (10% lower cost if volume doubles), • lower R&D cost per produced item • Commodities: identical products sold by many vendors in • large volumes (keyboards, DRAMs) – low cost because of • high volume and competition among suppliers

  6. Wafers and Dies An entire wafer is produced and chopped into dies that undergo testing and packaging

  7. Integrated Circuit Cost • Cost of an integrated circuit = • (cost of die + cost of packaging and testing) / final test yield • Cost of die = cost of wafer / (dies per wafer x die yield) • Dies/wafer = wafer area / die area - p wafer diam / die diag • Die yield = wafer yield x (1 + (defect rate x die area) / a) -a • Thus, die yield depends on die area and complexity • arising from multiple manufacturing steps (a ~ 4.0)

  8. Integrated Circuit Cost Examples • A 30 cm diameter wafer cost $5-6K in 2001 • Such a wafer yields about 366 good 1 cm2 dies and 1014 • good 0.49 cm2 dies (note the effect of area and yield) • Die sizes: Alpha 21264 1.15 cm2 , Itanium 3.0 cm2 , • embedded processors are between 0.1 – 0.25 cm2

  9. Contribution of IC Costs to Total System Cost

  10. Defining Fault, Error, and Failure • A fault produces a latent error; it becomes effective when • activated; it leads to failure when the observed actual • behavior deviates from the ideal specified behavior • Example I : a programming mistake is a fault; the buggy • code is the latent error; when the code runs, it is effective; • if the buggy code influences program output/behavior, a • failure occurs • Example II : an alpha particle strikes DRAM (fault); if it • changes the memory bit, it produces a latent error; when • the value is read, the error becomes effective; if program • output deviates, failure occurs

  11. Defining Reliability and Availability • A system toggles between • Service accomplishment: service matches specifications • Service interruption: services deviates from specs • The toggle is caused by failures and restorations • Reliability measures continuous service accomplishment • and is usually expressed as mean time to failure (MTTF) • Availability measures fraction of time that service matches • specifications, expressed as MTTF / (MTTF + MTTR)

  12. The Assembly Line Unpipelined Start and finish a job before moving to the next Jobs Time A B C Break the job into smaller stages A B C A B C A B C Pipelined

  13. Performance Improvements? • Does it take longer to finish each individual job? • Does it take shorter to finish a series of jobs? • What assumptions were made while answering these • questions? • Is a 10-stage pipeline better than a 5-stage pipeline?

  14. Quantitative Effects • As a result of pipelining: • Time in ns per instruction goes up • Number of cycles per instruction goes up (note the increase in clock speed) • Total execution time goes down, resulting in lower time per instruction • Average cycles per instruction increases slightly • Under ideal conditions, speedup = ratio of elapsed times between successive instruction completions = number of pipeline stages = increase in clock speed

  15. A 5-Stage Pipeline

  16. A 5-Stage Pipeline Use the PC to access the I-cache and increment PC by 4

  17. A 5-Stage Pipeline Read registers, compare registers, compute branch target; for now, assume branches take 2 cyc (there is enough work that branches can easily take more)

  18. A 5-Stage Pipeline ALU computation, effective address computation for load/store

  19. A 5-Stage Pipeline Memory access to/from data cache, stores finish in 4 cycles

  20. A 5-Stage Pipeline Write result of ALU computation or load into register file

  21. Title • Bullet

More Related