1 / 25

It’s all about latency

It’s all about latency. Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent. Overview. Introduction of processor model Show importance of latency Techniques to handle latency Quantify memory latency effect Why consider optical interconnects?

jubal
Download Presentation

It’s all about latency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent

  2. Overview • Introduction of processor model • Show importance of latency • Techniques to handle latency • Quantify memory latency effect • Why consider optical interconnects? • Latency of an optical interconnect • Conclusions

  3. Out-of-order processor pipeline ‘future’ register file execution units instruction window I-cache fetch decode rename LD ST INT in-order retirement architectural register file

  4. Branch latency ‘future’ register file execution units instruction window I-cache fetch decode rename LD ST INT BR ST XOR LD OR ADD BR ST XOR LD OR latency ... ... ... ... ... ... ... ... BR time

  5. Eliminate branch latency • By prediction:predict outcome of branch => eliminate dependency (with a high probability) • By predication:convert control dependency to data dependency => eliminate control dependency

  6. Load latency execution units LD while (pointer!=0) pointer = pointer.next; Loop: LD R1, R1(32) BNE R1, Loop LD BNE LD BNE load latency = 2 cycles branch latency = 1 cycle LD BNE CPI = 2 cycles/2 instructions = 1 cycle/instruction cycles

  7. When longer load latency execution units • When L1-cache misses • and L2-cache hits: LD load latency = 2+6 cycles branch latency = 1 cycle LD BNE CPI = 8 cycles/2 instructions = 4 cycles/instruction • When L2-cache misses • and main memory hits: LD BNE load latency = 2+6+60 cycles CPI = 34 cycles/instruction cycles LD BNE

  8. Memory hierarchy execution units register file L1 cache L2 cache main memory storage capacity and latency hard drive

  9. L1 cache latency IPC = Instructions Per clock Cycle, 1 Ghz processor, spec95 programs

  10. Main memory latency IPC = Instructions Per clock Cycle, 1 Ghz processor, spec95 programs

  11. Performance and latency performance change = sensitivity * load latency change

  12. Increase performance by • eliminating/reducing load latency: • By prefetching:predict the next miss and fetch the datato e.g. L1-cache • By address prediction:address known earlier=> load executed earlier=> data early in register file • or reducing sensitivity to load latency: • by fine-grain multithreading

  13. Some prefetch techniques • Stride prefetching:search for pattern with constant stridee.g. walking through a matrix (row- or column-order) • Markov prefetching:recurring patterns of misses 20 31 42 53 64 stride: 11 miss history prediction 10 110 15 12 100 … ...

  14. Stride prefetching IPC = Instructions Per clock Cycle, 1 Ghz processor, program: compress

  15. Prefetching and sensitivity Factors of “performance sensitivity to latency” increase with stride-prefetching:

  16. Latency is important:generalization to other processor architectures Consider schedule of program: Present in every program execution: • Latency of instruction execution • Latency of communication => latency importantwhatever processor architecture time

  17. Optical interconnects (OI) • Mature components: • Vertical-Cavity Surface Emitting Lasers (VCSELs) • Light Emitting Diodes (LEDs) • Very high bandwidths • Are replacing electronic interconnects in telecom and networks • Useful for short inter-chip and even intra-chip interconnects?

  18. OI in processor context • At levels close to processor core,latency is very important=> latency of OI determines how far OI penetrates in the memory hierarchy • What is the latency of an optical interconnect?

  19. An optical link LED/VCSEL receiver diode fiber or light conductor buffer/modulation/bias transimpedance amplifier Total latency = buffer latency + VCSEL/LED latency + time of flight + receiver latency

  20. VCSEL characteristics • A small semiconductor laser • Carrier density should be high enough for lasing action

  21. Total VCSEL link latencyconsists of • Buffer latency • Parasitic capacitances and series resistances of VCSEL and pads • Threshold carrier density build up • From low optical output to final optical output (intrinsic latency) • Time of flight (TOF) • Receiver latency

  22. Total optical link latency @ 1 mW CMOS: 0.6 m 0.25 m 0.6 m 0.25 m

  23. Latency as function of power

  24. Conclusions • When combining performance sensitivity and optical latency we conclude: • optical interconnects are feasible to main memory and for multiprocessors • for interconnects close to processor core, optical interconnects have too high latencywith present (telecom) devices, drivers and receivers => but now evolution to lower latency devices, drivers and receivers is taking place... For more information on the presented results: Henk Neefs, Latentiebeheersing in processors, PhD Universiteit Gent, January 2000 www.elis.rug.ac.be/~neefs

More Related