1 / 27

Chapter Six Pipelining

Chapter Six Pipelining. Pipelining: analogia com linha de produção.  tempo. Tempo de fabricação de um carro: C+M+C+P+A Taxa de produção (carros/h): medido na saída throughput: depende do número de estágios e do balanceamento Objetivo do pipeline:

ifowler
Download Presentation

Chapter Six Pipelining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter SixPipelining

  2. Pipelining: analogia com linha de produção  tempo • Tempo de fabricação de um carro: C+M+C+P+A • Taxa de produção (carros/h): medido na saída • throughput: depende do número de estágios e do balanceamento • Objetivo do pipeline: • distribuir e balancear o tempo em cada estágio • otimizar o uso de HW dos estágios (taxa de ocupação) • resumo: aumentar a velocidade e diminuir o hardware

  3. P r o g r a m 2 4 6 8 1 0 1 2 1 4 1 6 1 8 e x e c u t i o n T i m e o r d e r ( i n i n s t r u c t i o n s ) I n s t r u c t i o n D a t a l w $ 1 , 1 0 0 ( $ 0 ) R e g A L U R e g f e t c h a c c e s s I n s t r u c t i o n D a t a l w $ 2 , 2 0 0 ( $ 0 ) R e g A L U R e g 8 n s f e t c h a c c e s s I n s t r u c t i o n l w $ 3 , 3 0 0 ( $ 0 ) 8 n s f e t c h . . . 8 n s P r o g r a m 1 4 2 4 6 8 1 0 1 2 e x e c u t i o n T i m e o r d e r ( i n i n s t r u c t i o n s ) I n s t r u c t i o n D a t a l w $ 1 , 1 0 0 ( $ 0 ) R e g A L U R e g f e t c h a c c e s s I n s t r u c t i o n D a t a l w $ 2 , 2 0 0 ( $ 0 ) R e g A L U R e g 2 n s f e t c h a c c e s s I n s t r u c t i o n D a t a l w $ 3 , 3 0 0 ( $ 0 ) 2 n s R e g A L U R e g f e t c h a c c e s s 2 n s 2 n s 2 n s 2 n s 2 n s Pipelining • Improve perfomance by increasing instruction throughput • tempo de execução de uma instrução: 8 ns e 9 ns (com ou sem pipe) • throughput: 1 instr / 8ns (sem pipeline) ou 1 instr / 2 ns (com) • # de estágios = 5; ganho de 4:1; para obter 5:1 ??

  4. Pipelining • What makes it easy • all instructions are the same length • just a few instruction formats • memory operands appear only in loads and stores • What makes it hard? • structural hazards: suppose we had only one memory • control hazards: need to worry about branch instructions • data hazards: an instruction depends on a previous instruction • We’ll build a simple pipeline and look at these issues • We’ll talk about modern processors and what really makes it hard: • exception handling • trying to improve performance with out-of-order execution, etc.

  5. I F : I n s t r u c t i o n f e t c h I D : I n s t r u c t i o n d e c o d e / E X : E x e c u t e / M E M : M e m o r y a c c e s s W B : W r i t e b a c k r e g i s t e r f i l e r e a d a d d r e s s c a l c u l a t i o n 0 M u x 1 A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d r e g i s t e r 1 A d d r e s s P C R e a d d a t a 1 R e a d Z e r o r e g i s t e r 2 I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U 0 R e a d W r i t e d a t a 2 r e s u l t A d d r e s s 1 d a t a r e g i s t e r M I n s t r u c t i o n M u D a t a u m e m o r y W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Basic Idea • What do we need to add to actually split the datapath into stages?

  6. T i m e ( i n c l o c k c y c l e s ) P r o g r a m C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) A L U l w $ 1 , 1 0 0 ( $ 0 ) I M R e g D M R e g A L U l w $ 2 , 2 0 0 ( $ 0 ) I M R e g D M R e g A L U l w $ 3 , 3 0 0 ( $ 0 ) I M R e g D M R e g Uma representação para o pipeline • Hachurado representa “atividade” • Representação alternativa: imaginando que cada instrução tenha a sua própria via de dados • Outra representação: foto no tempo

  7. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 n R e a d o i t r e g i s t e r 1 c A d d r e s s P C R e a d u r t d a t a 1 s R e a d n Z e r o I r e g i s t e r 2 I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Pipelined Datapath • O que é produzido em cada estágio é armazenado em um registrador de pipeline para uso pelo próximo estágio • Problemas com o endereço do registrador de escrita?? Caminhando para esquerda?

  8. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 n R e a d o i t r e g i s t e r 1 c A d d r e s s P C R e a d u r t d a t a 1 s R e a d n Z e r o I r e g i s t e r 2 I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 r e s u l t 1 d a t a r e g i s t e r M M D a t a u u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Corrected Datapath

  9. Exemplo com sub e lw (1)

  10. Exemplo com sub e lw (2)

  11. Exemplo com sub e lw (3)

  12. T i m e ( i n c l o c k c y c l e s ) P r o g r a m C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) l w $ 1 0 , 2 0 ( $ 1 ) I M R e g A L U D M R e g s u b $ 1 1 , $ 2 , $ 3 I M R e g D M R e g A L U Graphically Representing Pipelines • Can help with answering questions like: • how many cycles does it take to execute this code? • what is the ALU doing during cycle 4? • use this representation to help understand datapaths

  13. Pipeline Control

  14. Pipeline control • We have 5 stages. What needs to be controlled in each stage? • Instruction Fetch and PC Increment • Instruction Decode / Register Fetch • Execution • Memory Stage • Write Back • How would control be handled in an automobile plant? • a fancy control center telling everyone what to do? • should we use a finite state machine?

  15. Pipeline Control • Pass control signals along just like the data

  16. Datapath with Control

  17. Dependencies • Problem with starting next instruction before first is finished • dependencies that “go backward in time” are data hazards

  18. Software Solution • Have compiler guarantee no hazards • Where do we insert the “nops” ? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) • Problem: this really slows us down!

  19. what if this $2 was $13? Forwarding • Use temporary results, don’t wait for them to be written • register file forwarding to handle read/write to same register • ALU forwarding

  20. Forwarding

  21. Can't always forward • Load word can still cause a hazard: • an instruction tries to read a register following a load instruction that writes to the same register. • Thus, we need a hazard detection unit to “stall” the load instruction

  22. Stalling • We can stall the pipeline by keeping an instruction in the same stage

  23. Hazard Detection Unit • Stall by letting an instruction that won’t write anything go forward

  24. Branch Hazards • When we decide to branch, other instructions are in the pipeline! • We are predicting “branch not taken” • need to add hardware for flushing instructions if we are wrong

  25. Flushing Instructions

  26. Improving Performance • Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) • Add a “branch delay slot” • the next instruction after a branch is always executed • rely on compiler to “fill” the slot with something useful • Superscalar: start more than one instruction in the same cycle

  27. Dynamic Scheduling • The hardware performs the “scheduling” • hardware tries to find instructions to execute • out of order execution is possible • speculative execution and dynamic branch prediction • All modern processors are very complicated • DEC Alpha 21264: 9 stage pipeline, 6 instruction issue • PowerPC and Pentium: branch history table • Compiler technology important • This class has given you the background you need to learn more • Video: An Overview of Intel’s Pentium Processor (available from University Video Communications)

More Related