1 / 27

Presented By Şahin DELİPINAR Simon Moore,Peter Robinson,Steve Wilcox

Rotary Pipeline P rocessors. Presented By Şahin DELİPINAR Simon Moore,Peter Robinson,Steve Wilcox Computer Labaratory,University Of Cambridge December 15, 1995. OUTLINES Abstract Introduction Rotary Pipeline Concept Implementation Issues Simulation Relation to other approaches

derick
Download Presentation

Presented By Şahin DELİPINAR Simon Moore,Peter Robinson,Steve Wilcox

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rotary Pipeline Processors Presented By Şahin DELİPINAR Simon Moore,Peter Robinson,Steve Wilcox Computer Labaratory,University Of Cambridge December 15, 1995

  2. OUTLINES • Abstract • Introduction • Rotary Pipeline Concept • Implementation Issues • Simulation • Relation to other approaches • Conclusions

  3. ABSTRACT • Rotary Pipeline Processors is a new architecture for superscalar computing • Registers flow around the pipeline • Performance is only limited by data rates • Operation flows by the intervals of self-time clock

  4. INTRODUCTION • Most current designs uses parallel pipeline to implement multiple instructions... • Synchronization problems decreasing performance in pipelines • In Rotary Pipeline Instructions dispatched to ALUs from the center of the pipeline. Data circulates in clockwise manner and processed by ALUs and Memory Accesses

  5. ROTARY PİPELİNE CONCEPT • Ovewiew : - A rotary pipeline rotates the registers to processors around the ring. When registers comes to an functio unit to be processed it is used and result is reloaded - Unused registers are not locked and continious to rotate - ALU Operations occure in parallel

  6. ROTARY PİPELİNE CONCEPT (Cont’d) • Basic Pipeline Constructions : A set of flip-flops are used to select which registers will be used and which will be left to cont.

  7. ROTARY PİPELİNE CONCEPT (Cont’d) • Adding A register File : If the rotary pipeline is large and there are many Register Files then Multiported register File will be used to store waiting register files Figure 3

  8. ROTARY PİPELİNE CONCEPT (Cont’d) • Rotary Bus Allocation : Register files are dispatched to busses on the basis of first come first serve principle. If Ins. are independed then they continious to travel. when it is used only one unit then # of busses will increase (Figure 4 )

  9. ROTARY PİPELİNE CONCEPT (Cont’d) • Instruction Issue : -Sequential Instructions are sent in the same directions so overlapping and register dependencies are resolved - If an ıns. is not processed by a function unit simply NOP issued resulting decrease in performance - Dynamic Instruction reordering - Assume Load command followed by Add operation and first unit is ALU... - Only %3 performance is gained - Mispredicted Branch result decreasing in performans

  10. ROTARY PİPELİNE CONCEPT (Cont’d) By the data driven nature of rotary pipeline Ins. Ordering is not so important. Completion of the instructions are out of order. Figure 4...

  11. ROTARY PİPELİNE CONCEPT (Cont’d)

  12. ROTARY PİPELİNE CONCEPT (Cont’d) • CONDITIONAL EXECUTION : Conditional execution of arithmetic and logical instruction may be handled by using an extra control logic at each ALU. This controls the writing of the results to the rotary pipeline by controlling the output switch network.

  13. ROTARY PİPELİNE CONCEPT (Cont’d) • BRANCHES: Branches have always adverse effect on the performans of the pipelines. Unconditional branches are easy to handle and predicted before the operation begins but conditional branches are dependent upon the outcome of execution stage and difficult to handle. This can be solved by the speculation execution technique.

  14. ROTARY PİPELİNE CONCEPT (Cont’d) • SPECULATIVE EXECUTION: - If an execution is marked as speculative it could be revoked. - If the register file is used… (results not written to reg.) - If a larger register file is used… ( Temp. Reg. Files ) - If a larger rotary pipeline is used…( Flip flops )

  15. IMPLEMENTATION • Data encoding and completion detection: -Determining of completion of evaluation for a logic block; 1. Embedding the completion signal within the data 2. Localised timing using matched delays

  16. IMPLEMENTATION (Cont’d) • Embedding the completion signal within the data is done by using 1 of 4 encoding technique. Here a completion signal is embedded within the data and as seen in Figure 5 a coding sheme is used. But in bundled data binary encoding is used • Matched delays method subjected to change according to thermal effects and manufecturer tolerance Figure 5

  17. IMPLEMENTATION (Cont’d) • Using Dynamic Logic : - Dynamic logic and inverted 1 of 4 encoded data dovetail nicely because precharging the logic depends upon the clearing 1 of 4 encoding function before evaluation. - Completion detection process can be simplified by using AND gates instead of C elements in the circuit. Figure 6

  18. IMPLEMENTATION (Cont’d) • Outline Of a Stage in the Pipeline: A banks of transistors are usedto download/upload data to registers Figure 7

  19. IMPLEMENTATION (Cont’d) • Controlling The Pipeline : Each Stage of the pipeline passes through the following stages: - Empty : ALU is prechared and flip-flops are reset - Waiting for data : Precharge and reset are released - Latching data : SR flip flops store the results - Precharge : After latching data ALU precharge commence - Reset : Once the next stage issues completion, the latches of this stage may be reset - Empty : Completing cycle

  20. IMPLEMENTATION (Cont’d) Figure 8

  21. SIMULATION • Instruction Set Choice : ARM instructions are used for the convenience of comparison with existing clock. Characteristics of the Ins. ; 1. conditionals: Every instruction can be conditionally executed 2. PC : The program counter is one of the general purpose registers and may be written to, thereby causing a branch; 3. Load and store multiple instructions in one register

  22. SIMULATION (Cont’d) • Initial Results : ARM Instruction sets and only store and compress benchmarks are used to test performance - Firstly ALU, Memory Access and Branch units taken - A number of ALU units added.. - Dynamic Instruction reordering increased the performance by %3 - Branch prediction and using larger memory register file increased the performance (Figure 9) - But soon memory accesses will limit the performance

  23. Figure 9

  24. RELATION TO OTHER APPROACHES • Data transfer capability within the stages In Rp, Data is passed throuh latches between pipeline stages . Rotary pipeline is beter than clock applications where data is only available after clock periods • Amulet is a single processor which data is transparent at latches in situations of pipeline refillings • CFPP , as data traversed along the pipeline register values filter down and at the end of the cycle , operands gathered at the very beginning of the pipeline RP differs from other superscaler processors by avoiding global Comm.

  25. CONCLUSIONS • Rotary Pipelines are self timed structures which allows multiple instructions to be implemented at the same time Variations: 1. Passing complete registers.. 2. Passing only active registers… • In Rotary Pipelines, structure emphisized on performance rather than size and low power. • RPs have fewer busses comp. to other superscaler processors • Suitable for self time circuits but not clocked implementations

  26. Questions?...

More Related