1 / 19

Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions

Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions. Ramkumar Jayaseelan , Haibin Liu, Tulika Mitra School of Computing, National University of Singapore { ramkumar , liuhb , tulika }@ comp.nus.edu.sg. Presented by Alex Oumantsev.

yetta
Download Presentation

Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions RamkumarJayaseelan, Haibin Liu, TulikaMitra School of Computing, National University of Singapore {ramkumar, liuhb, tulika}@comp.nus.edu.sg Presented by Alex Oumantsev

  2. Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions • Introduce the material • Related Work • Proposed Architecture • Compilation Toolchain • Experimental Evaluation • Conclusion

  3. Application-Specific instruction-set extensions (Custom Instructions) • Extend the instruction-set architecture • Balance performance and time-to-market • Frequently used computation patterns • Custom Functional Units • Parallelization and chaining of operations • Processor Support – RISC-style • Altera Nios-II • Tensilica Xtensa

  4. Base Processor – Custom Instruction mismatch • RISC-style • Fixed-length instructions • Two input operations per instruction • Custom Instructions • Complex • Multiple inputs per operation

  5. Number of Inputs per Custom Instruction

  6. Data Forwarding • Present on a typical RISC processor • Register Bypassing • Supplies data to a Functional Unit from buffer • Resolves Data hazards between instructions • Input operands for Custom Instruction • Use existing Logic

  7. Related Work • Design Space Exploration • Data Bandwidth • Nios-II Internal Register Files • Extra cycles wasted on explicit MOV • MicroBalaze Xilinx : Fast Simplex Link • put and get instructions • Relaxing register file port constraints • Fixed length instruction problem

  8. Proposed Architecture • MIPS-like 5 stage pipeline

  9. Data Forwarding • CUST instruction draws 2 inputs from Forwarding • Able to take up to 4 inputs • Modification – Do not read from Register in ID if Forwarding

  10. Instruction Encoding • Transparent to regular instructions • Minimize number of bits for operands • NIOS-II Example • Use 11 bits of OPX field • OPD defines operands from forwarding • COP specifies the custom instruction

  11. Predictable Forwarding • Two prior instructions can be used • Problems with Multicycle and Cache Miss • Create bubbles in the pipeline • Can’t rely on forwarding • Modify to send Stall signal to all stages • Pauses the pipeline till ready • No need for NOP instruction

  12. Multicycle Delays

  13. Cache Miss Delays

  14. Compilation Toolchain • Compiler cooperation needed • Determine if operand can be forwarded • Encode custom instruction correctly • Schedule to maximize forwarding

  15. Compilation Toolchain • IR Scheduling • Pattern Identification • Identify all possible patterns for custom instructions • Pattern Selection • Heuristic pattern Priority=speedup * frequency • Instruction Scheduling • Find optimal scheduling with forwarding • Forwarding Check and MOV Insertion • Insert MOV from x reg to x reg if needed

  16. Experimental Evaluation • SimpleScalar tool set used • Constraint of max 4 inputs and one output • Selected benchmarks

  17. Speedup • Speedup = (CycleOrigin / CycleEx -1)*100 • Ideal – 4 Read Ports from Registers • Forwarding – Discussed solution (may have MOV) • MOV – Nios-II implemented solution (forces MOV)

  18. Energy Consumption • Energy used by Registers • Ideal – 4 Read Ports from Registers • Forwarding – Discussed solution (may have MOV) • MOV – Nios-II implemented solution (forces MOV)

  19. Conclusion • Compiler modification • Minor pipeline modification • Data Forwarding used for MISO custom instructions • Overcome limited register ports • Compatible instruction encoding • Near-ideal speedup

More Related