1 / 23

Fully Pipelined FPU for OR1200

Fully Pipelined FPU for OR1200. Eric Zhang. Electrical & Computer Engineering. Introduction & Motivation. Floating Point Unit: Performs floating point operations such as: a dd/sub, multiplication, division, sine, cosine, FMA Wide dynamic range and high precision

moynihan
Download Presentation

Fully Pipelined FPU for OR1200

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fully Pipelined FPU for OR1200 Eric Zhang Electrical & Computer Engineering

  2. Introduction & Motivation • Floating Point Unit: • Performs floating point operations such as: • add/sub, multiplication, division, sine, cosine, FMA • Wide dynamic range and high precision • Required by many algorithms and applications • Eg. Hotspot, SRAD, etc. • High performance and Low power consumption

  3. FPU in OR1200 • Arithmetic, Conversion, Comparison

  4. FPU in OR1200 • Serial implementation with long stalls 10 cycles total 38 cycles total 37 cycles total

  5. Goals and Objectives • Pipeline the current version of floating point multiplication and division • Reduce number of clock cycles • Eliminate the stalls due to serial implementation • Synthesize and obtain the physical layout of the pipelined FPU using Synopsys Top-Down design flow

  6. Methodology • Analyze existing floating point implementation • Identify serial implementation that possible for pipelining • Pipeline the FPU multiplier and divider using Synopsys Register Retiming design flow • DC for synthesis, VCS for functional simulation and verification, IC compiler for physical layout, and power and are measurement

  7. Register Retiming

  8. Register Retiming 1. Library setup 2. Constraint setup 3. 4. Compile 5. New constraint 6. Retiming

  9. Register Retiming Flow

  10. Register Retiming Timing Report

  11. Schematic Before Retiming

  12. Schematic After Retiming

  13. VCS Functional Simulation 1.6 * 4.0 = 6.4

  14. VCS Functional Simulation 1.6 / 4.0 = 0.0625

  15. Physical Layout

  16. Specification Results

  17. DesignWare IP • Technology-independent • Microarchitecture-level library • Synthesizable for ASIC, SoC, and FPGA design • IPs include: • Arithmetic Components: Multiplier, divider,adder, etc • DW01_add, DW02_mult, DW_fp_mult • DSP, AMBA Bus, Memory Controller • DW_fir • etc

  18. DesignWare IP • To use DesignWare IP: • set synthetic_librarydw_foundation.sldb • set link_library$target_library $synthetic_library • License: DesignWare • Instantiation In Verilog file: • DW01_mult #(8, 8) U1 (A, B, TC, PRODUCT); • Synthesize using normal flow

  19. DesignWare IP • Benefits of using DesignWare IP • Increased productivity: parameterized, pre-verified • Better quality of results (QoR): optimized by Synopsys • Design reusability

  20. Improved Scripts for design flow • Automatic setup all necessary folders and scripts • Automatic setup scratch storage for synthesis results • Scripts common to different projects are created as symbolic links • Eg. setup.tcl

  21. Improved Scripts for design flow Top level folder without any projects: Create a project called “test”:

  22. Improved Scripts for design flow Top level folder after creating “test”: Folder layout of project “test” : Other useful scripts : timing_closure.sh : binary search for minimum delay project_init.tcl: Project specific information: top-level design name, language, etc

  23. Thank you!

More Related