1 / 34

A Synthesizable Datapath-Oriented Programmable Logic Core

A Synthesizable Datapath-Oriented Programmable Logic Core. Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British Columbia and Imperial College. Embedded Programmable Logic Cores. Embed a small amount of programmable logic onto an ASIC

aliya
Download Presentation

A Synthesizable Datapath-Oriented Programmable Logic Core

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British Columbia and Imperial College

  2. Embedded Programmable Logic Cores • Embed a small amount of programmable logic onto an ASIC • Postpone some decisions until late in design cycle • Fast upgrade path for products • Embedded Debug:

  3. Soft Programmable Logic Cores

  4. Soft Programmable Logic Cores • Advantages • Easy to integrate, reduces design time • Very flexible, can create the exact required core • Easy to migrate to smaller technologies • Disadvantages • Inefficient compared to hard cores • Our thought • Makes sense if you only want a small core (a few hundred gates)

  5. This talk: • A new architecture for a synthesizable programmable logic core that supports datapath (bus-based) circuits

  6. Previous Synthesizable PLC’s • Kim Bozman and Noha Kafafi: • LUT-Based • Unique Directional Routing Fabric

  7. Synthesizable Cores • Observation 1: To make it truly synthesizable, must avoid • combinational loops in the unprogrammed fabric • Observation 2: Each tile need not be identical

  8. Previous Synthesizable PLC’s • Andy Yan: • Product-term Based Logic Block • Unique Directional Routing Fabric • Supported Sequential Circuits

  9. Our Architecture • Use it when the PLC is connected to a bus: Bus Bus Observation: These connections are permanently tied to the bus signals, and we know this when the ASIC is designed

  10. Logic Architecture

  11. Logic Architecture Key point: - All bitblocks within a wordblock share same set of configuration bits - Means all bitblocks implement the same function

  12. Routing Architecture • Key point: Signals are routed as buses

  13. Routing Architecture • Key point: - Linear array of wordblocks • - Buses get wider as we go to the right

  14. Routing Architecture • Key point: - Linear array of wordblocks • - Buses get wider as we go to the right

  15. Routing Architecture • Key point: - Linear array of wordblocks • - Number of buses goes up as we go to the right

  16. Datapath Architecture

  17. Two inputs instead of three Multipliers Two output buses (MSB, LSB)

  18. Add a Control Block • Control block is based on P-term fine-grained synthesizable core

  19. Example Mapping • Monitor two buses: • - Count the number of times • each bus matches a mask • - includes don’t care bits • - Count the number of times • both buses match the mask • at the same time

  20. Interesting Questions: 1. How do the various architectural parameters affect density? • How does this compare to a fine-grained architecture?

  21. Architectural Parameters • D Number of Wordblocks (incl. multipliers) • N Bit Width • M Number of Input Buses • R Number of Output Buses • F Number of Feedback Paths • C Number of Constant Registers • A Number of Multipliers • P Number of Product-Term Blocks

  22. Impact of Number of Word-blocks and bit-width • Key Result: Both bit-width and number of wordblocks have a • significant impact on area.

  23. Impact of the Number of Multipliers • Key result: Area increase due to more buses in the routing

  24. Impact of the Size of the Control Block • Key result: The control block can dominate if it becomes too big

  25. Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 68,190 132,339,335 9,300 1940 7.33 • dotv3 34,119 65,534,780 6,575 1921 5.19 • dscg 72,178 116,271,968 9,473 1611 7.62 • fir4 76,213 130,971,120 9,843 1718 7.74 • egcd 1,225,231 22,776,474 10,420 18.6 117 • momul 294,135 11,448,589 7,097 38.9 41 • median 142,172 10,733,962 4,420 75.5 32 • debug1 87,265 1,302,928 3,484 14.9 25

  26. Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 68,190 132,339,335 9,300 1940 7.33 • dotv3 34,119 65,534,780 6,575 1921 5.19 • dscg 72,178 116,271,968 9,473 1611 7.62 • fir4 76,213 130,971,120 9,843 1718 7.74 • egcd 1,225,231 22,776,474 10,420 18.6 117 • momul 294,135 11,448,589 7,097 38.9 41 • median 142,172 10,733,962 4,420 75.5 32 • debug1 87,265 1,302,928 3,484 14.9 25 Key result 1: Significantly better than fine-grained architecture

  27. Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 68,190 132,339,335 9,300 1940 7.33 • dotv3 34,119 65,534,780 6,575 1921 5.19 • dscg 72,178 116,271,968 9,473 1611 7.62 • fir4 76,213 130,971,120 9,843 1718 7.74 • egcd 1,225,231 22,776,474 10,420 18.6 117 • momul 294,135 11,448,589 7,097 38.9 41 • median 142,172 10,733,962 4,420 75.5 32 • debug1 87,265 1,302,928 3,484 14.9 25 Key result 1: Significantly better than fine-grained architecture Key result 2: Overhead roughly the same as FPGA/ASIC

  28. But these results aren’t fair: • - For each benchmark, we found the optimum set of • architectural parameters. • - We need an architecture that works for a variety of • circuits

  29. Architecture Construction • Our thought: • - The number of inputs/outputs is fixed by the SoC • - The designer has an idea of the size of the programmable • logic (number of wordblocks) • Fix all other parameters (as a function of # of wordblocks) • - eg. fixed ratio between number of multipliers vs. wordblocks • fixed ratio between control logic and datapath logic, etc. • We arbitrarily chose fixed ratios based on our experience • - A full architecture study is left as future work!

  30. Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 332,091 132,339,335 9,300 399 35.7 • dotv3 225,518 65,534,780 6,575 291 34.3 • dscg 325,029 116,271,968 9,473 358 34.3 • fir4 307,154 130,971,120 9,843 426 31.2 • egcd 3,778,611 22,776,474 10,420 6.02 363 • momul 486,654 11,448,589 7,097 23.5 68.5 • median 194,654 10,733,962 4,420 55.1 44 • debug1 119,286 1,302,928 3,484 10.9 34

  31. Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 332,091 132,339,335 9,300 399 35.7 • dotv3 225,518 65,534,780 6,575 291 34.3 • dscg 325,029 116,271,968 9,473 358 34.3 • fir4 307,154 130,971,120 9,843 426 31.2 • egcd 3,778,611 22,776,474 10,420 6.02 363 • momul 486,654 11,448,589 7,097 23.5 68.5 • median 194,654 10,733,962 4,420 55.1 44 • debug1 119,286 1,302,928 3,484 10.9 34

  32. Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 332,091 132,339,335 9,300 399 35.7 • dotv3 225,518 65,534,780 6,575 291 34.3 • dscg 325,029 116,271,968 9,473 358 34.3 • fir4 307,154 130,971,120 9,843 426 31.2 • egcd 3,778,611 22,776,474 10,420 6.02 363 • momul 486,654 11,448,589 7,097 23.5 68.5 • median 194,654 10,733,962 4,420 55.1 44 • debug1 119,286 1,302,928 3,484 10.9 34 Key result 1: Significantly better than fine-grained architecture Key result 2: Overhead roughly the same as FPGA/ASIC

  33. 625mm 625mm

  34. Conclusions • Our architecture is 6 to 426 x more efficient than fine-grained architecture • But, this is only for datapath-oriented circuits. • However, this is ok: • - In an SoC, we know, when the chip is designed, whether • the inputs are buses or bits • - If there are buses, use this architecture • - If there are not buses, use Andy’s PTerm architecture • Final thought: using this architecture, the overhead is similar to • that of a normal FPGA. People already accept this!

More Related