1 / 23

A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory

A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory. M. Borgatti , L. Calì, G. De Sandre, B. Forêt, D. Iezzi, F. Lertora, G. Muzzi, M. Pasotti, M. Poles, P.L. Rolandi. STMicroelectronics - Central R&D - Italy. Outline of Presentation.

jagger
Download Presentation

A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory M. Borgatti, L. Calì, G. De Sandre, B. Forêt, D. Iezzi, F. Lertora, G. Muzzi, M. Pasotti, M. Poles, P.L. Rolandi STMicroelectronics - Central R&D - Italy

  2. Outline of Presentation • Project motivation and background • System architecture • Reconfigurable core • Memory subsystem • System performance • Application example: embedded face recognition system • Energy efficiency, measurements • SoC integration and design flow • System 2 RTL and RTL 2 Layout • Summary 2

  3. Project motivation and background • Conflicting industry trends • Economics of system integration • Even more complex SoC • More integration • Cost effectiveness and performance (per unit) • Increasing design complexity and risks • Increasing NREs • Shorter time-to-market and product life • Strong need for: • Faster project turnaround • Lower risk • Usage of re-configurable silicon fabrics 3

  4. Project motivation and background • Pragmatic approach proposed: • Reconfigurable architecture • Joins a statically extensible processor with e-FPGA • Tight connection to Flash memory subsystem • Open architecture with flexible programmable I/O • Programmable platform approach • Simple model for programmers 4

  5. Programmable Platform Approach System Applications Family System Application Application Compilation Platform Compilation Config. Proc + e-FPGA Silicon process + Enabling technologies Programmable platform 5

  6. System Architecture 48 kB SRAM 8KB D$ 8KB I$ bus bridge Extensible MPU 64 bit AHB BUS 8KB D$ M/S AHB I/F DMA & FPGA Prog. I/F FP CP DP INTs e-FPGA Instr. Ext. Flash Mem Inst. Ext I/F Buffer I/F AHB/APB Bridge 1kB Buffer GP I/O 64 bit APB BUS I2C BUS General Purpose I/O Lines I/O registers I2C Master 6

  7. e-FPGA Purposes • Processor ISA extensions • Simplest programmer’s model • Specific interface to the MPU datapath • Impact on processor performance • Impact on processor energy efficiency • Efficiency limited by instruction stream decoding • Bus-mapped co-processor • Maximum benefits in speed/power • Flexible I/O 7

  8. e-FPGA – Microprocessor interface e-FPGA Clock Microprocessor clock Clock Ctrl Instruction Other FPGA Purposes Decode Pipe Control Register File R Instruction extension E Result 8

  9. Flash Memory Architecture 2Mb #0 2Mb #1 2Mb #2 2Mb #3 DFT PMA Power Block 128-bit Memory Sub-System Crossbar 128 128 128 128 P I/F DP CP FP 64 64 32 8-bit P Data Port Code Port FPGA Port 9

  10. Flash Memory Subsystem • Modular approach • Customizable array of N independent 2Mb modules • 3 content-specific ports (CP, DP, FP) • HW support for filesystem implem. (DP) • Defrag • Compression • Virtual erase • 2Mb Module features: • 128b I/O • 40ns access time (400MB/s peak throughput) • Power management and arbitration 10

  11. 32-bit uP RegisterFile System Memory Hierarchy AHB Bridge 64-bit AHB Bus 32-bit FPGA PI/F • AHB Peak Throughput: • 800MB/s • e-FPGA • 400MB/s • (50MB/s sustained) • Total Aggregate Peak • 1.2GB/s 64-bit AHB 32-bit 64-bit CP I/F 64-bit DP I/F DMA 64 bit Port CP 32-bit Port FP 64-bit Port DP 512-B Buffer 2 x 64- + 1 x 32-bitMemory Port I/Fs 6x4 128-bit Crossbar 4 x Flash Memory Controller Logic 4 x 16384 x 128-bit Memory Module 11

  12. Application Ex.: Face Recognition • Target application: • Recognize a face out of twenty • low-resolution images from CMOS cameras • Potential applications: • Low cost smart toys • Advanced human-machine interfaces • Color CMOS camera processors • Image preprocessing: Bayer filter • Face location: based on Hough transform • Face recognition: Line-Based • Recognition rates over 90 % • Scale-invariant • Tolerant to changes in illumination intensity 12

  13. ‘8’ ’16’ Processor Extension (I) + + Processor Load Unit 4-segm. 4-segm. • 8-issue, 8-bit L2 distance • Complexity: • 23 8-bit OPS • 6 64-bit OPS • 1GOPS peak throughput • Distance computation • 10k equiv. ASIC gates • Mapped to e-FPGA _ x 64-bit register + Result 13

  14. Processor Extension (II) root Remaind. Number +1 >>1 <<2 >>30 >>2 + • Fixed-point square root kernel • Complexity: • 12 32-bit OPS • 2k equiv. ASIC gates • Mapped to e-FPGA _ > + 2 << 1 Result 14

  15. Performance: Processing Time @ 100 MHz

  16. Energy Efficiency vs. Flexibility FPGA-mapped CoProcessors 1000 Dedicated HW uP + FPGA Instructions 100 Energy Efficiency (MOPS/mW) Energy-Flexibility Gap ! 10 ASIPs, DSPs 1 Embedded Processors 0.1 Flexibility (Coverage) from: Zhang et Al., ISSCC 2000 16

  17. Performance: Energy Efficiency 17

  18. Functional model (untimed) Partitioning / I/F Synthesis / Refinement uP ISS Cycle Accurate Simulation Performance Analysis Libraries HW/SW VHDL (e-FPGA) Inst.Ext. Verilog HW (RTL) uP, AHB/APB Bus Peripherals C Soft Hardware (eFPGA) SW Apps eFPGA mapping eFPGA HARD MACRO SoC Integration 18

  19. CPU core, IPs Interface RTL code Flash RAM eFPGA core Inst. Ext. Coproc. I/O I/F Synthesis Floorplanning / P&R Synthesis Static Timing Analysis, Dynamic Verification Con. Mapping (P&R) Netlist + Timing Database FPGA Timing DB Bit-stream Static Timing Analysis (SoC + eFPGA) Silicon fab 19

  20. Chip Layout DFT 1MB FLASH Memory 8+8 KB I$ + D$ Embedded FPGA TAGS 32b uP + AHB & APB + 250k GATES Flash Ports Buffers uP AHB/APB FPGA 48 KB SRAM BUFFER 48kB SRAM 8+8 kB I$+D$ 20

  21. Chip Performances and Power Consumption 21

  22. Summary • e-FPGAs allow architectural tradeoffs for reconfigurable embedded systems: • Processor ISA extensions • Bus-mapped co-processor • Flexible I/O • Modular, content-specific, multiport e-Flash • Performance figures: • Up to 10x speedup • Up to 9x energy reduction • Dynamic reconfiguration in 500 us • Specific design-flow for system and RTL 22

  23. Acknowledgements: The authors thank: all the colleagues of NVM-DP Dept. A. Maurelli, F. Piazza and L. Fumagalli. 23

More Related