1 / 61

Reconfigurable Computing

Reconfigurable Computing. Nehir Sönmez 25-11-2004. Reconfigurable Computing. Standard Def inition : A reconf igurable computer is a device which computes by using post-fabrication spatial co m ponents of compute elements. [ Dehon ]

jamese
Download Presentation

Reconfigurable Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reconfigurable Computing Nehir Sönmez 25-11-2004

  2. Reconfigurable Computing Standard Definition: A reconfigurable computer is a device which computes by using post-fabrication spatial components of compute elements. [Dehon] • FPGA implementation of a processor core to run a program is excluded - not spatial mapping of problem. • ASIC implementations excluded – not postfabrication programmable. The definition restricts RC to mapping to fine-grained devices (such as FPGAs). Whereas General Purpose computers compute by making connections in time.

  3. What is Reconfigurable Computing? Computation using hardware that can adapt at the logic level to solve specific problems • Why is this interesting? • Some applications are poorly suited to microprocessor. • VLSI “explosion” provides increasing resources. • Hardware/Software • Relatively new research area.

  4. Spatial Computation • Example: • grade = 0.2 × mt1 + 0.2 × mt2 • + 0.2 × mt3 + 0.4 × project; • A hardware resource • (multiplier or adder) is • allocated for each operator • in the compute graph. • The abstract computation • graph becomes the • implementation template.

  5. Temporal Computation • A hardware resource is • time-multiplexed to • implement the actions of • the operators in the • compute graph. • Close to a sequential • processor/software • solution. Many inbetween • cases exist.

  6. Why is Custom Logic Faster Than Software? • Spatial vs. Temporal Computation • Processors divide computation across time, dedicated logic divides across space

  7. Why is Custom Logic Faster Than Software? • Specialization • Instruction set may not provide the operations your program needs • Processors provide hardware that may not be useful in every program or in every cycle of a given program • Multipliers • Dividers • Instruction Memory • Processors need lots of memory to hold the instructions that make up a program and to hold intermediate results. • Bit Width Mismatches • In general, processors have a fixed bit width, and all computations are performed on that many bits • Multimedia vector instructions (MMX) a response to this

  8. Data Storage (Register File) A B C ALU 64 Microprocessor-based Systems • Generalized to perform many functions well. • Operates on fixed data sizes. • Inherently sequential.

  9. A H B L Functional Unit Reconfigurable Computing If (A > B) { H = A; L = B; } Else { H = B; L = A; } • Create specialized hardware for each application. • Functional units optimized to perform a special task.

  10. Dataflow • Superscalar must find dataflow graph at run time • RC constructs data flow graph at compile time • no logic control overhead • no window size limitations

  11. Implementation Spectrum Microprocessor Reconfigurable Hardware ASIC • ASIC gives high performance at cost of inflexibility. • Processor is very flexible but not tuned to the application. • Reconfigurable hardware is a nice compromise.

  12. Flexibility vs Data-Processing Rate

  13. LE LE LE LE LE LE LE LE LE LE LE LE Field-Programmable Gate Array Tracks Logic Element • Each logic element outputs one data bit. • Interconnect programmable between elements. • Interconnect tracks grouped into channels.

  14. Logic Element FPGA Architecture Issues • Need to explore architectural issues. • How much functionality should go in a logic element? • How many routing tracks per channel? • Switch “population”?

  15. S S Real World Physical Issues • Modelling FPGA delay. • Improving performance through buffering/segmentation. • Technology dependent. • The cost of reconfigurability. Wires have real cost

  16. Translating a Design to an FPGA C program . . C = A+B . Circuit Array A + C B • CAD to translate circuit from text description to physical implementation well understood. • CAD to translate from C program to circuit not well understood. • Very difficult for application designers to successfully write high-performance applications Need for design automation!

  17. A B for (i = 0; i<n, i++) { . . } C = A+B + C High-level Compilers • Difficult to estimate hardware resources. • Some parts of program more appropriate for processor (hardware/software codesign). • Compiler must parallelize computation across many resources. • Engineers like to write in C rather than pushing little blocks around.

  18. Reconfigurable Hardware Logic Element A B Out • Each logic element operates on four one-bit inputs. • Output is one data bit. • Can perform any boolean function of four inputs 2 = 64K functions! C D A B C D = out 4 2

  19. Basic Logic Block Architecture

  20. Xilinx - Spartan II Architecture • IOBs provide the interfacebetween the package pins andthe internal logic • CLBs provide the functionalelements for constructing mostlogic • Dedicated block RAMmemories of 4096 bits each • Clock DLLs for clockdistributiondelay compensationand clock domain control • Versatile multi-levelinterconnect structure

  21. Spartan II Configurable Logic Block • Basic block is a logic cell (LC) • – A 4-input function generator (LUT), • – Carry logic • – storage element. • • Each CLB contains • – four LCs, organized in two similar slices. • – logic that combines function generators toprovide functions of five or six inputs. LUT capacity is completely determined by the number of inputs, not the complexity

  22. Spartan II CLB

  23. A B Co FA Ci LUT LUT S A A B B S Co Ci Ci Example: Two Bit Adder Made of Full Adders A+B = D Logic synthesis tool reduces circuit to SOP form S = ABCi + ABCi + ABCi + ABCi Co = ABCi + ABCi + ABCi + ABCi

  24. LUT LUT ? Circuit Compilation • Technology Mapping • Placement • Routing Assign a logical LUT to a physical location. Select wire segments And switches for Interconnection.

  25. Processor + FPGA Three possibilities daughtercard Proc FPGA chip Backplane bus (e.g. PCI) 1. FPGA serves as coprocessor for data intensive applications. FPGA chip Proc 2. FPGA serves as embedded computer for low latency transfer. “Reconfigurable Functional Unit”

  26. RF FPGA ALU 3. Processor integration Processor + FPGA (cont..) Processor • FPGA logic embedded inside processor. • A number of problems with 2 and 3. • Process technology an issue. • ALU much faster than FPGA generally. • FPGA much faster than the entire processor.

  27. F F F F F F F F F Multi-FPGA Systems • Most applications don’t fit on one device. • Create need for partitioning designs across many devices. • Effectively a “netlist computer” Each FPGA is a logic processor interconnected in a given topology.

  28. Xilinx XC4000 Cell • 2 4-input look-up tables • 1 3-input look-up table • 2 D flip flops

  29. Altera Flex10K

  30. Xilinx Virtex CLB

  31. Reconfiguration • Reconfiguration methodology • Static • Partially static (=partial reconfiguration) • Dynamic

  32. The Design Process • Partition a program into sections to beimplemented on hardware and softwareseparately • Synthesize the computations destined forreconfigurable hardware into gate-level orcircuit level description. • Map the circuit onto reconfigurable blocksand connect them using reconfigurablerouting. • After compilation, the circuit is ready for configuration onto the hardware at runtime.

  33.  Performance •  Power consumption •  Flexibility •  Programming  Specialization RC Objectives • RC objectives: Specialization, performance, flexibility • Basic idea: “Programmable Hardware”

  34. B B A A C C Continuous Routing Structured Routing Reconfigurable Devices • Routing strategies Reconfigurable Computing

  35. Xilinx XC4000 Routing 25

  36. Reconfigurable Instruction Set Processors • By including reconfigurability we can increase flexibility with high specialization Reconfigurable Instruction Set Processors Processor PLD Reconfigurable Processor

  37. · · · · · · Task 1 Task K Task K+1 Task N Software Hardware · · · Software Hardware Task 1 Task 2 Task N • Coprocessor based approach • ASIP based approach Reconfigurable Instruction Set Processors

  38. Coprocessor based approach (I) • Typical example: CPU + PCI board • Altera ARC-PCI • Compaq Pamette • System on Chip (SoC) • Altera´s Excalibur device • Chameleon Systems, Inc. Reconfigurable Instruction Set Processors

  39. Coprocessor based approach (II) • Altera ARC-PCI Reconfigurable Instruction Set Processors

  40. Coprocessor based approach (III) • Compaq Pamette Reconfigurable Instruction Set Processors

  41. Coprocessor based approach (IV) • Altera´s Excalibur device • Embedded Processor: ARM, MIPS or NIOS Reconfigurable Instruction Set Processors

  42. Coprocessor based approach (V) • Chameleon Systems, Inc. Reconfigurable Instruction Set Processors

  43. Fetch Decode Issue Integer Unit FP Unit Branch Unit LD/ST Unit Reconfigurable Unit ASIP based approach (I) • Reconfigurable unit within CPU Reconfigurable Instruction Set Processors

  44. C Code Compiler Instruction Description (Configuration bits) Assembly Code ASIP based approach (II) • Challenge: CAD tools Reconfigurable Instruction Set Processors

  45. C Code Compiler Structure C Parsing Optimizations Hardware Estimator Inst. Identification Inst. Selection Hardware Generation Config. Scheduling Code Generation Assembly Code Configuration bits ASIP based approach (III) Reconfigurable Instruction Set Processors

  46. 32 32 32 32 32 32 5 5 4 5 Register File ALU MUX Encoded Instruction Word RFU ASIP based approach (II) • Example: Philips CinCISe Architecture Reconfigurable Instruction Set Processors

  47. Why Compute With FPGAs? • Huge performance gap between software and hand-designed hardware systems • Often 100-to-1 ratio of performance or performance/area • Hardware systems not so good for general computing • Big design, cost barriers to implementation • Not practical to buy a new machine every time you want to run a different program • Reconfigurable systems offer best-of-both-worlds • Run-time programmability • Hardware-level performance

  48. Good Applications for Reconfigurable Computing • Relatively small application graph • FPGAs have limited capacity • Simple control flow helps a lot • Data Parallelism • Execute same computations on many independent data elements • Pipeline computations through the hardware • Small and/or varying bit widths • Take advantage of the ability to customize the size of operators

  49. Reconfigurable Computing Successes • RSA Decryption • Programmable-Active-Memory machine set record for decryption of RSA-encrypted data • DNA Sequence Matching • Reconfigurable hardware has achieved 100x better performance than contemporary supercomputers • Signal Processing • FPGA-based filters often get 10x better performance than DSP chips • Benefit from customization of hardware to the application • Emulation • Use reconfigurable logic to simulate new processors at high speeds • Cryptographic Attacks • High-performance low-cost implementations for breaking encryption algorithms

More Related