1 / 46

Reconfigurable Computing

Reconfigurable Computing. Introduction. Explosive growth in Computing Communication Speed Hungry Applications Weather forecast, real-time audio/video processing, … Contradiction with power consumption and cost. Introduction. Computing Paradigms. Processing architectures:

delfina
Download Presentation

Reconfigurable Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reconfigurable Computing Introduction

  2. Explosive growth in Computing Communication Speed Hungry Applications Weather forecast, real-time audio/video processing, … Contradiction with power consumption and cost Introduction

  3. Computing Paradigms Processing architectures: General-purpose architectures The Von Neumann Computer Domain specific processors for a class of applications (e.g. multimedia) Application specific processors for one application Reconfigurable processors

  4. Principle In 1945, the mathematician Von Neumann (VN): A computer could have asimple structure, capable of executing any kind of program, given a properly programmed control unit, without the need of hardware modification The Von Neumann Computer

  5. Structure A memory for storing program and data. A control unit (control path) featuring a program counter for controlling program execution An ALU for program execution The Von Neumann Computer Processor or Central processing unit Memory Datapath Data and Instructions Data Registers Control path Address register Instruction register PC Address

  6. The Von Neumann Computer Coding A program is coded as a set of instructions to be sequentially executed Program execution Instruction Fetch (IF): The next instruction to be executed is fetched from the memory Decode (D): The instruction is decoded to determine the operation Read operand (R): The operands are read from the memory Execute (EX): The required operation is executed on the ALU Write result (W): The result of the operation is written back to the memory Instruction execution in Cycle (IF, D, R, EX, W)

  7. The Von Neumann Computer Advantage: Flexibility: any well coded program can be executed Drawbacks Speed efficiency: Not efficient, due to the sequential program execution (temporal resource sharing) They cannot even drive their display (need graphics accelerator) Resource efficiency: Only one part of the hardware resources is required for the execution of an instruction. The rest remains idle Memory access: Memories are about 10 time slower than the processor Drawbacks are compensated using high clock speed, pipelining, caches, instruction pre-fetching, etc.

  8. The Von Neumann Computer Sequential execution tcycle = cycle execution time One instruction needs tinstruction = 5*tcycle 3 instructions: in 15*tcycle Pipelining: One instruction needs tinstruction = 5*tcycle no improvement. 3 instructions: in 7*tcycle in the ideal case.  Increased throughput Even with pipeline and other improvement like cache, the execution remain sequential.

  9. Domain-Specific Processors Goal: Overcome the drawbacks of the VN computer. Characteristics: Optimized datapath for a given class of applications Example: DSP Applications usually multiply accumulate (MAC)-dominated: A  A + (B * C) Datapath optimized to execute one or many MACs in only one cycle. Instruction fetching and decoding overhead is removed Memory access is limited Directly processing the input dataflow Special support for efficient looping Special loop or repeat instruction

  10. MAC on a VN Machine • Fetch MUL • Decode MUL • Read operand • Multiply • Store result • Fetch ACC • Decode ACC • Read the stored result • Add with the accumulated value • Store

  11. Loops on a VN Machine • Instruction cycles (fetch, decode, …) for: • Updating loop counter • Testing loop counter • Jumping back to the top of the loop

  12. ASIP • Application-Specific Instruction Set Processors: • Can be classified as domain-specific • Xtensa from Tensilica • A processor core which lets the system designer: • select and size features for a given application • define new instructions.

  13. Application Specific Processors Optimize the complete circuit for a specific function DSPs have VN architecture. with a degree of application-specific features Example: ASIC: Application Specific Integrated Circuit. Optimization is done by implementing the inherent parallel structure on a chip The data path is optimized for only one application. Instruction fetching and decoding overhead is removed Memory access is limited by directly processing the input data flow Exploitation of parallel computation

  14. Application-Specific Processors ASICExample: Implementation of a VN computer if (a < b) then { d = a+b; c = a*b; } else { d = a+1; c = b-1; } At least 3 instructions run-time >= 3*tinstruction • ASIC implementation: The complete execution is done in parallel in one clock cycle run-time = tclock= delay longest path from input to output

  15. ASIC as Accelerator • Accelerator design is difficult:

  16. ASIC as Accelerator • High manufacturing cost

  17. ASIC as Accelerator • Increasing design cost • Decreasing life cycle •  Decreasing number of wafer starts

  18. ASIC Starts vs. FPGA Starts

  19. More Life Cycle in FPGAs

  20. FPGA Replacing ASIC • rDPA: • Reconfigurable Data Path Array

  21. Conclusion Von Neumann computer: General purpose, used for any kind of function High degree of flexibility. Speed problem Some restrictions on the program coding and execution scheme Programs have to adapt to the machine DSPs Adapted for a class of applications Flexibility and efficiency only for a given class of applications ASICs Tailored for one application Very efficient in speed and resource Hardware adapts to the application Cannot re-adapt to a new application Not flexible

  22. Reconfigurable Computing The ideal device should combine: Flexibility of the VN computers Efficiency of ASICs The ideal device should be able to Optimally implement an application at a given time Re-adapt to allow the optimal implementation of a new application We call such a device a reconfigurable device. Definition: Reconfigurable computing can be defined as the study of computations involving reconfigurable devices. This includes, architecture, algorithms and applications.

  23. RCS Definitions • “Reconfigurable computing refers to any information processing system in which blocks of hardware can be reorganized or repurposed to adapt to changing dataflows or algorithms” • Ron Wilson, EE Times • “ A reconfigurable computer is a device which computes by using post-fabrication spatial connections of computable elements.” • Andre DeHon • “Reconfigurable devices contain an array of computational elements whose functionality is determined through multiple programmable configurations.” • Compton and Hauck • “On-the-fly ASIC” • Kurdahi • All are correct: • Better to understand by example

  24. Reconfigurable Computing • Configuration: • A device is configured when its functionality is set. • ASIC is configured when the circuit is fabricated. • Gate arrays are configured when the routing is defined. • Microprocessor is configured when an instruction is read from memory. •  Microprocessors bind functionality at every cycle. • The amount of programmability: • ASIC: can perform exactly one task (not programmable) • GPP: can perform a wide variety of operations: • ISA defines the number of operations that can be performed at a given time. • 64 bits • PLD: functionality is dedicated by bitstream: • 10-100 million bits (Virtex-II Pro XC2VP125: 43 million bits)

  25. Uncommitted Gate Array

  26. Committed Gate Array

  27. Reconfigurable Computing • Reconfigurabilty: • The ability to continually change the functionality of the device. • But microprocessors are not usually considered as reconfigurable devices. • Reconfiguration: • the process of changing the structure of a reconfigurable device (at run-time or off-line)

  28. Flexibility vs Efficiency y t i l i Von Neumann b i x DSP e General purpose l F computing Domain specific computing Reconfigurable systems Reconfigurable computing ASIC Application specific computing Perfromance

  29. PLD Cost ASIC Cross-over volume Volume Cost-Volume Curve • Increasing manufacturing costs increase NRE costs •  Crossover point shifts to right with every new technology node. • $1 million for <150nm

  30. 1000x XC4000 & Virtex-4 Spartan 100x CLB Capacity Virtex-II & Speed Virtex-II Pro Power per MHz Virtex & Price Virtex-E 10x Spartan-2 XC4000 Spartan-3 1x '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 Year Courtesy: Richard Sevcik, Xilinx A Decade of Progress • 200x More Logic • Plus memory, μP etc. • 40x Faster • 50x Lower Power • 500x Lower Cost [Areiba08]

  31. Applications • Emulation: براي debug کردن مدار و اطمينان از صحت عملکرد. • سرعت چندان مهم نيست (تست functionality). • Prototyping: ساخت نمونه ي اوليه ي محصول. • سرعت ممکن است مهم باشد. • Preproduction Use: در محصول نهايي به کار مي رود ولي در آينده توسط ASIC جايگزين خواهد شد. • Production Use: در محصول نهايي به کار مي رود و برنامه اي براي جايگزيني آن وجود ندارد. • حجم توليد احتمالا چندان زياد نيست. • سرعت مي تواند بسيار مهم باشد.

  32. Applications • موارد فوق فقط براي مدارهايي است که ASIC قابليت انجام کار را داشته باشد • ولي در بعضي موارد مدار بايد انعطاف پذير باشد.

  33. Some Fields of Application Rapid prototyping Post fabrication customization Multi-modal computing tasks Adaptive computing systems Fault tolerance High performance parallel computing

  34. Rapid prototyping Verifying hardware in real conditions before fabrication High NRE costs Software simulation Relatively inexpensive Slow Accuracy ? Hardware emulation Hardware testing under real operation conditions Fast Accurate Allow several iterations APTIX System Explorer ITALTEL FLEXBENCH

  35. Post Fabrication Customization Time to market advantage Ship the first version of a product Remote upgrading with new product versions Remote repairing Example: Mars rover vehicle: Some FPGAs can be modified from the earth. Manufacturer

  36. Multi-Modal Computing Tasks A group of devices can share only one device in a time multiplexed. No need for someone to play mp3 songs while watching a video clip and given a phone call. Whenever a service is needed, CU is connected to the corresponding device at the correct location and reconfigured. E.g. a domestic mp3, a domestic DVD player, a car mp3, a car DVD player, a mobile video player can all share the same electronic unit, if they are always used by the same person.

  37. Multi-Modal Computing Tasks One may remove CU from the domestic devices and connect them to one car device when going to work. If decided to go for a walk, removes it and connects it to a mobile device. Coming back home, uses it for watching video.

  38. Adaptive Computing Systems Uncertainty and unpredictability of some systems:  impossible, at compile time, to address all scenarios that can happen at run-time.

  39. Adaptive Computing Systems Adaptive computing systems: Computing systems that are able to adapt their behaviour and structure to changing operating and environmental conditions, time-varying optimization objectives, physical constraints e.g. changing protocols, new standards Basic Applications: Dynamic adaptation to environment Dynamic adaptation to threats Extended mission capabilities

  40. High performance parallel computing 1 Application 1 2 3 4 3 2 5 7 4 6 Physical Topology Virtual Topology Application 1 2 1 2 Traditional parallel implementation flow Exploiting reconfigurable topology 3 7 4 5 6 3 Physical Topology 4 5 6 7 Virtual Topology

  41. High performance parallel computing • Many reconfigurable machines achieved 100x speedups and per unit silicon (compared to microprocessors)

  42. The main question A reconfigurable device is a piece of hardware and A hardware can never change after fabrication Then How is a reconfigurable device made ?

  43. Microprocessor-Based Systems (Temporal) • Generalized to perform many functions well • Operates on fixed data sizes. • Inherently sequential • Constrained even with multiple data paths. Data Storage (Register File) A B C ALU 64

  44. A H B L A Simple Reconfigurable Circuit • Functional unit optimized to perform a special task. if (A > B) { H = A; L = B; } else { H = B; L = A; } Functional Unit

  45. H A H H A A A H A B L B L L B B B L Example: Bubblesort (Spacial) • Adapt interconnect to problem. • Take advantage of parallelism. H L Lowest Value Highest Value

  46. References • [Areiba08] Areiba, “Reconfigurable Computing Systems,” Lecture Slides. • [Hartenstein07] Hartenstein, “Basics of Reconfigurable Computing,” S. P. J. Henkel, Ed. New York: Springer-Verlag, 2007. • [Bobda07] Bobda, Reconfigurable Computing Systems,” Lecture Slides.

More Related