1 / 86

Reiner Hartenstein University of Kaiserslautern

November 21, 2001, Tampere, Finland. Enabling Technologies for Reconfigurable Computing part 1: Reconfigurable Computing (RC) Wednesday, November 21, 8.30 – 10.00 hrs. Reiner Hartenstein University of Kaiserslautern. Schedule. Reconfigurable: why?.

Download Presentation

Reiner Hartenstein University of Kaiserslautern

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. November 21, 2001, Tampere, Finland Enabling Technologies for Reconfigurable Computingpart 1:Reconfigurable Computing (RC)Wednesday, November 21, 8.30 – 10.00 hrs. Reiner Hartenstein University of Kaiserslautern

  2. Schedule 2

  3. Reconfigurable: why? • Exploding design cost and shrinking product life cycles of ASICs create a demand on RA usage for product longevity. • Performance is only one part of the story. The time has come fully exploit their flexibility to support turn-around times of minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field-upgrades. • A new “soft machine” paradigm and language framework is available for novel compilation techniques to cope with the new market structures transferring synthesis from vendor to customer. 3

  4. SOC Alternatives… not including C/C++ CAD Tools [Gordon Bell] • The blank sheet of paper: FPGA • Auto design of a basic system: Tensilica • Standardized, committee designed components*, cells, and custom IP • Standard components including more application specific processors *, IP add-ons and custom • One chip does it all: SMOP ** *) Processors, Memory, Communication & Memory Links, **) SMOP ?? 4

  5. SoC Alternatives [Gordon Bell] 5

  6. A Decade of Research in Reconfigurable Computing • Due to the achievements of numerous Research Projects throughout the 90ies the Breakthrough in Commercialization has started and already a quite comprehensive Methodology is available. • Dear Colleague, the RC Scene welcomes your contributions to improve it and to push for Inclusion in contemporary CS&E Curricula. • It is one of the Goals of this Talk to stimulate you by Highlights and introducing some Key Issues. 6

  7. no more a strange niche area • was “Hardware” design for a strange plattform • CAD, but no Compilation • Emerging awareness: • New mind set • New curricular embedding • coming Dichotomie of CS • SW <-> CW • HW <-> FW • computing in time <-> computing in space 7

  8. application- specific general purpose domain- specific flexibility/universality trade-off Kress Array Xplorer hard- wired FPGA efficiency flexibility trade-off 8

  9. RAs are heading for Mainstream Flash / RAM memory banks Logic DRAM/Flash/SRAM Microprocessor Reconfigurable Accelerator Array • CSoC, configurable SoC is: • an industry standard µProcessor, • embedded reconfigurable array, • memory, dedicated systen bus ... ARM, MIPS, Programmable Logic or... Analog Logic ... become indispensable for SoC products ? • ASPP, application-specific programmable product is: • Application-specific • standard productand: • embedded programmable logic Soap Chip: System on aprogrammableChip 9

  10. Reconfigurable Logic going Mainstream • Fine grain: FPGAs killing the ASIC market • Fastest growing segment of semiconductor market • Please, Lobby for New Curricula. • Substantially improved design flow and libraries • Coarse grain: several startups • Comprehensive Methodology • One of the goals of this talk: to motivate You by Key Issues and Visionary Highlights. 10

  11. Designer-oriented Innovation stalled ? • EDA industry: about 7 bio $ • leverages > 200 bio $ semconductor industry • FPGAs (7 bio $) fastest growing segment • EDA industry constantly redefining itself • „except logic synthesis nor really significant innovation in the past decade“ • CAD developers can‘t deliver their idear effectively • CAD developers personally don‘t appreciate the real problems facing designers 11

  12. EDA the main bottleneck 12

  13. guess it ! Biggest Mistake of EDA 13

  14. >> History • History • Paradidgm Shift • Coarse Grain: why ? • Coarse Grain Architectures • Reconfiguration Architecture http://www.uni-kl.de 14

  15. Source:Altera 1.2 Price per Logic Element 40% lower per Year 1 0.8 Price (Normalized to Q1/1993) 0.6 0.4 0.261 0.2 0.086 0.042 0.029 0 Q1 '93 Q1 '94 Q1 '95 Q1 '96 Q1 '97 Q1 '98 Q1 '99 Q1 '00 Logic Gate Price Trend 15

  16. “Mainstream Silicon Application is switching every 10 Years” Makimoto’s Wave “The Programmable System-on-a-Chip is the next wave“ standard ? µproc., memory 2007 1967 1987 ? LSI, MSI reconfigurable 1957 ASICs, accel’s 1977 1997 custom Published in 1989 The History of Paradigm Shifts 2ndDesignCrisis 1stDesignCrisis TTL What’s coming next ? 16

  17. Makimoto’s 3rd Wave • Fine Grain Subsystems (FPGAs): • 1st half of 3rd wave • universal (but less efficient) • Coarse Grain Subsystems: • 2nd half of 3rd wave • domain-specific • much more flexible than 2nd half of 2rd wave 17

  18. hardwired procedural programming structural programming 4th wave ? ? Coarse grain RAs ? Hartenstein’s Curve algorithm: variable algorithm: fixed algorithm: variable Tredennick’s resources: variable resources: fixed resources: fixed Paradigm Shifts How’s next Wave ? standard FPGAs 2007 2007 1967 1987 1957 1977 1997 custom no further wave ! 18

  19. Repeat Success Story by new Machine Paradigm ! Software Industry’s Secret of Success standard µproc., memory TTL 2007 1967 1987 LSI, MSI reconfigurable 1957 ASICs, accel’s 1977 1997 custom The Impact of Makimoto’s Paradigm Shifts Dr. Makimoto: FPL 2000 keynote Procedural personalization via RAM-based Machine Paradigm structural personalization: RAM-based before run time Personalization (CAD) before fabrication 19

  20. >> Paradigm Shift • History • Paradidgm Shift • Coarse Grain: why ? • Coarse Grain Architectures • Reconfiguration Architecture http://www.uni-kl.de 20

  21. Software Logic Synthesis (procedural) Route and Place downloading downloading download RAM I / O RAM sequential RAM re- conf. instruction data path accelerator(s) FPGA sequencer structural RAM “von Neumann” Sequential vs. structural RAM 21

  22. the tail wagging the dog Configware occupies most silicon (structural) downloading downloading downloading CAD downloading RAM I / O RAM RAM hardwired re- RAM accelerator(s) conf. instruction host host data path accelerator(s) sequencer reconfigurable computing Hardware Flexware Changing Models of Computing Software Software (procedural) contemporary “von Neumann” 22

  23. The Microprocessor is a Methuselah 9 technology generations ... • 1th 4004 • 2nd 8008 • 3rd 8086 • 4th 80286 • 5th 80386 • 6th 80486 • 7th P5 (Pentium) • 8th P6 (Pentium Pro / Pentium II) • 9th Pentium III ... the steam engine of the silicon age 23

  24. Billion US-$ US Market [forrester] Billion Subscribers worldwide Million Devices delivered in the U.S. 1 Bio 20 20 [IDC] 1500 $ 15 Consumer PC 1000 $ 0.5 Bio 10 cellular & PCS Information Appliances Consumer PC av. resale ($) [forrester] 1997 1998 1999 2000 2001 2002 … Decline of Wintel Business Model 24

  25. microprocessor parallel computer Reconfigurable Computing Basics of Binding Time time of “Instruction Fetch” run time loading time compile time 25

  26. Binding time: (Set-up of Communication Channels) microprocessor array processor at run time parallel computer at loading time Reconfigurable Computing The KressArray is a generalization of the systolic array at compile time ASICs later fabrication step full custom systolic before fabrication ICs arrays programming domain: time domain time & space space domain (structural) (procedural) (hybrid) Binding Time vs. Computing Domain 26

  27. Dataquest Predicts Programmability to be Predominant in SOC • Application-specific programmable products (ASPPs) will be the next best thing in semiconductor technology • With programmability as a standard feature, ASPPs will be predominant system-on-a-chip products in five years Jordan Selburn, principal analyst, ASICs and system-level integration, Dataquest Inc.’s Semiconductors Group EETimes 10/21/98 Dataquest Semiconductors ‘98 conference 27

  28. *) keynotes and papers at FPL 2000 Villach, Austria, August 27 - 30, 2000 http://www.fpl.uni-kl.de/FPL/ Applications • next generations’ wireless* • network processors* • many other areas* The 10th International Conference on Field-programmable Logic and Applications The Roadmap to Reconfigurable Systems 28

  29. Applications (2) • Image Processing: • for smart car (collision avoidance, others ...), • Smart traffic pilots, robotics, fast material inspection, • smart stub finders, motion detection (MPEG-4, ...) • Signal Processing, Speech Processing, Software Radio, • Correlation, Encryption, Comm. Switching / Protocols, • Innovative consumer electronics: • super smart cards, smart handies, wearable, • portable, set-top, laptop, desktop, embedded, ... • many others, ... 29

  30. Applications • new cellular standard: up to 2 Mbit/sec: new CDMA standard: > 500 MIPS needed just for RF receiver part • wide variety of end-user‘s devices: smart handies, palm pilots, laptops, games, camcorder-likes, ..the internet car, many new types of devices to come ... • increasing wide variety of services available from network provider:download just what a particular customer is subscribed to • expert group [Vissers]: > 20% of it will be accelerator code* 30

  31. 4G 3G memory 2G wireless StrongARM microprocessor / DSP 100 10 1 0.1 0.01 0.001 Algorithmic Complexity (Shannon’s Law) 1G Transistors/chip computational efficiency SH7752 mA/ MIP batteryperformance Normalized processor speed Why coarse grain ? Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld 100 000 000 10 000 000 1000 000 100 000 10 000 1000 100 10 1 1960 1970 1980 1990 2000 2010 31

  32. Shannon‘s Law • In a number of application areas throughput requirements are growing faster than Moore's law • Fundamental flaws in software processor solutions • 32 soft ARM cores fit onto contemporary FPGA • Stream-based distributed processing is the way to go 32

  33. It’s a Paradigm Shift ! • Using FPGAs (fine grain reconfigurable) just mainly is classical Logic Synthesis on a “strange hardware” platform • Coarse Grain Reconfigurable Arrays (Reconfigurable Computing), however, mean a really fundamental Paradigm Shift • This is still ignored by CS and EE Curricula and almost all R&D scenes 33

  34. >> Coarse Grain: why ? • History • Paradidgm Shift • Coarse Grain: why ? • Coarse Grain Architectures • Reconfiguration Architecture http://www.uni-kl.de 34

  35. It’s a General Paradigm Shift ! • Using FPGAs (fine grain reconfigurable): just Logic Synthesis on a strange platform • Coarse Grain Reconfigurable Arrays (Reconfigurable Computing): a fundamental Paradigm Shift systolic array* [1980] KressArray** [1995] • Replacing Concurrent Processes by much more efficient parallelism: Stream-based ComputingArrays chip-on-a-day* [2000] ____ *) hardwired **) reconfigurable • ignored by Curricula & most R&D scenes 35

  36. Fine-grained vs. coarse-grained • Fine-grained reconfiguration versus coarse-grained reconfiguration. • fine grain is general purpose • slow and area-inefficient, but high parallelism • coarse grain is application domain-specific • coarse grain is highly area-efficient • extremely high performance 36

  37. area used by application L L L S S L L L resources needed for reconfigurability S S L L L Reconfigurability Overhead partly for configuration code storage “hidden RAM” not shown 37

  38. FF of hidden RAM Principle of a Typical FPGA 38

  39. >1000 transistors at each cross bar > Ý 40 transistors Routing Congestion [DeHon]: at each switching FF FF often 50% or less of CLBs used point FF part of the FF > Ý 15 transistors hidden RAM at each tap FF most FPGA vendors’ FF FF gate count: 1 flipflop of FF FF configuration RAM = 4 gates Routing Overhead in FPGAs 39

  40. physical ~ 10 memory logical FPGA physical supersystolic ~ 10 000 FPGA logical FPGA routed microprocessor Why Coarse Grain instead of FPGA ? Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld 100 000 000 000 10 000 000 000 1000 000 000 100 000 000 10 000 000 1000 000 100 000 10 000 1000 Transistors / chip reduced reconfigurability overhead by up to ~ 1000 much faster loading drastically smaller configuration memory a lot of more benefits 1980 1990 2000 2010 40

  41. avoiding address computation overhead avoiding instruction fetch and interpretation overhead high parallelism, massively multiple deep pipelines much less configuration memory no routing areas to configure functions from CLBs >>> extremely high efficiency 41

  42. Configurable Computing Systems • combine programmable sequential processor with Flexware (structurally programmable „hard“ware): • capitalize on the strength of both,flexware and software. • early 60ies: Estrin (UCLA): enabling technology not available • 90ies: significant increase of research activities (DARPA ...) • FPGAs: not the enabling technology: hardware skills needed • Verilog or VHDL based systems often result in poor performance 42

  43. Platforms available • Soft Data Path Arrays • KressArray • Xtreme (PACT) • ACM (Quicksilver Tech) • CHESS Array (Elixent) • others • Compilation techniques feasibility studies: • Partitioning Co-Compiler • Design Space Explorer • others 43

  44. Also as an autonomous Machine • New Machine Paradigm (Xputer) • is the counterpart of the so-called von Neumann paradigm • CONS: confuses customers (paradigm switch: the brain hurts) • PROS: strong guidance of EDA tool development • more effective hardware/software APIs • compilation techniques similar to traditional compilation • better Application Development Tools accepting C or Java • easy to teach: simple machine principles • scan patterns (data counter) similar to control flow (program counter) • general model of hardware / software co-design • fascination for freak effect: opening up a new R&D discipline 44

  45. >> Coarse Grain Architectures • History • Paradidgm Shift • Coarse Grain: why ? • Coarse Grain Architectures • Reconfiguration Architecture http://www.uni-kl.de 45

  46. Some Players in Silicon Valley and …. Company Architecture Business Model Markets Adaptive Silicon Not disclosed Sell Cores Embedded DSP Networking Chameleon Systems 32 bit datapath array Sell Chips Voice over IP Malleable Not disclosed Sell Chips Wireless Commun. MorphICs Not disclosed Sell Cores Networking Silicon Spice Not disclosed SellSolutions Signal Conditioning Systolix Bit Serial Systolic Array Sell Cores Embedded Systems Triscend System on Chip Sell Chips Network Processors: > 20 Players Cisco: Xilinx’s largest Customer 46

  47. XPU family (IP cores): PACT Corp., Munich ** CALISTO: Silicon Spice CS2000 family: Chameleon Systems ** MECA family: Malleable flexible array: MorphICs * ACM: Quicksilver Tech CHESS array: Elixent * MorphoSys: Morpho Tech FIPSOC: SIDSA **) bought *) here at SoC Commercial rDPAs XPU128 47

  48. PACT Corp • Xtreme Processor Platform (XPP) family of IP cores, high-speed data-stream-capable, scalable, reconfigurable clusters of arrays of 32-bit DPUs with embedded memories, and high-speed I/O ports - • Application development support software featuring a flow graph-style algorithm mapping language - to minimize training requirements. • XPP's fabrics, featuring automatic DataFlow synchronization and flagged Event Network to dynamically configure the execution flow, • Supports dynamic RTR: hierarchical configuration managers free the designer from chip-level details and ensure that configurations are independently loaded in exactly the intended order. • Automatic event-based task swapping along with data streams: released resources automatically reconfigured immediately 48

  49. Reconfigurable Interconnect Fabric rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU RIF layouted over rDPUs: rDPA wired by abutment separate routing area rDPA (Reconfigurable Datapath Array) 49

  50. Some Application Areas, like e. g. Wireless Communication, need extraordinarily powerful Communication Resources Generically defined Fabrics: KressArray Family 50

More Related