reconfigurable computing and the von neumann syndrome n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Reconfigurable Computing and the von Neumann Syndrome PowerPoint Presentation
Download Presentation
Reconfigurable Computing and the von Neumann Syndrome

Loading in 2 Seconds...

play fullscreen
1 / 172

Reconfigurable Computing and the von Neumann Syndrome - PowerPoint PPT Presentation


  • 173 Views
  • Uploaded on

Reconfigurable Computing and the von Neumann Syndrome. Reiner Hartenstein. Questions ?. familiar with FPGAs ? Programming easy? Who is familiar with systolic arrays ? Duality: data streams vs. instruction streams ? Programming a multicore microprocessor: will it be easy ?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Reconfigurable Computing and the von Neumann Syndrome


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
  1. Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein

  2. Questions ? • familiar with FPGAs ? Programming easy? • Who is familiar with systolic arrays ? • Duality: data streams vs. instruction streams ? • Programming a multicore microprocessor: will it be easy ? 2

  3. pervas 3

  4. Outline • The Pervasiveness of FPGAs • The Reconfigurable Computing Paradox • The Gordon Moore gap • The von Neumann syndrome • We need a dual paradigm approach • Conclusions 4

  5. FPGAs found everywhere 5

  6. Pervasiveness of RC http://hartenstein.de/pervasiveness.html http://www.fpl.uni-kl.de/ RCeducation08/pervasiveness.html 6

  7. RCeducation 2008 The 3rd International Workshop on Reconfigurable Computing Education April 10, 2008, Montpellier, France http://www.fpl.uni-kl.de/RCeducation08/ 7

  8. Outline • The Pervasiveness of FPGAs • The Reconfigurable Computing Paradox • The Gordon Moore gap • The von Neumann syndrome • We need a dual paradigm approach • Conclusions the hardware / software chasm, the configware / software chasm the instruction stream tunnel the overhead-prone paradigm 8

  9. Outline • The Pervasiveness of FPGAs • The Reconfigurable Computing Paradox • The Gordon Moore gap • The von Neumann syndrome • We need a dual paradigm approach • Conclusions instruction-stream vs. data stream bridging the chasm: an old hat stubborn curriculum task forces 9

  10. Outline • The Pervasiveness of FPGAs • The Reconfigurable Computing Paradox • The Gordon Moore gap • The von Neumann syndrome • We need a dual paradigm approach • Conclusions 10

  11. Outline paradox 11

  12. RC education http://www.fpl.uni-kl.de/RCeducation/ http://www.fpl.uni-kl.de/ RCeducation08/pervasiveness.html 12

  13. Outline • The Pervasiveness of FPGAs • The Reconfigurable Computing Paradox • The Gordon Moore gap • The von Neumann syndrome • We need a dual paradigm approach • Conclusions platform FPGAs, coarse-grained arrays saving energy 13

  14. connect box reconfigurable interconnect fabrics switch box reconfigurable logic box FPGA with island architecture FPGA with island architecture FPGA with island architecture 14

  15. density: overhead: FPGA physical wiring overhead >> 10 000 FPGA logical FPGA routed Deficiencies of reconfigurable fabrics (FPGA)(fine-grained) transistors / microchip 109 reconfigurability overhead> (Gordon Moore curve) 106 routing congestion (microprocessor) immense area inefficiency 103 deficiency factor: >10,000 1st DeHon‘s Law [1996: Ph. D thesis, MIT] general purpose “simple” FPGA power guzzler 100 slow clock 1980 1990 2000 2010 15

  16. Reed-Solomon Decoding pattern recognition 730 2400 MAC SPIHT wavelet-based image compression 288 Smith-Waterman pattern matching real-time face detection 457 1000 6000 Viterbi Decoding 400 100 FFT crypto oil and gas video-rate stereo vision 17 1000 900 GRAPE 20 Astrophysics 52 BLAST 88 molecular dynamics simulation 1980 1990 2000 2010 protein identification 40 X 2/yr Software-to-Configware (FPGA) Migration: some published speed-up factors [2003 – 2005] 106 Image processing, Pattern matching, Multimedia speedup factor DSP and wireless 103 Bioinformatics 100 16

  17. 17

  18. The RC paradox Reed-Solomon Decoding pattern recognition 730 2400 MAC SPIHT wavelet-based image compression 288 PISA Smith-Waterman pattern matching real-time face detection 457 1000 6000 Viterbi Decoding 400 100 FFT 3000 crypto oil and gas video-rate stereo vision 17 1000 deficiency factor: >10,000 speed-up factor:6,000 total discrepancy: >60,000,000 900 GRAPE 20 Astrophysics 52 BLAST 88 molecular dynamics simulation 1980 1990 2000 2010 protein identification 40 X 2/yr Software-to-Configware (FPGA) Migration: some published speed-up factors [2003 – 2005] 106 Image processing, Pattern matching, Multimedia speedup factor DSP and wireless 103 Bioinformatics 100 18

  19. The RC paradox Reed-Solomon Decoding pattern recognition 730 2400 MAC SPIHT wavelet-based image compression 288 Smith-Waterman pattern matching real-time face detection 457 1000 6000 Viterbi Decoding 400 100 FFT 3000 crypto oil and gas video-rate stereo vision 17 1000 deficiency factor: >10,000 speed-up factor:6,000 total discrepancy: >60,000,000 900 GRAPE 20 Astrophysics 52 BLAST 88 molecular dynamics simulation 1980 1990 2000 2010 protein identification 40 X 2/yr Software-to-Configware (FPGA) Migration: some published speed-up factors [2003 – 2005] 106 Image processing, Pattern matching, Multimedia speedup factor DSP and wireless 103 Bioinformatics 100 19

  20. The RC paradox Reed-Solomon Decoding pattern recognition 730 2400 MAC SPIHT wavelet-based image compression 288 PISA Smith-Waterman pattern matching real-time face detection 457 1000 6000 Viterbi Decoding 400 100 FFT crypto oil and gas video-rate stereo vision 17 1000 deficiency factor: >10,000 speed-up factor:6,000 total discrepancy: >60,000,000 900 GRAPE 20 Astrophysics 52 BLAST 88 molecular dynamics simulation 1980 1990 2000 2010 protein identification 40 X 2/yr Software-to-Configware (FPGA) Migration: some published speed-up factors [2003 – 2005] 106 Image processing, Pattern matching, Multimedia speedup factor DSP and wireless 103 Bioinformatics 100 20

  21. Software-to-Configware (FPGA) Migration: some published speed-up factors [2003 – 2005] These examples worked fine with on-chip memory There are other algorithms more difficult to accelerate … … where d-daching might be useful (ASM) 21

  22. Outline platform-FPGA 22

  23. How much on-chip embedded BRAM ? 256 – 1704 BGA 8 – 32 DPU: coarse-grained 56 – 424 fast on-chip block RAMs: BRAMs On-chip LatticeCS series 23

  24. coarse 24

  25. rDPU Coarse-grained Reconfigurable Array note: software perspective without instruction streams: pipelining question after the talk: „but you can‘t implement decisions!“ SNN filter on (supersystolic) KressArray (mainly a pipe network) rout thru only no CPU reconfigurable Data Path Unit, 32 bits wide array size: 10 x 16 rDPUs compiled by Nageldinger‘s KressArray Xplorer with Juergen Becker‘s CoDe-X inside not used backbus connect 25

  26. Simple KressArray Configuration Example 26

  27. CPU program counter DPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPA logical rDPA physical rDPU rDPU rDPU rDPU Much less deficiencies by coarse-grained transistors / microchip DPU 109 rDPU (Gordon Moore curve) 106 area efficiency very close to Moore‘s law 103 Hartenstein‘s Law[1996: ISIS, Austin, TX] very compact configuration code: very fast reconfiguration 100 1980 1990 2000 2010 27

  28. energy 28

  29. oil and gas 17 1980 1990 2000 2010 X 2/yr Software-to-Configware (FPGA) Migration: Oil and gas [2005] 106 side effect: slashing the electricity bill by more than an order of magnitude speedup factor 103 100 29

  30. What about higher speed-up factors ? More dramatic electricity savings? An accidentially discovered side effect Herb Riley, R. Associates $70in 2010? Saves > $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack • Software to FPGA migration of an oil and gas application: • Speed-up factor of 17 • Electricity bill down to <10% • Hardware cost down to <10% • All other publications reporting speed-up did not report energy consumption. - This will change. 30

  31. What’s Really Going On With Oil Prices? [BusinessWeek, January 29, 2007] $52 Price of delivery in February 2007 [New York Mercantile Exchange: Jan. 17] $200 Minimum oil price in 2010, in a bet by investment banker Matthew Simmons 31

  32. Energy as a strategic issue • Google‘s annual electricity bill: 50,000,000 $ • Amsterdam‘s electricity: 25% into server farms • NY city server farms: 1/4 km2 building floor area • Predicted f. USA in 2020: 30-50% of the entire national electricity consumption goes into cyber infrastructure [Mark P. Mills] • petaFlop supercomputer (by 2012 ?): extreme power consumption 32

  33. Energy: an im portant motivation *) feasible also on reconfigurable platforms 33

  34. Moore gap 34

  35. Outline • The Pervasiveness of FPGAs • The Reconfigurable Computing Paradox • The Gordon Moore gap • The von Neumann syndrome • We need a dual paradigm approach • Conclusions & the multicore crisis 35

  36. Moore’s law not applicable to all aspects of VLSI the law of Gates What is the reason of the paradox ? The Gordon Moore curve does not indicate performance The peak clock frequency does not indicate performance 36

  37. 200 DEC alpha [BWRC, UC Berkeley, 2004] 175 150 memory wall, caches, ... 125 100 SPECfp2000/MHz/Billion Transistors CPU 75 IBM 50 SUN 25 HP 0 1990 1995 2000 2005 stolen from Bob Colwell Rapid Decline of Computational Density primary design goal: avoiding a paradigm shift dramatic demo of the von Neumann Syndrome alpha: down by 100 in 6 yrs IBM: down by 20 in 6 yrs 37

  38. Crossbar weight: 220 t, 3000 km of thick cable, Monstrous Steam Engines of Computing power measured in tens of megawatts, floor space measured in tens of thousands of square feet 5120 Processors, 5000 pins each larger than a battleship ready 2003 38

  39. ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/ Stellar/Stardent DAPP Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech ICL Intel Scientific Computers International Parallel . Machines Kendall Square Research Key Computer Laboratories Dead Supercomputer Society Research 1985 – 1995[Gordon Bell, keynote ISCA 2000] • MasPar • Meiko • Multiflow • Myrias • Numerix • Prisma • Tera • Thinking Machines • Saxpy • Scientific Computer • Systems (SCS) • Soviet Supercomputers • Supertek • Supercomputer Systems • Suprenum • Vitesse Electronics 39

  40. supercomputing crisis microprocessor crisis MPP parallelism does not scale going multi core We are in a Computing Crisis *) feasible also with rDPA 40

  41. Syndrome 41

  42. [Burks, Goldstein, von Neumann; 1946] • RAM (memory cells have adresses ….) The von Neumann Paradigm Trap CS education got stuck in this paradigm trap which stems from technology of the 1940s • Program counter (auto-increment, jump, goto, branch) • Datapath Unit with ALU etc., • I/O unit, …. CS education’s right eye is blind, and its left eye suffers from tunnel view We need a dual paradigm approach 42

  43. The Law of More: drastically declining programmer productivity What is the reason of the paradox ? the von Neumann Syndrome Result from decades of tunnel view in CS R&D and education basic mind set completely wrong “CPU: most flexible platform” ? >1000 CPUs running in parallel: the most inflexible platform However, FPGA & rDPA are very flexible 43

  44. multicore 44

  45. Understanding the Paradox ? Executive Summary doesn‘t help We must first understand the nature of the paradigm von Neumann chickens ? 45

  46. 46

  47. models 47

  48. RAM memory DPU DPU CPU program counter Von Neumann CPU (tunnel view with the left eye) Program Source: Software - World of Software -Engineering 48

  49. software instruction-stream-based data-stream-based RAM memory von Neumann bottleneck accelerator DPU hardware CPU co-processors program counter CPU von Neumann is not the common model microprocessor age: mainframe age: von Neumann instruction-stream-based machine 49

  50. software instruction-stream-based data-stream-based RAM memory von Neumann bottleneck accelerator DPU hardware CPU co-processors program counter CPU CPU reconfigurable hardwired accelerator accelerator Here is the contemporary common model microprocessor age: mainframe age: Now we are in the configware age: von Neumann instruction-stream-based machine 50