1 / 62

reconfigurable/fpga computing part 1

reconfigurbale / fpga hpc computing in 2014

rinnocente
Download Presentation

reconfigurable/fpga computing part 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reconfigurable Computing Reconfigurable Computing Roberto Innocente inno@sissa.it Part 1 of 2 May 10, 2014 R.Innocente 1

  2. Flexibility - + ? ASIC Application Specific Integrated Circuit GPP General Purpose Processor Reconfigurable Hardware Flexible, But enough energy, time and space efficient Very inflexible,designed to solve just 1 problem. Energy, space and time efficient Very flexible, can solve any problem. Energy, space and time inefficient May 10, 2014 R.Innocente 2

  3. History May 10, 2014 R.Innocente 3

  4. Gerald Estrin/1 is credited the idea of having proposed in the '60 the first reconfigurable (F+V) FIX+Variable computer Gerald Estrin. ACM 1960. Organization of computer systems: the fixed plus variable structure computer. May 10, 2014 R.Innocente 4

  5. Gerald Estrin/2 He envisioned that important gains in performance could be achieved when many computations are executed on appropriate problem oriented configurations. F+V is made of : - high speed general computer(the F part) : initially an ibm7090 - various size high speed special structures (the V part) problem specific: trigonometric functions, logarithms, exponential, n-th powers, complex arithmetic, … V is made of a 36 module positions motherboard which can undergo : - Function reconfiguration: physically changing some modules - Routing reconfiguration : changing part of the back wiring The Rammig machine (1977) : investigation of a reconfigurable machine with no manual or mechanical intervention May 10, 2014 R.Innocente 5

  6. Today reconfigurable hardware Is born out of the will to replace different logic IC(Integrated Circuits), and successively to rapidly prototype large ASICs(Application Specific ICs) or implement SoCs (Sytem On Chip). In the following slides readers are supposed to be involved in scientific computing and not EE engineers. May 10, 2014 R.Innocente 6

  7. Basic digital circuits AND INVERTER OR MUX D Type FF Shift Reg Usually 0=0V, 1=some positive voltage May 10, 2014 R.Innocente 7

  8. SSI 74xx IC May 10, 2014 R.Innocente 8

  9. PLD Inconvenience of standard discrete logic circuits : - 14 pin packages of 4/6 logic functions - often you had to traverse the PCB to find a free OR or inverter - if you needed only a few, you had in any case to put an IC with 4/6 Therefore came the idea of PLD (Programmable Logic Device) : - SPLD (Simple : PAL/PLA) - CPLD (Complex) In which a simple interconnection network could be configured melting some internal fuses (fuse technology) to implement combinatorial logic. May 10, 2014 R.Innocente 9

  10. disjunctive normal form (aka Sum of products ) Each boolean function of some boolean variables can be represented as a sum of minterms (product of all variables or their complement) . With 3 boolean vars : a,b,c are 2 of the 23 = 8 minterms f (a,b,c)=ā bc+̄ ab̄ c ābc,̄ ab̄ c May 10, 2014 R.Innocente 10

  11. PLA (Programmable Logic Array) f1=p1+p2+p3=x1x2+x1 ̄ x3+ ̄ x1 ̄ x2 x3+x1x3 May 10, 2014 R.Innocente 11

  12. FPGA Also CPLDs showed their limits, therefore in 1985/1990 Xilinx introduced a more flexible design , the FPGA (Field Programmable Gate Array) In which the interconnection network is much more flexible and on which also sequential circuits can be easily mapped. May 10, 2014 R.Innocente 12

  13. FPGA idea 1985 Xilinx – Ross Freeman (inventor of FPGA): “What if we could develop the equivalent of a circuit board full of standard logic parts (like TTL and PAL devices) on a single high density programmable logic chip ?” - post fabrication programmability by end users - fabless semiconductor company May 10, 2014 R.Innocente 13

  14. Today May 10, 2014 R.Innocente 14

  15. FPGA market Dominated by 2 players : - Altera - Xilinx From sourcetech411(2010) From 67% of 2010, today they share together 90% of the market (4.5 billion usd revenues in 2012) May 10, 2014 R.Innocente 15

  16. An important question: are FPGAs green ? Virtex-7 2000T (one of the top FPGAs) : ~ 20 W CPU : ~ 100 W Core i7-4770K Haswell (22 nm) 3.5 GHz@ 4 Cores 84 W Core i7-3930K Sandybridge-E (32 nm) 3.2 GHz @6Cores 130 W Xeon E7458 Dunnington (45 nm) 2.4 GHz 90 W Xeon E7460 Dunnington (45 nm) 2.66 GHz 130 W Xilinx showed 3600 copies of its 8 bit processor nanoblaze running on Virtex-7, consuming 20 W GPU : ~ 220 W Nvidia Tesla M2090 225 W Nvidia Tesla K20X 235 W This is a partial answer. We need to be able to estimate FPGA performance to give a more useful index. May 10, 2014 R.Innocente 16

  17. FPGA architecture From RF and Wireless World Sea of gates : logic blocks are like islands in a sea of interconnections May 10, 2014 R.Innocente 17

  18. Virtex family 1998 Virtex 250nm 100mhz 25k-60k cells 2000 Virtex-E 180nm 300mhz 1k-70kcells From L Zhuo 2000 Virtex II 150nm to168 mult420mhzupto 93k 4-luts 2005 Virtex-4 90nm 500mhz upto 200k cells 2007 Virtex-5 65nm 550mhz up to 330k cells Virtex-6 40nm 288-2k DSP to 500k 6-luts 2010 Virtex-7 28nm ~500mhz upto 2000k cells 2014 Virtex-US 20 nm upto 4400k cells Up to ~ 7 billion transistor Intel 2014 15-core Xeon IvyBridge-EX~ 4.3 billion transistor Nvidia 2012 GK110 Kepler ~ 7 billion transistor May 10, 2014 R.Innocente 18

  19. FPGA/CPU evolution May 10, 2014 R.Innocente 19

  20. Virtex-7 is not monolithic 2.5 D technology : 4 FPGA tiles with silicon interposer that provides 10k Interconeections between layers May 10, 2014 R.Innocente 20

  21. Enabling technologies May 10, 2014 R.Innocente 21

  22. Programming technology/1 Disordered except at very low range Antifuse SRAM OTP(One time programmable) Pass transistor in switch block May 10, 2014 R.Innocente 22

  23. Programming technology/2 Antifuse -pros: cheap, small -cons: requires special processing, One time programming SRAM -pros: can be deployed with standard semiconductor process, can be easily reprogrammed -cons: large area required(6 transistors) May 10, 2014 R.Innocente 23

  24. Confware The configuration of an FPGA ( that becomes compiled to a stream of bits) is not hardware, nor software. Someone invented the neologism confware The configuration of a reconfigurable hardware. May 10, 2014 R.Innocente 24

  25. How you configure an FPGA ? SRAM cells as a long shift register : loaded serially clocking in the confware Virtex 7 2000T = 440 Mbits of SRAM cells (simplified : large fpgas can also parallel load the confware) May 10, 2014 R.Innocente 25

  26. Logic Blocks/Logic Cells May 10, 2014 R.Innocente 26

  27. Fine/coarse grain logic blocks From : - a single transistor (Crosspoint : went in bankrupcy) - a logic gate To : - a complete processor (FPNA: field programmable node arrays) NB. FPNA is also field programmable neural array May 10, 2014 R.Innocente 27

  28. CLB(Configurable Logic Blocks) Homogeneous : - Logic Cells: 4 input LUT(LookUp Table) + FlipFlop Heterogeneous(modern development) : - Logic cells - DSP (Digital Signal Processing) - Memory blocks - I/O blocks Necessary differentiation to allow things like multiplication/addition to be mapped in an efficient way. The heterogenous architecture is prevalent now. The blocks are configured by SRAM bits usually loaded trough serial ports as already pointed out. May 10, 2014 R.Innocente 28

  29. Standard Logic Cell 16 bits of SRAM for conf 1 bit SRAM conf 4 input LUT D type FlipFlop 2:1 Mux May 10, 2014 R.Innocente 29

  30. standard LUT (Look Up Table) - 16 x 1 memory Dec Bin Out - any boolean function of 4 inputs : 0 0000 0 Bit 3 1 0001 1 2 0010 0 Bit 2 3 0011 0 4 0100 1 Bit 1 5 0101 0 6 0110 1 7 0111 1 NB. LUT rhymes with nut Bit 0 .. .. .. f = ̄ x3 ̄ x2 ̄ x1 x0+ ̄ x3 x2 ̄ x1 ̄ x0+ ̄ x3 x2 x1 ̄ x0+ ̄ x3 x2 x1x0 May 10, 2014 R.Innocente 30

  31. Uses of Logic Cell 2^4 = 16 x 1 bit memory Any boolean function of 4 inputs 4:1 multiplexer May 10, 2014 R.Innocente 31

  32. Virtex-7 Logic Block basics May 10, 2014 R.Innocente 32

  33. Virtex-7 Logic slice From Xilinx 4 x 32=128 bit shift reg May 10, 2014 R.Innocente 33

  34. Virtex7 CLB slice - 6-input LUT - 2 5-input LUTs with same inputs - 2 arbitrary boolean function on 3-input and 2-input or less May 10, 2014 R.Innocente 34

  35. Altera ALM May 10, 2014 R.Innocente 35

  36. Interconnection network May 10, 2014 R.Innocente 36

  37. Interconnection network Hierarchical routing Island type routing(predominant) Nearest neighbours Interconnection network can consume 80% of the area of an FPGA ! May 10, 2014 R.Innocente 37

  38. Programmable switch May 10, 2014 R.Innocente 38

  39. SRAM routing: coarse/fine grain 5 bit SRAM 1 bit SRAM May 10, 2014 R.Innocente 39

  40. Details of island type routing May 10, 2014 R.Innocente 40

  41. Disjoint/Wilton switch blocks Disjoint : wire can only go out on wire of same number, creates routing domains Wilton : can change domain in at least one directions May 10, 2014 R.Innocente 41

  42. Channel segments distribution May 10, 2014 R.Innocente 42

  43. Columnar architecture 7 series Xilinx fpga Columnar architecture May 10, 2014 R.Innocente 43

  44. DSP blocks & floating point May 10, 2014 R.Innocente 44

  45. FPGAs floating point in 1994 B. Fagin and C. Renard. Field Programmable Gate Arrays and Floating Point Arithmetic. IEEE Transactions on VLSI Systems, 2(3), September 1994. Fagin & Renard report that you can implement floating point operators but it is impractical : no FPGA in existence could contain a single multiplier circuit !! May 10, 2014 R.Innocente 45

  46. FPGA fp in 1995 Shirazi & al. On the same line of Fagin & Renard propose 2 custom fp formats 16 and 18 bits total: they provide for them add,sub, mul, div operators N. Shirazi, A. Walters, and P. Athanas. Quantitative Analysis of Floating Point Arithmetic on FPGA Based Custom Computing Machines. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, April 1995. May 10, 2014 R.Innocente 46

  47. FPGA fp in 2002 Belanovic & Leeser present a library of variable width parameterized floating point operators (superset of the ieee formats) A Library of Parameterized Floating-point Modules and Their Use Pavle Belanovic and Miriam Leeser, 2002 May 10, 2014 R.Innocente 47

  48. What allowed the breakthrough ? The addition, by major vendors, of hardware multipliers (called DSP blocks) on their FPGA from 2000 on : - 1st Xilinx on Virtex II - soon after Altera on Stratix This started in the last decade also the interest of HPC community : Cray XD1, Silicon RASC, Convey HC1 HPRC = High Performance Reconfigurable Computing May 10, 2014 R.Innocente 48

  49. FPGA MAC operation May 10, 2014 R.Innocente 49

  50. Virtex-7 DSP48 high level 1 bit 2 bit From Xilinx May 10, 2014 R.Innocente 50

More Related