1 / 13

Empowering efficient HPC with Dell

Empowering efficient HPC with Dell. Martin Hilgeman HPC Consultant EMEA. Amdahl’s Law. Gene Amdahl (1967): " Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities " . AFIPS Conference Proceedings (30): 483–485.

sawyer
Download Presentation

Empowering efficient HPC with Dell

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Empowering efficient HPC with Dell Martin Hilgeman HPC Consultant EMEA

  2. Amdahl’s Law Gene Amdahl (1967): "Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities". AFIPS Conference Proceedings (30): 483–485. “The effort expended on achieving high parallel processing rates is wasted unless it is accompanied by achievements in sequential processing rates of very nearly the same magnitude” a: speedup n: number of processors p: parallel fraction CHPC conference 2013

  3. Amdahl’s Law limits maximal speedup a: speedup n: number of processors p: parallel fraction CHPC conference 2013

  4. Amdahl’s Law and Efficiency Diminishing returns: Tension between the desire to use more processors and the associated “cost” CHPC conference 2013

  5. The Real Moore’s Law • The clock speed plateau • The power ceiling • IPC limit CHPC conference 2013

  6. Moore’s Law vs Amdahl's Law - “too Many Cooks in the Kitchen” Meanwhile Amdahl’s Law says that you cannot use them all efficiently Industry is applying Moore’s Law by adding more cores CHPC conference 2013

  7. What levels do we have? • Challenge: Sustain performance trajectory without massive increases in cost, power, real estate, and unreliability • Solutions: No single answer, must intelligently turn “Architectural Knobs” 5 3 1 2 4 Hardware performance What you really get CHPC conference 2013

  8. Turning the knobs 1 - 4 Frequency is unlikely to change much Thermal/Power/Leakage challenges Moore’s Law still holds: 130 -> 22 nm. LOTS of transistors Number of sockets per system is the easiest knob. Challenging for power/density/cooling/networking IPC still grows FMA3/4, AVX, FPGA implementations for algorithms Challenging for the user/developer 1 2 3 4 CHPC conference 2013

  9. Meanwhile… traditional IT is swimming in performance • Traditional IT server utilization rates remain low • New µServers are emerging, x86 and ARM • Further movement from 4->2->1 socket systems as their capabilities expand • What to do with all the capacity? • Software defined everything….. CHPC conference 2013

  10. Scaling sockets, power and density Extending design to facility Modularized compute/ storage optimization 2000 nodes, 30 PB storage, 600 kW in 22 m2 Shared Infrastructure evolving Highest efficiency for power and cooling ARM/ATOM: potential to disrupt perf/$$, perf/Watt model CHPC conference 2013

  11. Which leaves knob 5: make your hands dirty! DO it=1,noprec DO itSub=1,subNoprec ix = ir(1,it,itSub) iy = ir(2,it,itSub) iz = ir(3,it,itSub) idx = idr(1,it,itSub) idy = idr(2,it,itSub) idz = idr(3,it,itSub) sum = 0.0 testx = 0.0 testy = 0.0 testz = 0.0 DO ilz=-lsz,lsz irez = iz + ilz IF (irez.ge.k0z.and.irez.le.klz) THEN DO ily=-lsy,lsy irey = iy + ily IF (irey.ge.k0y.and.irey.le.kly) THEN DO ilx=-lsx,lsx irex = ix + ilx IF (irex.ge.k0x.and.irex.le.klx) THEN sum = sum + field(irex,irey,irez)& * diracsx(ilx,idx) & * diracsy(ily,idy) & * diracsz(ilz,idz) * (dx*dy*dz) testx = testx + diracsx(ilx,idx) testy = testy + diracsy(ily,idy) testz = testz + diracsz(ilz,idz) END IF END DO END IF END DO END IF END DO rec(it,itSub) = sum END DO END DO DO itSub=1,subNoprec DO it=1, noprec ix = ir(1,it,itSub) iy = ir(2,it,itSub) iz = ir(3,it,itSub) idx = idr(1,it,itSub) idy = idr(2,it,itSub) idz = idr(3,it,itSub) sum = 0.0 startz = MAX(iz-lsz,k0z) starty = MAX(iy-lsy,k0y) startx = MAX(ix-lsx,k0x) stopz = MIN(iz+lsz,klz) stopy = MIN(iy+lsy,kly) stopx = MIN(ix+lsx,klx) DO irez= startz, stopz ilz = irez - iz IF (diracsz(ilz,idz) .EQ. 0.d0 ) THEN CYCLE END IF dirac_tmp1 = diracsz(ilz,idz)*(dx*dy*dz) DO irey= starty, stopy ily = irey - iy dirac_tmp2 = diracsy(ily,idy) * dirac_tmp1 DO irex= startx, stopx ilx = irex - ix sum = sum + field(irex,irey,irez) & * diracsx(ilx,idx) & * dirac_tmp2 END DO END DO END DO rec(it,itSub)=sum END DO END DO 17 seconds 92 seconds CHPC conference 2013

  12. Efficiency optimization also applies across nodes CHPC conference 2013

  13. CHPC conference 2012

More Related