1 / 36

TESLA

TESLA. GPU Computing. Supercomputing at 1/10 th the Cost http://www.nvidia.com/tesla. c. PARALLEL COMPUTING. PERSONAL COMPUTING. VISUALIZATION. TESLA TM. QUADRO TM. GeForce TM , TEGRA TM. GPGPU Revolutionizes Computing Latency Processor + Throughput processor. CPU. GPU.

ozzie
Download Presentation

TESLA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TESLA GPU Computing • Supercomputing at 1/10th the Cost • http://www.nvidia.com/tesla

  2. c PARALLEL COMPUTING PERSONALCOMPUTING VISUALIZATION TESLATM QUADROTM GeForceTM, TEGRATM

  3. GPGPU Revolutionizes Computing Latency Processor + Throughput processor CPU GPU

  4. Tesla Data Center & Workstation GPU Solutions Tesla M-series GPUs M2070M2050M1060 Tesla S-series 1U Systems S2050S1070 Tesla C-series GPUs C2070C2050C1060 Integrated CPU-GPU Servers & Blades OEM CPU Server + Tesla S-series 1U Workstations 2 to 4 Tesla GPUs

  5. GPU Servers Go Mainstream ® Tesla S870 Dec 2007 Tesla S1070 / M1060 2008-2009 Tesla M2050 / M2070 2010

  6. Public OEM Model Numbers (alphabetical by OEM)

  7. 4.5x Lower Power & Cooling Costs37 TeraFlop System : Top 150 System • 7x Less Space Required 2 Racks of GPU+CPUs 15 Racks of CPUs 5x Lower Cost $740 K $3.8 M 4.5x Power Savings every Year $117 K $524 K As per November 2009 Top 500 List

  8. 8x Higher Linpack CPU 1U Server: 2x Intel Xeon X5550 (Nehalem) 2.66 GHz,48 GB memory, $7K, 0.55 kw GPU-CPU 1U Server: 2x Tesla C2050 + 2x Intel Xeon X5550, 48 GB memory, $11K, 1.0 kw

  9. The World’s Fastest Supercomputer Tianhe-1A2.507 Petaflop7168 Tesla M2050 GPUsNational Supercomputing Center in Tianjin

  10. Dawning Nebulae Second Fastest Supercomputer in the World 1.27 Petaflop 4640 Tesla GPUs 2x Better Performance / Watt

  11. TSUBAME 2.0 Results from G80 and T10 GPUs on Tsubame 1.2 • Tsubame 2.0 Cluster • 1408 nodes with peak perf • 4224 GPUs = 2175 TFlops • 2816 CPUs = 216 TFlops • Memory = 80.55 TB • SSD = 173.88 TB • HP SL390 Server • 3x NVIDIA Tesla M2050 GPUs • 2x Intel Westmere-EP CPU • 52 GB DDR3 Memory • 2x 60 GB SSD • 2x QDR InfiniBand

  12. 1000+ GPU Clusters Around the World St. Petersburg University Norwegian Univ of S & T Nizhegorodsky University Copenhagen Aarhus Kazan Univ Daresbury Lab Groningen Max Planck Institute WestGrid PNNL 256 GPUs Oxford Institute of Physics Wisconsin VaTech Braunschweig Cambridge Utah Argonne Lab Fermi Lab Peking University OSC Maryland Osaka Riken 220 GPUs NERSC CEA Johns Hopkins NCSA 384 GPUs Tsinghua University Chinese Academy of Sciences 2000+ GPUs Harvard KISTI Berkeley Jefferson Labs TACC SNU Georgia Tech Stanford Tokyo Tech 680 GPUs Yonsei Univ of Science & Tech Delaware IIT Delhi UNC Indian Inst of Tropical Meteorology Oak Ridge NIT Calicut Nagasaki NCHC Indian Institute of Science National Taiwan Univ LRDE Dept of Space IIT Madras Anna Univ Curtin University CSIRO 256 GPUs Existing Deployment Prospective Deployment Swinburne University

  13. Increasing Number of Professional CUDA Applications Available Now Future • CUDA C/C++ • PGI • Accelerators • Platform LSF • Cluster Mgr • TauCUDA • Perf Tools • Parallel Nsight • Vis Studio IDE • TotalView • Debugger • PGI CUDA x866 Tools • MATLAB • PGI CUDA • Fortran • CAPS HMPP • Bright Cluster • Manager • Allinea DDTDebugger • ParaTools • VampirTrace • AccelerEyes • Jacket MATLAB • Wolfram Mathematica • NVIDIA NPP • Perf Primitives • EMPhotonics • CULAPACK • CUDA FFT • CUDA BLAS • Thrust C++ • Template Lib • MAGMA (LAPACK) • NVIDIA • Video Libraries • RNG & SPARSE CUDA Libraries Libraries • Headwave Suite • OpenGeoSolutionsOpenSEIS • GeoStar Seismic Suite • Acceleware • RTM Solver • StoneRidge • RTM • Paradigm • RTM • Panorama Tech Oil & Gas • ffA SVI Pro • VSG • Open Inventor • Seismic City • RTM • Tsunami • RTM • Paradigm • SKUA • AMBER • NAMD • HOOMD • TeraChem • BigDFT • ABINT • Acellera • ACEMD • DL-POLY Bio-Chemistry • GROMACS • LAMMPS • VMD • GAMESS • CP2K • OpenEye ROCS • PIPER • Docking • MUMmerGPU • CUDA-BLASTP • CUDA-MEME Bio-Informatics • HEX Protein • Docking • CUDA-EC • CUDA SW++ • SmithWaterm • GPU-HMMR CAE • ACUSIM • AcuSolve 1.8 • Autodesk • Moldflow • Prometch • Particleworks • Remcom • XFdtd 7.0 • ANSYS • Mechanical • FluiDyna • OpenFOAM • LSTC • LS-DYNA 971 • Metacomp • CFD++ • MSC.Software • Marc 2010.2 • Announced • Available

  14. Increasing Number of Professional CUDA Applications Available Now Future • Adobe Premier Pro CS5 • ARRI • Various Apps • GenArts • Sapphire • TDVision • TDVCodec • Black Magic • Da Vinci • The Foundry • Kronos Video • MainConcept • CUDA Encoder • Fraunhofer • JPEG2000 • Cinnafilm • Pixel Strings • Assimilate • SCRATCH • Elemental • Video • Bunkspeed • Shot (iray) • Refractive SW • Octane • Random Control Arion • ILM • Plume • Autodesk • 3ds Max • Cebas • finalRender • Works Zebra • Zeany Rendering • mental images • iray (OEM) • NVIDIA OptiX (SDK) • Caustic Graphics • Weta Digital • PantaRay • Lightworks • Artisan • Chaos Group • V-Ray GPU • NAG • RNG • Numerix Risk • SciComp • SciFinance • RMS Risk • Mgt Solutions Finance • Murex • MACS • Aquimin • AlphaVision • Hanweck • Options Analy • Agilent • EMPro 2010 • CST Microwave • Agilent ADS • SPICE • Acceleware • FDTD Solver • Rocketick • VeritlogSim EDA • Synopsys • TCAD • SPEAG • SEMCAD X • GaudaOPC • Acceleware • EM Solution • MvTec • Machine Vis • Siemens 4D Ultrasound • Digisens Medical • Schrodinger • Core Hopping • Useful Progress Med Other • MotionDSP • Ikena Video • Manifold • GIS • Dalsa Machine Vision • Digital Anarchy Photo • Announced • Available

  15. 20+ Oil & Gas Companies Porting to CUDA • Successful Customers • Oil & Gas ISVs

  16. Finance: 10+ Banks Porting to CUDA • Successful Customers 124x Several unannounced 77x • Finance ISVs UnRisk

  17. Defense / Federal Agencies Software Available • Opportunities • Defense Contractors • Federal Agencies • Defense services • Speedups 10x-50x • GIS • Manifold, PCI Geomatics, DigitalGlobe • Signal Processing • GPU VSIPL • MATLAB • GPU Plugin available • UAV video analysis • MotionDSPIkena • Virtual Prototyping • RealityServer • Surveillance, Cryptography

  18. Tesla Bio WorkBench : Bio-Chemistry & Bio-Informatics TeraChem Hex (Docking) • Applications LAMMPS CUDA-MEME CUDA-BLASTP CUDA-EC MUMmerGPU • Community Download, Documentation Technical papers Discussion Forums Benchmarks & Configurations Tesla GPU Clusters Tesla Personal Supercomputer • Platforms

  19. ANSYS Mechanical > 125K Commercial Seats Faster Better Quality =

  20. MATLAB GPU Performance in High-Level Programming Tool 1 million+ Usersin 175+ Countries 3,500+Universities Worldwide Faster Productivity = 1,500+MATLAB/Simulink Books

  21. NVIDIA Developer Eco-System Parallelizing Compilers GPU Compilers Numerical Packages Debuggers & Profilers C C++ Fortran OpenCL DirectCompute Java Python PGI Accelerator CAPS HMPP mCUDA OpenMP MATLAB Mathematica NI LabView pyCUDA cuda-gdb NV Visual Profiler Parallel Nsight Visual Studio Allinea TotalView Libraries BLAS FFT LAPACK NPP Video Imaging GPULib GPGPU Consultants & Training OEM Solution Providers ANEO GPU Tech

  22. Doing GPU Computing RightCombination of Hardware and Software GPU Computing Applications Java Python Wrappers Direct Compute C++ C OpenCLtm Fortran NVIDIA GPU CUDA Parallel Computing Architecture OpenCL is trademark of Apple Inc. used under license to the Khronos Group Inc.

  23. Parallel Nsight Visual Studio Visual Profiler For Linux cuda-gdb For Linux

  24. Compiling C for CUDA Applications void serial_function(… ) { ... } void other_function(int ... ) { ... } void saxpy_serial(float ... ) { for(int i = 0; i<n; ++i) y[i] = a*x[i] + y[i]; } void main( ) { float x; saxpy_serial(..); ... } • C CUDA • Key Kernels • Rest of C • Application NVCC (Open64) • CPU Compiler Modify into Parallel CUDA code • CUDA object • files • CPU object • files Linker • CPU-GPU • Executable

  25. C for CUDA : C with a few keywords Standard C Code Parallel C Code

  26. CUDA C/C++ Continuous Innovation CUDA Toolkit 1.x CUDA Toolkit 2.x CUDA Toolkit 3.x • New in 3.2 • New cuSPARSE Library • New cuRAND Library (Sobol) • Support for 6GB Tesla & Quadro • Multi-GPU Debugging • Math Library Perf Improvements • Cluster Management Features • Integrated TCC Mode • Fermi arch support • C++ Class Templates • C++ Class Inheritance • Tools updates • cuda-memcheck • GPUDirect™ • 16-way concurrency • Function pointers & recursion • Double Precision • cuda-gdb • Visual Profiler • Compiler • Optimizations • Vista 32/64 • Mac OSX • 3D Textures • HW Interpolation • C Compiler • C Extensions • Single Precision • BLAS • FFT • SDK w/ 40 samples • Win XP 64 • Atomics support • Multi-GPU support • DP FFT • Parallel Nsight (beta) • 16-32 Conversion • intrinsics • Performance • enhancements

  27. CUDA 4.0 for Broader Developer Adoption

  28. 4 in Japanese, 3 in English, 2 in Chinese, 1 in Russian)

  29. Performance Benchmarks

  30. Performance Summary Preliminary data

  31. Standard FFT Library: cuFFT 3.2 cuFFT 3.2: NVIDIA Tesla C1060, Tesla C2050 (Fermi) MKL 10.2.4.32: Quad-Core Intel Xeon 5550, 2.67 GHz

  32. Standard BLAS Library: cuBLAS 3.2 cuBLAS 3.2: NVIDIA Tesla C1060, Tesla C2050 (Fermi) MKL 10.2.4.32: Quad-Core Intel Xeon 5550, 2.67 GHz

  33. Matrix Size for Best CUBLAS3.2 Performance cuBLAS 3.2: NVIDIA Tesla C1060, Tesla C2050 (Fermi) MKL 10.2.4.32: Quad-Core Intel Xeon 5550, 2.67 GHz

  34. CULA 1.3 LAPACK Library from EM Photonics Double Precision Results Data Courtesy: EM Photonics

  35. Sparse Matrix-Vector Multiplication (SpMV) SpMv: CUDA 3.0, Tesla C1060 and Tesla C2050 MKL 10.2: Intel Xeon 5550, 2.67 GHz Preliminary data

More Related