1 / 12

Co-Processor Architectures Fermi vs. Knights Ferry

Co-Processor Architectures Fermi vs. Knights Ferry. Roger Goff Dell Senior Global CERN/LHC Technologist +1.970.672.1252 | Roger_Goff@dell.com. nVidia Fermi Architecture. Up to 512 cores 16 Streaming multiprocessors each with 32 cores @ 1.3GHz Parallel DataCache 64 KB Shmem/L1 Cache

raisie
Download Presentation

Co-Processor Architectures Fermi vs. Knights Ferry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Co-Processor ArchitecturesFermi vs. Knights Ferry Roger Goff Dell Senior Global CERN/LHC Technologist +1.970.672.1252 | Roger_Goff@dell.com

  2. nVidia Fermi Architecture • Up to 512 cores • 16 Streaming multiprocessors each with 32 cores @ 1.3GHz • Parallel DataCache • 64 KB Shmem/L1 Cache • 768 KB Unified L2 Cache • Six 64-bit memory partitions • 384-bit memory interface • Up to 6 GB GDDR5 DRAM • Up to 16 concurrent kernels • IEEE floating point math • ECC memory

  3. Fermi Streaming Multiprocssor Architecture • 32 Cores • 32-bit Integer ALU with 64-bit extensions • Full IEEE 754-2008 32-bit and 64-bit precision • 64 KB Shared Memory/L1 cache • 16KB Shmem/48KB cache or 48KB Shmem/16KB L1 cache • 16 load/store units • Dual Warp scheduler (dual instruction issue) • Four Special Function Units (SFUs) for sin, cosine, reciprocal, and square root operations

  4. Comparison to Previous nVidia GPGPUs

  5. Intel MIC Architecture • 32 Cores @ 1.2 GHz • 4 threads/core, 128 total parallel threads • 32KB i-cache, 32KB d-cache • 256KB coherent L2 cache (8MB total) • 512bit vector unit • 16 Single precision FLOPs/clock • 8 Double precision FLOPS/clock • Pronounced “Mike” • Many cores with many threads per core • Standard IA programming and memory model • Knights Ferry Software development platform 1-2GB GDDR5 connected to host memory through PCI DMA operations with virtual addressing Intel HPC developer tools

  6. MIC Programming Environment • Inherently supports OpenMP. • Virtual memory environment extends back to host memory. • Intel Parallel Studio and Cluster Studio support MIC. • Optimizing performance will take almost as much effort as for CUDA and OpenCL environments.

  7. Knights Corner1st Production MIC Co-processor • Second Half 2012 • Knowns: • 50+ cores • 22nm manufacturing process • Unknowns: • Core frequency • Size of GDDR5 memory on board • ECC support

  8. Co-processor Comparison

  9. Co-processor Adoption • Commercial adoption: • Oil & Gas/seismic data processing • Financial services • Ray tracing • Molecular dynamics • Commercial applications: MATLAB, ANSYS • Barriers to adoption • Lack of parallel programming skills • Immature software development environment & standards • CUDA vs. OpenCL vs. OpenMP • Waiting for the compiler or libraries to abstract the accelerator • Uncertainty of benefit vs. effort • Amdahl’s law is still the law! Maximum Speedup = • Huge investment in current codes

  10. AMD “New Era of Processor Performance”

  11. Final Thoughts • Co-processors are here to stay, but their architectures will continue to evolve. • Programing tools will get easier to use and will further integrate co-processing technology. • Further abstraction of the underlying co-processor hardware is necessary to achieve broad adoption. • Processors from Intel and AMD will integrate co-processors before the end of the decade. • Preparing applications for extreme parallelism will enable users to get the most out of future systems.

  12. Thank you! Roger Goff Dell Senior Global CERN/LHC Technologist +1.970.672.1252 | Roger_Goff@dell.com

More Related