peter holvenstot
Download
Skip this Video
Download Presentation
OpenCL

Loading in 2 Seconds...

play fullscreen
1 / 17

OpenCL - PowerPoint PPT Presentation


  • 208 Views
  • Uploaded on

Peter Holvenstot. OpenCL. OpenCL. Designed as an API and language specification Standards maintained by the Khronos group Currently 1.0, 1.1, and 1.2 Manufacturers release their own SDK and drivers Major backers: Apple, AMD/ATI, Intel. OpenCL. Alternative to CUDA Not limited to ATI GPUs

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' OpenCL' - stasia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
opencl
OpenCL
  • Designed as an API and language specification
  • Standards maintained by the Khronos group
    • Currently 1.0, 1.1, and 1.2
  • Manufacturers release their own SDK and drivers
  • Major backers: Apple, AMD/ATI, Intel
opencl1
OpenCL
  • Alternative to CUDA
  • Not limited to ATI GPUs
  • Designed for “heterogenous computing”
  • Executable on many devices, including CPUs, GPUs, DSPs, and FPGAs
opencl2
OpenCL
  • Similar structure of host programs and kernels
  • Set of compute devices is called a \'context\'
  • Kernels executed by \'processing elements\'
  • Kernels can be compiled at run-time or build-time
opencl3
OpenCL
  • Task Parallelism – many kernels running at once
  • OpenCL 1.2 – device can be partitioned down to single Compute Unit
  • Built-in kernels for device-specific functionality
advantages
Advantages
  • Same code can be run on different devices
    • Can also be run on NVIDIA GPUs!
  • AMD/ATI attempting to integrate compute elements into other platforms (Accelerated Processing Units)
  • Limited library of portable math routines
    • Most common BLAST and FFT routines
disadvantages
Disadvantages
  • No “official” implementation
  • Vendors may meet specs or add restrictions
    • Apple adds restrictions on group size
  • Devices need appropriate settings to perform well
    • Different capabilities → different performance
    • Solution: Tuning/load balancing framework
restrictions
Restrictions
  • No recursion, variadics, or function pointer
  • Cannot dynamically allocate memory from device
  • No native variable-length arrays, double-precision
  • Some can be worked around by extensions
terminology
OpenCL:

Stream Core

Compute Unit

Wavefront

Intermediate Language

Terminology

CUDA:

  • Scalar Core
  • Streaming Multiprocssr
  • Warp
  • PTX
terminology1
OpenCL:

Host Memory

Global Memory

Global Memory

Constant Memory

Local Memory

Private Memory

Terminology

CUDA:

  • Host Memory
  • Global/Device Memory
  • Local Memory
  • Constant Memory
  • Shared Memory
  • Registers
terminology2
OpenCL:

NDRange

Work group

Work item

Global ID

Block ID

Local ID

Terminology

CUDA:

  • Grid
  • Block
  • Thread
  • Thread ID
  • Block Index
  • Thread Index
references
References
  • http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/CUDAvsOpenCL.pdf
  • https://wiki.aalto.fi/download/attachments/40025977/Cuda+and+OpenCL+API+comparison_presented.pdf
  • http://www.hpcwire.com/hpcwire/2012-02-28/opencl_gains_ground_on_cuda.html
  • http://www.netlib.org/utk/people/JackDongarra/PAPERS/parcocudaopencl.pdf
  • http://www.netlib.org/lapack/lawnspdf/lawn228.pdf
ad