Peter holvenstot
Download
1 / 17

OpenCL - PowerPoint PPT Presentation


  • 208 Views
  • Uploaded on

Peter Holvenstot. OpenCL. OpenCL. Designed as an API and language specification Standards maintained by the Khronos group Currently 1.0, 1.1, and 1.2 Manufacturers release their own SDK and drivers Major backers: Apple, AMD/ATI, Intel. OpenCL. Alternative to CUDA Not limited to ATI GPUs

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' OpenCL' - stasia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Opencl
OpenCL

  • Designed as an API and language specification

  • Standards maintained by the Khronos group

    • Currently 1.0, 1.1, and 1.2

  • Manufacturers release their own SDK and drivers

  • Major backers: Apple, AMD/ATI, Intel


Opencl1
OpenCL

  • Alternative to CUDA

  • Not limited to ATI GPUs

  • Designed for “heterogenous computing”

  • Executable on many devices, including CPUs, GPUs, DSPs, and FPGAs


Opencl2
OpenCL

  • Similar structure of host programs and kernels

  • Set of compute devices is called a 'context'

  • Kernels executed by 'processing elements'

  • Kernels can be compiled at run-time or build-time


Opencl3
OpenCL

  • Task Parallelism – many kernels running at once

  • OpenCL 1.2 – device can be partitioned down to single Compute Unit

  • Built-in kernels for device-specific functionality


Advantages
Advantages

  • Same code can be run on different devices

    • Can also be run on NVIDIA GPUs!

  • AMD/ATI attempting to integrate compute elements into other platforms (Accelerated Processing Units)

  • Limited library of portable math routines

    • Most common BLAST and FFT routines





Disadvantages
Disadvantages

  • No “official” implementation

  • Vendors may meet specs or add restrictions

    • Apple adds restrictions on group size

  • Devices need appropriate settings to perform well

    • Different capabilities → different performance

    • Solution: Tuning/load balancing framework




Restrictions
Restrictions

  • No recursion, variadics, or function pointer

  • Cannot dynamically allocate memory from device

  • No native variable-length arrays, double-precision

  • Some can be worked around by extensions


Terminology

OpenCL:

Stream Core

Compute Unit

Wavefront

Intermediate Language

Terminology

CUDA:

  • Scalar Core

  • Streaming Multiprocssr

  • Warp

  • PTX


Terminology1

OpenCL:

Host Memory

Global Memory

Global Memory

Constant Memory

Local Memory

Private Memory

Terminology

CUDA:

  • Host Memory

  • Global/Device Memory

  • Local Memory

  • Constant Memory

  • Shared Memory

  • Registers


Terminology2

OpenCL:

NDRange

Work group

Work item

Global ID

Block ID

Local ID

Terminology

CUDA:

  • Grid

  • Block

  • Thread

  • Thread ID

  • Block Index

  • Thread Index


References
References

  • http://blog.accelereyes.com/blog/wp-content/uploads/2012/02/CUDAvsOpenCL.pdf

  • https://wiki.aalto.fi/download/attachments/40025977/Cuda+and+OpenCL+API+comparison_presented.pdf

  • http://www.hpcwire.com/hpcwire/2012-02-28/opencl_gains_ground_on_cuda.html

  • http://www.netlib.org/utk/people/JackDongarra/PAPERS/parcocudaopencl.pdf

  • http://www.netlib.org/lapack/lawnspdf/lawn228.pdf


ad