Graphics processing unit
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Graphics Processing Unit PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on
  • Presentation posted in: General

Graphics Processing Unit. Joshua Reynolds Ted Gardner. GPUs - Background. Graphics are one of the most obvious examples of embarrasingly parallel computations Graphics cards use their own computational unit – the GPU GPUs have evolved to process graphics in a highly parallel way. Shaders.

Download Presentation

Graphics Processing Unit

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Graphics processing unit

Graphics Processing Unit

Joshua Reynolds

Ted Gardner


Gpus background

GPUs - Background

Graphics are one of the most obvious examples of embarrasingly parallel computations

Graphics cards use their own computational unit – the GPU

GPUs have evolved to process graphics in a highly parallel way


Shaders

Shaders

  • Shader types

    • Pixel/Fragment, Vertex, and Geometry

    • Unified shader model allows for a single shader to be used for any of the three types of shader

  • Functions

    • Read/write data from buffer

    • Perform arithmetic operations

  • Run entirely in parallel and can be very numerous

  • Example - Radeon HD 8xxx generation

    • Radeon HD 8350 has 80 unified shaders

    • Radeon HD 8970 has 2048 unified shaders


Example nvidia tesla

Example: NVIDIA Tesla

  • Up to 128 scalar processors

  • 12,000+ concurrent threads in flight

  • 470+ GFLOPS sustained performance

  • 100x or better speedups on GPUs


General purpose computing on gpu

General Purpose Computing on GPU

  • GPUs were originally designed for manipulation of graphics

  • Shaders are programmable, and can be used for non-graphical data

  • Each shader can apply a kernel to a set of data (or to create a set of data)

  • Individual shaders are generally slower and more limited than CPU cores, but their parallel nature can give a dramatic speedup


Computational uses

Computational Uses

  • Conway's Game of Life

  • Video encoding/decoding

  • Fluid Simulation

  • N-Body Simulation

  • Fourier Transform

  • Computation of Voronoi Diagrams

  • Crack UNIX password encryption(PixelFlow SIMD graphics computer)

  • Computation of artificial neural networks

  • Bitcoin mining (SHA-256)


Programming languages

Programming Languages

  • CUDA (C, C++ and Fortran)

    • Third party wrappers for: Python, Perl, Java, Ruby, LUA, Haskell, MATLAB, IDL, Mathematica

  • OpenCL(C99)

    • Wrappers for: C++, C, Java, C#, Python, Ruby, Perl, Lisp, Haskell, Mathematica, R, MATLAB, Pascal


Vendors

Vendors

  • Cuda

    • NVIDIA

  • OpenCL

    • NVIDIA

    • AMD

    • Apple

    • Intel

    • IBM

    • Portable OpenCL


Primary scheduler

Primary Scheduler


Voronoi diagram shops

Voronoi diagram - Shops


Centroidal voronoi tessellation

Centroidal Voronoi Tessellation


Gpu assisted computation of centroidal voronoi tessellation

GPU-Assisted Computation of Centroidal Voronoi Tessellation


Performance tuning optimization

Performance Tuning - Optimization

  • Populating all of the multiprocessors.

  • Being able to keep the cores busy with multithreading.

  • Optimizing device memory accesses for contiguous data, essentially optimizing for stride-1 memory accesses

  • Utilizing the software data cache to store intermediate results or to reorganize data that would otherwise require non-stride-1 device memory accesses.

  • Take advantage of asynchronous kernel launches by overlapping CPU computations with kernel execution


Example kernel prime number sieve opencl

__kernelvoid composite(int currentPrime, __globalchar* output){

size_t i = currentPrime*currentPrime+currentPrime*get_global_id(0);

output[i]='c';

}

Example Kernel - Prime Number Sieve (OpenCL)

  • CPU sets up data as array of "P" characters

    • 'P' denotes prime

    • 'c' denotes composite

  • For each prime, the CPU instructs the GPU to apply the composite kernel on the array

    • Kernel applies marking on the array

    • get_global_id(0) - "Rank" of the process, transformed so that the GPU only needs to run the kernel on the factors of the prime


Test o n 2 description

Test - O(n2)

Description

  • List of n integers, numbered 0 to n

  • For each value in list, add up and store all the values in the list

    • Obviously not the best algorithm for summing values in parallel, but we're just trying to simulate O(n2)

  • CPU has 4 cores

  • GPU has 480 unified shaders

  • OpenCL applies same kernel to GPU and CPU


Test o n 2 opencl kernel

Test - O(n2)

OpenCL kernel

__kernelvoid sum(__globalint* input, __globalint* output){

size_t i = get_global_id(0);

int out = 0;

for(int j = 0; j < get_global_size(0); j++){

out += input[j];

}

output[i] = out;

}


Test o n 2 result

Test - O(n2)

Result


Video example

Video Example


Cuda vs opencl

Cuda VS OpenCL

  • Cuda

    • More Popular

    • Large and mature libraries

    • Slightly faster

    • NVIDIA only

  • OpenCL

    • More Flexible Synchronization

    • Can enqueue regular CPU function pointers in its command queues

    • Run-time code generation built-in


Sources

Sources

http://techreport.com/review/17670/nvidia-fermi-gpu-architecture-revealed/2

http://people.maths.ox.ac.uk/~gilesm/hpc/NVIDIA/NVIDIA_CUDA_Tutorial_No_NDA_Apr08.pdf

Guodong Rong; Yang Liu; Wenping Wang; Xiaotian Yin; Gu, X.D.; Guo, Xiaohu, "GPU-Assisted Computation of Centroidal Voronoi Tessellation," Visualization and Computer Graphics, IEEE Transactions on , vol.17, no.3, pp.345,356, March 2011

http://www.computer.org/csdl/trans/tg/2011/03/ttg2011030345-abs.html

http://www.math.psu.edu/qdu/Res/Pic/gallery3.html


  • Login