cuda itk l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CUDA ITK PowerPoint Presentation
Download Presentation
CUDA ITK

Loading in 2 Seconds...

play fullscreen
1 / 20

CUDA ITK - PowerPoint PPT Presentation


  • 512 Views
  • Uploaded on

CUDA ITK. Won-Ki Jeong SCI Institute University of Utah. NVIDIA G80. New architecture for computing on the GPU GPU as massively parallel multithreaded machine One step further from streaming model New hardware features Unified shaders (ALUs) Flexible memory access (scatter)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CUDA ITK' - JasminFlorian


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cuda itk

CUDA ITK

Won-Ki Jeong

SCI Institute

University of Utah

nvidia g80
NVIDIA G80
  • New architecture for computing on the GPU
    • GPU as massively parallel multithreaded machine
      • One step further from streaming model
    • New hardware features
      • Unified shaders (ALUs)
      • Flexible memory access (scatter)
      • Fast user-controllable on-chip memory
      • Integer, bitwise operations
nvidia cuda
NVIDIA CUDA
  • C-extension NVIDIA GPU programming language
    • No graphics API overhead
    • Easy to learn
    • Support development tools
  • Extensions / API
    • Function type : __global__, __device__, __host__
    • Variable type : __shared__, __constant__
    • cudaMalloc(), cudaFree(), cudaMemcpy(),…
    • __syncthread(), atomicAdd(),…
  • Program types
    • Device program (kernel) : run on the GPU
    • Host program : run on the CPU to call device programs
cuda itk4
CUDA ITK
  • ITK powered by CUDA
    • Many registration / image processing functions are still computationally expensive and parallelizable
    • Current ITK parallelization is bound by # of CPUs (cores)
  • Our approach
    • Implement several well-known ITK image filters using NVIDIA CUDA
    • Focus on 3D volume processing
      • CT / MRI datasets are mostly 3D volume
cuda itk5
CUDA ITK
  • CUDA code is integrated into ITK
    • Transparent to the itk users
    • No need to modify current code using ITK
  • Check environment variable ITK_CUDA
    • Entry point : GenerateData() or ThreadedGenerateData()
    • If ITK_CUDA == 0
      • Execute original ITK code
    • If ITK_CUDA == 1
      • Execute CUDA code
itk image space filters
ITK image space filters
  • Convolution filters
    • Mean filter
    • Gaussian filter
    • Derivative filter
    • Hessian of Gaussian filter
  • Statistical filter
    • Median filter
  • PDE-based filter
    • Anisotropic diffusion filter
speed up using cuda
Speed up using CUDA
  • Mean filter : ~ 140x
  • Median filter : ~ 25x
  • Gaussian filter : ~ 60x
  • Anisotropic diffusion : ~ 70x
convolution filters
Convolution filters
  • Separable filter
    • N-dimensional convolution = N*1D convolution
    • For filter radius r,
  • Example
    • 2D Gaussian = 2 * 1D Gaussian
gpu implementation

Input (global memory)

Output (global memory)

kernel

*

Shared memory

GPU implementation
  • Apply 1D convolution along each axis
    • Minimize overlapping
minimize overlapping

1

2

1

2

4

2

1

1

2

Multiple overlapping

No overlapping

Minimize overlapping
  • Usually kernel width is large ( > 20 for Gaussian)
    • Max block size ~ 8x8x8
    • Each pixel has 6 neighbors in 3D
  • Use long and thin blocks to minimize overlapping

1

1

1

1

median filter

1

2

0

3

4

1

1

2

8

1

3

4

1

0

1

2

8

1

0

1

4

3

1

8

2

8

1

0

1

4

3

1

16

4

5

11

Median filter
  • Viola et al. [VIS 03]
    • Finding median by bisection of histogram bins
    • Log(# bins) iterations (e.g., 8-bit pixel : 8 iterations)

Intensity :

0

1

2

3

4

5

6

7

1.

3.

2.

4.

pseudo code gpu median filter
Pseudo code (GPU median filter)

Copy current block from global to shared memory

min = 0;

max = 255;

pivot = (min+max)/2.0f;

For(i=0; i<8; i++)

{

count = 0;

For(j=0; j<kernelsize; j++)

{

if(kernel[j] > pivot) count++:

}

if(count < kernelsize/2) max = floor(pivot);

else min = ceil(pivot);

pivot = (min + max)/2.0f;

}

return floor(pivot);

perona malik anisotropic pde
Perona & Malik anisotropic PDE
  • Nonlinear diffusion
    • Fall-off function c (conductance)controls anisotropy
    • Less smoothing across high gradient
    • Contrast parameterk
  • Numerical solution
    • Euler explicit integration (iterative method)
    • Finite difference for derivative computation
gradient conductance map
Gradient & Conductance map
  • Half x / y / z direction gradients / conductance for each pixel
  • 2D example
    • For n^2 block, 4(n+1)^2 + (n+2)^2 shared memory required

Shared memory

Global memory

n*n

(n+2)*(n+2)

(n+1)*(n+1) * 4

(grad x, grad y, cond x, cond y)

euler integration
Euler integration
  • Use pre-computed gradients and conductance
    • Each gradient / conductance is used twice
    • Avoid redundant computation by using pre-computed gradient / conductance map
experiments
Experiments
  • Test environment
    • CPU : AMD Opteron Dual Core 1.8GHz
    • GPU : Tesla C870
  • Input volume is 128^3
result
Result
  • Mean filter
  • Gaussian filter
result18
Result
  • Median filter
  • Anisotropic diffusion
summary
Summary
  • ITK powered by CUDA
    • Image space filters using CUDA
    • Up to 140x speed up
  • Future work
    • GPU image class for ITK
      • Reduce CPU to GPU memory I/O
      • Pipelining support
    • Image registration
    • Numerical library (vnl)
    • Out-of-GPU-core processing
      • Seismic volumes (~10s to 100s GB)