1 / 62

GPU Acceleration in Registration

GPU Acceleration in Registration. Danny Ruijters 26 April 2007. Outline. The GPU Rigid 3D-3D Registration Elastic Registration Conclusions. The GPU. The graphics card. Raserization of primitives Texture mapping Colour interpolation. The GPU. Graphics Processing Unit

yeo-butler
Download Presentation

GPU Acceleration in Registration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU Acceleration in Registration Danny Ruijters 26 April 2007

  2. Outline • The GPU • Rigid 3D-3D Registration • Elastic Registration • Conclusions

  3. The GPU

  4. The graphics card • Raserization of primitives • Texture mapping • Colour interpolation

  5. The GPU • Graphics Processing Unit • Programmable processor in the graphics rendering pipeline • Parallel execution (SIMD like)

  6. video memory on-chip cache memory vertex shading (T&L) pre-TnL cache geometry system memory commands post-TnL cache CPU triangle setup rasterization texture cache fragment shading and raster operations textures frame buffer Graphics rendering pipeline

  7. video memory on-chip cache memory transfer limited vertex shading (T&L) pre-TnL cache transform limited geometry system memory commands post-TnL cache setup limited CPU triangle setup texture limited raster limited rasterization CPU limited fragment shader limited fragment shading and raster operations texture cache textures frame buffer frame buffer limited Bottlenecks

  8. 128 processing units Local cache Shared memory

  9. Performance

  10. Performance • Parallelism & pipelining (up to 16 parallel pipelines) • Vector processor • Moore’s Law: CPU: 2* performance per 18 months • GPU: 2* performance per 6 months

  11. Textures & buffers • 1D, 2D, 3D textures • 2D output buffers (frame buffer, accumulation buffer, stencil buffer, p-buffer) • 8, 10, 12, 16 bit integers, 16, 32 bit floating point • 1 (intensity), 2 (luminance-alpha), 3 (RGB), 4 (RGBA) components per pixel

  12. Historic overview GPU • RenderMan (1988, pre-history) • Intel MMX (SIMD, 1997, pre-history) • Register combiners (nVidia, 1999, bronze age) • Vender specific APIs (2001, iron age) • Generic assembly-like language (2002, middle-ages) • Different high-level languages (2003, industrial age) • CUDA: general purpose C-like language (2007, modern age)

  13. Register combiners (1999, bronse age) // Stage 0 // spare0.rgb = gradient dot ViewDir, spare1.rgb = -(gradient dot ViewDir) glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_A_NV,GL_TEXTURE0_ARB,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_B_NV,GL_CONSTANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_C_NV,GL_TEXTURE0_ARB,GL_EXPAND_NEGATE_NV,GL_RGB); glCombinerInputNV(GL_COMBINER0_NV,GL_RGB,GL_VARIABLE_D_NV,GL_CONSTANT_COLOR1_NV,GL_EXPAND_NORMAL_NV,GL_RGB); glCombinerOutputNV(GL_COMBINER0_NV,GL_RGB,GL_SPARE0_NV,GL_SPARE1_NV,GL_DISCARD_NV,GL_NONE,GL_NONE,GL_TRUE,GL_TRUE,GL_FALSE);

  14. GL_ARB_fragment_program (2002) !!ARBfp1.0 ATTRIB coord = fragment.texcoord[0]; ATTRIB color = fragment.color; OUTPUT out = result.color; TEMP texel; TEMP lookup; TEX texel, coord, texture[0], 3D; TEX lookup, texel, texture[1], 1D; MUL out, lookup, color; END

  15. GLSlang (2003) uniform vec3 ViewDir; void main (void) { floatvalue; vec3 gradient; gradient = texture3(0, gl_TexCoord0) * 2.0 - 1.0; value = 1.0 - abs(dot(gradient, ViewDir)); value *= 1.3 * dot(gradient, gradient); value = clamp(value, 0.0, 1.0); gl_FragColor = vec4(value); }

  16. CUDA (2007) • Compute Unified Device Architecture • General purpose C-like language • nVidia only • Very recently released

  17. Rigid 3D-3D Registration

  18. 3DRA – MR registration

  19. 3DRA – XperCT Registration 1 Pre-operative

  20. 3DRA – XperCT Registration 2 Post-operative: verification of the embolization

  21. 3DRA Slice

  22. Mutual information F. Maes et al., "Multimodality Image Registration by Maximization of Mutual Information,“ IEEE Transactions on Medical Imaging 16(2), pp. 187-198, April 1997

  23. Joint histogram

  24. Resampling Joint histogram: increment(g,g)

  25. 3DRA – MR, before, after

  26. 3DRA – MR: CPU interpolation

  27. 3DRA – MR: GPU interpolation

  28. Elastic Registration

  29. Elastic deformation • Parameterized deformation: • B-spline deformation:

  30. Cubic B-spline

  31. GPU linear interpolation • Hardwired: linear interpolation is much faster than separate lookups

  32. = GPU Cubic Interpolation • Compose cubic interpolation from weighted sum of linear interpolations: C. Sigg, M. Hadwiger, “Fast Third-Order Texture Filtering”, GPU Gems 2

  33. = Outline of proof

  34. GPU Cubic Interpolation • 2D: 4 linear-interpolated lookups, instead of 16 direct lookups • 3D: 8 linear-interpolated lookups, instead of 64 direct lookups

  35. GPU Linear Interpolation Accuracy

  36. Linear deformation, linear interpolation

  37. Linear deformation, cubic interpolation

  38. Cubic deformation, linear interpolation

  39. Cubic deformation, cubic interpolation

  40. Optimization • Many parameters: huge parameter space • Solution: use derivatives like Jacobian, Hessian • Examples: Gradient Descent, Quasi-Newton, Levenberg-Marquardt

  41. GPU Elastic Registration Iteration • Generate deformed image on GPU & store to texture • Calculate Similarity Measure & First-Order Derivative on GPU • Texture with reference image • Texture with deformed image

  42. First-Order Derivative of Sim. Measure J. Kybic, M. Unser, “Fast Parametric Elastic Image Registration”

  43. Derivative of the Similarity Measure SSD:

  44. Derivative of the Deformed Image • Sobel operator to calculate gradients:

  45. Derivative of the Control Points • Constant • B-spline: separatable kernel of fixed size

  46. Original Fluoroscopy Sequence

  47. 2 * 2 Control Points

  48. 8 * 8 Control Points

  49. Deformation Field

  50. GPU Elastic Registration • 40 images: Quasi Newton: 16 seconds • Gradient Descent: 63 seconds • 8 * 8 Control Points: rest motion • Multi-resolution deformation field, with reduced parameters (discussed with Dirk Loeckx)

More Related