1 / 13

Implementing Memory & Run Time Efficient Image Texture Classification using NVIDIA GPU

Implementing Memory & Run Time Efficient Image Texture Classification using NVIDIA GPU. SHREYAS PARNERKAR. Motivation.

demeter
Download Presentation

Implementing Memory & Run Time Efficient Image Texture Classification using NVIDIA GPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementing Memory & Run Time Efficient Image Texture Classification using NVIDIA GPU SHREYAS PARNERKAR

  2. Motivation • Texture analysis is important in many applications of computer image analysis for classification or segmentation of images based on local spatial variations of intensity or color. • Applications include industrial and biomedical surface inspection, for example for defects and disease, segmentation of satellite or aerial imagery, segmentation of textured regions in document analysis. • Most texture classification methods derive features based on output of large filter banks (13 – 48 dimensional feature space).

  3. Motivation • Tuzel et al. use image intensities and first and second order derivatives of intensities in both x and y direction for texture classification which results in a 5 dimensional feature space. • These features are used to calculate co-variance matrices using Integral images (P & Q). • Calculation of integral images is computationally intensive because of highly nested loops.

  4. Algorithm: Integral Image Calculations

  5. Dependence Graph ROWS COLUMNS

  6. GPU Utilization concerns • Such scheduling results in a maximum of W or H elements to be executed in parallel. • But at other instances, it is always less than the maximum. • GPU utilization drops down resulting in slow-down since plenty of threads are idle. • Such scheduling is hence not good for GPU implementation.

  7. Memory Concerns • Shared Memory Limited to 4kB . Cannot put entire image in shared memory. • Global memory is slow compared to shared memory. • Uploading entire image in global memory causes interference with the graphic display (??). • Put just the required data in shared memory. • Required data can be entire image.

  8. Updated Dependence Graph ROWS ROWS COLUMNS COLUMNS + =

  9. Results

  10. Results CPU Over-Head

  11. Results

  12. Yet to come…. Scope to improve the speed up

  13. In Conclusion… • Implement parallel reduction for even more speed up. (In progress) • Use calculated P-Q integral images to calculate covariance. ( Can be done on CPU ) • Read Data from actual images (Currently sample random data is generated). • Compare Memory Usage for CPU vs GPU implementation.

More Related