speeding up k means by gpus n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Speeding up k -Means by GPUs PowerPoint Presentation
Download Presentation
Speeding up k -Means by GPUs

Loading in 2 Seconds...

play fullscreen
1 / 20
marva

Speeding up k -Means by GPUs - PowerPoint PPT Presentation

85 Views
Download Presentation
Speeding up k -Means by GPUs
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Speeding up k-Means by GPUs YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010

  2. Outline • Introduction • Efficiency of data mining -> GPGPU -> k-means on GPU; • Related work • Method • Research Plan

  3. Efficiency of Data mining • Face the challenge of efficiency due to the increasing data Parallel data mining Fig.2 Fig.1

  4. Control ALU ALU ALU ALU DRAM Cache DRAM GPGPU • A general-purpose and high performance parallel hardware; • Supply another platform for parallelizing data mining algorithms. CPU Fig.3 GPU

  5. k-means on GPU • Programming on GPU • CUDA: integrated CPU+GPU , C program • k-Means • Widely used in statistical data analysis, pattern recognition, etc.; • Easy to implement on CPU, suitable to implement on GPU;

  6. Outline • Introduction • Related work • UV_k-Means, GPUMiner and HP_k-Means; • Method • Research Plan

  7. Related work Speed of k-Means on low dimension data, in second. NVIDIA GTX 280 GPU; Intel(R) Core(TM) i5 CPU;

  8. Outline • Introduction • Related work • Method and Results • k-Means(three steps)-> step 1 -> step 2 -> step 3; • Experiments; • Research Plan

  9. k-Means algorithm n data point; k centroid; Step 1 O(nkd) Compute distanc (ni, ki) Step 2 O(nk) find the closest centroid Step 3 O(nd) compute new centroid Memory Mechanism If centroid change? Yes No End

  10. Memory Mechanism of GPU • Global Memory • Large size • Long latency • Register • Small size • Short latency • User cannot control • Shared memory • Medium size • Short latency • User control

  11. k-Means on GPU • Key idea • Increase the number of computing operation for each global memory access; • Adopts the method from matrix multiplication and reduction. • Dimension is a key parameter • For low dimension: use register; • For high dimension: use shared memory;

  12. k-Means on GPU • For low dimension Read each data from global memory once

  13. k-Means on GPU • For high dimension Read each data from global memory once

  14. Experiments • The experiments were conducted on a PC with an NVIDIA GTX280 GPU and an Intel(R) Core(TM) i5 CPU. • GTX 280 has 30 SIMD multi-processors, and each one contains eight processors and performs at 1.29 GHz. The memory of the GPU is 1GB with the peak bandwidth of 141.7 GB/sec. • The CPU has four cores running at 2.67 GHz. The main memory is 8 GB with the peak bandwidth of 5.6 GB/sec. We use Visual Studio 2008 to write and compile all the source code. The version of CUDA is 2.3. • We calculate the time of the application after the file I/O, in order to show the speedup effect more clearly.

  15. Experiments • On low dimension data • Compare with HP, UV and GPUMiner, the data is generated randomly Four to ten times faster than HP

  16. Experiments • On high dimension data • Compare with UV and GPUMiner, the data is from KDD 1999. Four to eight times faster than UV

  17. Experiments • Compare with CPU • The results illustrate that our algorithm compares very favorably with other existing algorithms. Forty to two hundred times faster than CPU version

  18. Outline • Introduction • Related work • Method • Research Plan

  19. Research Plan • Detail analysis about k-Means on GPU • GFLOPS • Deal with even larger data set • Other data mining algorithms on GPU • K-nn • SDP (widely used in protein identification )

  20. Q & A • Thanks very much