Enhancing k-Means Algorithm Efficiency through GPU Acceleration Techniques

Speeding up k-Means by GPUs YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010

Outline • Introduction • Efficiency of data mining -> GPGPU -> k-means on GPU; • Related work • Method • Research Plan

Efficiency of Data mining • Face the challenge of efficiency due to the increasing data Parallel data mining Fig.2 Fig.1

Control ALU ALU ALU ALU DRAM Cache DRAM GPGPU • A general-purpose and high performance parallel hardware; • Supply another platform for parallelizing data mining algorithms. CPU Fig.3 GPU

k-means on GPU • Programming on GPU • CUDA: integrated CPU+GPU , C program • k-Means • Widely used in statistical data analysis, pattern recognition, etc.; • Easy to implement on CPU, suitable to implement on GPU;

Outline • Introduction • Related work • UV_k-Means, GPUMiner and HP_k-Means; • Method • Research Plan

Related work Speed of k-Means on low dimension data, in second. NVIDIA GTX 280 GPU; Intel(R) Core(TM) i5 CPU;

Outline • Introduction • Related work • Method and Results • k-Means(three steps)-> step 1 -> step 2 -> step 3; • Experiments; • Research Plan

k-Means algorithm n data point; k centroid; Step 1 O(nkd) Compute distanc (ni, ki) Step 2 O(nk) find the closest centroid Step 3 O(nd) compute new centroid Memory Mechanism If centroid change? Yes No End

Memory Mechanism of GPU • Global Memory • Large size • Long latency • Register • Small size • Short latency • User cannot control • Shared memory • Medium size • Short latency • User control

k-Means on GPU • Key idea • Increase the number of computing operation for each global memory access; • Adopts the method from matrix multiplication and reduction. • Dimension is a key parameter • For low dimension: use register; • For high dimension: use shared memory;

k-Means on GPU • For low dimension Read each data from global memory once

k-Means on GPU • For high dimension Read each data from global memory once

Experiments • The experiments were conducted on a PC with an NVIDIA GTX280 GPU and an Intel(R) Core(TM) i5 CPU. • GTX 280 has 30 SIMD multi-processors, and each one contains eight processors and performs at 1.29 GHz. The memory of the GPU is 1GB with the peak bandwidth of 141.7 GB/sec. • The CPU has four cores running at 2.67 GHz. The main memory is 8 GB with the peak bandwidth of 5.6 GB/sec. We use Visual Studio 2008 to write and compile all the source code. The version of CUDA is 2.3. • We calculate the time of the application after the file I/O, in order to show the speedup effect more clearly.

Experiments • On low dimension data • Compare with HP, UV and GPUMiner, the data is generated randomly Four to ten times faster than HP

Experiments • On high dimension data • Compare with UV and GPUMiner, the data is from KDD 1999. Four to eight times faster than UV

Experiments • Compare with CPU • The results illustrate that our algorithm compares very favorably with other existing algorithms. Forty to two hundred times faster than CPU version

Outline • Introduction • Related work • Method • Research Plan

Research Plan • Detail analysis about k-Means on GPU • GFLOPS • Deal with even larger data set • Other data mining algorithms on GPU • K-nn • SDP (widely used in protein identification )

Q & A • Thanks very much

Enhancing k-Means Algorithm Efficiency through GPU Acceleration Techniques

Enhancing k-Means Algorithm Efficiency through GPU Acceleration Techniques

Presentation Transcript

K-means algorithm

K-means Clustering

Speeding up VirtualDub

Speeding Up

K-Means

Scalable K-Means++

K-means and Fuzzy K-means

Speeding Up on Curves

K-Means Clustering

K-means Clustering

Speeding up pattern matching by text compression

Speeding up VirtualDub

Speeding Things Up

Speeding Up Rendering

K-means Clustering

K-means Clustering

Clustering: K-Means

Speeding Up Your VIs

Speeding up Slowing down

K-means

Speeding Things Up