database operations on gpu l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Database Operations on GPU PowerPoint Presentation
Download Presentation
Database Operations on GPU

Loading in 2 Seconds...

play fullscreen
1 / 84

Database Operations on GPU - PowerPoint PPT Presentation


  • 338 Views
  • Uploaded on

Database Operations on GPU. Changchang Wu 4/18/2007. Outline. Database Operations on GPU Point List Generation on GPU Nearest Neighbor Searching on GPU. Database Operations on GPU. Design Issues. Low bandwidth between GPU and CPU A void frame buffer readbacks No arbitrary writes

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Database Operations on GPU' - Albert_Lan


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
database operations on gpu

Database Operations on GPU

Changchang Wu

4/18/2007

outline
Outline
  • Database Operations on GPU
  • Point List Generation on GPU
  • Nearest Neighbor Searching on GPU
design issues
Design Issues
  • Low bandwidth between GPU and CPU
    • Avoid frame buffer readbacks
  • No arbitrary writes
    • Avoid data rearrangements
  • Programmable pipeline has poor branching
    • Evaluate branches using fixed function tests
design overview
Design Overview
  • Use depth test functionality of GPUs for performing comparisons
    • Implements all possible comparisons <, <=, >=, >, ==, !=, ALWAYS, NEVER
  • Use stencil test for data validation and storing results of comparison operations
  • Use occlusion query to count number of elements that satisfy some condition
basic operations
Basic Operations

Basic SQL query

Select A

From T

Where C

A= attributes or aggregations (SUM, COUNT, MAX etc)

T=relational table

C= Boolean Combination of Predicates (using operators AND, OR, NOT)

basic operations7
Basic Operations
  • Predicates – ai op constant or ai op aj
    • Op is one of <,>,<=,>=,!=, =, TRUE, FALSE
  • Boolean combinations – Conjunctive Normal Form (CNF) expression evaluation
  • Aggregations – COUNT, SUM, MAX, MEDIAN, AVG
predicate evaluation
Predicate Evaluation
  • ai op constant (d)
    • Copy the attribute values ai into depth buffer
    • Define the comparison operation using depth test
    • Draw a screen filling quad at depth d

glDepthFunc(…)

glStencilOp(fail,zfail,zpass);

predicate evaluation9
Predicate Evaluation
  • Comparing two attributes:
    • ai op ajis treated as (ai – aj) op 0
  • Semi-linear queries
    • Easy to compute with fragment shader
boolean combinations
Boolean Combinations
  • Expression provided as a CNF
  • CNF is of form (A1 AND A2 AND … AND Ak)

where Ai = (Bi1 OR Bi2 OR … OR Bimi )

  • CNF does not have NOT operator
    • If CNF has a NOT operator, invert comparison operation to eliminate NOT

Eg. NOT (ai < d) => (ai >= d)

  • For example, compute ai within [low, high]
    • Evaluated as ( ai >= low ) AND ( ai <= high )
range query
Range Query
  • Compute ai within [low, high]
    • Evaluated as ( ai >= low ) AND ( ai <= high )
aggregations
Aggregations
  • COUNT, MAX, MIN, SUM, AVG
  • No data rearrangements
count
COUNT
  • Use occlusion queries to get pixel pass count
  • Syntax:
    • Begin occlusion query
    • Perform database operation
    • End occlusion query
    • Get count of number of attributes that passed database operation
  • Involves no additional overhead!
max min median
MAX, MIN, MEDIAN
  • We compute Kth-largest number
  • Traditional algorithms require data rearrangements
  • We perform no data rearrangements, no frame buffer readbacks
k th largest number
K-th Largest Number
  • By comparing and counting, determinate every bit in order of MSB to LSB
example parallel max
Example: Parallel Max
  • S={10,24,37,99,192,200,200,232}
  • Step 1: Draw Quad at 128(10000000)
    • S = {10,24,37,99,192,200,200,232}
  • Step 2: Draw Quad at 192(11000000)
    • S = {10,24,37,192,200,200,232}
  • Step 3: Draw Quad at 224(11100000)
    • S = {10,24,37,192,200,200,232}
  • Step 4: Draw Quad at 240(11110000)
  • – No values pass
  • Step 5: Draw Quad at 232(11101000)
    • S = {10,24,37,192,200,200,232}
  • Step 6,7,8: Draw Quads at 236,234,233 – No values pass, Max is 232
accumulator mean
Accumulator, Mean
  • Accumulator - Use sorting algorithm and add all the values
  • Mean – Use accumulator and divide by n
  • Interval range arithmetic
  • Alternative algorithm
    • Use fragment programs – requires very few renderings
    • Use mipmaps [Harris et al. 02], fragment programs [Coombe et al. 03]
accumulator
Accumulator
  • Data representation is of form

ak 2k + ak-1 2k-1 + … + a0

Sum = sum(ak) 2k+ sum(ak-1) 2k-1+…+sum(a0)

Current GPUs support no bit-masking operations

the algorithm
The Algorithm

>=0.5 means i-th bit is 1

implementation
Implementation
  • Algorithm
    • CPU – Intel compiler 7.1 with hyper-threading, multi-threading, SIMD optimizations
    • GPU – NVIDIA Cg Compiler
  • Hardware
    • Dell Precision Workstation with Dual 2.8GHz Xeon Processor
    • NVIDIA GeForce FX 5900 Ultra GPU
    • 2GB RAM
benchmarks
Benchmarks
  • TCP/IP database with 1 million records and four attributes
  • Census database with 360K records
analysis issues
Analysis: Issues
  • Precision
  • Copy time
  • Integer arithmetic
  • Depth compare masking
  • Memory management
  • No Branching
  • No random writes
analysis performance
Analysis: Performance
  • Relative Performance Gain
    • High Performance – Predicate evaluation, multi-attribute queries, semi-linear queries, count
    • Medium Performance – Kth-largest number
    • Low Performance - Accumulator
high performance
High Performance
  • Parallel pixel processing engines
  • Pipelining
  • Early Z-cull
  • Eliminate branch mispredictions
medium performance
Medium Performance
  • Parallelism
  • FX 5900 has clock speed 450MHz, 8 pixel processing engines
  • Rendering single 1000x1000 quad takes 0.278ms
  • Rendering 19 such quads take 5.28ms. Observed time is 6.6ms
  • 80% efficiency in parallelism!!
low performance
Low Performance
  • No gain over SIMD based CPU implementation
  • Two main reasons:
    • Lack of integer-arithmetic
    • Clock rate
advantages
Advantages
  • Algorithms progress at GPU growth rate
  • Offload CPU work
  • Fast due to massive parallelism on GPUs
  • Algorithms could be generalized to any geometric shape
    • Eg. Max value within a triangular region
  • Commodity hardware!
timing
Timing

Reduces a highly sparse matrix with N

elements to a list of its M active entries

in O(N) + M (log N) steps,

applications
Applications
  • Image Analysis
    • Feature Detection
  • Volume Analysis
  • Sparse Matrix Generation
searching
Searching
  • 1D Binary Search
  • Nearest Neighbor Search for High dimension space
  • K-NN Search
binary search
Binary Search
  • Find a specific element in an ordered list
  • Implement just like CPU algorithm
    • Assuming hardware supports long enough shaders
    • Finds the first element of a given value v
      • If v does not exist, find next smallest element > v
  • Search algorithm is sequential, but many searches can be executed in parallel
    • Number of pixels drawn determines number of searches executed in parallel
      • 1 pixel == 1 search
binary search53
Binary Search
  • Search for v0

Search starts at center of sorted array

v2 >= v0 so search left half of sub-array

Initialize

4

Sorted List

v0

v0

v0

v2

v2

v2

v5

v5

0

1

2

3

4

5

6

7

binary search54
Binary Search
  • Search for v0

v0 >= v0 so search left half of sub-array

Initialize

4

Step 1

2

Sorted List

v0

v0

v0

v2

v2

v2

v5

v5

0

1

2

3

4

5

6

7

binary search55
Binary Search
  • Search for v0

v0 >= v0 so search left half of sub-array

Initialize

4

Step 1

2

Step 2

1

Sorted List

v0

v0

v0

v2

v2

v2

v5

v5

0

1

2

3

4

5

6

7

binary search56
Binary Search
  • Search for v0

At this point, we either have found v0 or are 1 element too far left

One last step to resolve

Initialize

4

Step 1

2

Step 2

1

Step 3

0

Sorted List

v0

v0

v0

v2

v2

v2

v5

v5

0

1

2

3

4

5

6

7

binary search57
Binary Search
  • Search for v0

Done!

Initialize

4

Step 1

2

Step 2

1

Step 3

0

Step 4

0

Sorted List

v0

v0

v0

v2

v2

v2

v5

v5

0

1

2

3

4

5

6

7

binary search58
Binary Search
  • Search for v0 and v2

Search starts at center of sorted array

Both searches proceed to the left half of the array

Initialize

4

4

Sorted List

v0

v0

v0

v2

v2

v2

v5

v5

0

1

2

3

4

5

6

7

binary search59
Binary Search
  • Search for v0 and v2

The search for v0 continues as before

The search for v2 overshot, so go back to the right

Initialize

4

4

Step 1

2

2

Sorted List

v0

v0

v0

v2

v2

v2

v5

v5

0

1

2

3

4

5

6

7

binary search60
Binary Search
  • Search for v0 and v2

We’ve found the proper v2, but are still looking for v0

Both searches continue

Initialize

4

4

Step 1

2

2

Step 2

1

3

Sorted List

v0

v0

v0

v2

v2

v2

v5

v5

0

1

2

3

4

5

6

7

binary search61
Binary Search
  • Search for v0 and v2

Now, we’ve found the proper v0, but overshot v2

The cleanup step takes care of this

Initialize

4

4

Step 1

2

2

Step 2

1

3

Step 3

0

2

Sorted List

v0

v0

v0

v2

v2

v2

v5

v5

0

1

2

3

4

5

6

7

binary search62
Binary Search
  • Search for v0 and v2

Done! Both v0 and v2 are located properly

Initialize

4

4

Step 1

2

2

Step 2

1

3

Step 3

0

2

Step 4

0

3

Sorted List

v0

v0

v0

v2

v2

v2

v5

v5

0

1

2

3

4

5

6

7

binary search summary
Binary Search Summary
  • Single rendering pass
    • Each pixel drawn performs independent search
  • O(log n) steps
nearest neighbor search
Nearest Neighbor Search
  • Very fundamental step in similarity search of data mining, retrieval…
  • Curse of dimensionality,
    • When dimensionality is very high, structures like k-d tree does not help
  • Use GPU to improve linear scan
distances
Distances
  • N-norm distance
  • Cosine distance acos(dot(x,y))
data representation
Data Representation
  • Use separate textures to store different dimensions.
distance computation
Distance Computation
  • Accumulating distance component of different dimensions
k nearest neighbor search
K-Nearest Neighbor Search
  • Given a sample point p, find the k points nearest p within a data set
  • On the CPU, this is easily done with a heap or priority queue
    • Can add or reject neighbors as search progresses
    • Don’t know how to build one efficiently on GPU
  • kNN-grid
    • Can only add neighbors…
knn grid algorithm

sample point

candidate neighbor

neighbors found

kNN-grid Algorithm

Want 4 neighbors

knn grid algorithm74
Candidate neighbors must be within max search radius

Visit voxels in order of distance to sample point

sample point

candidate neighbor

neighbors found

kNN-grid Algorithm

Want 4 neighbors

knn grid algorithm75
If current number of neighbors found is less than the number requested, grow search radius

sample point

candidate neighbor

neighbors found

kNN-grid Algorithm

1

Want 4 neighbors

knn grid algorithm76
If current number of neighbors found is less than the number requested, grow search radius

sample point

candidate neighbor

neighbors found

kNN-grid Algorithm

2

Want 4 neighbors

knn grid algorithm77
Don’t add neighbors outside maximum search radius

Don’t grow search radius when neighbor is outside maximum radius

sample point

candidate neighbor

neighbors found

kNN-grid Algorithm

2

Want 4 neighbors

knn grid algorithm78
Add neighbors within search radius

sample point

candidate neighbor

neighbors found

kNN-grid Algorithm

3

Want 4 neighbors

knn grid algorithm79
Add neighbors within search radius

sample point

candidate neighbor

neighbors found

kNN-grid Algorithm

4

Want 4 neighbors

knn grid algorithm80
Don’t expand search radius if enough neighbors already found

sample point

candidate neighbor

neighbors found

kNN-grid Algorithm

4

Want 4 neighbors

knn grid algorithm81
Add neighbors within search radius

sample point

candidate neighbor

neighbors found

kNN-grid Algorithm

5

Want 4 neighbors

knn grid algorithm82
Visit all other voxels accessible within determined search radius

Add neighbors within search radius

sample point

candidate neighbor

neighbors found

kNN-grid Algorithm

6

Want 4 neighbors

knn grid summary
Finds all neighbors within a sphere centered about sample point

May locate more than requested k-nearest neighbors

sample point

candidate neighbor

neighbors found

kNN-grid Summary

6

Want 4 neighbors

references
References
  • Naga Govindaraju, Brandon Lloyd, Wei Wang, Ming Lin and Dinesh Manocha, Fast Computation of Database Operations using Graphics Processorshttp://www.gpgpu.org/s2004/slides/govindaraju.DatabaseOperations.ppt
  • Benjamin Bustos, Oliver Deussen, Stefan Hiller, and Daniel Keim, A Graphic Hardware Accelerated Algorithm for Nearest Neighbor Search
  • Gernot Ziegler, Art Tevs, Christian Theobalt, Hans-Peter Seidel, GPU Point List Generation through Histogram Pyramids

http://www.mpi-inf.mpg.de/~gziegler/gpu_pointlist/

  • Tim Purcell, Sorting and Searching http://www.gpgpu.org/s2005/slides/purcell.SortingAndSearching.ppt