Database Operations on GPU - PowerPoint PPT Presentation

database operations on gpu n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Database Operations on GPU PowerPoint Presentation
Download Presentation
Database Operations on GPU

play fullscreen
1 / 84
Database Operations on GPU
381 Views
Download Presentation
Albert_Lan
Download Presentation

Database Operations on GPU

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Database Operations on GPU Changchang Wu 4/18/2007

  2. Outline • Database Operations on GPU • Point List Generation on GPU • Nearest Neighbor Searching on GPU

  3. Database Operations on GPU

  4. Design Issues • Low bandwidth between GPU and CPU • Avoid frame buffer readbacks • No arbitrary writes • Avoid data rearrangements • Programmable pipeline has poor branching • Evaluate branches using fixed function tests

  5. Design Overview • Use depth test functionality of GPUs for performing comparisons • Implements all possible comparisons <, <=, >=, >, ==, !=, ALWAYS, NEVER • Use stencil test for data validation and storing results of comparison operations • Use occlusion query to count number of elements that satisfy some condition

  6. Basic Operations Basic SQL query Select A From T Where C A= attributes or aggregations (SUM, COUNT, MAX etc) T=relational table C= Boolean Combination of Predicates (using operators AND, OR, NOT)

  7. Basic Operations • Predicates – ai op constant or ai op aj • Op is one of <,>,<=,>=,!=, =, TRUE, FALSE • Boolean combinations – Conjunctive Normal Form (CNF) expression evaluation • Aggregations – COUNT, SUM, MAX, MEDIAN, AVG

  8. Predicate Evaluation • ai op constant (d) • Copy the attribute values ai into depth buffer • Define the comparison operation using depth test • Draw a screen filling quad at depth d glDepthFunc(…) glStencilOp(fail,zfail,zpass);

  9. Predicate Evaluation • Comparing two attributes: • ai op ajis treated as (ai – aj) op 0 • Semi-linear queries • Easy to compute with fragment shader

  10. Boolean Combinations • Expression provided as a CNF • CNF is of form (A1 AND A2 AND … AND Ak) where Ai = (Bi1 OR Bi2 OR … OR Bimi ) • CNF does not have NOT operator • If CNF has a NOT operator, invert comparison operation to eliminate NOT Eg. NOT (ai < d) => (ai >= d) • For example, compute ai within [low, high] • Evaluated as ( ai >= low ) AND ( ai <= high )

  11. Algorithm

  12. Range Query • Compute ai within [low, high] • Evaluated as ( ai >= low ) AND ( ai <= high )

  13. Aggregations • COUNT, MAX, MIN, SUM, AVG • No data rearrangements

  14. COUNT • Use occlusion queries to get pixel pass count • Syntax: • Begin occlusion query • Perform database operation • End occlusion query • Get count of number of attributes that passed database operation • Involves no additional overhead!

  15. MAX, MIN, MEDIAN • We compute Kth-largest number • Traditional algorithms require data rearrangements • We perform no data rearrangements, no frame buffer readbacks

  16. K-th Largest Number • By comparing and counting, determinate every bit in order of MSB to LSB

  17. Example: Parallel Max • S={10,24,37,99,192,200,200,232} • Step 1: Draw Quad at 128(10000000) • S = {10,24,37,99,192,200,200,232} • Step 2: Draw Quad at 192(11000000) • S = {10,24,37,192,200,200,232} • Step 3: Draw Quad at 224(11100000) • S = {10,24,37,192,200,200,232} • Step 4: Draw Quad at 240(11110000) • – No values pass • Step 5: Draw Quad at 232(11101000) • S = {10,24,37,192,200,200,232} • Step 6,7,8: Draw Quads at 236,234,233 – No values pass, Max is 232

  18. Accumulator, Mean • Accumulator - Use sorting algorithm and add all the values • Mean – Use accumulator and divide by n • Interval range arithmetic • Alternative algorithm • Use fragment programs – requires very few renderings • Use mipmaps [Harris et al. 02], fragment programs [Coombe et al. 03]

  19. Accumulator • Data representation is of form ak 2k + ak-1 2k-1 + … + a0 Sum = sum(ak) 2k+ sum(ak-1) 2k-1+…+sum(a0) Current GPUs support no bit-masking operations

  20. The Algorithm >=0.5 means i-th bit is 1

  21. Implementation • Algorithm • CPU – Intel compiler 7.1 with hyper-threading, multi-threading, SIMD optimizations • GPU – NVIDIA Cg Compiler • Hardware • Dell Precision Workstation with Dual 2.8GHz Xeon Processor • NVIDIA GeForce FX 5900 Ultra GPU • 2GB RAM

  22. Benchmarks • TCP/IP database with 1 million records and four attributes • Census database with 360K records

  23. Copy Time

  24. Predicate Evaluation

  25. Range Query

  26. Multi-Attribute Query

  27. Semi-linear Query

  28. Kth-Largest

  29. Kth-Largest

  30. Kth-Largest conditional

  31. Accumulator

  32. Analysis: Issues • Precision • Copy time • Integer arithmetic • Depth compare masking • Memory management • No Branching • No random writes

  33. Analysis: Performance • Relative Performance Gain • High Performance – Predicate evaluation, multi-attribute queries, semi-linear queries, count • Medium Performance – Kth-largest number • Low Performance - Accumulator

  34. High Performance • Parallel pixel processing engines • Pipelining • Early Z-cull • Eliminate branch mispredictions

  35. Medium Performance • Parallelism • FX 5900 has clock speed 450MHz, 8 pixel processing engines • Rendering single 1000x1000 quad takes 0.278ms • Rendering 19 such quads take 5.28ms. Observed time is 6.6ms • 80% efficiency in parallelism!!

  36. Low Performance • No gain over SIMD based CPU implementation • Two main reasons: • Lack of integer-arithmetic • Clock rate

  37. Advantages • Algorithms progress at GPU growth rate • Offload CPU work • Fast due to massive parallelism on GPUs • Algorithms could be generalized to any geometric shape • Eg. Max value within a triangular region • Commodity hardware!

  38. GPU Point List Generation • Data compaction

  39. Overall task

  40. 3D to 2D mapping

  41. Current Problem

  42. The solution

  43. Overview, Data Compaction

  44. Algorithm: Discriminator

  45. Algorithm: Histogram Builder

  46. Histogram Output

  47. Algorithm: PointList Builder

  48. PointList Output

  49. Timing Reduces a highly sparse matrix with N elements to a list of its M active entries in O(N) + M (log N) steps,

  50. Applications • Image Analysis • Feature Detection • Volume Analysis • Sparse Matrix Generation