A hardware processing unit for point sets
Download
1 / 44

- PowerPoint PPT Presentation


  • 102 Views
  • Updated On :

A Hardware Processing Unit For Point Sets. S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008. Motivation. Point-based graphics established Powerful algorithms Representation Processing Manipulation Rendering Decomposition Get neighborhood Operate on neighbors.

Related searches for

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - rigg


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A hardware processing unit for point sets l.jpg

A Hardware Processing Unit For Point Sets

S. Heinzle, G. Guennebaud,M. Botsch, M. Gross

Graphics Hardware 2008


Motivation l.jpg
Motivation

  • Point-based graphics established

  • Powerful algorithms

    • Representation

    • Processing

    • Manipulation

    • Rendering

  • Decomposition

    • Get neighborhood

    • Operate on neighbors

Graphics Hardware 2008


Motivation3 l.jpg
Motivation

  • GPUs not suited for getting neighborhood

    • SIMD

    • Incoherent branching

    • Dynamic data structures slow

    • Recursive calls not supported

  • CPUs

    • Small number of FPUs

    • Inflexible memory caches

Courtesy of NVIDIA

Courtesy of Intel

Graphics Hardware 2008


Contributions l.jpg
Contributions

  • Hardware architecture for point sets

    • Neighbor search module

    • Novel advanced caching mechanism

    • Reconfigurable processing module

    • Programmability using FPGA compiler

  • FPGA prototype and measurements

  • Small & Lean

     Integration into multi-core CPU/GPU possible

Graphics Hardware 2008


Outline l.jpg
Outline

  • Related Work

  • Spatial Searching and Caching

  • Architecture and Prototype

  • Results

  • Conclusion

Graphics Hardware 2008


Related work l.jpg
Related Work

Kd-Tree

[Bentley 75]

kNN on GPUs[Ma and McCool 02]

Kd-Tree on GPUs

[Popov et al. 07]

Kd-Tree Hardware

[Woop et al. 05]

[Woop et al. 06]

Graphics Hardware 2008


Related work7 l.jpg
Related Work

Adaptive SPH Fluid Simulation

[Adams et al. ‘07]

Algebraic Moving Least Squares,

[Guennebaud and Gross ‘07]

Linear Moving Least Squares,

[Adamson and Alexa ’04]

Graphics Hardware 2008


Linear moving least squares l.jpg
Linear Moving Least Squares

  • Implicit surface definition defined by set of points

Graphics Hardware 2008


Linear moving least squares9 l.jpg
Linear Moving Least Squares

  • Implicit surface definition defined by set of points

x

Graphics Hardware 2008


Linear moving least squares10 l.jpg

10

Linear Moving Least Squares

ni

pi

x

Graphics Hardware 2008


Linear moving least squares11 l.jpg
Linear Moving Least Squares

  • Iterative projections onto plane

x

Graphics Hardware 2008


Linear moving least squares12 l.jpg
Linear Moving Least Squares

  • Iterative projections onto plane

x’

x

Graphics Hardware 2008


Linear moving least squares13 l.jpg
Linear Moving Least Squares

  • Iterative projections onto plane

x’’

x

’ ’

Graphics Hardware 2008


Linear moving least squares14 l.jpg
Linear Moving Least Squares

  • Iterative projections onto plane

x’’’

x

’ ’ ’

Graphics Hardware 2008


Linear moving least squares15 l.jpg
Linear Moving Least Squares

  • Surface defined by points projecting onto themselves

x

Graphics Hardware 2008


Outline16 l.jpg
Outline

  • Related Work

  • Spatial Searching and Caching

  • Architecture & Prototype

  • Results

  • Conclusion

Graphics Hardware 2008


Spatial search l.jpg
Spatial Search

  • Spatial search: kNN and eNN

    • Common in most point operations

    • Based on kd-tree

  • Example eNN:

Graphics Hardware 2008


Spatial search18 l.jpg
Spatial Search

  • kNN search similar to eNN search:

    • Start with infinite radius

    • Sort leaf points into priority queue

    • Shrink radius with every point sorted

Graphics Hardware 2008


Coherent neighbor cache e nn l.jpg
Coherent Neighbor Cache(eNN)

  • Find neighbors in slightly bigger radius

  • Re-use result for spatially close query

Re-use if

Graphics Hardware 2008


Coherent neighbor cache knn exact l.jpg
Coherent Neighbor Cache(kNN, exact)

  • Find (k+1) neighbors

  • Re-use result for spatially close query

Re-use if

Graphics Hardware 2008


Coherent neighbor cache knn approximation l.jpg
Coherent Neighbor Cache(kNN, approximation)

  • Approximation error e

    • Enlarge radius

Re-use if

Graphics Hardware 2008


Outline22 l.jpg
Outline

  • Related Work

  • Spatial Searching and Caching

  • Architecture & Prototype

  • Results

  • Conclusion

Graphics Hardware 2008


The architecture l.jpg
The Architecture

Host

Graphics Hardware 2008


Coherent neighbor cache l.jpg
Coherent Neighbor Cache

0

0

0

1

1

1

n

n

n

  • Eight cached neighborhoods

  • Problem: parallel queries in kd-tree module

  •  Interleave spatially similar queries

Graphics Hardware 2008


Kd tree traversal l.jpg
Kd-Tree Traversal

Graphics Hardware 2008


Node recurse l.jpg
NodeRecurse

  • Kd-tree structure on chip

  • 16 threads

  • Pipelining and multi-threading

Graphics Hardware 2008


Stacks l.jpg
Stacks

  • 16 stacks

  • Parallel read/write

  • Bounded in depth

  • 6 bytes per thread per recursion

Graphics Hardware 2008


Slide28 l.jpg
Leaf

  • 16 parallel priority queues (1-cycle ops)

  • Queues store pointers and distances

  • Bandwidth bottleneck

Graphics Hardware 2008


Processing module l.jpg
Processing Module

  • Multithreaded quad-port bank of 16 registers

  • 128 threads

  • Programmability using FPGA-technology

Graphics Hardware 2008


Further data l.jpg
Further Data

  • Implemented on two FPGAs

    • 64 bit DDR DRAM

    • Interconnection: no overhead

  • Resource usage regs and LUTs

    • Virtex 2 Pro 100 (kNN): 26% registers, 38% LUTs

    • Virtex 2 Pro 70 (MLS):47% registers, 52% LUTs

  • Clock frequency: 75 MHz

Graphics Hardware 2008


Outline31 l.jpg
Outline

  • Related Work

  • Spatial Searching and Caching

  • Architecture & Prototype

  • Results

  • Conclusion

Graphics Hardware 2008


Applications l.jpg
Applications

  • Tested on various applications

  • PCI interface of prototype slow

  • [Weyrich et al. 04]

  • [Adams et al. 07]

Graphics Hardware 2008


Results knn l.jpg
Results kNN

75 MHz

2200 MHz

1200 MHz

CUDA: x4

ASIC estimate, 500 MHz

x6.6

Number of queries

CUDA w/o sort: x4.0

CPU: x1.5

CUDA: x2.4

CUDA w/o sort: x3.1

CPU: x1.4

CUDA: x1.6

FPGA: x1

CPU: x1.1

FPGA: x1

FPGA: x1

Number of Neighbors

Graphics Hardware 2008


Results knn34 l.jpg
Results kNN

  • Small hardware footprint

  • FPGA slightly slower

  • Realistic clock frequency

     Prototype faster than CPU/GPU

75 MHz

2200 MHz

1200 MHz

CUDA: x4

ASIC estimate, 500 MHz

x6.6

Number of queries

CUDA w/o sort: x4.0

CPU: x1.5

CUDA: x2.4

CUDA w/o sort: x3.1

CPU: x1.4

CUDA: x1.6

FPGA: x1

CPU: x1.1

FPGA: x1

FPGA: x1

Number of Neighbors

Graphics Hardware 2008


Results mls l.jpg
Results MLS

FPGA faster than CPU

75 MHz

2200 MHz

1200 MHz

Number of queries

MLS CUDA x3.8

  • kNN bottleneck

    • FPGA

    • GPU

FPGA: x1

MLS CPU: x0.4

Number of Neighbors

Graphics Hardware 2008


Coherent neighbor cache36 l.jpg
Coherent Neighbor Cache

CPU,

e=0.1

Number of queries

FPGA,

e=0.1

FPGA, exact

Level of coherence

Graphics Hardware 2008


Results approximation error mls projection l.jpg
Results Approximation Error (MLS projection)

MLS Error

e approximation

no approx.

Graphics Hardware 2008


Results approximation error mls projection38 l.jpg
Results Approximation Error (MLS projection)

Cache hits

Cache Hits

e approximation

Graphics Hardware 2008


Approximation error visual l.jpg
Approximation Error (visual)

Graphics Hardware 2008


Approximation error visual40 l.jpg
Approximation Error (visual)

  • Coherent Neighbor Cache:

  • Not optimal for exact queries

  • Approximate queries

    • Can be tolerated in most cases

    • Greatly increases performance

    • Even for small approximations

Graphics Hardware 2008


Outline41 l.jpg
Outline

  • Related Work

  • Spatial Searching and Caching

  • Architecture & Prototype

  • Results

  • Conclusion

Graphics Hardware 2008


Conclusion l.jpg
Conclusion

  • Novel hardware architecture for

    • Nearest-neighbor searches

    • Generic meshless processing operators

  • Cache exploiting spatial coherence

  • Good performance considering resources

  • Possible GPU integration

Graphics Hardware 2008


Future work l.jpg
Future Work

  • Programmable data structure

    • Support different data structures

    • Programmability in data structure

    • Construction on-chip

  • ‘Real’ programmability in point processing module

Graphics Hardware 2008


A hardware processing unit for point sets44 l.jpg

A Hardware Processing Unit For Point Sets

S. Heinzle, G. Guennebaud,M. Botsch, M. Gross

Graphics Hardware 2008


ad