Loading in 5 sec....

A Hardware Processing Unit For Point SetsPowerPoint Presentation

A Hardware Processing Unit For Point Sets

- By
**rigg** - Follow User

- 102 Views
- Updated On :

A Hardware Processing Unit For Point Sets. S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008. Motivation. Point-based graphics established Powerful algorithms Representation Processing Manipulation Rendering Decomposition Get neighborhood Operate on neighbors.

Related searches for

Download Presentation
## PowerPoint Slideshow about '' - rigg

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### A Hardware Processing Unit For Point Sets

Outline

### A Hardware Processing Unit For Point Sets

S. Heinzle, G. Guennebaud,M. Botsch, M. Gross

Graphics Hardware 2008

Motivation

- Point-based graphics established
- Powerful algorithms
- Representation
- Processing
- Manipulation
- Rendering

- Decomposition
- Get neighborhood
- Operate on neighbors

Graphics Hardware 2008

Motivation

- GPUs not suited for getting neighborhood
- SIMD
- Incoherent branching
- Dynamic data structures slow
- Recursive calls not supported

- CPUs
- Small number of FPUs
- Inflexible memory caches

Courtesy of NVIDIA

Courtesy of Intel

Graphics Hardware 2008

Contributions

- Hardware architecture for point sets
- Neighbor search module
- Novel advanced caching mechanism
- Reconfigurable processing module
- Programmability using FPGA compiler

- FPGA prototype and measurements
- Small & Lean
Integration into multi-core CPU/GPU possible

Graphics Hardware 2008

Outline

- Related Work
- Spatial Searching and Caching
- Architecture and Prototype
- Results
- Conclusion

Graphics Hardware 2008

Related Work

Kd-Tree

[Bentley 75]

kNN on GPUs[Ma and McCool 02]

Kd-Tree on GPUs

[Popov et al. 07]

Kd-Tree Hardware

[Woop et al. 05]

[Woop et al. 06]

Graphics Hardware 2008

Related Work

Adaptive SPH Fluid Simulation

[Adams et al. ‘07]

Algebraic Moving Least Squares,

[Guennebaud and Gross ‘07]

Linear Moving Least Squares,

[Adamson and Alexa ’04]

Graphics Hardware 2008

Linear Moving Least Squares

- Implicit surface definition defined by set of points

Graphics Hardware 2008

Linear Moving Least Squares

- Implicit surface definition defined by set of points

x

Graphics Hardware 2008

Linear Moving Least Squares

- Surface defined by points projecting onto themselves

x

Graphics Hardware 2008

Outline

- Related Work
- Spatial Searching and Caching
- Architecture & Prototype
- Results
- Conclusion

Graphics Hardware 2008

Spatial Search

- Spatial search: kNN and eNN
- Common in most point operations
- Based on kd-tree

- Example eNN:

Graphics Hardware 2008

Spatial Search

- kNN search similar to eNN search:
- Start with infinite radius
- Sort leaf points into priority queue
- Shrink radius with every point sorted

Graphics Hardware 2008

Coherent Neighbor Cache(eNN)

- Find neighbors in slightly bigger radius
- Re-use result for spatially close query

Re-use if

Graphics Hardware 2008

Coherent Neighbor Cache(kNN, exact)

- Find (k+1) neighbors
- Re-use result for spatially close query

Re-use if

Graphics Hardware 2008

Coherent Neighbor Cache(kNN, approximation)

- Approximation error e
- Enlarge radius

Re-use if

Graphics Hardware 2008

Outline

- Related Work
- Spatial Searching and Caching
- Architecture & Prototype
- Results
- Conclusion

Graphics Hardware 2008

Coherent Neighbor Cache

0

0

0

1

1

1

n

n

n

- Eight cached neighborhoods
- Problem: parallel queries in kd-tree module
- Interleave spatially similar queries

Graphics Hardware 2008

Kd-Tree Traversal

Graphics Hardware 2008

NodeRecurse

- Kd-tree structure on chip
- 16 threads
- Pipelining and multi-threading

Graphics Hardware 2008

Stacks

- 16 stacks
- Parallel read/write
- Bounded in depth
- 6 bytes per thread per recursion

Graphics Hardware 2008

Leaf

- 16 parallel priority queues (1-cycle ops)
- Queues store pointers and distances
- Bandwidth bottleneck

Graphics Hardware 2008

Processing Module

- Multithreaded quad-port bank of 16 registers
- 128 threads
- Programmability using FPGA-technology

Graphics Hardware 2008

Further Data

- Implemented on two FPGAs
- 64 bit DDR DRAM
- Interconnection: no overhead

- Resource usage regs and LUTs
- Virtex 2 Pro 100 (kNN): 26% registers, 38% LUTs
- Virtex 2 Pro 70 (MLS):47% registers, 52% LUTs

- Clock frequency: 75 MHz

Graphics Hardware 2008

Outline

- Related Work
- Spatial Searching and Caching
- Architecture & Prototype
- Results
- Conclusion

Graphics Hardware 2008

Applications

- Tested on various applications
- PCI interface of prototype slow

- [Weyrich et al. 04]

- [Adams et al. 07]

Graphics Hardware 2008

Results kNN

75 MHz

2200 MHz

1200 MHz

CUDA: x4

ASIC estimate, 500 MHz

x6.6

Number of queries

CUDA w/o sort: x4.0

CPU: x1.5

CUDA: x2.4

CUDA w/o sort: x3.1

CPU: x1.4

CUDA: x1.6

FPGA: x1

CPU: x1.1

FPGA: x1

FPGA: x1

Number of Neighbors

Graphics Hardware 2008

Results kNN

- Small hardware footprint
- FPGA slightly slower
- Realistic clock frequency
Prototype faster than CPU/GPU

75 MHz

2200 MHz

1200 MHz

CUDA: x4

ASIC estimate, 500 MHz

x6.6

Number of queries

CUDA w/o sort: x4.0

CPU: x1.5

CUDA: x2.4

CUDA w/o sort: x3.1

CPU: x1.4

CUDA: x1.6

FPGA: x1

CPU: x1.1

FPGA: x1

FPGA: x1

Number of Neighbors

Graphics Hardware 2008

Results MLS

FPGA faster than CPU

75 MHz

2200 MHz

1200 MHz

Number of queries

MLS CUDA x3.8

- kNN bottleneck
- FPGA
- GPU

FPGA: x1

MLS CPU: x0.4

Number of Neighbors

Graphics Hardware 2008

Coherent Neighbor Cache

CPU,

e=0.1

Number of queries

FPGA,

e=0.1

FPGA, exact

Level of coherence

Graphics Hardware 2008

Results Approximation Error (MLS projection)

MLS Error

e approximation

no approx.

Graphics Hardware 2008

Results Approximation Error (MLS projection)

Cache hits

Cache Hits

e approximation

Graphics Hardware 2008

Approximation Error (visual)

Graphics Hardware 2008

Approximation Error (visual)

- Coherent Neighbor Cache:
- Not optimal for exact queries
- Approximate queries
- Can be tolerated in most cases
- Greatly increases performance
- Even for small approximations

Graphics Hardware 2008

- Related Work
- Spatial Searching and Caching
- Architecture & Prototype
- Results
- Conclusion

Graphics Hardware 2008

Conclusion

- Novel hardware architecture for
- Nearest-neighbor searches
- Generic meshless processing operators

- Cache exploiting spatial coherence
- Good performance considering resources
- Possible GPU integration

Graphics Hardware 2008

Future Work

- Programmable data structure
- Support different data structures
- Programmability in data structure
- Construction on-chip

- ‘Real’ programmability in point processing module

Graphics Hardware 2008

S. Heinzle, G. Guennebaud,M. Botsch, M. Gross

Graphics Hardware 2008

Download Presentation

Connecting to Server..