# GPU Computational Geometry - PowerPoint PPT Presentation

GPU Computational Geometry

1 / 82
GPU Computational Geometry

## GPU Computational Geometry

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. By Shawn Brown - April 3rd, 2007, CS790-058 GPU Computational Geometry

2. Introduction to Computational Geometry 3 Papers in the area Overview

3. Where am I? How do I get there? • mapping • Where is the closest post office? • Nearest neighbor search • Find all the movie theaters in a 10 mile square. • Range queries • Geometric Problems • Think of problem & solution in geometric terms • Data structures & algorithms follow from this approach Computational Geometry

4. Computer Graphics • Robotics (motion planning) • Geographic Information Systems (mapping) • CAD/CAM (design, manufacturing) • Molecular Modeling • Pattern Recognition • Databases (queries) • AI (Path finding) • Etc… CG Application Areas

5. Geometric Reasoning • Vertices, lines, Polygons, Half-planes, Simplexs, arrangements, connectedness, graph theory, etc. • Normal CS Data Structures & algorithms • Applied in geometric context • Backwards Analysis • Look at algorithm in reverse order to make proofs • At current step (final step), how did I get here? • Randomization techniques • Randomly pick next object to work on from set • Robustness & Degeneracy's • Will algorithm work correctly under numerical accuracy constraints • Will algorithm work correctly for co-incident, co-linear, co-planer, redundant data, etc. Some broad themes

6. Convex hulls • Polygon Triangulation • Line segment intersection • Linear Programming • Minimum enclosing region (Disc, Sphere, box) • Range Searching • KD-Trees, Range Trees, Partition Trees, Simplex Trees, Cutting trees, etc. • Point Location • Trapezoidal Maps CG Data Structures & Algorithms

7. Voronoi Diagrams • Delaunay Triangulation (dual of Voronoi) • Arrangements and Duality • Windowing (Rectangle query) • Binary Space Partitions (BSPs) • Minkowski Sums (Motion Planning) • Quad Trees • Visibility Graphs (shortest path) More data structures & Algorithms

8. Fixed size memory • Upper bound on amount of data handled • Works best on stand-a-lone objects • Each object handled has very few dependencies on neighbors • Works best on memory efficient data • Cache coherent memory access • Coalesce memory accesses • Regular grids better than irregular meshes • Neighbor dependencies as predictable patterns • Works best on multiple objects in parallel • Data Structures & algorithms need to support • Works poorly on algorithms with dependencies on previous steps • Avoid comparisons between objects and levels • Works best on algorithms with high arithmetical intensity • High cost of I/O vs. compute power GPU Limitations

9. Data represented on regular grids (texture maps) • Data access patterns are regular and predictable • Data has few dependencies • Each object is independent of it’s neighbors • Any dependencies are read only, predictable, cache coherent • Dependencies across multiple iterations are regular, predictable, and cache coherent • Low bandwidth I/O • Lots of compute operations per I/O operation GPU Solutions Data Structures & Algorithms

10. Good Fits for GPU • Voronoi Diagrams, Distance Fields • Poor Fits for GPU • Binary Searches, Tree searches (KDTrees, etc.) • Can’t parallize (next compare dependent on results of previous compare) • Unpredictable Cache incoherent access patterns across multiple data objects • Traditional Sorting • Bitonic sort is exception • Reductions (from ‘n’ objects to single answer) GPU Vs. CPU

11. “Generic Mesh Refinement on GPU”, by Tamy Boubekeur and Christophe Schlick, 2005 “Dynamic LOD on GPU” by Junfeng Ji, Enhua Wu, Sheng Li, and Xuehiu Liu, 2005 “Isosurface Computation Made Simple: Hardware Acceleration, Adaptive Refinement and Tetrahedral Stripping”by Valerio Pascucci, Joint Eurographics - IEEE TVCG Symposium on Visualization (VisSym), 2004, p. 293-300. 3 Research Papers

12. “Generic Mesh Refinement on GPU” by Tamy Boubekeur and Christophe Schlick, Proceedings of SIGGRAPH /Eurographics Graphics Hardware, 2005, ACM Press 1st Paper

13. Geometry Mesh Refinement • Displacement Mapping • Subdivision Surfaces • Refinement Typically done on CPU • GPU Pipeline optimized for rendering millions of triangles from vertex lists • But lack of support for geometry generation on GPU • Goal: How to do Mesh Refinement on GPU Mesh Refinement - Intro

14. A texture (height map) is used to displace underlying geometry. • Displacement done in direction of local surface normal. • Re-tessellation of original polygons into micro-polygons • Example: Pixar’s REYES on Renderman Displacement mapping *from Wikipedia.com

15. The limit of an infinite refinement process • Start with an initial polyhedral mesh, G0=(V0, E0, F0) • Subdivide via a set of rules, Gn+1 = Subdivide( Gn ) • Repeat subdivision step until refined polyhedral mesh approximates desired smooth surface. • Algorithm (One Refinement step) • New Edge Vertices (by weighting rules) • Remesh each original face (new edges, new faces) • Perturb original vertices (by weighting rules) SUBDIVISION

16. Loop SubvisionNew Vertex WEIGHTING RULEs Edge Mask Interior Edge Edge Mask Border Edge

17. LOOP SUBDIVISIONREMESH Remesh New Edges, New Faces Create New Edge Vertices

18. LOOP SUBVISIONPerturb Original VerteX RULES Vertex Mask Ordinary Valance Vertex Mask Extra-ordinary Valance

19. Loop SUBDIVISIONRefinement Gn = Current Mesh Create New Edges And Remesh Gn+1 = Subdivided Mesh Perturb Original Vertices

20. Traditional subdivision schemes (Loop) require dynamic adjacency information to implement. • Adjacency information is cache coherent in at most one direction (vertical or horizontal) for both reads and writes • Works best on CPU • Works poorly on GPU • lack of cache coherency • Hard to parrellize Previous Schemes

21. Entire mesh must fit in GPU memory • LOD rendering means n copies of different size meshes must be stored in memory • Dynamic Meshes must be updated on each frame by CPU • Conclusion: Use/update coarse meshes on CPU, generate refined meshes on GPU to desired LOD. GPU LIMITATIONS

22. Main Reason: Overcome Bandwidth Bottleneck • CPU approach: • Load coarse mesh on CPU (thousands of polygons) • Optionally load height map (for displacement mapping) • Generate refined mesh on CPU (millions of polygons) • Transfer refined mesh to GPU (high bandwidth) • Render refined mesh on GPU • GPU approach: • Load coarse mesh on CPU (thousands of polygons) • transfer coarse mesh to GPU (low bandwidth) • Optional transfer height map (for displacement mapping) • Generate refined mesh on GPU (millions of polygons) • Render refined mesh on GPU • Secondary Reason: Offload work load from CPU onto GPU JUSTIFICATION

23. Generic Refinement Pattern (RP - template): • Store RP as vertex buffer on GPU • Use coarse triangle T as input to vertex shader • Update and Draw virtual triangles of RP from attributes of input Triangle T Proposed SOLUTION

24. Render( Mesh M) • For each coarse triangle TinMdo • Place triangle attributes TAas inputs to vertex shader • Draw parameterized RP template instead of T Algorithm

25. Need to map virtual vertices of pattern onto actual attributes (<x,y,z>, <u,v>, etc.) of triangle T • Store virtual coordinates of pattern vertices V as barycentric triple (u,v,w). • Vwuv = {w,u,v} with w = 1-u-v • Given {P0, P1, P2} as actual positions of T • Vpos = V.w * P0 + V.u * P1 + V.v * P2 • Other triangle attributes (u,v, colors, etc.) can be generated in a similar manner from virtual vertices MORE Details

26. Given coarse triangle T with attributes TA • Position, texture coords, normals,etc. • <{P0,P1,P2}, {u0,u1,u2}, {v0,v1,v2}, {N0,N1,N2}> • For each vertex V in RP template • Interpolate position Pv ={x,y,z} from {P0,P1,P2} • Interpolate texture values Huv ={u,v} • Interpolate normal values Nv ={nx,ny,nz} • Use texture coords (Huv) to get value ‘h’ in height map • Compute Displaced Position • Dv = Pv + h*Nv GPU Displacement MAPPING

27. Texture Map access in Vertex Shader can be slow (especially if accesses are not coherent). Use a parameter driven function instead which can be quickly computed in Vertex Shader Procedural DISPLACEMENT Mapping D=P+(a*sin(f*||P||)*N)

28. Store a set of larger and larger refinement patterns on GPU = {RP0, RP1,…, RPn} Use LOD techniques to pick appropriate LOD pattern for refinement and rendering LEVEL of DETAIL (LOD)

29. No true subdivision scheme support No geometric continuity guarantees across shared edges of coarse triangles LOD Scheme is not adaptive and exhibits popping artifacts LIMITATIONS TO APPROACH

30. Purely local interpolating refinement scheme • Fast mesh smoothing • Provides visual smoothness • Despite lack of geometric continuity across edges • Generate Triangle normal's using linear or quadratic interpolation (enhanced triangle definition) • Offers results similar to Modified Butterfly subdivision scheme Curved PN Triangles

31. Environment: P4 3.0 Ghz Nvidia Quadro FX 4400 PCIe MS Windows XP Running on OpenGL PERFORMANCE Conclusion: Frame rates are equivalent, #Vertices on bus greatly reduced, CPU freed up to work on other tasks than refinement.

32. Simple Vertex Shader Method for low cost tessellation of meshes on GPU • At cost of linear interpolation of 3 original triangle attributes for each virtual triangle attribute in pattern • Generic and Economic PN-Triangle implementation on GPU • Reduced bandwidth on graphics bus • Low level constant amount transferred regardless of target refinement (use larger templates for more refined results) • CPU freed up • to work on other tasks than refinement CONCLUSIONS

33. Dynamic LOD on GPUby Junfeng Ji, Enhua Wu, Sheng Li, and Xuehui Liu, Proceedings of Computer Graphics International (CGI), 2005, IEEE Computer Society Press. 2nd Paper

34. Modern Datasets are getting to large to visualize at interactive rates • Level of Detail (LOD) methods are used to greatly reduce the amount of geometry that needs to be visualized • Because of complexity, LOD methods are traditionally performed on the CPU • This paper proposes a GPU LOD technique using shaders Introduction

35. Irregular Meshes • Progressive Meshes, H. Hoppe, 1996 • Hierarchical Dynamic Simplification, D. Luebke, 1997 • Regular Meshes • Multi-resolution Analysis of Arbitrary Meshes, Eck et al., 1995 • Digital Elevation Models (DEMs) + LOD Quad Trees, Lindstrom 1996 & Parojala 1998 • Geometry Image Meshes, Gu & Hoppe et al., 2002 • Extended to poly cube maps by Tarini et al, 2004. • Point Techniques • Qsplat, Rusinkiewicz, 2000 PRIOR WORK

36. Progressive Meshes 13,546 500 152 150 faces 150 152 500 13,546 Mn M175 ecol(vs,vt, vs) M1 M0 ’ vt M0 M1 M175 Mn vl vl vr vr vs ’ vs vspl(vs,vl,vr ,vs ,vt ,…)

37. Entire object represented as single vertex tree Start at base level Collapse group of vertices into parent representative vertex (proxy) Render at appropriate LOD by traversing to level of tree based on current viewing parameters Hierarchical DYNamicSIMPLIFIcATION

38. Geometry Image Meshes CUT PARAMETERIZE REGULAR GRID SAMPLE RENDER GEOMETRY IMAGE RGB = XYZ

39. GIM’s have complex distorted parameterizations • Approximate geometry by polycube map • Project Geometry onto PolyCube • Store each face of polycube in texture atlas Poly-CUBE MAPS TEXTURE ATLAS

40. Perform LOD geometry selection dynamically on GPU GPU limitations push us towards a regular representation of geometry For max efficiency, data structure must support parallel algorithms. GOAL – GPU LOD Geometry

41. Use Geometry Image Mesh (GIM) as underlying data structure. • Regular structure (texture map) works very well on GPU. • Use Polycube texture atlas for complex objects • Add LOD support via a modified Quad Tree data structure called P-QuadTree. Proposed Solution

42. Creation • LOD Atlas Texture • Rendering • Select appropriate LOD level • Render on GPU OVERVIEW of APPROACH

43. Generate GIM Atlas from 3D model • Generate LOD atlas from GIM • Generate additional texture maps • Normal Map • LOD metrics • Index map (parent lookup) CREATION

44. Generate Polycube from geometry object using semi-automatic technique from Tarini et al. • Cut cube faces along edges to get individual textures • Pack face textures into square or rectangular texture Sample texture atlas on regular grid • Create GIM from projected samples CREATE GIM ATLAS

45. For each chart, Texture must be (2m+1)×(2m+1) • Pad Texture with null samples • Construct QuadTree top down using GPU Kernel • Each node represents 3x3 of vertices • Uses Restricted QuadTree triangulation • Stack all levels of LOD quadtree in LOD Atlas • Can be done in rectangle with ratio 1:1.5 CREATE LOD QUADTREE ATLAS

46. Avoid problems with cracks at T-intersections • Compute error at each node • Parent error always greater than children • Constrain difference in error between neighboring vertices to never be greater than one • Check 2 nephews as well (cost of 2 texture lookups) RESTRICTED QUADTREE TRIANGULATION

47. Each node represents 3x3 vertices and 8 triangles • Easily rendered as triangle fan • Bounding sphere around 9 vertices • Not much information in paper on how they compute normals or normal cone… LOD NODES

48. CUTTING AND PACKING CUTTING PACKING CUTTING PACKING RECTANGULAR CHARTS SQUARE CHARTS

49. GIM ATLAS & LOD ATLAS

50. Geometry Map (GIM) (x,y,z) on regular grid • Center position of node • LOD Parameter map • Error (used for LOD selection) • Normal cone (used for back face culling) • bounding sphere radius (used for backface culling) • Normal Map (N.x,N.y,N.z) • Normal at center position of node • Index Map • Parent node lookup 4 Texture maps required