GPU Computational Geometry

138 Views

Download Presentation
## GPU Computational Geometry

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**By Shawn Brown - April 3rd, 2007, CS790-058**GPU Computational Geometry**Introduction to Computational Geometry**3 Papers in the area Overview**Where am I? How do I get there?**• mapping • Where is the closest post office? • Nearest neighbor search • Find all the movie theaters in a 10 mile square. • Range queries • Geometric Problems • Think of problem & solution in geometric terms • Data structures & algorithms follow from this approach Computational Geometry**Computer Graphics**• Robotics (motion planning) • Geographic Information Systems (mapping) • CAD/CAM (design, manufacturing) • Molecular Modeling • Pattern Recognition • Databases (queries) • AI (Path finding) • Etc… CG Application Areas**Geometric Reasoning**• Vertices, lines, Polygons, Half-planes, Simplexs, arrangements, connectedness, graph theory, etc. • Normal CS Data Structures & algorithms • Applied in geometric context • Backwards Analysis • Look at algorithm in reverse order to make proofs • At current step (final step), how did I get here? • Randomization techniques • Randomly pick next object to work on from set • Robustness & Degeneracy's • Will algorithm work correctly under numerical accuracy constraints • Will algorithm work correctly for co-incident, co-linear, co-planer, redundant data, etc. Some broad themes**Convex hulls**• Polygon Triangulation • Line segment intersection • Linear Programming • Minimum enclosing region (Disc, Sphere, box) • Range Searching • KD-Trees, Range Trees, Partition Trees, Simplex Trees, Cutting trees, etc. • Point Location • Trapezoidal Maps CG Data Structures & Algorithms**Voronoi Diagrams**• Delaunay Triangulation (dual of Voronoi) • Arrangements and Duality • Windowing (Rectangle query) • Binary Space Partitions (BSPs) • Minkowski Sums (Motion Planning) • Quad Trees • Visibility Graphs (shortest path) More data structures & Algorithms**Fixed size memory**• Upper bound on amount of data handled • Works best on stand-a-lone objects • Each object handled has very few dependencies on neighbors • Works best on memory efficient data • Cache coherent memory access • Coalesce memory accesses • Regular grids better than irregular meshes • Neighbor dependencies as predictable patterns • Works best on multiple objects in parallel • Data Structures & algorithms need to support • Works poorly on algorithms with dependencies on previous steps • Avoid comparisons between objects and levels • Works best on algorithms with high arithmetical intensity • High cost of I/O vs. compute power GPU Limitations**Data represented on regular grids (texture maps)**• Data access patterns are regular and predictable • Data has few dependencies • Each object is independent of it’s neighbors • Any dependencies are read only, predictable, cache coherent • Dependencies across multiple iterations are regular, predictable, and cache coherent • Low bandwidth I/O • Lots of compute operations per I/O operation GPU Solutions Data Structures & Algorithms**Good Fits for GPU**• Voronoi Diagrams, Distance Fields • Poor Fits for GPU • Binary Searches, Tree searches (KDTrees, etc.) • Can’t parallize (next compare dependent on results of previous compare) • Unpredictable Cache incoherent access patterns across multiple data objects • Traditional Sorting • Bitonic sort is exception • Reductions (from ‘n’ objects to single answer) GPU Vs. CPU**“Generic Mesh Refinement on GPU”, by Tamy Boubekeur and**Christophe Schlick, 2005 “Dynamic LOD on GPU” by Junfeng Ji, Enhua Wu, Sheng Li, and Xuehiu Liu, 2005 “Isosurface Computation Made Simple: Hardware Acceleration, Adaptive Refinement and Tetrahedral Stripping”by Valerio Pascucci, Joint Eurographics - IEEE TVCG Symposium on Visualization (VisSym), 2004, p. 293-300. 3 Research Papers**“Generic Mesh Refinement on GPU” by Tamy Boubekeur and**Christophe Schlick, Proceedings of SIGGRAPH /Eurographics Graphics Hardware, 2005, ACM Press 1st Paper**Geometry Mesh Refinement**• Displacement Mapping • Subdivision Surfaces • Refinement Typically done on CPU • GPU Pipeline optimized for rendering millions of triangles from vertex lists • But lack of support for geometry generation on GPU • Goal: How to do Mesh Refinement on GPU Mesh Refinement - Intro**A texture (height map) is used to displace underlying**geometry. • Displacement done in direction of local surface normal. • Re-tessellation of original polygons into micro-polygons • Example: Pixar’s REYES on Renderman Displacement mapping *from Wikipedia.com**The limit of an infinite refinement process**• Start with an initial polyhedral mesh, G0=(V0, E0, F0) • Subdivide via a set of rules, Gn+1 = Subdivide( Gn ) • Repeat subdivision step until refined polyhedral mesh approximates desired smooth surface. • Algorithm (One Refinement step) • New Edge Vertices (by weighting rules) • Remesh each original face (new edges, new faces) • Perturb original vertices (by weighting rules) SUBDIVISION**Loop SubvisionNew Vertex WEIGHTING RULEs**Edge Mask Interior Edge Edge Mask Border Edge**LOOP SUBDIVISIONREMESH**Remesh New Edges, New Faces Create New Edge Vertices**LOOP SUBVISIONPerturb Original VerteX RULES**Vertex Mask Ordinary Valance Vertex Mask Extra-ordinary Valance**Loop SUBDIVISIONRefinement**Gn = Current Mesh Create New Edges And Remesh Gn+1 = Subdivided Mesh Perturb Original Vertices**Traditional subdivision schemes (Loop) require dynamic**adjacency information to implement. • Adjacency information is cache coherent in at most one direction (vertical or horizontal) for both reads and writes • Works best on CPU • Works poorly on GPU • lack of cache coherency • Hard to parrellize Previous Schemes**Entire mesh must fit in GPU memory**• LOD rendering means n copies of different size meshes must be stored in memory • Dynamic Meshes must be updated on each frame by CPU • Conclusion: Use/update coarse meshes on CPU, generate refined meshes on GPU to desired LOD. GPU LIMITATIONS**Main Reason: Overcome Bandwidth Bottleneck**• CPU approach: • Load coarse mesh on CPU (thousands of polygons) • Optionally load height map (for displacement mapping) • Generate refined mesh on CPU (millions of polygons) • Transfer refined mesh to GPU (high bandwidth) • Render refined mesh on GPU • GPU approach: • Load coarse mesh on CPU (thousands of polygons) • transfer coarse mesh to GPU (low bandwidth) • Optional transfer height map (for displacement mapping) • Generate refined mesh on GPU (millions of polygons) • Render refined mesh on GPU • Secondary Reason: Offload work load from CPU onto GPU JUSTIFICATION**Generic Refinement Pattern (RP - template):**• Store RP as vertex buffer on GPU • Use coarse triangle T as input to vertex shader • Update and Draw virtual triangles of RP from attributes of input Triangle T Proposed SOLUTION**Render( Mesh M)**• For each coarse triangle TinMdo • Place triangle attributes TAas inputs to vertex shader • Draw parameterized RP template instead of T Algorithm**Need to map virtual vertices of pattern onto actual**attributes (<x,y,z>, <u,v>, etc.) of triangle T • Store virtual coordinates of pattern vertices V as barycentric triple (u,v,w). • Vwuv = {w,u,v} with w = 1-u-v • Given {P0, P1, P2} as actual positions of T • Vpos = V.w * P0 + V.u * P1 + V.v * P2 • Other triangle attributes (u,v, colors, etc.) can be generated in a similar manner from virtual vertices MORE Details**Given coarse triangle T with attributes TA**• Position, texture coords, normals,etc. • <{P0,P1,P2}, {u0,u1,u2}, {v0,v1,v2}, {N0,N1,N2}> • For each vertex V in RP template • Interpolate position Pv ={x,y,z} from {P0,P1,P2} • Interpolate texture values Huv ={u,v} • Interpolate normal values Nv ={nx,ny,nz} • Use texture coords (Huv) to get value ‘h’ in height map • Compute Displaced Position • Dv = Pv + h*Nv GPU Displacement MAPPING**Texture Map access in Vertex Shader can be slow (especially**if accesses are not coherent). Use a parameter driven function instead which can be quickly computed in Vertex Shader Procedural DISPLACEMENT Mapping D=P+(a*sin(f*||P||)*N)**Store a set of larger and larger refinement patterns on GPU**= {RP0, RP1,…, RPn} Use LOD techniques to pick appropriate LOD pattern for refinement and rendering LEVEL of DETAIL (LOD)**No true subdivision scheme support**No geometric continuity guarantees across shared edges of coarse triangles LOD Scheme is not adaptive and exhibits popping artifacts LIMITATIONS TO APPROACH**Purely local interpolating refinement scheme**• Fast mesh smoothing • Provides visual smoothness • Despite lack of geometric continuity across edges • Generate Triangle normal's using linear or quadratic interpolation (enhanced triangle definition) • Offers results similar to Modified Butterfly subdivision scheme Curved PN Triangles**Environment:**P4 3.0 Ghz Nvidia Quadro FX 4400 PCIe MS Windows XP Running on OpenGL PERFORMANCE Conclusion: Frame rates are equivalent, #Vertices on bus greatly reduced, CPU freed up to work on other tasks than refinement.**Simple Vertex Shader Method for low cost tessellation of**meshes on GPU • At cost of linear interpolation of 3 original triangle attributes for each virtual triangle attribute in pattern • Generic and Economic PN-Triangle implementation on GPU • Reduced bandwidth on graphics bus • Low level constant amount transferred regardless of target refinement (use larger templates for more refined results) • CPU freed up • to work on other tasks than refinement CONCLUSIONS**Dynamic LOD on GPUby Junfeng Ji, Enhua Wu, Sheng Li, and**Xuehui Liu, Proceedings of Computer Graphics International (CGI), 2005, IEEE Computer Society Press. 2nd Paper**Modern Datasets are getting to large to visualize at**interactive rates • Level of Detail (LOD) methods are used to greatly reduce the amount of geometry that needs to be visualized • Because of complexity, LOD methods are traditionally performed on the CPU • This paper proposes a GPU LOD technique using shaders Introduction**Irregular Meshes**• Progressive Meshes, H. Hoppe, 1996 • Hierarchical Dynamic Simplification, D. Luebke, 1997 • Regular Meshes • Multi-resolution Analysis of Arbitrary Meshes, Eck et al., 1995 • Digital Elevation Models (DEMs) + LOD Quad Trees, Lindstrom 1996 & Parojala 1998 • Geometry Image Meshes, Gu & Hoppe et al., 2002 • Extended to poly cube maps by Tarini et al, 2004. • Point Techniques • Qsplat, Rusinkiewicz, 2000 PRIOR WORK**Progressive Meshes**13,546 500 152 150 faces 150 152 500 13,546 Mn M175 ecol(vs,vt, vs) M1 M0 ’ vt M0 M1 M175 Mn vl vl vr vr vs ’ vs vspl(vs,vl,vr ,vs ,vt ,…)**Entire object represented as single vertex tree**Start at base level Collapse group of vertices into parent representative vertex (proxy) Render at appropriate LOD by traversing to level of tree based on current viewing parameters Hierarchical DYNamicSIMPLIFIcATION**Geometry Image Meshes**CUT PARAMETERIZE REGULAR GRID SAMPLE RENDER GEOMETRY IMAGE RGB = XYZ**GIM’s have complex distorted parameterizations**• Approximate geometry by polycube map • Project Geometry onto PolyCube • Store each face of polycube in texture atlas Poly-CUBE MAPS TEXTURE ATLAS**Perform LOD geometry selection dynamically on GPU**GPU limitations push us towards a regular representation of geometry For max efficiency, data structure must support parallel algorithms. GOAL – GPU LOD Geometry**Use Geometry Image Mesh (GIM) as underlying data structure.**• Regular structure (texture map) works very well on GPU. • Use Polycube texture atlas for complex objects • Add LOD support via a modified Quad Tree data structure called P-QuadTree. Proposed Solution**Creation**• LOD Atlas Texture • Rendering • Select appropriate LOD level • Render on GPU OVERVIEW of APPROACH**Generate GIM Atlas from 3D model**• Generate LOD atlas from GIM • Generate additional texture maps • Normal Map • LOD metrics • Index map (parent lookup) CREATION**Generate Polycube from geometry object using semi-automatic**technique from Tarini et al. • Cut cube faces along edges to get individual textures • Pack face textures into square or rectangular texture Sample texture atlas on regular grid • Create GIM from projected samples CREATE GIM ATLAS**For each chart, Texture must be (2m+1)×(2m+1)**• Pad Texture with null samples • Construct QuadTree top down using GPU Kernel • Each node represents 3x3 of vertices • Uses Restricted QuadTree triangulation • Stack all levels of LOD quadtree in LOD Atlas • Can be done in rectangle with ratio 1:1.5 CREATE LOD QUADTREE ATLAS**Avoid problems with cracks at T-intersections**• Compute error at each node • Parent error always greater than children • Constrain difference in error between neighboring vertices to never be greater than one • Check 2 nephews as well (cost of 2 texture lookups) RESTRICTED QUADTREE TRIANGULATION**Each node represents 3x3 vertices and 8 triangles**• Easily rendered as triangle fan • Bounding sphere around 9 vertices • Not much information in paper on how they compute normals or normal cone… LOD NODES**CUTTING AND PACKING**CUTTING PACKING CUTTING PACKING RECTANGULAR CHARTS SQUARE CHARTS**Geometry Map (GIM) (x,y,z) on regular grid**• Center position of node • LOD Parameter map • Error (used for LOD selection) • Normal cone (used for back face culling) • bounding sphere radius (used for backface culling) • Normal Map (N.x,N.y,N.z) • Normal at center position of node • Index Map • Parent node lookup 4 Texture maps required