real time mesh simplification using the gpu l.
Download
Skip this Video
Download Presentation
Real-time Mesh Simplification Using the GPU

Loading in 2 Seconds...

play fullscreen
1 / 36

Real-time Mesh Simplification Using the GPU - PowerPoint PPT Presentation


  • 551 Views
  • Uploaded on

Real-time Mesh Simplification Using the GPU. Christopher DeCoro Natasha Tatarchuk 3D Application Research Group. Introduction. Implement Mesh Decimation in real-time Utilizes new Geometry Shader stage of GPU Achieves a 20x speedup over CPU . Project Motivation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Real-time Mesh Simplification Using the GPU' - ita


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
real time mesh simplification using the gpu

Real-time Mesh Simplification Using the GPU

Christopher DeCoro

Natasha Tatarchuk

3D Application Research Group

introduction
Introduction
  • Implement Mesh Decimation in real-time
    • Utilizes new Geometry Shader stage of GPU
    • Achieves a 20x speedup over CPU
project motivation
Project Motivation
  • Massive Increases in submitted geometry
    • Geometry rendered per shadow map (6x for cubemap!)
    • Not always needed at highest resolution
  • Geometry not always known at build-time
    • Dynamically-skinned objects only finalized at run-time
    • May be customized to users machine based on capabilities, would need to be adapted at program load time
    • Could be dynamically generated per level, need to be adapted at level load time
    • Simplification therefore needs to be fast (or even real-time)

Also, just as importantly…

  • We want applications that exercise & stress GS/GPU
    • Evaluate new capabilities of the GPU
    • Learn how to adapt previously CPU-bound algorithms
    • Develop GPU-centric methodologies
  • Identify future feature set for GS/GPU as a whole
    • Limitations still exist – which should be addressed?
contributions
Contributions
  • Mapping of Decimation to GPU
    • 20x speedup vs. CPU
    • Enables load-time or real-time usage
  • Detail Preservation by Non-linear Warping
    • Also applicable to CPU out-of-core decimation
  • General-purpose GPU Octree
    • Adaptive decimation w/ constant memory
    • Applications not limited to simplification: collision detection, frustum culling, etc.
outline
Outline
  • Project Introduction and Motivation
  • Background
    • Decimation with Vertex Clustering
    • Geometry Shaders in Direct3D 10
  • Geometry Shader-based Vertex Clustering
  • Adaptive Simplification w/ Non-linear Warps
  • Probabalistic Octrees on the GPU
vertex clustering
Vertex Clustering
  • Reduces mesh resolution
    • High-res mesh as input
    • Low-res as output
  • All implemented on the GPU
    • Ideal for processing streamed out data
    • Useful when rendering multiple times (i.e. shadows)
    • Can handle enormous models from scanned data
  • Based on “Out-of-Core Simplification of Large Polygonal Models,” P. Lindstrom, 2000

Figure from [Lindstrom 2000]

previous rendering pipeline
Previous Rendering Pipeline
  • Vertex Shaders and Pixel Shaders
  • Limits 1 output per 1 input
    • No culling of triangles for decimation
  • Fixed destination for each stage
    • Result meshes cannot be (easily) saved and reused
directx10 rendering pipeline
DirectX10 Rendering Pipeline
  • Geometry Shader in between VS & PS
    • Called for each primitive (usually triangle)
  • Able to access all vertices of a primitive
    • Can compute per-face quantities
  • Breaks 1:1 input-output limitation
    • Allows triangles to be culled from pipeline
  • Allows stream-out of processed geometry
    • Decimated meshes can easily be saved and reused
outline9
Outline
  • Project Introduction and Motivation
  • Background
  • Geometry Shader-based Vertex Clustering
    • Overview
    • Quadric Generation
    • Optimal Position Computation
    • Final Clustering
  • Adaptive Simplification w/ Non-linear Warps
  • Probabilistic Octrees on the GPU
algorithm overview
Algorithm Overview
  • Start with the input mesh
    • Shown divided into clusters
  • Pass 1: Compute the quadric map from mesh
    • Use GS to compute quadric
    • Accumulate in cluster map, an RT used as large array
  • Pass 2: For each cluster, compute optimal position
    • Solves a linear system given by quadrics
  • Pass 3: Collapse each vertex to representative
    • 9x9x9 grid shown

Model Courtesy of Stanford Graphics Lab

vertex clustering pipeline
Vertex Clustering Pipeline
  • Pass 1: Create Quadric Map
    • Input: Original Mesh
    • Computation:
      • Determine plane equation, face quadrics for triangle
      • Compute the cluster and address of each vertex
      • Pack quadric into RT at appropriate address
    • Output: Render Targets representing clusters with packed quadrics and average positions
quadric map implementation
Quadric Map Implementation

//Map a point to its location in the cluster map array

float2 writeAddr( float3 vPos )

{

uint iX = clusterId(vPos) / iClusterMapSize.x;

uint iY = clusterId(vPos) % iClusterMapSize.y;

return expand( float2(iX,iY)/float(iClusterMapSize.x) ) + 1.0/iClusterMapSize.x;

}

[maxvertexcount(3)]

void main( triangle ClipVertex input[3], inoutPointStream<FragmentData> stream )

{

//For the current triangle, compute the area and normal

float3 vNormal = (cross( input[1].vWorldPos - input[0].vWorldPos, input[2].vWorldPos - input[0].vWorldPos ));

float fArea = length(vNormal)/6;

vNormal = normalize(vNormal);

//Then compute the distance of plane to the origin along the normal

float fDist = -dot(vNormal, input[0].vWorldPos);

//Compute the components of the face quadrics using the plane coefficients

float3x3 qA = fArea*outer(vNormal, vNormal);

float3 qb = fArea*vNormal*fDist;

float qc = fArea*fDist*fDist;

//Loop over each vertex in input triangle primitive

for(int i=0; i<3; i++)

{

//Assign the output position in the quadric map

FragmentData output;

output.vPos = float4(writeAddress(input[i].vPos),0,1);

//Write the quadric to be accumulated in the quadric map

packQuadric( qA, qb, qc, output );

stream.Append( output );

}

}

  • Start with the input mesh
    • Shown divided into clusters
  • Compute the quadric map from mesh
    • Use GS to compute quadric
    • Accumulate in cluster map, an RT used as large array
  • For each cluster, compute optimal position
  • Collapse each vertex to representative
    • 9x9x9 grid shown
vertex clustering pipeline13
Vertex Clustering Pipeline
  • Pass 2: Find Optimal Positions
    • Input: Cluster Map Render Targets, Full-screen Quad
    • Computation:
      • Determine if we can solve for optimal position
      • If not, fall back to vertex average
    • Output: Render Targets representing clusters with optimal position of representative vtx.
optimal positions
Optimal Positions

Original Mesh

  • For each cell, need representative
  • Naïve solution: Use averages
    • Looks very blocky
    • Does not consider the original faces, only vertices
  • Implemented solution: Use quadrics
    • Quadrics are a measure of surface
    • We can solve for optimal position

Simplified w/ Averages

Simplified w/ Quadrics

optimal positions implementation
Optimal Positions Implementation

float3 optimalPosition(float2 vTexcoord)

{

float3 vPos = float3(0,0,0);

float4 dataWorld, dataA0, dataB, dataA1;

//Read the vertex average from the cluster map

dataWorld = tClusterMap0.SampleLevel( sClusterMap0, vTexcoord, 0 );

int iCount = dataWorld.w;

//Only compute optimal position if there are vertices in this cluster

if( iCount != 0 )

{

//Read all the data from the clustermap to reconstruct the quadric

dataA0 = tClusterMap1.SampleLevel( sClusterMap1, vTexcoord, 0 );

dataA1 = tClusterMap2.SampleLevel( sClusterMap2, vTexcoord, 0 );

dataB = tClusterMap3.SampleLevel( sClusterMap3, vTexcoord, 0 );

//Then reassemble the quadric

float3x3 qA = { dataA0.x, dataA0.y, dataA0.z,

dataA0.y, dataA0.w, dataA1.x,

dataA0.z, dataA1.x, dataA1.y };

float3 qB = dataB.xyz;

float qC = dataA1.z;

//Determine if inverting A is stable, if so, compute optimal position

//If not, default to using the average position

constfloat SINGULAR_THRESHOLD = 1e-11;

if(determinant(quadricA) > SINGULAR_THRESHOLD )

vPos = -mul( inverse(quadricA), quadricB );

else

vPos = dataWorld.xyz / dataWorld.w;

}

return vPos;

}

  • Start with the input mesh
    • Shown divided into clusters
  • Compute the quadric map from mesh
    • Use GS to compute quadric
    • Accumulate in cluster map, an RT used as large array
  • For each cluster, compute optimal position
  • Collapse each vertex to representative
    • 9x9x9 grid shown
vertex clustering pipeline16
Vertex Clustering Pipeline
  • Pass 3: Decimate Mesh
    • Input: Cluster Map Render Targets, Input Mesh
    • Computation:
      • Find clusters, Remap vertices to representative
      • Determine if triangle becomes degenerate
      • If not, stream output new triangle at new positions
    • Output: Low-resolution Mesh
final clustering implementation
Final Clustering Implementation

[maxvertexcount(3)]

void main( triangle ClipVertex input[3], inoutTriangleStream<StreamoutVertex> stream )

{

//Only emit a triangle if all three vertices are in diff. clusters

if( all_different(clusterId(input[0].vPos),

clusterId(input[1].vPos),

clusterId(input[2].vPos)) )

{

for(int i=0; i<3; i++)

{

//Lookup optimal position in the RT computed in Step 2

vPos = tClusterMap3.SampleLevel( sClusterMap3, readAddr(input[0].vPos), 0 );

//Output vertex to stream out

stream.Append( vPos );

}

}

return;

}

  • Start with the input mesh
    • Shown divided into clusters
  • Compute the quadric map from mesh
    • Use GS to compute quadric
    • Accumulate in cluster map, an RT used as large array
  • For each cluster, compute optimal position
  • Collapse each vertex to representative
    • 9x9x9 grid shown
vertex clustering pipeline18
Vertex Clustering Pipeline
  • Alternate Pass 2: Downsample RTs
    • Input and Output as before
    • Computation:
      • Collapse 8 adjacent cells by adding cluster quadrics
      • Compute optimal position for 2x larger cell
    • Create multiple lower levels of detail without repeatedly incurring Pass 1 overhead (~75%)
      • Pass 3 can use previous streamed-out mesh
      • Lower levels of detail almost free
timing results
Timing Results
  • Recorded Time Spent in Decimation
    • GPU: AMD/ATI XXX
    • CPU: 3Ghz Intel P4
  • Significant Improvement over CPU
    • Averages ~20x speedup on large models
    • Scales linearly
more results
More Results
  • Models shown at varying resolutions

Buddha, 45x130x45 grid

Bunny, 90x90x90 grid

Dragon, 100x60x20 grid

Models Courtesy of Stanford Graphics Lab

more results21
More Results
  • Models shown at varying resolutions

Buddha, 20x70x20 grid

Bunny, 60x60x60 grid

Dragon, 50x25x10 grid

more results22
More Results
  • Models shown at varying resolutions

Buddha, 10x40x10 grid

Bunny, 20x20x20 grid

Dragon, 30x15x6 grid

outline23
Outline
  • Project Introduction and Motivation
  • Background
  • Geometry Shader-based Vertex Clustering
  • Adaptive Simplification w/ Non-linear Warps
    • View-dependent Simplification
    • Region-of-interest Simplification
  • Probabalistic Octrees on the GPU
view dependent simplification
View-dependent Simplification
  • Standard simplification does not consider view
    • Preserves uniform amount of detail all over
  • Simplify in post-projection space to use view
    • Preserves more detail closer to viewer (left)

View Direction

arbitrary warping functions
Arbitrary Warping Functions
  • View Transform special case of nonlinear warp
    • Can use arbitrary warp for adaptive simplification
  • Regular grids allow data-independence, parallelism
    • Constant time mapping from position to grid cell
    • Maps well onto GPU render targets
    • Forces uniform resolution throughout output mesh
  • Irregular geometry grids allow non-uniform output
    • Cells can be larger/smaller in certain regions
    • Corresponds to lower/greater output triangle density
    • We lose constant-time mapping of position to cell
  • Solution: apply inverse warp to vertices
    • Equivalent to applying forward warp to grid cells
    • Clustering still performed in uniform grid
    • Flexibility of irregular geometry w/ speed of regular
    • One proposal: Gaussian weighting functions
region of interest specification
Region-of-Interest Specification
  • Importance specified w/ biased Gaussian
    • Highest preservation at mean
    • Width of region given by sigma
    • Bias prevents falloff to zero
  • Integrate to produce corresponding warp function

(Derivation given in paper)

region of interest specification27
Region-of-Interest Specification
  • Warping allows non-uniform/adaptive level of detail
  • Head has most semantic importance
  • Detail lost in uniform simplification
  • We can warp first to expand center
  • Equivalent to grid density increasing
  • Adaptive simplification preserves head detail
outline28
Outline
  • Project Introduction and Motivation
  • Background
  • Geometry Shader-based Vertex Clustering
  • Adaptive Simplification w/ Non-linear Warps
  • Probabalistic Octrees on the GPU
    • Motivation
    • Probablistic Storage
    • Adaptive Simplification
    • Randomized Construction
    • Results
octrees motivation
Octrees - Motivation
  • Basic grid
    • regular geometry, regular topology
    • Limitations as we discussed
  • Warped grid
    • irregular geometry, regular topology
    • Much improved; however, we can do better
    • May be difficult to know required detail a priori
  • CPU Solution: Multi-resolution grid (i.e. octree)
    • Irregular topology (irregular geometry w/ warping)
    • Store grid at many levels of detail
    • Measure error at each level, use coarse as possible
    • Efficiency requires dynamic memory, storage O(L3)
    • Requires O(L) writes to produce correct tree
gpu solution probabilistic octrees
GPU Solution – Probabilistic Octrees
  • Proposal
    • Successful storage not guaranteed, w/ Prob. <= 1
    • However, storage failure detected on read
  • Assumptions allow much flexibility
    • We can have unlimited depth tree (but lim P=0)
    • Sparse storage of data
  • Require conservative algorithms for task
    • Vertex clustering (conveniently!) is such an example
    • So is collision detection and frustum culling
  • Only studied in brief in this paper, we would like to analyze more for future work
implementation details
Implementation Details
  • Storage: Spatial Hashes
    • Map (position,level) to cell, cell hashed to index
    • Additive blending for quadric accumulation (app-specific)
    • Max blending to store (key,-key) with data (i.e. min_key,max_key)
  • Retrieval:
    • Again map (position, level) to index
    • Retrieve key value from data, collision iff min_key != max_key
    • Use parent level, which will have higher storage probability
  • Usage for Adaptive Simplification
    • For each vertex, find maximum error level below some threshold
    • Use this as the representative vertex
    • Can perform binary search along path
    • Conservative, because we can maintain validity even when using parent of optimal node (just adds some error)
probabilistic octree results
Probabilistic Octree Results
  • Adaptive simplification shown on bunny (~4K tris)
    • Preserves detail around leg, eyes and ears
    • Simplifies significantly on large, flat regions
  • Using 8% of storage of total tree, we have < 10% collisions
  • Only ~20% performance hit vs. standard grids
conclusions
Conclusions
  • GS is a powerful tool for interactive graphics
  • Amplification and decimation are important applications of GS
geometry shaders and other feature wish list
Geometry Shaders and Other Feature Wish-List
  • Bring back the Point fill mode
    • Important for scatter in GPGPU applications
  • Data amplification improvements with indexed stream out
    • Avoiding triangle soups very non-trivial
  • Efficient indexable temps
thanks a lot
Thanks a lot!
  • Various people here…