Massive-Model Rendering Techniques Andreas Dietrich, Enrico Gobbetti, Sung-Eui Yoon IEEE CGA Nov/Dec 2007

Massive-Model Rendering TechniquesAndreas Dietrich, Enrico Gobbetti, Sung-Eui YoonIEEE CGA Nov/Dec 2007

Motivation • interactive visualization of massive 3D models – science, engineering, education, entertainment • ability to gather or generate massive 3D data exceeds ability to interactively render massive 3D data • memory bandwidth limiting factor in CPU and GPU • aim for output-sensitive rendering alg.’s • runtime and memory proportional to # of pixels (not model complexity) • need out-of-core data management • filter out data that doesn’t contribute to particular image

Example 1A • Boeing 777 CAD, 350M triangles

Example 1B turbulent fluids, 2K x 2K x 2K samples with 270 time steps = 1.5 TB’s

Example 1C • Michelangelo’s St. Matthew, 372M triangles (9.6 GBs)

Example 1D • Puget sound 20GB terrain + tree models = 90 trillion triangles

Two main rendering techniques • rasterization vs ray-tracing (object-order vs image-order) Rasterization

Rasterization (object-order) • pipeline implies processing any number of primitives in stream-like manner • important when scene size > memory size • limited to O(n) run-time complexity, n = # of primitives • to get logarithmic time complex • spatial index structures (CPU) to cut down primitives sent to pipeline • gap of GPU performance and memory bandwidth requires careful working set management

Ray-tracing (image-order) ray-casting and ray-tracing

Ray-tracing Example

Ray-tracing • basic r.t. simpler to implement than rasterization • entire geometry stage handled implicitly • must limit number of primitives tested for ray intersection to get logarithmic time complexity • spatial index structures • acceleration structures core part of modern r.t. renders • global lighting (including indirect lighting) possible when combining Monte-Carlo integration techniques with r.t. • shaders can describe diff. surfaces independently and r.t. combines all effects in physically correct way

Ray-tracing Packets • packet – bundle of rays simulatenously traced through scene • packet tracing can use SIMD vector operations of modern CPU’s • deferred shading – avoid switching between intersection and shading computation per ray • amoritizing memory access, function calls, etc. • frustum traversal methods bound ray packets and cut down traversal and interactions calc’s – object and scan-line coherence!

Comparison • rasterization – • efficiently exploit scan-line coherence • best when ‘few’ triangles cover large screen space • ray tracing • perform better if visibility evaluated point-wise • hierarchical front-to-back rasterization + occlusion culling similar to beam or frustum tracing • current renders either r.t. or rast. • hybrids likely • hybrids encouraged by more general purpose, highly parallel stream processors (akka GPUs)

Complexity Reduction Techniques • Geometric simplification • level of detail • discrete LOD • progressive meshes • continuous LOD • Visibility culling • back-face, view-frustum and occlusion culling

Geometric Simplification • interatively simplify input mesh by sequence of vertex removal or edge contraction

Error Evaluation • approximation accuracy critical to simplification results • most common is quadric error metric by Garland and Heckbert • associates quadric matrix per vertex; less memory than tracking distances to all associated planes • most simplification alg. use greedy strategy • sort candidate vertices using metric • pick vertex & operation for minimal simplification error • streaming simplification – use finalization tags on max. LOD data • bulk of mesh kept out-of-core

Level of detail • LOD – compact discription of multiple representations of single shape • discrete LOD (Clark76) • standard approach; used everywhere (even VRML, etc.) • sufficient only for small, isolated objects • progressive LOD • coarse shape + sequence of small modifications • sufficiently only for uniformly accurate approximations • continuous LOD • progressive + selective refinement

Continuous LOD and multi-triangulation

Continuous LOD and multi-triangulation E. Puppo and R. Scopigno, Simplification, LOD and Multiresolution - - Principles and Applications, Eurographics '97 Tutorial Notes, 1997.

Granularity of continuous LOD • LOD decision per vertex/triangle • on general purpose CPU • LOD decision on blocks of triangles • less LOD decision computation • More efficient for modern GPU’s

Visibility Culling • in massive (“real”) scenes most data can’t be seen from given view point; occlusion • depth complexity • goal: reject large sections of scene before visible surface determination • this is visiblity culling • LOD and v.c. needed for output sensitive rendering • methods: • back-face culling • view-frustum culling • occlusion culling

Visibility Culling

Occlusion culling • global nature makes it hard • broad classifications • from-point visibility algorithms • from-region visibility algorithms • from-region • spatial subdivision of scene into fixed cells • preprocessing: compute potentially visible set (PVS) • mainly used in specialized cases: urban outdoors, interior of buildings • from-point • computed on-line, more general

Bounding volume hierarchies • visibility alg.’s use spatial index • bounding volume hierarchies or spatial partitioning • BVH • organize geometry bottom up in tree structure • render top down • spatial partitioning • subdivide scene top-down • hierarchical grids, octrees, kd-trees (axis aligned BSP)

Kd-tree with ray-tracing or rasterization D A B C E

Early traversal termination • to get sub-linear time also need early traversal termination • ray tracing – terminate when ray tracing reaches hit point • rasterization – exploit z-buffer with occlusion queries • for all cells in spatial subdivision render them front to back • render a bounding box for current cell (disable framebuffer writes) • if any pixels would have been written render primitive set in leaf cell or recurse into non-leaf cell • occlusion queries need to avoid CPU stalls and GPU starvation

Discussion • few approaches integrate LODs and occlusion culling • off-line simplification basically unaware of visibility • with complex models view-dependent LODs resolving occlusion properly is necessary even for individual pixels • Alternative render primitives to triangles • Points • voxels • images

Complex Example

Point Primitives • point primitives – ignore mesh connectivity during preprocessing and rendering • Levoy and Whitted 1985 – points better than triangles for complex, organic shapes • current hardware lacks support for essential point filtering and blending • point-representations also used in generating LODs • classically used as surface elements • recently used as volumetric elements (“Far Voxels”) • voxel stores direction-dependent approximation of contents • approximation construction is visibility-aware and assumes distant viewer

Image-Based Rendering (IBR) • “geometry + color + lighting” VS. “infinite collection of images, one per view pose and time” • data size forces hybrid approaches • imposters - geometry represent for nearby objects and image representation for distance objects • portal textures – for environments with natural subdivision into cells with reduced mutual visibility • limitation of single texture imposter yields artifacts during view motion; need to incorporate parallax • textured depth meshes – imposter is texture + depth info. per vertex (in imposter mesh) • layered depth images – each pixels stores all the intersections of view ray with scene • renewed interested in IBR (decade old) due to programable GPU

Data Management • driven by gap computation performance and bandwidth through memory hierarchy • 10-8 s L1/L2 caches • 10-7 s main memory • 10-2 s disk • networking latency • Options: • out-of-core techniques • layout techniques • compression techniques

Out-of-core • major part of model on disk • Reduce disk accesses • 2 cache parameters (cache-aware techniques) • size of main memory • disk block size • manage working set • Explicit data page system • avoid I/O thrashing • use compact external representations to reduce I/O from cache misses

Layout techniques • 3D geometry of triangle mesh versus 1D linear representation on disk • need index mapping scheme

Coherency • geometric coherency – in rasterization or ray tracing triangle data tends to be accessed coherently • what about adjacency in 1D memory? • I/O architectures and memory hierarchy • lower level - larger in size and slower • data moved between levels in blocks • caches used between levels • data transfer (block-fetch) occurs on cache-miss • assume data accessed coherently • problem: spatially coherent access often yields non-coherent memory access

Cache-coherent (spatial) layouts • organize spatial data in 1D memory to minimize cache misses • cache-aware vs cache-oblivious layouts • Example (C.A.): optimize triangle list sequence for mesh to reduce GPU vertex cache misses • up to six times performance increase • use size of vertex cache • cache-oblivious – doesn’t use cache size parameter • layout minimizes expect cache misses with various block sizes • can get benefits from all levels of mem. Hierarchy • can use standard OS paging instead of custom one • developed for meshes and BVH’s for rendering and other geometric computations

Compression Techniques • mesh compression – compute compact representations by reducing redundant info. • widely researched • commonly based on triangle strips (easy hardware decode) • tri. strips not useful for ray-tracing • alternate mesh compression algorithms needed for random access (i.e. ray-tracing) • decompose mesh into chucks which are (de)compressed separately • ray-strips – sequence of vertices which implicitly encodes triangles and BVH

Discussion • out-of-core • reduce disk access time; require memory and disk block sizes • better disk access than cache-oblivious layouts but require explicit paging system with non-trivial system level implementation • cache-oblivous layouts • don’t require cache parameters • achieve reasonably high performance • compression can improve either of above

Parallel-processing techniques • especially with advanced shading single CPU/GPU can’t keep up • sort-first – subdivide screen space into disjoint regions rendered independently • sort-last – split scene data into several parts distributed among separate RAM+CPU+GPU combinations; rendering system composes parts into final image • rasterization – merge N framebuffers and z-buffers • ray-tracer – ray-traced scene parts and merge as above

General techniques • data parallel rendering • demand-driven rendering • distributed rendering

Data parallel rendering • defined as parallel rendering of distributed scene database • reduces complexity of visibility calculations • each chunk of massive scene can fit into subsystem’s memory (so parallel system can handle bigger scene) • advanced shading difficult since it often requires access to all parts of the scene • rasterization – typically use sort-last image composition • ray-tracing – usually sort-first on primary ray; but secondary rays often require lots of subsystem communication • pure data parallel rendering can’t handle load-imbalances from viewpoint changes

Demand-driven rendering • sort-first screen subdivision can use static assignment of screen region to rendering subsystem • better: split screen into small regions (tiles) and dynamically assign computation subsystem to tiles • avoid leaving statically assign rendering subsystem unutilized • rendering subsystem (clients) ask for next tile needing rendering • when tile is completed send results to master processor for composition • resulting loading balancing yield almost linear scalability in the number of rendering clients

Distributed rendering • shared-memory versus distributed systems • master process distributes rendering workload to rendering clients and assembles results for final display • try to assign same tile to same client across frames (temporal coherence) • to hide latency asynchronously perform • rendering • network transfer • Image display • updating scenetransfer image data of frame N & client render frame N+1 & application updates frame N+2

Discussion • trends multi-core CPU and GPU – faster rendering • but scenes keep growing • likely will need to still use distributed system of multi-core CPU/GPU’s

System issues • Rendering massive scene requires • advanced algorithms and data structures • efficient combining of techniques • mixing and matching techniques to balance realism vs framerate takes significant effort • no single standard approach exists • some representative state-of-the-art systems • Visibility-driven rasterization • Real-time ray-tracing • LOD-based mesh rasterization • switching to alternative rendering primitives

Visibility-driven rasterization • high depth complexity – architectural walk throughs and large CAD assemblies; occlusion culling most effective • Visibility Guided Rendering (VGR) • hierarchy of axis-aligned bounding boxes • internal node has splitting plane on a primary axis; used to traverse front-to-back • preprocessing: generate tree top-down • run occlusion queries in parallel to traversal • maintain queue of query requests • fill queue in breadth-first order • far nodes maybe rendered unnecessarily since not all nearer ones are rendered; but this avoid GPU stalls

VGR

Track previously visible leaf nodes • Keep list of leaf nodes visible in last frame • render these first in current frame • frame-to-frame coherence • fill z-buffer before first occlusion query takes place • visibility info. from leaves propagated up the tree to exclude subtrees from traversal and visibility testing (11B) • if node’s projected area is smaller than a pixel, switch to point rendering and optionally randomly skip points for distance nodes

iWalk • VGR on-line visiblity culling vs iWalk extensive preprocessing • preprocess: construct out-of-core octree • rendering time: • compute visibility coefficient for each octree node • predict visibility events • use prediction to prefect geometry likely to be needed in next frame (avoid stall in next frame)

Real-time ray-tracing: OpenRT • OpenRT – interactive ray-tracing on cluster of PC’s • multi-level kd-tree • each object has kd-tree • bounding volume of object placed in global kd-tree • allows for some motion of objects and instancing • logarithmic time complexity allows huge in-core scenes (Fig. 1D) • tile-based demand-driven interactive rendering • out-of-core support with custom memory management • simplified in-core model used when data is loading • plug-and-play shaders support soft shadows, transparency, etc. (Fig. 12a and 12b)

OpenRT

Massive-Model Rendering Techniques Andreas Dietrich, Enrico Gobbetti, Sung-Eui Yoon IEEE CGA Nov/Dec 2007

Massive-Model Rendering Techniques Andreas Dietrich, Enrico Gobbetti, Sung-Eui Yoon IEEE CGA Nov/Dec 2007

Presentation Transcript

Non-Photorealistic Rendering (NPR)

Solar Energy on a Massive Scale

Introduction to Haptic Rendering

Characterizing and Analyzing Massive Spatio-Temporal Graphs

Today's IEEE: Career, Content, and Networking

Visibility Driven Out-of-Core HLOD Rendering

Adviser: Frank,Yeong -Sung Lin Present by Chris Chang

Lecture 5 FCS, Autocorrelation, PCH, Cross-correlation Enrico Gratton

Model Checking

Architecture and Techniques for Diagnosing Faults in IEEE 802.11 Infrastructure Networks

The Rendering Process Hidden Line and Hidden Surface Removal (HLHSR)

IEEE R8 Committee Bucharest Meeting, October 12-14, 2007 IEEE R8 TECHNICAL ACTIVITIES

A Security Model for Anonymous Credential Systems

Optimization of a bearing angle Dipl.-Ing. (FH) Andreas Veiz

An Interactive Introduction to OpenGL Programming

Universal Gravitation

Practical Implementation of SH Lighting and HDR Rendering on PlayStation 2