1 / 32

Enhancing and Optimizing the Render Cache

Enhancing and Optimizing the Render Cache. Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg Cornell Program of Computer Graphics. Background. Render Cache

Download Presentation

Enhancing and Optimizing the Render Cache

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enhancing and Optimizing the Render Cache Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg Cornell Program of Computer Graphics

  2. Background • Render Cache • “Interactive Rendering using the Render Cache”, Rendering Workshop 1999 • Goal • Interactive Rendering • Exploit frame-to-frame coherence • Decouple renderer from display framerate • Reuse “expensive” rendering results

  3. Background • Goal: Interactive rendering Ray tracing Path tracing

  4. image renderer display user application Background • Modified Visual • Feedback Loop Asynchronous interface

  5. Background • Reproject rendered points Original view New view

  6. Background renderer Displayprocess Update Points Project/Z-Buffer DepthCull image Interpolate Sampling renderer

  7. Background • Results after each stage Projection Depth cull Interpolation

  8. Background • Sampling Displayed image Priority image Requested pixels

  9. Related Work • Faster ray engines • Optimize and parallelize • E.g., Wald et al • Hardware-based display • Mesh-based • E.g., Tapestry, Holodeck, Tole et al • Texture-based • E.g., Corrective textures

  10. Motivation • Render Cache works well • Can enable interactive use of higher quality ray-based renderers. • … but needs improvement • Images too small (256x256) • Gaps often visible during camera motion • Not fast enough in tracking shading changes

  11. Enhancements • Tiled Z-Buffer • Better scalability and memory coherence • Larger Interpolation Prefilter • Can fill larger gaps between points • Predictive Sampling • Improved quality during camera motion • Point Eviction • Faster update of shading changes

  12. Enhancements • Code Optimization • Use of SIMD (MMX/SSE/SSE2) • Data layout, branch conversions, etc. • Publicly Available • For evaluation, comparison, or use • Non-commercial binary release • URL is in the paper

  13. Memory Coherence • Change from R10K to Pentium 4 • Cache reduced from 4MB to 256K • Clock increased from 195MHz to 1.7GHz • Cache misses much more expensive • Change from 256x256 to 512x512 • Point data ~ 5MB, Image data ~ 3MB • Much bigger than cache • Projection and Z-Buffer problematic

  14. Projection and Z-Buffer • Random order memory access • Read/modify/write operation is memory latency limited Point Cloud 5MB Image - 3MB

  15. Tiled Projection and Z-Buffer • Divide image into tiles • Tiles sized to fit in cache Point Cloud 5MB Tile Buckets - 4MB Image - 3MB

  16. Tiled Projection and Z-Buffer • Project and bucket sort by tile Point Cloud 5MB Tile Buckets - 4MB Image - 3MB

  17. Tiled Projection and Z-Buffer • Z-Buffer each tile separately Point Cloud 5MB Tile Buckets - 4MB Image - 3MB

  18. Tiled Projection and Z-Buffer • Uses more memory and instructions • But it is faster (25ms instead of 42ms) Point Cloud 5MB Tile Buckets - 4MB Image - 3MB

  19. Interpolation Filters • Larger filters • Fill larger gaps in point data • Generally more expensive • Result in more blurring of the image • The previous Render Cache • Used a 3x3 weighted filter • Can only fill very small gaps • Introduces only a small amount of blurring

  20. Prefilter • Add a larger “backup” filter • Results used only when 3x3 filter fails • Uses a uniform 7x7 filter • Can be computed cheaply • Can fill in much larger gaps • Does not affect sampling priorities • Actually executed first then overwritten • Hence the name “prefilter”

  21. Prefilter 3x3 filter only 7x7 prefilter only Both filters

  22. Predictive Sampling • Sampling is purely reactive • Helps to guide sparse sampling • Samples returned in later frame • Problem when large new regions become visible • Predict large gaps ahead of time • Project using a predicted camera • Request samples before they are needed

  23. Predictive Sampling • Projection is expensive • 47% of original render cache cost • Use simplified projection • No Z-Buffer • Only need to find regions with no points • Reduced resolution • 1/4 width and height (1/16 # of pixels) • Store only 1 byte per pixel • Occupancy image fits easily in cache

  24. Predictive Sampling • Example during rapid camera rotation No Prediction With Prediction

  25. Algorithm Overview Update Points renderer Prediction Project/Sort Z-Buffer DepthCull image Prefilter Interpolate Sampling renderer

  26. Point Eviction • Stale data can be worse than no data • Points may live a long time at high ratios • Not enough new samples to overwrite old • Color change detection already exists • Enhances sampling in regions of change • Works by aging nearby points • Evict points beyond an age limit • Speeds image convergence

  27. SIMD Optimizations • Utilize MMX/SSE/SSE2 instructions • Project four points at once • Process R,G,B channel simultaneously • Add memory prefetches • Automatic prefetch works well for linear access • Convert branches to data dependencies • Compares set masks of zeroes or ones • Use boolean operations instead of branches • Roughly a factor of two total speedup

  28. Results • Single 1.7GHz processor - rotating camera Ray trace only (1.8 fps) Render Cache (9 fps)

  29. S a m p l i n g U p d a t e P o i n t s F i l t e r / S m o o t h P r e d i c t i o n P r e f i l t e r D e p t h C u l l P r o j e c t Z - B u f f e r Results • Timing: 62.1 ms (up to 16 fps) • 512x512 image, render cache only • 1.7GHz Pentium 4 processor

  30. Scalability with Image Size 1600000 1200x1200 1400000 1200000 1000000 800000 600000 Frame Size (Pixels) 400000 512x512 200000 0 0 50 100 150 200 250 300 350 Frame Time (ms)

  31. Results • Try it for yourself • Download publicly available binary • Includes Render Cache and simple Ray Tracer • Requires a Pentium 4 and Java Web Start • Free for evaluation and internal use • Http://www.graphics.cornell.edu/research/interactive/rendercache • Demo

  32. The End

More Related