Loading in 5 sec....

Real-Time Hair Simulation with Parallel ComputingPowerPoint Presentation

Real-Time Hair Simulation with Parallel Computing

- 93 Views
- Uploaded on
- Presentation posted in: General

Real-Time Hair Simulation with Parallel Computing

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Real-Time Hair Simulation with Parallel Computing

Oriam De Gyves465730

- A method for real-time simulation using Nvidia’s Compute Unified Device Architecture is presented.
- Parallel Computing + fast integration scheme
- 100,000 hairs (800,000 particles) running at 23 frames per second.

- Particle simulation
- Realistic human characters
- Hair
- Realism
- Better appearance

Challenging problems in computer graphics

- The problem is…
- People has around 100,000 hairs
- Expensive characteristic to simulate

- But…
- Each hair can be viewed as a set of particles
- CUDA is a really good architecture for simulating particles because of it’s great degree of parallelism
- Verlet integration scheme offers an attractive way to modify particle’s positions faster than traditional methods like Euler integration.

- Three important parts in the implementation:
- Parallel Computing (provided by CUDA)
- Fast integration scheme (provided by Verlet integration)
- Access GPU’s RAM information (using VBOs)

- CUDA
- I’m sure you all know CUDA

- Verlet Integration
- Instead of storing position and velocity
- x = 2x – x’ + a*Δt2
- x’ = x

- VBOs
- CUDA data transfers from GPU to CPU, or vice versa, are really expensive
- Access information in the GPU with a pointer
- Improves overall performance

- Notes:
- N = number of particles per hair
- H = total number of hairs in the simulation

- So…
- Instead of working with H arrays of size N (100k arrays of size 8)
- Work with N arrays of size H (8 arrays of size 100k)
- Why?
- CUDA Parallelism
- So it is more efficient

- In other words…
- Depending on the hardware device, up to 1024 threads can be in execution at the same time

Cudaworkswith:

N Grids of size H

Eachgrid has 1 particle of eachhair(x,y,z)

- Initialization of the particle’s position:
- Using sphere coordinates
- Radius = the head radius for the first Grid
- Radius = the head radius + offset for the rest of the grids

- Constraints handled by relaxation
- Iterative process
- Downside: The number of iterations depends on the number of particles in each hair.

- Every iteration:
- Verlet integration
- Get forces (wind, gravity)
- Wind is a sin and cos function
- Wind drag is calculated in CPU only one for each Grid
- Get the new position for each particle

- Handle Collisions
- Collisions handled only with the head

- Satisfy Constraints
- Do CONSTRAINT_ITERATIONS times
- 2 Grids are sent to CUDA
- Hairs are aligned in the grid
- So Grid1[index] and Grid2[index] are particles of the same hair
- Compare distances of these two particles and fix it if necessary

- Drawing
- Use a VBO to access GPU’s RAM and draw the particles.

- Verlet integration

- Depending on the method to obtain the data from the GPU, the performance of the simulation can be seen in the next table:

- One of the most important things to address in the future is the collision handling between hairs. It is irrelevant the collision between particles since the lines can intersect even if the particles do not.
- Improved rendering techniques like self-shadowing, alpha blending and a geometry other than a line.
- Use only one Grid so the use of a single VBO improves performance instead of decreasing it.
- Level of detail using a geometry shader to calculate the position of only a fraction of the hairs but still drawing 100,00 primitives.
- Little fix: variable wind force calculated in the CPU once for every grid.