1 / 10

PARSEC FACESIM

PARSEC FACESIM. study and parallelization Dmitri Makarov, Dmitri Shtilman. facesim facts. iterative numerical methods: data parallel (floating point computation). size of the problem: ~370K tetrahedrons. included in PARSEC. C++ application.

loretta
Download Presentation

PARSEC FACESIM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PARSEC FACESIM study and parallelization Dmitri Makarov, Dmitri Shtilman

  2. facesim facts • iterative numerical methods: data parallel (floating point computation). • size of the problem: ~370K tetrahedrons. • included in PARSEC. C++ application. • parallelized: taskQ, thread pool with custom barrier implementation. • not scalable beyond 16 threads (for 128 threads speedup only 16x on T2+)

  3. facesim scaling on T2+

  4. facesim cputrack profiling

  5. facesim issues • in-house barrier: • pthread_cond_wait() • pthread_cond_signal() • not parallelized stages of the simulation • overhead of tasks, extra computations • resource contention: 128 threads • 32 FPUs • 32 LSs

  6. improvements • reworked the thread pool implementation: • N-1 threads created before the simulation starts • each thread works on its partition of data • master sets a flag when work is ready • all wait at barrier for everyone else to finish task • no need to add tasks to queue (same entry point) • use spin barrier instead of pthread barrier • parallelized sequential stages

  7. results

  8. other platforms

  9. observations • generic API may be portable but inefficient lib implementation could kill performance. • C++ can introduce a lot of redundancy if not used carefully: • new T[N] calls T() sequentially N times. • library should implement default constructors as efficiently as possible. • flexibility has a cost: • load balancing overhead can outweigh its benefits

  10. future work • apply our observations to other PhysBAM-based applications • study and optimize cloth simulation • port facesim to CUDA, OpenCL (real-time rendering) • implement important parts of PhysBAM in Scala

More Related