1 / 85

UCSD Site Report to the IAB

UCSD Site Report to the IAB. Sheldon Brown Site Director Daniel Tracy CHMPR Programmer. May 11, 2011 Baltimore, MD. Overview. Ongoing Projects Multi-User Extensible Virtual Worlds Assets , Dynamics and Behavior Computation for Virtual Worlds and Computer Games Extending the CHMPR

dalia
Download Presentation

UCSD Site Report to the IAB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UCSD Site Report to the IAB Sheldon Brown Site Director Daniel Tracy CHMPR Programmer May 11, 2011 Baltimore, MD

  2. Overview Ongoing Projects • Multi-User Extensible Virtual Worlds • Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Extending the CHMPR • Future Cinema – has revised focus on Augmented Reality • FRP and RapidMRI projects underway • REU – two new undergraduates involved in research • Complementary project with NSF EAGER grant • “Identifying and Integrating Creative Patterns of User Behavior and

  3. Staff • Sheldon Brown, Site Director • Erik Hill, Programmer Analyst • Daniel Tracy, Programmer Analyst • Todd Margolis, Progammer Analyst • Kristen Kho, Programmer Analyst • Jeremy Douglass, Post-Doc Researcher • VivekRamavajjala, Graduate Student • Sam Kronik, Graduate Student • Robin Betz, Undergraduate Student • Bradley Ruoff, Undergradauate Student • Lourdes Guardiano-Durkin, Administrator

  4. Projects Ongoing projects: • Multi-user Extensible Virtual Worlds • Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Revised Project • Future Cinema as Augmented Reality Affiliated Projects • Identifying and Integrating Creative Patterns of User Behavior and Experience in Virtual Worlds

  5. Products and Activities Last six months: Virtual World Exhibitions • CSU Sacramento • UCSD 50th Anniversary Innovation Expo Next Generation Cinema Presentations • presentation by Justin Rattner, Intel • Featured on French/German TV: Souvenirs from Earth • Ukraine: Video Art in a Global Context Exhibition • Mexico Moving Forward • College Art Association New York • 3D movie featured at Seoul Korea Film Festival • Scalable City wins first prize in Sony Europe 3D movie competition Lectures • Varieties of Virtual World Experience via Multicore Computing at the Frontiers of Multicore Computing  •  Intel Labs Radio Show • Keynote talk for NEA/NSF Summit at RPI •  EMPAC. I gave one of the keynote talks • Publications • Tracy D., Brown S. Combining Parallel & Incremental Techniques for Real-Time Physics in Large, • Continuous Virtual Environments. Journal of Computing and Concurrancy – pending publication. Website: http://chmpr.ucsd.edu

  6. Multi-user Extensible Virtual WorldsStatus: Continuing • Project Description: Multi-user Extensible Virtual Worlds In order for virtual worlds to realize their potential across a number of areas of industry and research domains, along with serving as generally effective social forums, their expressive qualities need to be significantly improved upon. They require a considerable increase in the quantity and quality of entities and their interactions. This also entails a substantial increase in the sizes of virtual worlds, the number of users that are able to be supported, the variety of objects and behaviors and the simultaneity of entity interactions. • Sponsors: IBM, Intel • Deliverables: • Prototype Multi-user Virtual World ongoing development

  7. Multi-user Extensible Virtual WorldsStatus: Continuing • Sponsors: IBM, Intel • Deliverables: • Prototype Multi-user Virtual World ongoing development • Major results • Optimizing Client Server operations. Integrating Compute accelerators.

  8. Scalable City: Massive Scale Virtual Worlds • Massively multiplayer continuous world • Hundreds of thousands of interactive objects • Large aggregate bandwidth requirements • Challenges/Issues • Optimization, feature development, workable across heterogeneous clients

  9. Scalability Support large environments, massively multi-player Hybrid, Multi-platform server z10, x86, CellBE, Tesla accelerators Performance Clients need to perform well on a range of desktop computer configurations Goals

  10. Increasing complexity of objects and interactions with increasing world size, users, numbers of objects and types of interactions. Multiple 10gb interfaces to compute accelerators, storage clusters and compute cloud. compute accelerators for asset transformation, physics and behaviors. Server system keeps track of world state. Server services are distributed across cloud clusters, and redistributed across clients as performance or local work necessitates. Coherency with overall system is pursued, managed by centralized server. Virtual world components have dynamic tolerance levels for discoherency and latency.

  11. Development Server Framework 5/2010 4 QS20 blades – 8 Cell CPU’s 2 QS22 blades - 4 Cell CPU’s 8HS22 blades - 16 Xeons – 96 cores 4 way Xeon Server – 32 core 3 10gb interfaces to compute accelerators 1 10gb interfaces to internet Many Clients IBM Z10 mainframe computer at San Diego Supercomputer Center 2- IFL’s with 128mb Ram, zVM virtual OS manager with Linux guests 6 tb storage fast local storage – 15K disks 4 SR and 2 LR 10gb ethernet interfaces nVidia Tesla accelerator – 4 GPU’s on linux host, external dual pci connection.

  12. How do you program a distributed heterogeneous system? • Server manages various virtual world processes. Use compute accelerators for compute intensive, parallelizable subsystems such as physics. Two phase approach: • Different systems for different underlying architectures return compatible results • Xeon blades running Scalable Physics Engine x86 optimized • GPU’s or novel architectures run Bullet engine Distribute heavy computational stages • Collision Detection on broad phase pair output • Constraint solving/Integration on contact groups • Long term approach : OpenCL plan • Develop physics system using algorithms well-suited to OpenCL parallelization • Applicable to both object collisions and deformation • Same code base for different hardware – host or server side accelerators • Parallelization occurs throughout the physics pipeline • Linearly scalable to availability of hardware resources • Similar approach for other aspects of asset computation

  13. Server Goals: 10,000 players on 1,000 cities Performance Challenges Communication: 14.2 GB/sec to clients Physics: 200,000 active objects Rendering: x10 particle system complexity Multi-user Environment

  14. Fast & non-redundant data marshalling/archiving “Player data-sharing” optimizations Generating assets deterministically on client Removes need to communicate resources Reduced c/s synchronization frequency Client-side interpolation Further tweaks to reduce bandwidth Messages consolidated, compressed Adaptability to Client Hardware Communication

  15. Heterogeneous Client Support • Client machine profiling • Processing power (CPU, GPU, # of cores) • Rendering performance • Networking latency/bandwidth • Dynamic fidelity adjustment • Graphics effects • Shadows, volumetric rendering, particle systems… • Planned: Compute/synchronization trade-off

  16. Future Work: Client-side Predictive Physics • Interpolation smooths movement until server stalls • If server increases lag, there is nothing to interpolate to! • Inject copy of Server functionality into Client • Performs same work on subset of data for prediction • Server state may differ from prediction • Client interpolates what user sees during correction • Allows us to decrease synchronization latency much further • Update frequency adjustable based on client process/network Server Client Server

  17. Tool of Interest: Growth Tracker

  18. Stability • Complex software subject to glitches • Scalable City designed to run continuously • Some bugs don’t manifest immediately • Scalable City grew virtual memory footprint • Confirmed no memory leaks! • Tools exist to detect memory leaks quickly

  19. Stability • Higher-level “memory leak” problems: • Aggregate structures that persist across processing cycles can grow unbounded • Not a detectable problem to the system! • Scalable City uses massive number • STL & boost structures, strings, etc, etc. • Data Structure Growth Tracking Tool • Override implementation of all aggregates!

  20. Growth Tracker • Growth tracker is a “singleton class” • One instance in each program (client/server) • Every instance tracked by singleton class • Registers upon creation • Every instance sampled periodically • Exponentially-increasing sample time • Size of each instance tracked over time • Algorithm detects problem based upon history

  21. Growth Tracker • Practical implementation in large project • Cannot modify data structure usage code! • Must be self-contained solution • Must override all aggregates in all files • Required advanced C++ features • Complex, but small implementation • Little data extractable: address, complete type • Remains installed: one command turns on

  22. Growth Tracker • Useful software tool • Generalizes to any long-running program • Requires running application for 1-2 days • No general way to detect unintended growth • Provides more useful output than a crash upon allocation! • This kind of problem can be impractical to find in large software without such a tool

  23. Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer GamesStatus: Continuing • Project Description: Digital media environments are increasingly authored by users while they interact with them. This means that components such as their media assets and their behavior is under real-time control, rather then authored in advance. Doing so presents computational challenges to insure ongoing real-time performance, it also creates challenges in tracking assets across multiple types of instantiations. • Sponsors: IBM, Intel, • Deliverables: • Improve dynamics and asset computation across virtual worlds and digital cinema

  24. Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Physical-based Simulation in the Massively Multi-player Scalable City Environment using OpenCL

  25. Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Review of work to date

  26. Scalable City:Physics Engine Evolution • Open Dynamics Engine • Open Source, Convenient, Good Reputation • Augmented/Replaced subsystems over time • Broad phase CD designed for large VR environ. • Pipeline redesign for resting objects • Multi-threaded subsystems for higher activity • Only the core constraint solver remains ODE

  27. Pipeline Redesign • Overhead proportional to level of activity, rather than environment scale • Novel broad phase and pipeline methods

  28. Multithreaded Stages • Thread-parallelism: limited scale • Traditional physics methods allow limited parallelism

  29. New Physics Engine • New physics engine from scratch in C++ • Designed for massive parallelism • SIMD & massively threaded (via OpenCL) • Distributed Computing (MPI) • Unique design for OpenCL physics • “Advanced Character Physics”, Thomas Jakobsen

  30. Massively Parallel Physics • Physics atoms are particles & constraints • Objects represented as set of these atoms • Rigid Body Dynamics behavior is “emergent” • Soft bodies can also be modeled integrally

  31. Advantages • Massive, simple, evenly divided computations • Collision detection and constraints operate on particles • All constraints are solved independently • Eliminates most OpenCL buffer transfers • No contact graph generation stage • Broad phase collision detection integrated with collision constraint solving N-Body Coll. Det. Contact Graph Integration N-Body Coll. Det. Integration

  32. Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Implementation Progress

  33. Progress: Last Meeting • What we had finished: • Particle system with Verlet integration • Heightmap constraint w/ interpolation, friction, bounce • Stick constraints • Rigid body construction from particles + sticks • Multi-pass relaxation solver • Object transform extraction from particles • Dynamic object insertion & removal during simulation

  34. Progress: Last Meeting • What we were lacking: • Support for multiple object topologies in OpenCL • Efficient OpenCL transfers for object migration • Parallel OpenCL broad phase collision detection • Integration into Scalable City incremental physics • MPI layer for distributed processing

  35. Progress: Current • Additional Progress: • Support for multiple object topologies in OpenCL • Efficient OpenCL transfers for object migration • Parallel OpenCL broad phase collision detection • Integration into Scalable City incremental physics • MPI layer for distributed processing

  36. Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Object Multi-topology Support

  37. Object Multi-Topologies • Requires indirection in accessing object info • Single buffer per format (sticks, particles, forces) • Mapping from object to ranges in each in CL • Hole-tracking on host to re-use regions • Supports dynamic object insertion-removal • Exact fit replacement • Efficient for small # of discrete topologies

  38. Object Multi-Topologies OpenCL Memory Collision Detection Produce Constraints Objects: track allocations, object identity Particles: positions, forces, mappings Sticks: rest length, mappings Collision Detection: filter self collisions Average Constraints Produce Constraints Host Memory StickAccum: calculation results particle x stick OpenCL Host: Hole tracking for Particles, Sticks, StickAccum

  39. Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Efficient OpenCL Communication

  40. OpenCL Communication • Most communication has been eliminated • All operations performed in CL • Updating multiple buffers req’d for insertion • Supports distributed & incremental systems • Object insertion requires small blits to 9 buffers • Multiple insertions will be non-contiguous • Extremely slow when CL is mapped to GPU devices!

  41. Transfer Optimizations • Must consolidate multi-buffer writes • One buffer contains data & destination metadata • A single, contiguous transfer to CL device • Host-directed Transfers • Multiple asynchronous clEnqueueCopyBuffer() • Kernel-directed Transfers • Kernel execution performs all transfers on card

  42. Transfer Optimizations Both methods much preferable to naïve buffer updates!

  43. Transfer Optimizations Kernel-driven buffer updates 40% faster in test case

  44. OpenCL Communication • Future Kernel-Driven Optimizations • Better load balancing • Better control of transfer size for each entry • Lower space overhead • Reduce meta-data overhead • Reduces time to transfer single buffer of updates • Ordering entries by destination buffer is a good start

  45. Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Parallel OpenCL Broad Phase Collision Detection

  46. OpenCL Broad Phase • Lack of OpenCL BP is now largest overhead • Communication, Nonparallel execution • 30-50% of execution time • Learning from well-engineered examples • nVidia: OpenCL Particle Collision Simulation • Hash grid: High Performance, Feature Poor

  47. OpenCL Broad Phase • Grid Limitation: Cell based on object size • Scalable City uses vastly different object sizes • House pieces, cyclones, lot entities, gravity fields • Not an “incremental algorithm” • Almost impossible in OpenCL: give up • Some acceleration from temporal coherence? • Sorting strategy

  48. Parallel Sweep & Prune • Utilizes intervals for some size variation • Sort & implicitly subdivide along one axis • Sort partial buffer along another axis • Parallel second pass detects overlaps

  49. Space Filling Curve Sort • Modified Morton numbers provide • Spatial locality order in one sort pass • Conservative interval calculation eliminates false negatives

  50. Hash Grid with Queries • Majority of objects have similar size • House pieces: mapped to grid for n-body • Medium size objects queried against range of grid cells in separate kernel • Lots, cyclones collide with house pieces, but not with each other • Unit of parallelism improved to Object/Cell pair • Gravity fields done in different subsystem

More Related