1 / 68

Bandwidth Reduction and Latency Tolerance in Real Time Internet Applications

Multi-user Extensible Virtual Worlds. Bandwidth Reduction and Latency Tolerance in Real Time Internet Applications. Fast archiving process Archives cached and shared with multiple clients Reduces overhead for many clients in same city Client-side transform interpolation

kato
Download Presentation

Bandwidth Reduction and Latency Tolerance in Real Time Internet Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-user Extensible Virtual Worlds Bandwidth Reduction and Latency Tolerance in Real Time Internet Applications

  2. Fast archiving process Archives cached and shared with multiple clients Reduces overhead for many clients in same city Client-side transform interpolation Allows reduced transmission frequency Offloaded work to the client Generating deterministic assets during city load Performing mesh animation locally Secure handshake authentication on connect Communication Optimizations

  3. Previous Attempts • Combining common-practice methods • Low-level optimizations • Caching and re-use • Interpolation • Prediction and smooth correction

  4. Allows server to send updates less frequently Maintains a smooth experience Client-side interpolation Positions From Server Drawn to Screen

  5. Primary Goals • Latency tolerance beyond state of the art • Reduce bandwidth beyond state of the art

  6. Previous Attempts • Not good enough for our scale • Massive activity imposes bandwidth and synchronization burden beyond the norm • Need server to be further “forward in time”for higher latency tolerance • Need to find a way to lower bandwidth further • Simply reduce rate of update frames further?

  7. Original Server Injection Model of Prediction • Interpolation smooths movement until server stalls • Client interpolates based upon expected arrival time • If server increases lag, there is nothing to interpolate to! • Inject copy of Server functionality into Client • Performs same work on subset of data for prediction • Server state may differ from prediction • Client interpolates what user sees during correction Server Client Server

  8. Failure of Simple Injection Model • Small differences produce large changes • Collision events, House construct selection • Some states have large prediction failure consequences • Corrections become as dramatic as having allowed video to “stutter” • Does not allow us to tolerate severe internet latency in practice

  9. Accomplished Work • Established persistent server on the IBM z10 enterprise server running 24/7 • Lowered power consumption during low utilization • Characterized latency across typical wired and wireless networks

  10. Latency Characteristics • Large latency due to software pipelines • Exacerbated by present interpolation system • Intermittent network latency can triple shown value • Some data gathered from Internet Weather Map project

  11. Accomplished Work • Tested server-in-client injection model for latency tolerance • Results indicate infeasible approach to increase time lag between server and client • Designed new synchronization architecture to accomplish both latency tolerance and bandwidth reduction • Presently being implemented

  12. New Synchronization Architecture • Synchronization is a common problem • All multiplayer systems • Application-specific issues dominate problem • Generalized solution for wider applicability • Can be applied to many client-server systems • Unified design addresses both bandwidth and latency problems in real-time distributed applications

  13. New Synchronization Architecture • States synchronized as separate streams • Server at different virtual time in all streams! • Each stream can be at different clock • Client performs prediction and correction only for streams the clock has overrun • Partial-knowledge prediction • Each stream can be abstracted differently • Values, predictor inputs for values, etc.

  14. New Synchronization Architecture • Decouples problems of each kind of state • Each stream poses different trade-offs • Taxonomy of properties aids application • Exposes necessary model modifications

  15. New Synchronization Architecture • State properties • Predictability (level of determinism) • How can it be computed locally? • Computational dependencies • What other states are required to compute locally? • Error magnification effect • How do errors in my computational dependencies magnify errors in locally computed state?

  16. New Synchronization Architecture • Events given in terms of a virtual clock • Precise clock synchronization is impossible • Prevents errors from propagating forward • Virtual clock time is adjustable • Accuracy implies all clients view environment accurately at some (relatively close) point in time

  17. Example: Object Transforms • Moving objects affected by forces • Assume no other effects for now • Physical processes accurately predictable • Architecture & executable code may differ • Float point drift must still be compensated for • Intermittent object transform updates required • Vastly improves latency & bandwidth • Only a potential due to real dependencies

  18. Complication: Collisions • Also predictable, but errors magnify • Dissimilar collision computations can drastically affect future object transforms • Server notifies clients: collisions & misses • Clients are notified of “close calls” that miss! • Server exists forward in time from clients • Example of predictable computational dependency with error magnification

  19. Animation: Buildings select and grab objects • Objects “pulled into place” in structure • Current selection criteria based upon proximity to building • Coupled to transforms: circular dependency! • Initially developed for selection efficiency • Not imperative, can be modified • Errors have largest magnification yet! • Objects change subsystems, alter paths

  20. Animation: Buildings select and grab objects • Push selection further forward in time • Server notifies clients of selections • Reduces client need to predict selection • Selection criteria modified • Allows server to compute future more easily • Example of untenable error magnification • Simulation model altered to accommodate!

  21. Last Example:Player Location • Player cyclone applies forces to objects • Creates dependency with object transforms • Problem: Clients are not “forward in time” • Player location not well predictable • Solution: loosely couple wind forces from player visual representation • Make physics manifestation more predictable

  22. Last Example:Player Location • Common position predictor function utilized by server and all clients • Deterministic function defines position of physical manifestation • Function uses intermittent player state as input • Periodic client player velocity update enough to maintain

  23. Last Example:Player Location • Still relatively low-latency: prediction failure results in compensation • Compensation should be soft (low error magnification) • Example of modified simulation with low error magnification • Reduces player position traffic by abstracting to velocity functions

  24. Synchronization Examples

  25. Where Are We On This? • Design complete but undergoing iteration • Implementation underway • No drift on test using Verlet integration kernel • Most work yet to be done • Generalizations being developed • Applies to real-time internet applications

  26. Summary • Developing revolutionary methods for synchronization in internet applications to: • Reduce bandwidth requirements • Increasing latency tolerance • Involves independent state streams that: • Are synchronized and predicted with different methods according to their properties • Expose necessary model modifications

  27. Multi-user Extensible Virtual Worlds End of Communication Talk Questions?

  28. Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Combining Incremental and Parallel Methods for Large-scale Physics Simulation

  29. Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Review of work to date

  30. ScalableEngine • Built to handle large VR environments efficiently (massive object count, low activity) • Only physics system capable of handling Scalable City in real time • Overhead proportional to level of activity rather than environment scale or object count • Novel broad phase1 and physics pipeline2 methods published • Efficient Large-Scale Sweep and Prune Methods with AABB Insertion and Removal. IEEE-VR 2009 • Accelerating Physics in Large, Continuous Virtual Environments. Concurrency and Computation: Practice and Experience, 24(2):125134, 2012

  31. ScalableEngine:Broad Phase CD level of activity Lower asymptotic complexity: order of magnitude performance improvement!

  32. ScalableEngine:Full Physics Pipeline Note: only a constant number of bodies undergoing active physics computation. Excluding unnecessary work: Again lower asymptotic complexity

  33. ScalableEngine:Multi-user System • Scalable City developed into massively multi-user client-server system. • Player count increases activity level • For multi-user, other factors matter as well • Computational efficiency • Parallelism

  34. ScalableEngine • Best engine at handling large environments • Heavy computation similar to other software • As activity increases, advantage matters less • Regions with high activity see less benefit! • Parallelized ScalableEngine by multi-threading all aspects of computation • Improved performance, but not enough for massively-multi-player. • Traditional physics does not parallelize well.

  35. ScalableEngine:Multithreaded Physics Limited parallelism in traditional physics methods

  36. CLEngine • Developed new physics simulation system from scratch focused on massive parallelism • Based on work of Thomas Jakobsen1. • Design modified for parallel application • OpenCL utilized for portability to various compute devices (CPU, GPU, Accelerators)

  37. CLEngine:Core • Object representation broken down to particles and stick constraints • Rigid body volume behavior is emergent • All constraints independently solvable • Very fine-grained, highly parallel core

  38. CLEngine:Host Interface • OpenCL weakness: expensive communication on dedicated GPUs • Designed to reduce communication by • Keeping many contiguous stages on the card • Accelerate communication with transport kernel • Reduce communication to state deltas Coll. Det. Contact Graph Integration Coll. Det. Integration

  39. CLEngine:Performance • 3-6 times CPU performance for single thread • Higher parallelism acceleration curve • Many optimizations still not done! • GPU targetable for extreme performance • Optimizations are more critical • Communication, local memory, vector types

  40. CLEngine Prototype Limitations • Designed for an “all active” system • Not state-aware, no incremental processing • Does more total work than current CPU engine • We want both advantages simultaneously! • Multiple ways to achieve this • Challenges imposed by slow communication • Integrating a broad phase solution efficiently • Reporting results usefully and efficiently

  41. Work Finished, Pt 1 • Made CLEngine & Testing Framework portable to OpenCL v1.1 systems generally • Tested on IBM/PowerPC, Ubuntu, Windows, OS X Intel • Built high-level services on CLEngine core • Allows to interface like a traditional physics engine • Ported Scalable City Server to VS2010 for OpenCL • Tools were 7 years old, OpenCL vendors didn’t support • Integrated CLEngine as run-time option for S.C. • Not operational due to incomplete interfacing options

  42. Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games Broad Phase Integration: Where we’re going and why

  43. CLEngine Broad Phase:Options Prev. Discussed • Hash Grid • Query stage for medium size (Lots, Cyclones) • Multi-sort Sweep & Prune • Single solution for small-medium • Better performance for object clustering? • Space-filling curves • Reduce S&P sorts from two to one! • All cases: Host must deal with large objects

  44. Work Finished, Pt 2 • Space-filling curves with S&P implemented • Morton: massive false positives • Hilbert: high false positives & false negatives • Space-filling curves generally too inaccurate • Multi-sort S&P has similar limitations to Grid • Parallel last pass inefficient w/o similar object size • Ideally precisely same object size for symmetry • Traversal must stop based on largest object size • Load balancing also affected by clustering

  45. CLEngine Option 2:Broad Phase on Host • More flexible, generally performant • Handles all object sizes well • Thread parallel, incremental processing • CLEngine sees only relevant object subset • Active objects & objects overlapping them in B.P. • Maintained by communicating deltas • CLEngine core is simpler: process & report all • Focus optimization on communication

  46. Work Finished, Pt 3:Host Broad Phase Design • Designed Host Broad Phase System • Communication manager being implemented • BP and state system used to consolidate delta • BP optimized for thread-parallel high activity • Doubled performance under these conditions • Free-threaded interfaces for all operations

  47. Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games End of Physics Talk Questions?

  48. Diagnosing Unbounded Heap Growth in C++ • Project Description: Improper memory management in software can cause performance to decline and eventual program failure. These problems can be difficult to detect during testing due to the unpredictable amount of time it can take to exhibit overt symptoms, and those symptoms may appear unrelated to memory management. The purpose of this research project is to identify causes of unbounded heap growth in C++ software beyond traditional memory leaks. • Major Accomplishments: • Heuristic perfected to yield low false positives/negatives with continuously improving accuracy over time • Identified memory problems in Google Chrome, WebKit, Ogre3D • Fixed growing data structures in Chrome and Ogre3D

  49. Diagnosing Unbounded Heap Growth in C++ Review from last meeting

  50. Diagnosing Unbounded Heap Growth in C++Motivation • Scalable City needs to run continuously • Many months without intervention/access • Had slow growth of memory • leading to crash after several weeks • Available analysis tools reported no leaks! • Software frees all memory correctly! • Different kind of undetected memory issue

More Related