1 / 15

Async Workgroup Update

Async Workgroup Update. Barthold Lichtenbelt. Goals. Provide synchronization framework for OpenGL Provide base functionality as defined in NV_fence and GL2_async_core Build a framework for future, more complex, functionality, some of which discussed in GL2_async_core

percy
Download Presentation

Async Workgroup Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Async Workgroup Update Barthold Lichtenbelt

  2. Goals • Provide synchronization framework for OpenGL • Provide base functionality as defined in NV_fence and GL2_async_core • Build a framework for future, more complex, functionality, some of which discussed in GL2_async_core • Initially support CPU <-> GPU synchronization • Support synchronization across multiple OpenGL contexts • Resulted in GL_ARB_sync spec • Finished April 2006 • Posted draft to opengl.org for feedback • Not quite official ARB extension yet

  3. Functionality overview • ARB_sync provides synchronization primitives • Can be tested, set and waited upon • Specifically, a “Fence Synchronization Object” and corresponding Fence command • Fence completion allows for partial glFinish • All commands prior to the fence are forced to complete before control is returned to caller • Fence Sync Objects can be shared across contexts • Allows for synchronization of OpenGL command streams across contexts • New data type: GLtime represents intervals in nanoseconds • 64 bit integer, same encoding as UST counter in OpenML • Accuracy implementation dependent, precision in nanoseconds If you have used the Windows Event model, this will feel familiar

  4. Synchronization model in ARB_sync 1/2 • A “sync object” is a primitive used for synchronization between CPU and GPU, CPU, or ‘something else’. • Sync object has state: type, condition, status • A sync object’s status can be signaled or non-signaled • when created status is signaled unless a flag is set in which case it is non-signaled • A “fence sync object” is a specific type of sync object • Provides partial finish semantics • Only type of sync object currently defined • A “fence” is a token inserted in the GL command stream • A sync object is not inserted into the command stream • Fence has no state • A fence is associated with a fence sync object. • Multiple fences can be associated with the same sync object • When a fence is inserted in the command stream, the status of its sync object is set to non-signaled • A fence, once completed, will set the status of its sync object to signaled

  5. Synchronization model in ARB_sync 2/2 • A wait function waits on a sync object, not on a fence • A poll function polls a sync object, not a fence • A wait function called on a sync object in the non-signaled state will block. It unblocks when the sync object transitions to the signaled state.

  6. Context A Sync_objectA = glCreateSync(attrib); <render to texture that context B needs> glFence(sync_objectA); glFlush(); // prevent deadlock Context B glClientWaitSync(sync_objectA,0,GL_FOREVER); glBindTexture(….); // Just rendered <render using texture> Example – RTT with two contexts

  7. OS specific functionality • Convert sync object to the window system native event primitive • Allows applications to synchronize all events in a system using one API • All operations on <sync> are reflected in OS event and vice-versa • Both <sync> and the OS event are valid to use in your code • On windows, convert to an Event HANDLE wglConvertSyncToEvent(object sync); • Need to specify, when sync object is created, that it can be converted to OS event • Separate extension: WGL_ARB_sync_event • On Unix, convert to a file-descriptor, x-event or semaphore? • Still TBD

  8. Possible future functionality • Add a WaitForMultipleSync(uint *sync_objects, ….) command • Synchronize with multiple sync objects at once • Add a “payload” to a fence • For example, the time it completed • Allow one GPU stream to wait for another GPU stream • WaitSync(sync_object); • A sync object whose status will pulse with every vblank • A sync object that can signal when data binding has completed • As opposed to when rendering has completed using the data

  9. Example – Streaming video processing • Loop Draw frame 1 // To a FBO, for example glFence(sync_object1);// inserts a fence in the command stream Draw frame 2 glFence(sync_object2); while (glClientWaitSync(sync_object1,0,0)!=GL_ALREADY_SIGNALED) <Do some useful work> // App uses CPU cycles instead of blocking Read back data in frame 1 while (glClientWaitSync(sync_object1,0,0)!=GL_ALREADY_SIGNALED) <Do some useful work> // App uses CPU cycles instead of blocking Read back data in frame 2

  10. Variation with asynchronous read back • Loop Draw frame 1 // To a FBO, for example Read back frame 1 into PBO 1 // Asynchronous readback glFence(sync_object1);// Inserts a fence in the command stream Draw frame 2 Read back frame 2 into PBO 2 glFence(sync_object2); while (glClientWaitSync(sync_object1,0,0)!=GL_ALREADY_SIGNALED) <Do some useful work> // App uses CPU cycles instead of blocking glMapBuffer(…); // Access the data of frame 1 in PBO 1 while (glClientWaitSync(sync_object1,0,0)!=GL_ALREADY_SIGNALED) <Do some useful work> // App uses CPU cycles instead of blocking glMapBuffer(…); // Access the data of frame 2 in PBO 2

  11. Differences with GL_NV_Fence • No separation of sync objects and fences in NV_Fence • NV version only has fence objects • Fence object has state • Creation of sync object and inserting a fence in one command • SetFenceNV creates and inserts a fence (old object model) • NV Fence objects not shared across contexts

  12. API Overview 1/2 • Create a sync attribute object object CreateSyncAttrib(); • SYNC_TYPE has to be FENCE • SYNC_CONDITION has to be SYNC_PRIOR_COMMANDS_COMPLETE • SYNC_STATUS SIGNALED or UNSIGNALED • Create the sync object object CreateSync(object attrib); • Insert a fence, associated with a sync object, into command stream void Fence(object sync);

  13. API Overview 2/2 • Wait or test the status of a fence sync object enum ClientWaitSync(object sync, uint flags, time timeout); • Blocks until sync is signalled or timeout expired • If timeout == 0, does not block, returns the status of sync • If timeout == FOREVER, call does not timeout • Optionally will flush before blocking • Returns 3 values: ALREADY_SIGNALED, TIMEOUT_EXPIRED, CONDITION_SATISFIED • Signal or unsignal a sync object void SignalSync(object sync, enum mode); • If status transitions from unsignaled to signaled, ClientWaitSync will unblock

  14. Example – Streaming video processing • Loop Draw frame 1 // To a FBO, for example glFence(sync_object1);// inserts a fence in the command stream Draw frame 2 glFence(sync_object2); while (glClientWaitSync(sync_object1,0,0)!=GL_ALREADY_SIGNALED) <Do some useful work> // App uses CPU cycles instead of blocking Read back data in frame 1 while (glClientWaitSync(sync_object1,0,0)!=GL_ALREADY_SIGNALED) <Do some useful work> // App uses CPU cycles instead of blocking Read back data in frame 2

  15. Variation with asynchronous read back • Loop Draw frame 1 // To a FBO, for example Read back frame 1 into PBO 1 // Asynchronous readback glFence(sync_object1);// Inserts a fence in the command stream Draw frame 2 Read back frame 2 into PBO 2 glFence(sync_object2); while (glClientWaitSync(sync_object1,0,0)!=GL_ALREADY_SIGNALED) <Do some useful work> // App uses CPU cycles instead of blocking glMapBuffer(…); // Access the data of frame 1 in PBO 1 while (glClientWaitSync(sync_object1,0,0)!=GL_ALREADY_SIGNALED) <Do some useful work> // App uses CPU cycles instead of blocking glMapBuffer(…); // Access the data of frame 2 in PBO 2

More Related