1 / 38

The power of C++ Project Austin app

The power of C++ Project Austin app. Ale Contenti Visual C++ | Principal Dev Manager 4-001. Diving deep into project Austin. What’s Austin Why we built it C++ at work Go build amazing apps!. Austin. Austin is a digital note-taking app for Windows 8

jamal
Download Presentation

The power of C++ Project Austin app

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The power of C++ Project Austin app Ale Contenti Visual C++ | Principal Dev Manager 4-001

  2. Diving deep into project Austin • What’s Austin • Why we built it • C++ at work • Go build amazing apps!

  3. Austin • Austin is a digital note-taking app for Windows 8 • You can add pages to your notebook, delete them, or move them around • You can use digital ink to write or draw things on those pages • You can add photos from your computer, from SkyDrive, or directly from your computer's camera • You can share the notes you create to other Windows 8 apps such as Email or SkyDrive • Beautiful and simple

  4. Austin: just a pen and a piece of paper

  5. Austin: why we built it • We used Visual C++ 2012 to build an amazing app: • Written in “modern C++” • DirectX, XAML for UI • C++/CX to interact with WinRT • Auto-vectorizer for faster ink smoothing • C++ AMP for faster page curling • …and it was fun  (the code is available on codeplex, too) • Showcase the power of Windows 8, the native platform and C++

  6. Modern C++DirectX and XAML UIC++/CX layer

  7. Modern C++ • We strived to write Austin in a “modern” way: • C++ Standard Library, augmented with PPL and Boost • Smart pointers instead of raw pointers • Pervasive RAII pattern • Handle errors using C++ exceptions • Coding conventions inspired by Boost • No bare pointers, no delete

  8. DirectX and XAML • DirectX to create an immersive, fluid user interface,that's built as a 3D scene with lights, shadows, and a camera • On the DirectX render target, we draw notebook's pages, photos, ink strokes, and background • A 3D engine library abstracts some of the DirectX complexity • DirectX for a fast, fluid, real-to-life experience XAML UI is used for the settings menu, the app bar, and the rest of the user interface The SwapChainBackgroundPanel to host the 3D scene inside the XAML UI page

  9. C++/CX • C++/CX is used at the “boundary”, to interact with Windows, (via the WinRT objects) and to leverage XAML UI • Used for loading and saving images, file picker, camera, storage files and folders (SkyDrive, etc.), implementing the “share” contract • Very useful for XAML UI: UI elements and events hook-ups • We were careful in not having C++/CX code “bleed” too much in our Standard C++ code (15 files out of 350) • Windows is the RunTime

  10. Ink smoothing and auto-vectorizer

  11. Ink smoothing: the problem • We have in the order of 5ms or less to smooth the strokes In real time, please… 

  12. Ink smoothing: the code The C++ compiler is obsessed with optimization: In this case, it will auto-vectorize the loop • for (int j=0; j<numPoints; j++) • { • float t = (float)j/(float)(numPoints-1); • smoothedPressure[j] = (1-t)*p2p + t*p3p; • smoothedPoints_X[j] = (2*t*t*t - 3*t*t + 1) * p2x • + (-2*t*t*t + 3*t*t) * p3x • + (t*t*t - 2*t*t + t) * L*(p3x-p1x) • + (t*t*t - t*t) * L*(p4x-p2x); • smoothedPoints_Y[j] = (2*t*t*t - 3*t*t + 1) * p2y • + (-2*t*t*t + 3*t*t) * p3y • + (t*t*t - 2*t*t + t) * L*(p3y-p1y) • + (t*t*t - t*t) * L*(p4y-p2y); • }

  13. Auto-vectorizer(super simplified view) for (i = 0; i < 1000; i++) { C[i] = A[i]+B[i] } for (i = 0; i < 1000; i+=4) { C[i:i+3] = A[i:i+3]+B[i:i+3] } “addps xmm1, xmm0 “ xmm0 + xmm1 xmm1

  14. Auto-vectorizer: info from the compiler When does the auto-vectorizer kick in? On the command line: /Qvec-report:1 will report the vectorized loops /Qvec-report2 will report both vectorizedand non-vectorized loops, and the reason why some loops were not vectorized Refer to the Vectorizer and ParallelizerMessages in MSDN • ink_renderer.cpp(1092) : info C5001: loop vectorized From the build output, with /Qvec-report1:

  15. Auto-vectorizer: it’s not always easy • #include <vector> • void test1() • { • std::vector<int> a(100000), b(10000), c(10000); • for (int i = 0; i < a.size(); ++i) • { • a[i] = b[i] + c[i]; • } • } info C5002: loop not vectorized due to reason ‘501’

  16. Auto-vectorizer: it’s not always easy • #include <vector> • void test1() • { • std::vector<int> a(100000), b(10000), c(10000); • for (int i = 0; i < a.size(); ++i) • { • a[i] = b[i] + c[i]; • } • }

  17. Auto-vectorizer: it’s not always easy • #include <vector> • void test1() • { • std::vector<int> a(100000), b(10000), c(10000); • for (int i = 0, int iMax = a.size(); i < iMax; ++i) • { • a[i] = b[i] + c[i]; • } • } info C5001: loop vectorized

  18. Auto-vectorizer at work in Austin • The compiler will analyze the loop and emit the right code • For the ink-smoothing algorithm, we got a 30% speed-up • For the first part of the page curling algorithm, we got a 175% speed-up • Auto-vectorizer can analyze very complex loops • Always measure with a profiler to understand which loops you need to speed up • Leveragethe Vectorizer and ParallelizerMessages guide for help

  19. Page curling and C++ AMP

  20. Page curling: calculating normals Lots of triangles: we have less than 15ms to “turn a page” in real time; we need to parallelize this algorithm • // pseudo-code • for each triangle{ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position; • Normal triangleNormal = cross(vertex2Pos – vertex1Pos, vertex3Pos – vertex1Pos); • triangleNormal.normalize(); • vertex1.normal += triangleNormal; vertex2.normal += triangleNormal; vertex3.normal += triangleNormal;} C++ AMP is a good candidate, since the data size is pretty large

  21. Page curling: calculating normals We’re looping over each triangle This set of operations is safe, because it works on a single triangle at each time, no races • // pseudo-code • for each triangle{ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position; • Normal triangleNormal = cross(vertex2Pos – vertex1Pos, vertex3Pos – vertex1Pos); • triangleNormal.normalize(); • vertex1.normal += triangleNormal; vertex2.normal += triangleNormal; vertex3.normal += triangleNormal;} But here we’re updating vertexes which are shared between triangles -> race! This algorithm only works on a single thread

  22. Page curling: split the loop to make it parallelizable for each triangle for each triangle Calculate triangle normals Calculate triangle normals cache triangle normals Calculate vertex normals for each vertex Calculate vertex normals

  23. First, loop for each triangle… We use C++ AMP • c::array<b::float32, 2> tempTriangleNormals(3, (int)triangleCount()); • parallel_for_each(extent<1>(triangleCount), [=](index<1> idx) restrict(amp){ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position; • Normal triangleNormal = cross(vertex2Pos – vertex1Pos, vertex3Pos – vertex1Pos); • triangleNormal.normalize(); • tempTriangleNormals[idx] = triangleNormal; • }); Same as before, we calculate the normals for each triangle We collect the normals into a temporary array, which stay in GPU memory

  24. …then, loop for each vertex • parallel_for_each( extent<2>(vertexCountY, vertexCountX), [=](index<2> idx) restrict(amp){ Normal vertexNormal = vertexNormalView(idx); • // go find the normals from nearby trianglesvertexNormal+= sumTriangleNormals(idx); • vertexNormal.normalize(); • vertexNormalView(idx) = vertexNormal;}); We go over each vertex, so no races In sumTriangleNormals, we fetch the normals from tempTriangleNormals, i.e., the temporary we kept on the GPU memory

  25. Page curling: C++ AMP at work • Massive Parallelism with GPU and WARP • Running this algorithm on the GPU yields between 3x and 7x speed-ups • CPU is now free to execute other code • Even when DirectX 11 capable GPU hardware isnot present, C++ AMP willfallback to WARP, whichleverages multi-core and SSE2

  26. Key takeaways

  27. Key takeaways • Use modern C++: RAII, r-value references, lambdas, const, Standard C++ Libraries, Boost, other 3rd party libraries, etc. • DirectX for fast and powerful graphics • XAML UI for standard UI elements • C++/CX to talk to Windows, to other components and to other languages (e.g., JS) • Auto-vectorizer and PPL to distribute work on the CPU • C++ AMP to leverage the GPU massively parallel compute power C++ Rocks! Go write great apps!! 

  28. Resources

  29. Related Sessions • Tue/5:45/B92 OdysseyConnecting C++ Apps to the Cloud via Casablanca • Wed/11:15/B92 OdysseyIt’s all about performance: Using Visual C++ 2012 to make the best use of your hardware • Wed/1:45/B92 StingerDirectX Graphics Development with Visual Studio 2012

  30. Related Sessions • Wed/5:15/B33 CascadeDiving deep into C++ /CX and WinRT • Thu/5:15/B92 Nexus/NormandyBuilding a Windows Store app using XAML and C++  - Photo app, the hiloproject • Fri/12:45/B33 McKinleyThe Future of C++

  31. Resources • vcblog • Project Austin Part 1 of 6: Introduction • Project Austin on CodePlex • Auto-Vectorizer in Visual Studio 2012 • C++ AMP in a nutshell • Parallel Patterns Library (PPL) • alecont@microsoft.com Please submit session evals on the Build Windows 8 App or at http://aka.ms/BuildSessions

  32. Participate in Design Research Experience development tools and features early in their design and development Influence future design decisions MICROSOFT DEVELOPER DIVISION DESIGN RESEARCH FILL IT ONLINE AT http://bit.ly/x6dtHt ENROLL TODAY!

  33. Appendix

  34. Ink smoothing: the math • Line must be contiguous, as well as first and second derivatives • We approximate with the “cardinal” spline solution • With auto-vectorizer, we get a nice 30% speed-up

  35. Page curling: how do we turn the page • Brilliant paper by Hong et. al., Turning Pages of 3D Electronic Books • Turning a page of a physical book can be simulated as deforming a page around a cone • Each “page” in Austin is made of a bunch of triangles • In C++, we apply the page turning algorithm to all triangles • The auto-vectorizer comes to rescue again with a sweet 1.7x speed-up 

  36. Page curling: vertex normals and shading • Vertex normals are typically calculated as the normalized average of the surface normals of all triangles containing the vertex • Using this approach, computing the vertex normals on the CPU simply involves iterating over all triangles depicting the page surface and accumulating the triangle normals in the normalsof the respective vertices • To me, the above screams “massive parallel” 

  37. Page curling: C++ AMP • // first calculate the triangle normalsc::array<b::float32, 2> triangleNormals(3, (int)triangleCount()); • c::parallel_for_each(c::extent<1>(triangleCount()), [=, &triangleNormals](c::index<1> idx) restrict(amp){ • b::float32 v1PosX = vertexPositionArray(0, indexArray(2, idx[0])[0]); b::float32 v1PosY = vertexPositionArray(1, indexArray(2, idx[0])[0]); b::float32 v1PosZ = vertexPositionArray(2, indexArray(2, idx[0])[0]); • b::float32 v2PosX = vertexPositionArray(0, indexArray(1, idx[0])[0]); b::float32 v2PosY = vertexPositionArray(1, indexArray(1, idx[0])[0]); b::float32 v2PosZ = vertexPositionArray(2, indexArray(1, idx[0])[0]); • b::float32 v3PosX = vertexPositionArray(0, indexArray(0, idx[0])[0]); b::float32 v3PosY = vertexPositionArray(1, indexArray(0, idx[0])[0]); b::float32 v3PosZ = vertexPositionArray(2, indexArray(0, idx[0])[0]); • b::float32 x1 = v2PosX - v1PosX; b::float32 y1 = v2PosY - v1PosY; b::float32 z1 = v2PosZ - v1PosZ; • b::float32 x2 = v3PosX - v1PosX; b::float32 y2 = v3PosY - v1PosY; b::float32 z2 = v3PosZ - v1PosZ; • // cross them b::float32 x3 = y1 * z2 - z1 * y2;b::float32 y3 = z1 * x2 - x1 * z2;b::float32 z3 = x1 * y2 - y1 * x2; • NORMALIZE(x3, y3, z3); • triangleNormals(0, idx[0]) = x3; triangleNormals(1, idx[0]) = y3; triangleNormals(2, idx[0]) = z3;});

More Related