1 / 28

Tips and Tricks: Visual C++ 2005 Optimization Best Practices

Tips and Tricks: Visual C++ 2005 Optimization Best Practices. Kang Su Gatlin TLNL04 Program Manager Visual C++ Microsoft Corporation. 6 Tips/Best Practices To Help Any C++ Dev Write Faster Code. Managed + Unmanaged Pick the right level of optimization Add instant parallelism. Unmanaged

angie
Download Presentation

Tips and Tricks: Visual C++ 2005 Optimization Best Practices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tips and Tricks: Visual C++ 2005 Optimization Best Practices Kang Su Gatlin TLNL04 Program Manager Visual C++ Microsoft Corporation

  2. 6 Tips/Best Practices To Help Any C++ Dev Write Faster Code • Managed + Unmanaged • Pick the right level of optimization • Add instant parallelism • Unmanaged • Disambiguate memory • Use intrinsics • Managed • Avoid double thunks • Speed app startup time

  3. 1. Pick the Right Level Of Optimization • Builds from the Lab • If at all possible use Profile-Guided Optimization • Only available unmanaged • More on this next slide • If not, use Whole Program Optimization (/GL) • Available managed and unmanaged • After that we recommend • /O2 (optimize for speed) for hot functions/files • /O1 (optimize for size) for the rest • Other switches to use for maximum speed • /Gy • /OPT:REF,ICF (good size win on 64bit) • /fp:fast • /arch:SSE2 (will not work on downlevel architectures) • Debug Symbols Are NOT Only for Debug Builds • Executable size and codegen are NOT effected by this • It’s all in the PDB file • Always building debug symbols will make life easier • Make sure you use /OPT:REF,ICF, don’t use /ZI, and use /INCREMENTAL:NO

  4. Next-Gen Optimizations Today Profile Guided Optimization • The next level beyond Whole Program Optimization • Static compilers can’t answer everything • We get 20-50% improvement on large server applications that we ship • Current support is unmanaged only if(a < b) foo(); else baz(); for(i = 0; i < count; ++i) bar(); Should we unroll this loop? Should we inline foo()?

  5. Profile Guided Optimization Object files Compile with /GL Source Object files Link with /LTCG:PGI Instrumented Image + PGD file Scenarios Profile data Instrumented Image Output Profile data Optimized Image Link with /LTCG:PGO Object files There is throughput impact

  6. What PGO Does And Does Not Do • PGO does • Optimizations galore • Speed/Size Determination • Switch expansion • Better inlining decisions • Function/basic block layout • Virtual call speculation • Partial inlining • Optimize within a single image • Merging and weighting of multiple scenarios • PGO does not • No probing assembly language (inline or otherwise) • No optimizations across DLLs • No data layout optimization

  7. PGO Compilation in Visual C++ 2005

  8. 2. Add Instant ParallelismJust add OpenMP Pragmas! • OpenMP is a popular API for multithreaded programs • Born from the HPC community • It consists of a set of simple #pragmas and runtime routines • Most value parallelizing large loops with no loop-dependencies • Visual C++ 2005 implements the full OpenMP 2.5 standard • Full unmanaged and/clr managed support • See the PDC issue of MSDN magazine for an article on OpenMP

  9. OpenMP Parallelization void test(int first, int last) { for (int i = first; i <= last; ++i) { a[i] = b[i] * c[i]; } } Each iteration is independent; order of execution does not matter Assignments to ‘a’, ‘b’, and ‘c’ are independent #pragma omp parallel for #pragma omp parallel sections { #pragma omp section if(x < 0) a = foo(x); else a = x + 5; #pragma omp section b = bat(y); #pragma omp section c = baz(x + y); } j = a+b+c; if(x < 0) a = foo(x); else a = x + 5; b = bat(y); c = baz(x + y); j = a*b+c;

  10. OpenMP Case StudyPanorama Factory by Smoky City Design • Top-rated image stitching application • Added multithreading with OpenMP in Visual C++ 2005 Beta2 • Used 102 instances of #pragma omp * • Extremely impressive Results… • Stitching together several large images • Dual processor, dual core x64 machine

  11. 3. Disambiguate Memory • Programmer knows a and b never overlap ecx = a, eax = b mov edx, DWORD PTR [eax] mov DWORD PTR [ecx], edx mov edx, DWORD PTR [eax+4] mov DWORD PTR [ecx+4], edx mov edx, DWORD PTR [eax] mov DWORD PTR [ecx+8], edx mov edx, DWORD PTR [eax+4] mov DWORD PTR [ecx+12], edx mov edx, DWORD PTR [eax] mov DWORD PTR [ecx+16], edx mov edx, DWORD PTR [eax+4] mov DWORD PTR [ecx+20], edx mov edx, DWORD PTR [eax] mov DWORD PTR [ecx+24], edx mov eax, DWORD PTR [eax+4] mov DWORD PTR [ecx+28], eax void copy8(int * a, int * b) { a[0] = b[0]; a[1] = b[1]; a[2] = b[0]; a[3] = b[1]; a[4] = b[0]; a[5] = b[1]; a[6] = b[0]; a[7] = b[1]; }

  12. Aliasing And Memory Disambiguation • Aliasing is when one object can be used as an alias to another object • If compiler can NOT prove that an object does not alias then it MUST assume it can • How can we address some of these problems? • Avoid taking address of an object. • Avoid taking address of a function. • Avoid using global variables. Statics are preferable. • Use__restrict, __declspec(noalias), and __declspec(restrict)when possible.

  13. __restrict – A compiler hint • Programmer knows a and b don’t overlap eax = a, edx = b void copy8(int * __restrict a, int * b) { a[0] = b[0]; a[1] = b[1]; a[2] = b[0]; a[3] = b[1]; a[4] = b[0]; a[5] = b[1]; a[6] = b[0]; a[7] = b[1]; } mov ecx, DWORD PTR [edx] mov edx, DWORD PTR [edx+4] mov DWORD PTR [eax], ecx mov DWORD PTR [eax+4], edx mov DWORD PTR [eax+8], ecx mov DWORD PTR [eax+12], edx mov DWORD PTR [eax+16], ecx mov DWORD PTR [eax+20], edx mov DWORD PTR [eax+24], ecx mov DWORD PTR [eax+28], edx

  14. __declspec(restrict) • Tells the compiler that the function returns an unalised pointer • Only applicable to functions • This is a promise the programmer makes to the compiler • If this promise is violated the compiler may generate bad code • The CRT uses this decoration, e.g., malloc, calloc, etc… __declspec(restrict) void *malloc(int size);

  15. __declspec(noalias) • Tells the compiler that the function is a semi-pure function • Only references locals, arguments, and first-level indirections of arguments • This is a promise the programmer makes to the compiler • If this promise is violated the compiler may generate bad code __declspec(noalias) void isElement(Tree *t, Element e);

  16. 4. Use Intrinsics • Simply represented as functions to the programmer • _mm_load_pd(double const*); • Compilers understand these as primitives • Allows the user to get right at the hardware w/o using asm • Almost anything you can do in assembly • interlock, memory fences, cache control, SIMD • The key to things such as vectorization and lock-free programming • You can use intrinsics in a file compiled /clr, but the function(s) will be compiled as unmanaged • Intrinsics are consumed by PGO and our optimizer • Inline asm is not • Documentation for intrinsics is much better in Visual C++ 2005 • [Visual Studio 8]\VC\include\intrin.h

  17. Matrix Addition With Intrinsics void MatMatAdd(Matrix &a, Matrix &b, Matrix &c) { for(int i = 0; i < a.m_rows; ++i) for(int j = 0; j < a.m_cols; j++) c[i][j] = a[i][j] + b[i][j]; } #include <intrin.h> void MatMatAddVect(Matrix &a, Matrix &b, Matrix &c) { __m128 aSIMD, bSIMD, cSIMD; for(int i = 0; i < a.m_rows; ++i) for(int j = 0; j < a.m_cols; j+= 4) { aSIMD = _mm_load_ps(&a[i][j]); bSIMD = _mm_load_ps(&b[i][j]); cSIMD= _mm_add_ps(aSIMD, bSIMD); _mm_store_ps(&c[i][j], cSIMD); } }

  18. Spin-Lock With Intrinsics #include <intrin.h> #include <windows.h> void EnterSpinLock(volatile long &lock) { while(_InterlockedCompareExchange(&lock, 1, 0) != 0) Sleep(0); } void ExitSpinLock(volatile long &lock) { lock = 0; }

  19. 5. Avoid Double-Thunks • Thunks are functions used to transition from managed to unmanaged (and vice-versa) Managed Code UnmanagedFunc(); Unmanaged Code UnmanagedFunc() { … } Managed To Unmanaged Thunk Thunks are a part of life… but sometimes we can have Double Thunks…

  20. Double Thunking • From managed to managed only • Indirect calls • Function pointers and virtual functions • Is the callee is managed or unmanaged entry point? • __declspec(dllexport) • No current mechanism to export functions as managed entry points Managed Code ManagedFunc(); Managed Code ManagedFunc() { … } Managed To Unmanaged Thunk Unmanaged To Managed Thunk

  21. How To Fix Double Thunking • Indirect Functions (including Virtual Funcs) • Compile with /clr:pure • Use __clrcall • __declspec(export) • Wrap functions in a managed class, and then #using the object file

  22. Using __clrcall To Improve Performance

  23. 6. Speed App Startup Time • No one likes to wait for an app to start-up • There is still some time associated with loading CLR • In some apps you may have non-CLR paths • Only load the CLR when you need to • Use DelayLoading technology in the linker • If the EXE is compiled /clr then we will always load the CLR

  24. Delay Loading The CLR

  25. Summary Of Best Practices Large and ongoing investment in managed and unmanaged C++ code • Managed + Unmanaged • Use PGO for unmanaged and WPO for managed… • OpenMP can ease multithreaded development. • Unmanaged • Make it easier for the compiler to track pointers. • Intrinsics give the ability to get to the metal. • Managed • Know where your double thunks are and fix. • Delay load the CLR to improve startup.

  26. Resources • Visual C++ Dev Center • http://msdn.microsoft.com/visualc • This is the place to go for all our news and whitepapers • Myself • kanggatl@microsoft.com • http://blogs.msdn.com/kangsu • Must See Talks • TLN309  C++: Future Directions in Language Innovation with Herb Sutter (Friday 10:30am)

  27. © 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

More Related