1 / 27

Visual C++ 2005 New Optimizations

Visual C++ 2005 New Optimizations. Ayman Shoukry Program Manager Visual C++ Microsoft Corporation. How can your application run faster?. Maximize optimization for each file. Whole Program Optimization (WPO) goes beyond individual files.

karly-howe
Download Presentation

Visual C++ 2005 New Optimizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual C++ 2005 New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation

  2. How can your application run faster? • Maximize optimization for each file. • Whole Program Optimization (WPO) goes beyond individual files. • Profile Guided Optimization (PGO) specializes optimizations specifically for your application. • New Floating Point Model. • OpenMP • 64bit Code Generation.

  3. Maximum Optimization for Each File • Compiler optimizes each source code file to get best runtime performance • The only type optimization available in Visual C++ 6 • Visual C++ 2005 has better optimization algorithms • Specialized support for newer processors such as Pentium 4 • Improved speed and better precision of floating point operations • New optimization techniques like loop unrolling

  4. Whole Program Opitmization • Typically Visual C++ will optimize programs by generating code for object files separately • Introducing whole program optimization • First introduced with Visual C++ 2002 and has since improved • Compiler and linker set with new options (/GL and /LTCG) • Compiler has freedom to do additional optimizations • Cross-module inlining • Custom calling conventions • Visual C++ 2005 supports this on all platforms • Whole program optimizations is widely used for Microsoft products.

  5. Profile Guided Optimization • Static analysis leaves many open optimization questions for the compiler, leading to conservative optimizations • Visual C++ programs can be tuned for expected user scenarios by collecting information from running application • Introducing profile guided optimization • Optimizing code by using program in a way how its customer use it • Runs optimizations at link time like whole program optimization • Available in Visual Studio 2005 • Widely adopted in Microsoft Is it common for p to be NULL? If it is not common for p to be NULL, the error code should be collected with other infrequently used code if (p != NULL) { /* Perform action with p */} else { /* Error code */}

  6. PGO: Instrumentation • We instrument with “probes” inserted into the code • Two main types of probes • Value probes • Used to construct histogram of values • Count (simple/entry) probes • Used to count number of times a path is taken • We try to insert the minimum number of probes to get full coverage • Minimizes the cost of instrumentation

  7. PGO Optimizations • Switch expansion • Better inlining decisions • Cold code separation • Virtual call speculation • Partial inlining

  8. Profile Guided Optimization Object files Compilewith /GL & Optimizations On (e.g. /O2) Source Object files Link with /LTCG:PGI Instrumented Image Scenarios Profile data Instrumented Image Output Profile data Link with /LTCG:PGO Optimized Image Object files

  9. a bar baz PGO: Inlining Sample • Profile Guided uses call graph path profiling. foo bat

  10. PGO: Inlining Sample (Cont) • Profile Guided uses call graph path profiling. 10 75 a bar baz 20 50 foo bar baz 100 15 bat bar baz 15

  11. bar baz PGO – Inlining Sample (cont) • Inlining decisions are made at each call site. 10 a 20 125 foo 100 15 bat bar baz 15

  12. Most frequent values are pulled out. if (i == 10) goto default; switch (i) { case 1: … case 2: … case 3: … default:… } PGO – Switch Expansion // 90% of the // time i = 10; switch (i) { case 1: … case 2: … case 3: … default:… }

  13. Defaultlayout Optimized layout A A B B C D D C PGO – Code Separation Basic blocks are ordered so that most frequent path falls through. A 100 10 B C 100 10 D

  14. PGO – Virtual Call Speculation The type of object A in function Func was almost always Foo via the profiles void Func(Base *A) { … while(true) { … if(type(A) == Foo:Base) { // inline of A->call(); } else A->call(); … } } void Bar(Base *A) { … while(true) { … A->call(); … } } class Base{ … virtual void call(); } class Foo:Base{ … void call(); } class Bar:Base { … void call(); }

  15. PGO – Partial Inlining Basic Block 1 Cond Hot Code Cold Code More Code

  16. PGO – Partial Inlining (cont) Basic Block 1 Cond Hot path is inlined, but NOT the cold Hot Code Cold Code More Code

  17. Demo Optimizing applications with VC++ 2005

  18. New Floating Point Model • /Op made your code run slow • No intermediate switch • New Floating Point Model • /fp:fast • /fp:precise (default) • /fp:strict • /fp:except

  19. /fp:precise • The default floating point switch • Performance and Precision • IEEE Conformant • Round to the appropriate precision • At assignments, casts and function calls

  20. /fp:fast • When performance matters most • You know your application does simple floating point operations • What can /fp:fast do? • Association • Distribution • Factoring inverse • Scalar reduction • Copy propagation • And others…

  21. /fp:except • Reliable floating point exceptions • Thrown and not thrown when expected • Faults and traps, when reliable, should occur at the line that causes the exception • FWAITs on x86 might be added • Cannot be used with /fp:fast and in managed code

  22. /fp:strict • The strictest FP option • Turns off contractions • Assumes floating point control word can change or that the user will examine flags • /fp:except is implied • Low double digit percent slowdown versus /fp:fast

  23. What is the output? #include <stdio.h> int main() { double x, y, z; double sum; x = 1e20; y = -1e20; z = 10.0; sum = x + y + z; printf ("sum=%f\n",sum); } /fp:fast /O2 = 0.000 /fp:strict /O2 = 10.0

  24. OpenMP • A specification for writing multithreaded programs • It consists of a set of simple #pragmas and runtime routines • Makes it very easy to parallelize loop-based code • Helps with load balancing, synchronization, etc… • In Visual Studio, only available in C++

  25. 1 ≤ i ≤ 250 251 ≤ i ≤ 500 501 ≤ i ≤ 750 751 ≤ i ≤ 1000 OpenMP Parallelization • Can parallelize loops and straight-line code • Includes synchronization constructs void test(int first, int last) { #pragma omp parallel for for (int i = first; i <= last; ++i) { a[i] = b[i] + c[i]; } } first = 1 last = 1000

  26. 64bit Compiler in VC2005 • 64bit Compiler Cross Tools • Compiler is 32bit but resulting image is 64bit • 64bit Compiler Native Tools • Compiler and resulting image are 64bit binaries. • All previous optimizations apply for 64bit as well.

  27. Resources • Visual C++ Dev Center • http://msdn.microsoft.com/visualc • This is the place to go for all our news and whitepapers • Also VC2005 specific forums at http://forums.microsoft.com • Myself • http://blogs.msdn.com/aymans

More Related