1 / 35

Tuning a Runtime for Both Productivity and Performance

This article discusses the tuning of runtime for both productivity and performance, focusing on startup time and throughput. It also explores the use of ahead-of-time generation (CrossGen) and tiered compilation. Code examples are provided.

kareng
Download Presentation

Tuning a Runtime for Both Productivity and Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tuning a Runtime for Both Productivity and Performance Mei-Chin Tsai Sergiy Kuryata

  2. What is a runtime?

  3. Windows/X86 Linux/X86 Translate Windows/X64 Linux/X64 MetaData/IL Windows/ARM Linux/ARM Linux/ARM64 Windows/ARM64

  4. 1. Tuning startup and throughput 2. Startup time case study 3. Takeaways

  5. Services of runtime to execute code • TypeSystem • Object layout and vtable layout • type casting (correctness) • Just-in-Time compiler (JIT) • Convert IL to native code • Garbage Collector (GC) • Cleaning up managed heap when needed class MyBase { public intbaseField; //……. } class MyClass : MyBase { public intmyField; // …. public virtual intmyFunc() { int result = myField + baseField; return result; } }

  6. Question -How many methods are JITed to run this HellowWorld Web API application? publicclassValuesController{ [HttpGet("/")] publicstring Hello() => "Hello World!"; [HttpGet("/api/values")] publicIEnumerable<string> Get() { returnnewstring[] { "value1", "value2" }; } [HttpGet("/api/values/{id}")] publicstring Get(int id) { return"Your value is " + id; } } https://github.com/dotnet/corert/tree/master/samples/WebApi

  7. Simple HelloWorld WebApi sample

  8. We have a problem hereMeasure.. Measure.. Measure.. • HelloWorld WebApi takes 1.38s to run… • We are asked to JITover 4000 methods JIT Type System JITEEInterface

  9. Calling engineer in action

  10. Precompile on targeted device • Execution and compilation environment is always matched • NGEN • Cache the JIT and TypeSystem result at deploy time • Remove majority of JIT from program execution • Program execution just running compiled code

  11. Simple HelloWorld WebApi sample Start up is now 0.48 second. Not bad! We are done!

  12. Fragility • JIT / TypeSystem output depends on • Layout of code in the application and framework • Data structures within the CLR • This is fragile and causes precompiled images to be invalidated • .NET Framework is serviced via Windows Update • Application dependencies update • Performance can change after deployment

  13. Engineer was happy for a while!

  14. The world changes on you….. • Devices where battery life matters • Build once and deploy on millions of servers • Sorry… but we don’t trust you. • Security – those executable on disk should be signed • No admin service allowed on servers, locked down devices, or Linux

  15. Compile once at build lab • Need to deal with mismatch between compilation and execution • Scale back of caching – don’t layout types till execution • Scale back code optimization such as inlining, de-virtualization • CrossGen.exe • Generate less performing code • With version resilience and copy-deployable

  16. C#:     public static void GenDoTest<T>(GenBaseClass<T> o, string exp)     { Debugger.Break();         string res = o.ToString();         if(exp != res) throw new Exception();     } Crossgencodegen: push    rdi push    rsi push    rbx sub     rsp,30h mov     qword ptr [rsp+28h],rcx mov     rsi,rcx mov     rdi,rdx mov     rbx,r8 call    qword ptr [MyRepro_ni+0x1178 (00007ffd`36421178)] mov     rcx,rsi call    qword ptr [MyRepro_ni+0x1038 (00007ffd`36421038)] mov     rcx,rdi mov     r11,rax cmpdwordptr [rcx],ecx call    qword ptr [rax] => goes through VSD stub (5 more instructions before hitting target) mov     rdx,rax mov     rcx,rbx call    qword ptr [MyRepro_ni+0x1180 (00007ffd`36421180)] test    al,al jne     00007ffd`36424216 add     rsp,30h pop     rbx pop     rsi pop     rdi ret NGEN codegen: push    rdi push    rsi sub     rsp,28h mov     rsi,rdx mov     rdi,r8 call    CLRStub[ExternalMethodThunk]@7ffd363160c0 (00007ffd`363160c0) mov     rcx,rsi mov     rax,qwordptr [rsi] mov     rax,qwordptr [rax+40h] call    qword ptr [rax+20h] => goes directly to target mov     rdx,rax mov     rcx,rdi call    CLRStub[ExternalMethodThunk]@7ffd363160c8 (00007ffd`363160c8) test    eax,eax je      00007ffd`3631ca60 add     rsp,28h pop     rsi pop     rdi ret

  17. Simple HelloWorld WebApi sample https://github.com/dotnet/corert/tree/master/samples/WebApi

  18. ??? Performance team Engineer

  19. How about throughput? Number collected using .NET Core 2.1 on a machine with 8 cores (Xeon Core i7 2GHz) and 32GB of RAM Oops! We push the problem to elsewhere.

  20. Code generation technology choices • Ahead-of-time generation (CrossGen) • JIT • Minimum optimizations • Full optimizations • Interpreter (not a supported option)

  21. Tiered Compilation • Generate code multiple times for a single method • Method bodies have a versioning story • Generate with minimum optimizations at startup • Replace with higher optimized code at steady state • Use CrossGen to avoid generation at all for most methods Min Opt JIT CrossGen IL Code IL Code Full Opt JIT Full Opt JIT

  22. Heuristic of the tiering • Steady state vs. startup is a gray area • How to determine hot methods? • Hit count to trigger fully optimized JIT • Or use sample profiling to trigger fully optimized JIT • Other potential future heuristic • Presently using hit count of “30”

  23. Measure again

  24. Recap on our codegen journey More capability in runtime for code optimization Pure JIT CrossGen Tier JITting NGEN

  25. Coming next… • Better support for Docker containers • Configurable Runtime for constrained environments • Reduced size of the Framework libraries • Startup optimizations to support the new UI stack • Improved code quality of the precompile code • Publicly available AOT compiler

  26. 2. Startup time case study

  27. Is AOT right for your application? • Main purpose – to improve startup time • Pre-compile code to reduce the time spent JIT’ing • PerfView is your friend • https://github.com/Microsoft/perfview • Case study: NuGet Package Explorer • https://github.com/NuGetPackageExplorer/NuGetPackageExplorer • Kudos to Oren Novotny for porting it to .NET Core 3.0

  28. Collecting Data with PerfView

  29. Analyzing JIT Data with PerfView

  30. JIT Data for AOT compiled application

  31. 3. Takeaways

  32. Performance • Performance is hard. It is on-going work • No silver bullet • Be data driven • Design for performance • Tune for performance • It may require many small work Understand your requirement ahead of time Monitor and revalidate

  33. Developer Division and .NET Runtime team are hiring! • Talk to us if you enjoy system programming, runtimes, and compilers! • We are a globally distributed team. 1/3 of Sergiy’s team works from Europe. 

  34. Questions? GitHub.com/dotnet/CoreCLR Mei-Chin Tsai meichint@microsoft.com Github.com/MeiChin-Tsai Sergiy Kuryata sergeyk@microsoft.com Github.com/sergiy-k GitHub.com/dotnet/Roslyn GitHub.com/dotnet/CoreRT

  35. Links • https://blogs.msdn.microsoft.com/dotnet/2019/01/29/announcing-net-core-3-preview-2/ • https://blogs.msdn.microsoft.com/dotnet/2018/08/20/bing-com-runs-on-net-core-2-1 • https://blogs.msdn.microsoft.com/dotnet/2018/04/18/performance-improvements-in-net-core-2-1/ • https://blogs.msdn.microsoft.com/dotnet/2018/08/02/tiered-compilation-preview-in-net-core-2-1/

More Related