1 / 201

Performance by Design using the .NET Framework

Performance by Design using the .NET Framework. Mark Friedman Rico Mariani Vance Williams Architect Developer Division Microsoft Corporation. Agenda. Teaching Performance Culture – RicoM Managed Code – RicoM CPU Optimization – VanceM Memory – VanceM Threading – VanceM

evan
Download Presentation

Performance by Design using the .NET Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance by Design using the .NET Framework Mark Friedman Rico Mariani Vance Williams Architect Developer Division Microsoft Corporation

  2. Agenda • Teaching Performance Culture – RicoM • Managed Code – RicoM • CPU Optimization – VanceM • Memory – VanceM • Threading – VanceM • Web application scalability – MarkFr • Web application responsiveness – MarkFr

  3. Performance By Design  Rico Mariani Architect Microsoft Corporation

  4. Introduction • Part 1 – Teaching Performance Culture • Part 2 – General Topics about Managed Code

  5. Rule #1 • Measure • Just thinking about what to measure will help you do a good job • Performance will happen • If you don’t measure you can be sure it will be slow, big, or whatever else you don’t want • If you haven’t measured, your job’s not finished

  6. Rule #2 • Do your homework • Good engineering requires you to understand your raw materials • What are the key properties of your Framework? Your processor? Your target system?

  7. No more rules • Very few absolutes in the performance biz • Performance work is plagued with powerful secondary and tertiary effects that often dwarf what we think are the primary effects • Whenever considering advice you must remember Rule #1 • Don’t let nifty sounding quotes keep you from great performance

  8. Performance Culture • Budget • an exercise to assess the value of a new feature and the cost you’d be willing to pay • Plan • validate your design against the budget, this is a risk assessment • Verify • measure the final results, discard failures without remorse or penalty, don’t make us live with them

  9. Budget • Begin by thinking about how the customer thinks about performance • Responsiveness • Capacity • Throughput • Cost of Entry • Identify the resource the customer views as critical to this system • Choose the level of performance we want to deliver (do we need an “A+” or is a “D” good enough) • Convert this into what resource usage needs to be to succeed • Don’t think about the code, think about the customer

  10. Plan • You can’t plan without a budget, so get one • Use best practices to select candidate algorithms • Understand their costs in terms of the critical resource • Identify your dependencies and understand their costs • Compare these projected costs against the budgets • If you are close to budget you will need much greater detail in your plans • Identify verification steps and places to abort if it goes badly • Proceed when you are comfortable with the risk

  11. Verify • The budget and the plan drive verification steps • Performance that cannot be verified does not exist • Don’t be afraid to cancel features that are not meeting their budgets – we expect to lose some bets • Don’t inflict bad performance on the world

  12. What Goes Wrong (I) • Programs take dependencies that they fundamentally cannot afford • E.g. Hash and comparison functions that call string splitting functions • Solution • Understand the costs of your dependencies in terms of your critical resource • Use dependencies in the context they were intended

  13. What Goes Wrong (II) • Programs use an algorithm that is fundamentally unsuitable • E.g. mostly sorted data passed to a quicksort • Solution • Model your real algorithm with real data before you assume its ok

  14. What Goes Wrong (III) • Programs do a lot of work that doesn’t constitute “forward progress” • E.g. converting from one format to another in multiple stages each of which re-copies the data • Solution • Score your algorithms relative to the minimum work required to get the job done • This is the number one reason code is slower than it could be

  15. What Goes Wrong (IV) • Programs are designed to do more than they need to do • E.g. arbitrary extensibility even to the point where it is too complicated to be useful • Solution • Focus only on your customers needs • Usable first, then re-usable

  16. Agenda • Teaching Performance Culture • Managed Code • CPU Optimization • Memory • Threading • Web application scalability • Web application responsiveness

  17. Selected Topics (I) : Object Lifetime • Good lifetime looks like single digit time in the collector • Bad lifetime looks like Mid-life Crisis in the Datacenter

  18. Selected Topics (II) : JIT, Ngen, GAC • Many consequences, no perfect answer • Jitting implies private pages • Why is that bad? • Why can it be good? • Ngen is designed to combat these effects • But it’s a mixed blessing too • Shareable code has its costs/benefits • GAC increases the usefulness, also at cost

  19. Selected Topics (III) : Measurement • Choose the right tool for the right problem • Identify sources of consumption (perfmon) • Consider • CPU profilers (like the one in Visual Studio Team System) • Memory profilers, like CLRProfiler • Resource trackers like filemon, regmon • .NET Stopwatch, and others… more on this later today • ETW gives you the best of everything, especially on Vista – but the tooling is still immature

  20. Selected Topics (IV): Collection Classes • The most glaring performance problems (e.g. enumeration of ArrayList were addressed by the Generic collections, chance for a redo) • Beware of using Collections as your “flagship storage” – they are not the most frugal

  21. Selected Topics (V): Exceptions • Managed code pays less for the presence of exceptions, but pays more for the throws • rich state capture, complex state examination • Use of exceptions for anything unexceptional can easily torpedo your performance • Exceptions access “cold” memory

  22. Final words • Understand your goals • Understand the costs of what you use • Be ruthless about measuring so that you’ve done the full job • Keep reading and experimenting so you can learn aspects of the system that are most relevant to you • Share your wisdom with your friends • Insist on performance culture in your group • Don’t forget Rule #1 and Rule #2 !!!

  23. References • Rico Mariani’s Performance Tidbits • http://blogs.msdn.com/ricom • Patterns and Practices Performance References • http://msdn.microsoft.com/en-us/library/aa338212.aspx • “Maoni’s Weblog” • http://blogs.msdn.com/maoni/ • “If broken it is, fix it you should” • http://blogs.msdn.com/tess/

  24. CPU Optimization for .NET Applications Vance Morrison Performance Architect Microsoft Corporation

  25. Overview: How to Measure • Along the way • A Little Theory • Pitfalls to avoid • Tricks of the trade • Low Tech: Stopwatch • Medium Tech: MeasureIt • Higher Tech: Sample Based Profiling (CPU) • Future: Instrumenting your code for perf

  26. Measure, Measure, Measure • This talk is about exactly how (Demos!) • Its all about TIME • We virtualize most other resources, so everything is in the currency of ‘time’ • Measuring Time • Low Tech: System.Diagnostics.Stopwatch • Medium Tech: MeasureIt (Automates Stopwatch) • Medium Tech: Use ETW (Event Tracing for Windows) • Higher Tech: Sample Based Profiling. • The Key is to make it EASY so you will DO it

  27. Low Tech: System.Diagnostics.Stopwatch • This technique is surprisingly useful • Stopwatch is a high resolution Timer • Very Straightforward Stopwatch sw = Stopwatch.StartNew(); // Something being measured sw.Stop(); Console.WriteLine("Time = {0:f3} MSec", sw.Elapsed.TotalMilliseconds); • Pitfalls • Measuring very small time (< 1 usec) • Clock skew on multiprocessors (Each CPU has a clock) • CPU throttling • Variance in measurements (Noise) • Dubious extrapolations.

  28. For CPU, Run CPU at Max Clock • If you don’t do this, CPU is typically throttled to save power • Throttling causes less consistent measurements 1) Go to Control panel’s Power Options 2) Set to High Performance

  29. demo What Runtime Primitives Cost (MeasureIt)  Vance Morrison

  30. Sample Based CPU Profiling • Next Easiest Technique • Visual Studio Does uses this by Default • Theory • At periodic intervals (e.g. 1msec) the CPU is halted • A complete stack trace is taken. • Afterward, samples with the same stack are combined. • If the sampling is statistically independent • Then the number of samples is proportional to the time spend at that location • Good attributes • Low overhead (typical < 10%) • No change to program needed • Less likely to perturb the perf behavior of the program • Does not measure ‘blocked’ time

  31. Pitfalls of Sampling • Samples related to CPU Time NOT REAL time. • Samples are not taken when process is not running. • Thus only useful if CPU is the bottleneck! • VS Profiler does not sample while in the OS, thus system time is also excluded. • You need enough samples to have high accuracy • In general error proportional to 1/SQRT(n) • 10 samples has 64% error • 100 samples has 20% error • 1000 samples has 6% error

  32. demo Sample Based ProfilingIn Visual Studio  Vance Morrison

  33. Future: Event Tracing For Windows • Windows already has fast, accurate logging • XPERF tool will display the logged data • The Framework already exposes ETW logging • System.Diagnostics.Eventing.EventProvider • However it is not easy to use End-to-End • We are working it • We will have more offerings in next release • It is a complete talk just by itself • If you need logging NOW you CAN use EventProvider, xperf • If you can wait a year, it will be significantly nicer. • If there is interest, we can have an ‘Open Space’ discussion

  34. Resources (Keywords for Web Search) • Measure Early, Measure Often for Performance (MeasureIt) • Visual Studio Profiler Team Blog • CLR Performance Team’s Blog • Instructions on investigating suspicious DLL Loads. • Improving .NET Application Performance and Scalability • Lutz Roeder .NET Reflector for inspecting code. • Xperf (Pigs can Fly) • Vance Morrison’s Blog • Rico Mariani’s Blog

  35. Related Sessions Related Labs

  36. Memory Optimization for .NET Applications Vance Morrison Performance Architect Microsoft Corporation

  37. Outline • When Should Care about Memory? • Theory • Memory Usage Breakdown of a process • The Garbage Collected Heap • Characteristics of the GC • Practice (Tools) • Process Level (Task Manager / PerfMon) • Module Level (VaDump) • Object level (ClrProfiler)

  38. When Memory Affects Time • If your computation is not local, memory matters! • Cold Startup: At first all CODE is from Disk • Disk can transfer at best 50Meg/Sec, worst 1Meg/Sec • However OS caches Disk (thus only ‘New’ Code is bad). • Sluggishness when App Switching • Start App1 • Users switches to App2, • App 2 ‘steals’ physical memory of App1 (paged out) • User switches back to App 1, memory must be paged in • Servers are constantly ‘App Switching’

  39. Memory Breakdown of a .NET App • Memory Mapped Files (Mostly Shared) • Loaded DLLs • Code / Read-Only Data (Can be shared) • Read/Write data (Private to one process) • Other Mapped Files (Fonts, Registry) • Dynamic Data (Not Shared) • Unmanaged Heaps • Stack • System support (VM Page Entries, TEB) • Direct VirtualAlloc calls • Runtime’s Garbage Collected (GC) Heap

  40. Viewing an Apps Memory Breakdown • Task Manager (Built In) • Working Set, and Private Working set are the interesting columns • Commit Size is NOT Interesting • Can’t distinguish between shareable and shared. • PerfMon (Built In) • Can also get Information from Task Manager • Can display it as an ongoing Graph • Can get details on the .NET GC Heap • VaDump (Free Download) • Shows breakdown within a single process • Shows down to DLL granularity • Shows Shareable vs Shared distinction.

  41. Tools for viewing Memory Use • Task Manager • Win-R -> taskMgr Select Columns Working Set tends to Overestimates Memory Impact (shared OS files) But does not account for read files. Private Working Set Underestimates Impact Small < 20Meg WS Med ~ 50 Meg WS Large > 100 Meg WS Small < 5 Meg Private Med ~ 20 Meg Private Large > 50 Meg Private

  42. Getting Overview of GC Memory • PerfMon (Performance Monitor) • Win-R -> PerfMon

  43. Performance Monitor : Add Counters Adding New Counters

  44. Performance Monitor: GC Counters 1 Open .NET CLR Memory 2 Select Counters 3 Select Process 4 Add 5 OK

  45. Performance Montor: GC Info Set Display to Report GC Heap Size % Time in GC So 7.3 Meg of the 8.6 Meg of private working set is the GC Heap % Time in GC ideally less than 10% Ratios of GC Generations Gen0 = 10 x Gen 1, Gen 1 = 10 x Gen 2

  46. Drilling In Further: VADump • VaDump –sop ProcessID 1 Total WS 2 Breakdown 3 DLL Breakdown GC Heap Here Only These Affect Cold Startup

  47. Typical Memory Breakdown Results • Memory Is Mostly from DLL Loads • Typically cold startup is very bad (Since data must come disk) • Private + Sharable but not Shared is the metric of interest • Eliminating unnecessary (unshared) DLL loads first attack • See CLR Performance Team Blog on Tracking down DLL loads • Eliminating amount of code touched is next. • Memory Is Mostly In GC Heap • Does not affect cold startup much but can affect warm startup • If GC heap large (> 10s of Meg), probably degrading throughput • GC Time is proportional to number of pointer in surviving data • In Either Case when Working Set Large (> 10Meg) • Throughput is lost due to cache misses • Server workloads are typically cache limited

  48. Fixing Memory Issues: Prevention! • Fixing Memory Issues is HARD • Usually a DESIGN problem: Not Pay for Play • Using every new feature in your app • XML, LINQ, WPF, WCF, Serialization, Winforms, … • Initialize all subsystems at startup • GC Memory Are your Data Structures • Tend to be designed early • Hard to change later • Thus it Pays to Think about Memory Early!

  49. Some Theory on the .NET GC Heap • Compacting Garbage Collector • Runtime Traces all Pointers to the GC Heap • Fast: Allocations under 85K just ‘bump a pointer’ • When heap full GC happens and memory is typically compacted • Object allocated on same thread together • .NET GC is a Generation Collector (3 Gens: 0, 1, 2) • Gen 0 • All memory allocated out of gen 0 • Ideally size of Gen 0 < L2 Cache size • GC of Gen 0 takes little time (e.g. .2 msec) • Gen 1 • Memory that survived 1 GC. Ideally #Gen0 = 10 x #Gen 1 • GC of Gen 1 takes longer but is modest (e.g. 1msec) • Gen 2 • All objects (including large objects). Can be very large. • GC of Gen 2 (for 20Meg Heap) = 160msec • Can take a noticeable amount of time (e.g. 8 msec / Meg) • Time depends on • Amount of memory surviving • Number of GC pointers in surviving memory • Fragmentation of Heap.

  50. More .NET GC Heap Theory • GC heap looks like a Sawtooth • Typical Gen 2 Peak / Trough Ratio ~ 1.6 • Ratio mostly independent of heap size • Keep in mind no other fragmentation

More Related