Performance Investigations with Analysis Services 2012

Performance Investigations with Analysis Services 2012 Akshai Mirchandani AkshaiM@Microsoft.com Principal Software Design Engineer

Session Overview • Tabular versus Multidimensional: a very brief overview • Performance investigation tools and techniques • Formula engine performance • Frequently asked questions/issues • Common pitfalls for processing performance • Tuning Analysis Services for high end hardware • Q&A

Tabular versus Multidimensional • Common • Configuring overall server memory • Heap Type • Tuning query thread pools • Data source performance • Tuning MDX queries

Tabular versus Multidimensional • Multidimensional • Optimizing disk system • Processing performance • Parallel processing of partitions • Performance of building aggregations • Performance of calculated members, scope assignments, unary operators, and other advanced calculations • Tuning IO thread pools (SSAS 2012) • Multi-user settings (CoordinatorQueryBalancingFactorand CoordinatorQueryBoostPriorityLevel)

Tabular versus Multidimensional • Tabular • Memory, memory, memory… • DAX query performance • No parallel processing of partitions (yet) • Query performance versus Processing performance? • Adjusting compression levels lets you favor processing time versus query time and vice versa • Hardware: CPU and Memory performance • Much more noticeable for an in-memory system – just like disk performance is more of an issue for MOLAP • Often the smaller systems show significantly better performance • No NUMA issues, and the server-grade CPUs take longer to become available

Performance Investigation Tools • Profiler • XEvents • Lighter weight than Profiler • You can merge the events with a Windows ETW trace and look at CPU, I/O together was SSAS events • Currently the set of XEvents is the same as Profiler events – but in the future newer XEvents should show up • MDX • Use MDX queries and MDX calculation modifications to identify/isolate performance problems • DMVs • System DMVs like DISCOVER_PARTITION_STAT, DISCOVER_MEMORYUSAGE, DISCOVER_OBJECT_ACTIVITY. • ETW (Event Tracing for Windows) • This is a Windows capability – to efficiently gather fine-grained information about CPU, I/O, wait states, etc. • This can show which partition/aggregation files are “hot”, and perhaps you can tune them to be on fast disks

Demo • XEvents • Profiler • SSAS 2012 Calculation Events • Sample tool

Identifying Query Problems • Identify the queries that take a long time to run • Test each query in isolation • Determine if the issue is primarily in the Formula Engine (FE) or the Storage Engine (SE) • Test the parts of the query • Ensure to test cold/warm cache • Ensure to test after running clean cache • After identifying individual query performance, ensure you are also running concurrency query tests

Identifying the Long Duration Queries • Run a trace that saves to a database table • Select TOP X • Filter by the Query End event (EventClass ID = 10) • Order by Duration • Include the TextData field • Include the CurrentTime field To identify shorter running queries that run a lot and burn CPU • Order by Duration Descending • GROUP BY TextData field • SUM by Duration

Determining Storage Engine or Formula Engine • Run a trace while testing the query • Sum the duration field for the Subcube (EventClass = 11) • This is the amount of time the query spends in the Storage Engine • Subtract the sum of the duration field from the duration in the Query Cube End event (EventClass = 10) • This is the amount of time the query spends in the Formula Engine • Also look at the count of subcube events; many == FE issue • Do this both with a cold cache and a warm cache • Focus on Storage Engine or Formula Engine issues depending on: • Query duration is <30% SE means you focus on FE. • Query duration is >= 30% SE means you first need to rule out SE before focusing FE

Testing the Parts of the Query • Comment out all script commands in the MDX script • If it's slow with the full MDX script but fast with an empty MDX script, then use MDX Script Performance Analyzer to identify the problem MDX script assignment: http://mdxscriptperf.codeplex.com/ • It may be necessary to start from a stripped down cube, with all of the following removed: • The MDX script, except for the Calculate command • All unary operators • All measure expressions • All custom member formulas • All Semi-additive measures • All Custom Rollup properties • If on SQL Server 2008 R2, run Profiler because new trace events to track resource usage and locks • http://support.microsoft.com/kb/2458438 • If on SQL Server 2012, take a look at the new advanced calculation trace events • Note that they are expensive, so avoid turning them on in production! • Continued

Checking on Partitions and Aggregations • If the query is spending the majority of the time in the Storage Engine, examine the partitions and aggregations • Look in the trace for the partitions that the query is hitting to ensure that no unnecessary partitions are being queried • For example, if you are querying only 2011 data, the query should not also be hitting 2009 and 2010 partitions • In that case, is there something wrong with the MDX? • Or is there something wrong in the slice or data within the partition? • Look in the trace to see if any aggregations are being used by the query • Consider using Query Subcube Verbose to understand more details • If no aggregations are being used, consider adding appropriate aggregations for the query

Common Formula Engine slow downs • Block mode – won’t go into this • Lots of discussion in the performance guide • Stop by the SQL Clinic if you have questions • Rewriting crossjoins to put hierarchies from the same dimension together • Non Empty • Removing arbitrary sets • Not overwriting an empty cell with a non-null value

Putting Hierarchies From the Same Dimension Together • Sometimes you see: • [Product Category] * [Year] * [Product Subcategory] * [Product] • Why not do this instead? • [Product Category] * [Product Subcategory] * [Product] * [Year] • Advantages? • Auto-Exists can optimize the latter pattern to query each dimension independently • If the server has to do “cross dimensional auto-exists”, huge performance hit • Particularly if there is an arbitrary shape in the subselect • A different algorithm, which can be much more expensive • Performance optimizations in R2 and 2012 to improve this, but it still helps • On occasion it’s unavoidable • Usually it’s a matter of educating users of Excel to “first drag all the product attributes, and then drag Years” • Or better, try to hide the attribute hierarchies and only give them a user hierarchy

Non Empty • Related to auto exists • Two algorithms: • Naïve and Express • Trace events to show which algorithm • Issues: • Non Empty can be expensive • Not using Non Empty is not feasible • Watch out for query patterns: • Non Empty • Arbitrary shapes in subselect • Calculations • Causes Non Empty to first apply auto-exists, and then apply Non Empty • Applying auto-exists can be *very* expensive

Arbitrary Shapes • An arbitrary shape is any set that cannot be expressed as a crossjoin of members from the same level of an attribute hierarchy • Arbitrary shapes can only use the query cache if they are used in a subselect, in the WHERE clause, or in a calculated member • Arbitrary Set: • {(Food, USA), (Drink, Canada)} • {customer.geography.USA, customer.geography.[British Columbia]}. • Note that an arbitrary shape on the query axis does not limit the use of any cache

Processing Pitfalls (Dimensions) • SELECT DISTINCT … • Do you have appropriate indexes in the relational database? • Are you using the fastest provider? • Wherever possible, use a native provider (except…) • Have you optimized data access? • E.g. Network Packet Size? • Attribute relationships? • The “key” attribute is the most expensive part of dimension processing • Can you optimize it? • Consider/test processing with ByTable (versus ByAttribute) • Note that I don’t actually recommend it, but it can make a positive difference in some scenarios

Processing Pitfalls (Dimensions) • Why is ProcessUpdate so slow!!!?? • It has to fetch *all* the data in the dimension • Check which members already exist, update them (track if they changed) • If a new member shows up, add it • It has to invalidate indexes/aggregations in partitions that reference it • It has to rebuild indexes/aggregations • Optimizations? • Rigid attribute relationships help this, but hurt flexibility • Discussion…

Processing Pitfalls (Partitions) • Processing in parallel • MaxActiveConnections • Enough threads? • Speed of reading data • How fast is your data source? • Speed of looking up dimension keys – too many dimensions? • Building indexes and aggregations • Quota limitations • DataStorePageSize? • NUMA? • Reduce aggregations? Number and size… • Reduce indexes?

Tuning AS for High End Hardware (1) • What is your scenario? • Throughput? • Processing window? • Query response time? • All of the above? Sigh… • Lots of effort went into this area for SSAS 2012 • Fruits of it also show up in the 2008 R2 Operations/Performance guides • Windows Heap instead of AS Heap • For machines with lots of cores and active threads, Windows heap gives better perf • AS heap is faster for lower concurrency scenarios • Windows Heap is the new default in SSAS 2012 • DataStorePageSize/DataStoreHashPageSize • Pages allocated from the 8 KB page pool show contention • For large results (e.g. processing aggregations/indexes), this can hurt significantly • Using 64 KB avoids this contention • In SSAS 2012, this size is auto-selected – it picks 64 KB for storage engine caches

Tuning AS for High End Hardware (2) • NUMA (Non-Uniform Memory Access) • High end hardware with lots of cores is usually NUMA • Causes slowdowns in performance • “I upgraded from 16 cores to 32 cores and my performance is about the same (or worse)” • Some work was already done in 2005 • An attempt was made to return pages from “nearby memory” • Some more performance work has been done to tune this in SSAS 2012 • Windows File System Cache is affected by the same problem • If a partition file is first scanned on NUMA node 0, the cache pages are allocated from NUMA node 0 memory • Later accesses on any other NUMA node will pay a greater price to access this “distant memory” • SSAS 2012 added a new I/O Process thread pool • The I/O Process thread pool handles all segment scans • In fact, under the covers there are multiple I/O Process thread pools created – one per NUMA node (**) • A partition chooses one of those thread pools and will try to stick with it – thus attempting to avoid the problem ** Caveats exist – default is split the thread pools only if there are >= 4 NUMA nodes

Tuning AS for High End Hardware (3) • Other bottlenecks • Can turn off Resource Monitoring with a configuration setting • The results of some DMVs will become invalid (no error is raised) • Can configure Analysis Services to open files in RANDOM mode • This changes the way Windows handles the pages in the file system cache • It can generate some good performance gains on high end hardware • Risk: only do it if the machine has enough memory – the setting will cause pages to stay in memory longer and the system can start choking

Additional Resources Analysis Services Maestros training Analysis Services Performance Guide Analysis Services Operations Guide SQL CAT team http://blogs.msdn.com/b/psssql/archive/2012/01/31/analysis-services-thread-pool-changes-in-sql-server-2012.aspx

Performance Investigations with Analysis Services 2012