Capacity Planning Testing for SharePoint 2007 Dario Mratovich Consultant Microsoft OFC309
Session Objectives and Takeaways • Learn how to determine the real throughput requirements for a farm • Understand how to do capacity planning testing on your SharePoint farm • NOT a primer on how to use VSTT 2008 • Learn some of the different ways in which you can increase capacity in your farm
Agenda • Know the goal of your testing • Know what to measure • Determine the throughput requirements for your farm • Create the test environment • Create the tests and custom tools • Run tests and analyze results
Know the Goal of Your Testing Determine what you want the end result to be – what are you hoping to prove? While this seems like a simple concept, many people that are doing testing for the first time have a difficult time being very specific in answering this question.
Know the Goal of Your Testing • Measuring RPS • Most common • Measuring specific operations • Impact on or impact by normal ops • Measuring how much / how fast content can be indexed • Measuring performance of individual pages: • TTLB for pages with custom code • Clients over latent networks • Typically one of two categories: • Capacity planning verification – yes this will work! • Proof of concept – how much can I do?
Know the Goal of Your TestingExample of not doing capacity planning • England and Wales 1901 Census website • Launched in January 2002 • 30 million hits per day • Servers and application couldn’t keep up with demand • Took 8 months to redevelop, redeploy and test
Know What to Measure Knowing the goal of your testing will make it easier for you to decide which metrics to captureand what thresholds you need to meet for each metric when running your tests.
Know What to Measure • RPS – most tests are based on how many requests you can service. Requests Per Second is the best number for a test like that. RPS can be used for measuring how many pages are delivered as well as things like how many searches are executed.
Know What to Measure • Page Time – also known as Time To Last Byte (TTLB), this tells you how long it takes to deliver a page back to the client. Often you will use this value in conjunction with RPS; for example, our farm needs to deliver 100 RPS and pages should load within 5 seconds.
Know What to Measure • Crawl Time – for measuring crawl performance, measure • Overall time the crawl takes, • Corpus size, and • Number of documents being indexed per second • Document indexing rate is a little more complicated: • Office Server Search Indexer Catalogs – Documents Filtered gives you the total number of documents that have been indexed per content source • Office Server Search Gatherer Projects – Document Add Rate gives you the number of items indexed per second per content source • Office Server Search Gatherer – Documents Filtered Rate gives you the overall number of items being indexed per second
Determine the Throughput Requirements for Your Farm The rest of this section will focus on testing for farm throughput, or RPS, since it is the most common scenario by a wide margin. Determining the RPS needed to support your farm can be a complicated process.
Determine the Throughput Requirements for Your Farm • Rule #1: Number of users means nothing! • Rule #2: Number of users means nothing!! • Rule #3: Number of users means nothing!!! • I can support 100,000 users on a single server farm running on my laptop if they each average 1 request every 12 hours • 100k requests over 12 hours is 2 RPS
Determine the Throughput Requirements for Your Farm • First determine number of RPS needed • Use historical data if possible (IIS logs and Log Parser, Web Trends, etc.) • Otherwise, start with number of users • Divide users by usage profile • What percent are Light (20), typical (36), heavy (60), or extreme (120) • Multiply number users by number ops per usage profile • Factor in peak concurrency
Determine the Throughput Requirements for Your Farm • Contoso has 80k employees; up to 40k may be at work during any 8 hour window • 80k users, 40k active, concurrency 5% to 10% at peak • 10% light, 70% typical, 15% heavy, 5% extreme • (10% light x 40k) x 20 RPH = 80,000 RPH • (70% typical x 40k) x 36 RPH = 1,008,000 RPH • (15% heavy x 40k) x 60 RPH = 360,000 RPH • (5% extreme x 40k) x 120 RPH = 240,000 RPH • 1,688,000 / 3600 (seconds per hour) = 469 RPS • 469 x 10% peak = 46.9 RPS required
Determine the Throughput Requirements for Your Farm • Now you know RPS required, but what should those requests be doing? • This determines your test mix • Again, look to any historical info you can get if possible • Otherwise, as in most cases, start making educated guesses • Start with test mixes in Planning for Software Boundaries documents on TechNet
demo Introducing Visual Studio Team Test: Building a simple Web Test
Create the Test Environment Once you’ve decided what your test goals are, you need to design the environment and tests that are going to be used to execute your tests. This includes not only the SharePoint environment, but also the operational environment.
Create the Test EnvironmentSchematic Test Tools Infrastructure SharePoint Environment
Create the Test EnvironmentConfiguring Test Servers • VSTT Controller • VSTT Agent(s) • Use a separate SQL Server for VSTT results • Turn off anti-virus software on load test controller and agents • Make sure all network settings are configured correctly, names are resolvable, etc. • Watch out for bottlenecks on your agents and controller
Create the Test EnvironmentPlan your infrastructure • Active Directory • Forest and Domain • Users • Do you have enough DCs for the authentication load • Monitor the lsass.exe on DCs • DNS • Load Balancing • Remember, names are only resolved once • A test agent will route all requests for a namespace to the same IP address once it’s initially resolved
Create the Test EnvironmentConfigure SharePoint • Stop the WSS timer and Admin services, as well as profile imports and crawls (unless testing these) • Enable BLOB and/or output cache, if appropriate • All pages should be published; nothing checked-out • Make sure navigation is realistic • Make sure you test a wide sample of pages • Stop anti-virus software (unless you are measuring performance when using SharePoint integrated AV) • Write scenarios will change the content database • Restore from backup after a test run that includes writes
Create the Test EnvironmentOther SharePoint Tasks • How many users are you going to have, and in what roles? • How are you going to populate those SharePoint roles with AD users and groups? • Do you need • Audiences? • Profile imports? • Search content sources? • To import profiles and crawl content prior to running your tests? How long will it take?
Create the Test EnvironmentPreparing Test Data • Make sure you have adequate sample data • A very common stumbling block • Have enough content for a reasonable search corpus • Uploading the same document many times – sometimes hundreds or thousands of times – with different names each time CAN HURT YOU • You will probably need tools to populate sample data • http://www.codeplex.com/sptdatapop • You almost always end up writing additional tools for other data population tasks
Create the Tests and Custom Tools The next step is to create the actual tests that are going to be run. Often times tests are data driven, meaning test parameters are read from a database or CSV file rather than a static value so that the entire test site is hit over the course of a test. That may mean that additional custom tools are developed and used to capture all of the unique sites, pages, lists, and items in the site so that they can be plugged into your tests.
Web Tests Best Practices • After recording a test change hardcoded values (URLs, parameters, etc.) to pull from data source • Always have validation rules (error.aspx is 200) • Use multiple users and user roles • The system behaves differently per user role • Avoid using farm admin (it does not benefit caching) • Model client apps such as RSS, Outlook, OneNote
Load Test Best Practices • Validate your test mix; a bad mix can make the results unreasonably fast or slow • Start in a well known system state • Restore a backup before each run • Be sure to update stats and defrag indexes • Perform an iisreset • Define a warmup period for the test; the first request after iisreset does a lot of things • Think times – Yes or No? • Virtually impossible to model correctly • If you do you’ll need to have higher user load to generate high RPS. It can stress the agents
Sample Tests and Data Population • http://www.codeplex.com/sptdatapop • Published by PG based in internal tool • Sample tests cover many basic SharePoint operations • Use as a reference to build new tests • Data population tool defines hierarchical data in XML format • Nest the tag in the same way as the product OM • SPSite->SPWeb->SPList->SPListItem • Sites->Webs->Lists->ListItems)
Other Tools You May Need • Script to create users in Active Directory • CSV files or database containing data for webs, lists, libraries, list items and documents for use in webtests • Tool to upload sample corpus into the site • sptdatapop tools • May also be able to use your upload document webtest • Tool to create webs, lists, libraries, list items, etc. • sptdatapop tools • Tool to create My Sites • May be able to use the sptdatapop tools • Tool to create a list of users and passwords to run tests
demo Add a WebTest, Run a Load Test
Run Tests and Analyze Results Your tests have been created, now you are running your tests. What are some of the bottlenecks you can look for? What are ways you can scale your farm up further?
Questions to Ask Yourself • Where is the bottleneck? • There will ALWAYS one • And is it where I expect it to be? • Can performance be improved? How? • Are there spikes in throughput? Are there any strange patterns? • Are there many errors? • Unexpected resources getting downloaded? • Any Load Balancing Issues?
Investigation Techniques • Ensure correct configuration (HW, SharePoint Settings, Load Balancer, etc.) • Divide and conquer • Run a simple litmus scenario (e.g. homepage) • Isolate workloads / operations / pages • Reduce farm topology • Read vs. Read-Write scenarios
Investigating with VSTT • Use Tables view to: • Find error rate of individual transactions • TTLB per operation • Look for any suspicious responses • Use Graph view to: • Look for RPS and TTLB Patterns • Analyze perf counters and Correlate behaviors
Analyzing Performance Counters • Which machine/s cause the bottleneck? • CPU / Memory / Disk IO / Network • Then you can zoom in according to machine role • Start with the VSTT default counter sets • Add SharePoint / Search / Excel specific ones
Patterns and BehavioursWhat we expect to see • The SQL Private Bytes behaviour is normal • WFE memory is stable • We expect this level of spikiness in CPU
Patterns and BehavioursIdentifying problems • WFE CPU went down across all machines • SQL CPU went up • SQL Lock Wait Time went up • Hence the bottleneck is in SQL
demo Reviewing Test Results
Typical Data Related Problems • Large lists • Lots of Web parts importing non-cached data from various places • Cross-list queries and CBQ Web parts • Too deep site structures • Too many sites in a site collection • Too many site collections in a Content DB • Too many ACLs • Unrealistic org structure showing on MySite
Scaling PointsOnce you understand your capacity requirements, consider: • Pages with custom code – should the code be rewritten? • Are my objects within recommended guidelines? • Number web applications, site collections, sites collections per content database, etc. • Number of items in lists and libraries, etc. • Database optimizations • Additional data files? • Additional SQL Servers? • Disks: RAID options, partition alignment • DAT304 – Considerations for Large-Scale SharePoint Deployments on Microsoft SQL Server
Scaling Points (continued)Once you understand your capacity requirements, consider: • Should load be split up by function into multiple farms? • Is the farm running 64-bit? • YES!!! • Should virtualization be used? • What does the host look like? • What does the guest look like? • Is caching used? • Output cache? • Blob cache? • Object cache? • Is the object cache being monitored and sizing accordingly? • Is per-item security being overused?
Scaling Points (continued)Once you understand your capacity requirements, consider: • How big are pages? • Fiddler is your friend! • Is SSL being used? • If so are sticky sessions used so that requests don’t re-negotiate on each page? • Is there enough network bandwidth to support the traffic? • Do you have two VLANs for your farm? • One for the page traffic and a second backend channel for your servers to talk to each other and SQL? • Is the communication actually occurring on the appropriate VLAN? • Do you have a dedicated WFE that is used for indexing?
Required Slide Speakers, please list the Breakout Sessions, TLC Interactive Theaters and Labs that are related to your session. Any queries, please check with your Track Owner. Related Content Breakout Sessions DAT304 – Considerations for Large-Scale SharePoint Deployments on Microsoft SQL Server OFC301 – 10 Steps to a Successful SharePoint Deployment Whiteboard Sessions WTB301 – SharePoint Architecture Panel Discussion with Joel, Eric and Zlatan
Required Slide 10 pairs of MP3 sunglasses to be won Complete a session evaluation and enter to win!
Required Slide Speakers, TechEd 2009 is not producing a DVD. Please announce that attendees can access session recordings from Tech-Ed website. These will only be available after the event. Resources Tech·Ed Africa 2009 sessions will be made available for download the week after the event from: www.tech-ed.co.za • www.microsoft.com/teched International Content & Community • www.microsoft.com/learning • Microsoft Certification & Training Resources • http://microsoft.com/technet • Resources for IT Professionals • http://microsoft.com/msdn Resources for Developers
Required Slide © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.