Planning for Scale and Capacity Kimmo Forss Lead IW Architect Microsoft Corporation • Simon Skaria • Senior Program Manager Microsoft Corporation Satish Matthew Program Manager Microsoft Corporation James Petrosky Sr. Consultant Microsoft Corporation • Doron Bar-Caspi • Senior Program Manager Microsoft Corporation
Capacity Planning The process of evaluating a technology againstthe needs of an organization, and making an educated decision about the procurement, design, and configuration of hardware and software to meet the demands specific to a system being installed.
Common Questions • How much data can we store? • How many users can our environment support? • How much hardware do we need? • How many sites can we run on our servers? • How do we validate our design? • What is the SharePoint Capacity Planning Tool and how can I use it? • What tools can we use to measure performance? • How do we plan and monitor our storage needs & performance?
SharePoint Planning Lifecycle Usage Scenarios SharePoint Planning Adaptive Refinements Solution Design Physical Configuration
Session Objectives And Takeaways • Session Objectives • Discuss the SharePoint Planning process • Discuss the Components and Factors to Consider when Planning for Performance and Capacity in MOSS • Provide Recommendations and Best Practices • Describe the considerations when planning a SQL Server backend for a large SharePoint deployment. • List Best practices for disks and other hardware tuning to optimize SharePoint performance. • Demonstrate the SharePoint Capacity Planning Tool • Leave with a Better Understanding of the Planning Process, SharePoint Platform, & Recommendations • Describe the Process for Determining the Hardware and Topology Requirements
AgendaPlanning for Scale and Capacity • Usage Scenarios • Solution Design • Physical Design & Configuration • Plan for Software Boundaries • Estimate Performance and Capacity Requirements (Throughput Targets) • Plan Hardware and Storage Requirements • Monitoring & Adaptive Refinements • SharePoint Capacity Planning Tool Performance and capacity planning: The process of mapping your solution design to a farm size and set of hardware that will support your business goals.
Usage Scenarios • Understanding the business needs, goals, and desired value added by the solution • Who are the end-users? • How will they use the solution? • What are the different use-case scenarios? • What does the data look like? • How it’s used (i.e. “High Impact”, “Archived”, etc.) • Business types (i.e. “HR”, “Sales”, etc.) • File types (i.e. “.docx”, “.pptx”, etc.) • Average size for each file type • Where are the end-users located?
SharePoint 2007 Usage Scenarios Business Intelligence Collaboration Forms and BusinessProcesses Enterprise Content Management Web Content Management Search
Global Intranet Scenario • Centralized or Distributed • Global teams or local teams • Where to put the sites • End user experience • Bandwidth/Latency • Operational costs • Search • Offline • Network accelerators • Security • ACLs
Web Publishing Scenario • Characteristics • Fewer content creators (may be external agencies) • Large number of viewers (authenticated or not) • Small number of central sites for content publishing • Approval workflow • Staging and deployment • Database Characteristics • Publishing databases mainly read • Design considerations • Design for end to end performance • Page size matters! (html, .js,.css,….) • More memory than CPU intensive (caching) • Navigation and search ability
Search Scenario • Characteristics • Multiple sources • Security trimming • Indexing Characteristics • Network/CPU intensive • Querying Characteristics • CPU/Memory/IO Intensive • Database Characteristics • Property store in SQL • Design considerations • 50 M items / catalog (farm)
Solution DesignAn Example • Central authoring farm • Content is staged, reviewed and approved • Approved content pushed to the globally distributed data centers • My Sites available to users locally in their data center • Business intelligence reports for finance available via KPI dashboard • Global search available to everyone
Physical Design Planning Continuum • MOSS On-premise • Complete customizability • Facilitates rich, complex KM scenarios • SMEs plans, designs and implements Planning and expertise • Core WSS • For small orgs • Simple on-premise • MOSS Online • Complete MOSS offering • Limited customization • Office Live • Hosted for grassroots orgs • No planning necessary Functionality and Complexity
Physical Design Planning • Components • Software Boundaries • Throughput Targets • Data Capacity • Hardware • Planning Activities • Plan for Software Boundaries • Estimate Performance and Capacity Requirements • Plan Hardware and Storage Requirements • Test and Validate Your Design Performance and capacity planning: The process of mapping your solution design to a farm size and set of hardware that will support your business goals.
Plan for Software BoundariesPhysical Design • Object Categories • Software Scalability vs. Hardware Scalability • Test Results, Findings, and Recommendations from the Product Group • Test Environment • Test Results • Recommendations • Other Considerations
Plan for Software BoundariesObject Categories • Site Objects • Site Collections, Web sites, documents, document libraries, list items, document file size, etc. • People Objects • User profiles, security principals, etc. • Search Objects • Search indexes, Indexed documents • Logical Architecture Objects • Shared Services Providers, Site Collections, Content Databases, Zones, etc. • Physical Objects • Servers: Index, WFE, Database, Application, etc.
Plan for Software BoundariesSoftware Scalability vs. Hardware Scalability • Software scalability • Recommendations for acceptable performance based on software behavior and characteristics • Hardware scalability • Does not change/modify software behavior or characteristics…but can increase overall throughput of a server farm and might be necessary to achieve acceptable performance as the number of objects approach recommended limits
Plan for Software BoundariesProduct Group's Test Environment • Hardware Specifications: • Network: Gigabit Ethernet (one billion bits/sec) • Farm Configurations Tested:
Plan for Software BoundariesTest Results and Findings • Throughput vs. Number of Site Collections in One Content Database
Moving Site Collections Usingstsadm -o mergecontentdbs Kimmo Forss Lead IW Architect Microsoft Corporation demo
Plan for Software BoundariesTest Results and Findings • Throughput differences between flat document library vs. document library with folders See “Scaling to Extremely Large Lists and Performant Access Methods” at http://blogs.msdn.com/sharepoint/archive/2007/07/25/scaling-large-lists.aspx
Plan for Software BoundariesRecommendations & Guidelines (subset) • For all recommendations, visit “Plan for software boundaries (Office SharePoint Server)” at http://technet2.microsoft.com/Office/en-us/library/6a13cd9f-4b44-40d6-85aa-c70a8e5c34fe1033.mspx
Information Architecture Limit content DB • Soft limit* for the size of a ContentDB: 100GB. In most cases, exceeding 100GB is discouraged. • If you can excuse going over 100GB, make sure: • Test your I/O subsystem for adequate perf. • Use a single site collection in this DB. • Remember to test your Backup solution for this size. For minimum downtime – we recommend adequate tools, like a differential backup solution. * Your experience may vary: H/W and usage profile dependant.
Information Architecture Manage Large Lists • SharePoint support large lists, but you must carefully plan how users view the lists to prevent performance impacts. • For best performance, do not go over 2,000 items in a list level (for example, the root of the list or a single folder). • If you must create and browse large lists, define and use customized filtered views that are configured to return less than 5,000 items.
Plan for Software BoundariesOther Considerations • Throughput vs. number of Web servers • Test findings showed plateau at 5:1 (YMMV) • Perform tests in your environment • Other Recommendations • Carefully plan your site hierarchy and design • Minimize # Web applications and application pools • Limit # of Shared Service Providers • Plan for database growth • Follow data and feature best practices andsuggested limits.
Estimate Performance & CapacityUsage Profiles • Determine Usage Profile • Usage profile == User community’s behavior • Distribution of requests across content • Operation types and frequency • Existing solution in place? Mine IIS logs • Leverage usage profiles provided in configurations tested by Product Group as starting point:
Estimate Performance & CapacitySample Usage Profile (WSS Collaboration)
Estimate Performance & CapacityThroughput Requirements • Estimating Throughput Targets • User response time, concurrency Warning: Plan for Peak Concurrency Throughput targets (in RPS) at various concurrency rates (recommended response time of 1 – 2 seconds)
Estimate Performance & CapacityOther Factors • Other configuration factors that can influence throughput targets • Indexing (schedule indexing window off-hours) • Caching enabled? • Output Caching and Cache Profiles • Object Caching • Disk-based Caching for Binary Large Objects • If interested in learning more about Caching in MOSS, the next session in this room will provide more info %(AG306 Performance and Optimization Strategies for MOSS 2007) • Page customizations • Custom Web parts
Plan Hardware and StorageHow SharePoint Scales • Designed to grow with organization needs • Server resources: x32, x64, CPU, RAM, HDD • Recommend 64-bit for back end services (SQL) which can leverage additional addressable memory • SQL: HDD configuration critical • Server Farm • Topology restrictions removed • WFE, Query, Index, Excel Calc, Project, SQL • Adopted WSS adage: content only limited by HW capability* • Sites: In WSS 3.0, Portals sites are "just another WSS site”
Plan Hardware and Storage64-bit vs. 32-bit Hardware • WSS 3.0 and MOSS 2007 can work on both • 64-bit hardware can be mixed within a farm (and even at the server role*) • 64-bit Hardware Recommended; • This is last version of 32-bit • 32-bit can directly address only a 2GB Memory Address Space • 64-bit supports up to 1,024 GB Memory (Physical and/or Addressable) • 32-bit may perform better when using <= 2GB RAM, but we recommend 64-bit for future investments and scalability • Larger # of Processors • 64-bit HW Prioritization • SQL Server Index Excel Search WFE
Plan Hardware and StorageSingle Server Example • One Server Configured as: • Web Front-End Server Role • Application Server Role • Database Server Role • Appropriate for limited use-scenarios including the following: • Installing Office SharePoint Server 2007 for evaluation purposes. • Deploying only Microsoft Windows SharePoint Services 3.0. • Deploying a subset of the Office SharePoint Server 2007 features. • Deploying Office SharePoint Server 2007 for a limited purpose (such as for a single department) or for a limited number of users.
Web Server + Query Server Clustered SQLServer Application Server Plan Hardware and StorageMulti-Server Farm Example • Optimizes performance of web servers • Increases redundancy and reduces points of failure • Redundancy at WFE, Query, and Database server roles • Determine configuration based on your business needs and goals • Determine config of other Application roles (Excel Services, Index, Forms, etc)
Plan Hardware and StorageStorage Considerations • Primary Metric: Document Storage • Plan for 1.2 – 1.5 x file system size for SQL Server • note: metric is closely tied to RAID level used on SQL disks • Secondary Metric: Index Size • Index Server: (5 – 12% of total size of all indexed content) * 3 • Query Server: Same as Index Server
Plan Hardware and StorageStorage Considerations • Read and review new whitepaper “Performance Recommendations for Storage Planning and Monitoring” at http://technet2.microsoft.com/Office/en-us/library/ca472046-7d4a-4f17-92b1-c88a743a5e3c1033.mspx?mfr=true • Discourages Content DBs > 100GB (if larger, then try to limit to a single site collection per DB) • Stresses design to respect published software boundaries • Manage lists and libraries with many items (>2,000) • Start with a dedicated server running SQL Server 2005 • Separate and prioritize your data among disks • Physical storage & RAID recommendations
Plan for Hardware and StoragePlanning for SQL Server is a Must • Tests and deployment experience shows that a healthy SQL Server is the basis for a healthy SharePoint farm. • Sub-optimal SQL Server will radiate to other components in the farm. • Slow response from SQL Server will result in WFE requests buildup in a queue, and will cause unpredicted symptoms. • I/O subsystem hardware plays a significant role.
Plan for Hardware & StorageSQL box memory • “4 GB is the minimum required memory, 8 GB is recommended for medium size deployments, and 16 GB and above is recommended for large deployments.” • What influences the amount of RAM? • Number and size of Content databases • Number of concurrent requests to SQL • Total user base • Size and width of commonly used lists • Remember: Minimum is where we start…
Plan for Hardware & Storage DAS vs SAN • Both type of storage can scale, perform, and serve a multi-TB farm. • So… where’s the difference? • Ease of management • Growth potential • Advanced capabilities (snapshots, remote replication) • The ability to share with other applications • Price…
Plan for Hardware & Storage SQL Server disks • When prioritizing data among faster disks, use the following ranking: • TempDB data and transaction logs • Database transaction log files • Search database. • Database data files • Note: In a heavily read-oriented portal, prioritize data over logs.
Plan for Hardware & Storage SQL Server files • Best Practices: • Allocate TempDB on RAID 1 (or R1 variants) • Separate Data and Logs across disks • For TempDB, Create multiple data files up to the number of CPU cores • Pre-Grow files (don’t rely on Autogrow) • Allocate dedicated disks for Search
Plan for Hardware & Storage SQL Server Disks • Disk array design is critical to good SQL performance • In general more spindles (disks) = better performance • Calculate array performance. Plan for .75 to 1IOPS per GB of array for content. 1.5 to 2 for temp, search and log • How to calculate array performance: Spindle IOPS capability * Number of Spindles = Total IOPS • Common spindle capabilities: • U320 SCSI 10K = 100 IOPS, 15K 130 IOPS • Fiber Channel 10K = 130 IOPS, 15K = 200 IOPS • SAS 10K 165IOPS, 15K = 260 IOPS • Example: FC 15K disk X 10 disks = 200 * 10 = 2000IOPS • Raid 1 verses Raid 5. Read speed = same. Write speed twice as bad on Raid 5