Architecting for Scale in SharePoint 2010

Architecting for Scale in SharePoint 2010 Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.

Scaling SP2010 from the Ground Up • Storage Architecture • SQL Tuning Tidbits • Remote Blob Storage (Demo) • Performance and Control • Scalable Taxonomy Design (Demo) • Search… A Complete Story • The Big Picture: 10 million, 100 million 1 BILLION Documents?

Storage Architecture • Storage Architecture can make or break SharePoint Performance • Poor storage performance can tank the whole SharePoint farm! • Can Tough to Estimate • Use an extendable storage platform if possible • Wider is Better • More spindles always better than higher GB • Avoid using a small number of large disks for increasing storage capacity

Storage Architecture • TempDB, Search DBs, Content DBs • Multiple Data Files in Primary File Group • # Files = ½ to ¼ of CPU Cores | <= CPU Cores • Separate to unique spindle sets if possible • Pre-Allocate all Data Files, Including TempDB • Estimate Projected DB Size and Divide by # Files to get the pre-allocation size for each file • Leave “AutoGrow” enabled, but don’t rely on it • Pre-Allocation to prevent AutoGrow • Set AutoGrow to 10% or logical MB/GB value based on projected databse Size

Storage Architecture • Data / Log File Spindle Priority

SQL Tuning Tidbits • SQL Instant Initialization • Run SQL As Domain User with either… • Local Admin • Grant “Perform Volume Maintenance Tasks” • TempDB Pre-Allocation to 10% Largest DB • SAN vs DAS vs NAS (Don’t Overshare!) • Host Bus Adapter (HBA) Configuration • NTFS Allocation Unit Size: 64K • Enable Locked Pages in Memory (SQL Std.) • Don’t skip on RAM!

Remote BLOB Storage SharePoint 2003 • What’s this ECM thing? • Interesting workarounds • API access was problematic SharePoint 2007 • SP1 Brings us EBS Provider • - BLOBs are orphaned during edit/save • - Orphan cleanup is resource intensive • Externalization happens on the WFE (reduced RPS) • Future support of EBS API is not guaranteed SharePoint 2010 Long Live RBS - Transactional consistency supports “VETO” - Transactional consistency allows for UPDATE - Orphan cleanup uses SQL Indexes - Transparent to the SharePoint API - RBS is the best option for future support

Remote BLOB Storage SharePoint WFE 7. Back to User SharePoint Object Model 2. Enforce Business Logic 1. Save Request RBS Client Library Relational Access 3. Save Blob 5. Return BLOB ID 6. Save Metadata & BLOB ID BLOB Store Provider Library 4. Write Blob SQL Server Blob Store Content DB Config DB

RBS Requirements • SQL Server 2008 R2 November CTP • Any Version, even SQL Express • FILESTREAM RBS Provider • Updated version dated November 1! • http://go.microsoft.com/fwlink/?LinkId=177388

Remote BLOB Storage demo…

Performance and Control SharePoint 2003 • Column Indexes were not possible • Database Indexes were not supported SharePoint 2007 • Column Indexes (10) could be configured via the UI • End users could impact performance with poor performing • list views SharePoint 2010 • Database optimizations allow far more items in a list • Support for (20) Multi-Column Indexes • Resource intensive operations can be limited or disallowed • during production hours • Large query thresholds • Blocking Operations • Can be overridden via the Object Model • Can configure an unblocked “window”

Scalable Taxonomy Design • Targeted Limits • Tens of Millions of Documents/Items in a List • 5000 Item View/Query Result Size • 100 Million Items per SP2010 Search Index • 1 BILLION Items in FAST For SharePoint Index • 150,000 Site Collections per Web Application • 50,000 Site Collections per Content DB • 100GB Content DB Size is SOFT LIMIT! • Recommend for Collab or Fast Backup/Restore SLA • Some archival type Content DBs exist at near 1TB!

Scalable Taxonomy Design • Enabling 100 Million • Place large Collaboration Site Collections (20GB+) in their own content database • Break Up Archive/Records Site Collections by Year or, if necessary, Content Type and Year • AVOID Item Level ACLs!!! • Release to Metadata Based Folder Structures as a workaround • Use Content Type Syndication to facilitate multiple Site Collections of the same type • Use Content Organizer as a “Drop Zone”

Content Organization demo…

Search… A Complete Story SharePoint 2003 • WSS CAML Only • SPS Shared Services yielded decent full text results SharePoint 2007 • WSS 3.0 SiteDataQuery allowed search across lists/sites • MOSS Search added Managed Properties • FAST ESP for SharePoint was a late player SharePoint 2010 • Microsoft SharePoint Foundation Search • Site Collection Scope | No Redundancy | 10 Million • Microsoft Search Server Express 2010 • Extended Features| No Redundancy | 10 Million • Microsoft SharePoint 2010 Search / Search Server • Extended Features | Scale Out | Redundancy | 100 Million • Microsoft FAST Search Server 2010 for SharePoint • Extreme Scale | Redundancy | Doc Processing Pipeline • BILLIONS of documents!

Search… A Complete Story • SharePoint Server 2010 / Search Server • Multiple Crawl Servers (Scale Out/Redundancy) • Crawl Servers comprised of stateless Crawlers • Multiple Crawlers improve crawl performance • Multiple Crawl DBs support more Crawlers • Crawl DB is separated from Property DB • Index is comprised of multiple Index Partitions that can be mirrored on different Query Servers • Multiple Index Partitions improve Query Performance

Search… A Complete Story Cool… What can it do?

Search… A Complete Story • FAST Search Server 2010 for SharePoint • Extreme Scale and Performance • Custom Relevancy and Navigation Tuning • Tune Performance for content volume, query volume, crawl pipeline performance and query speed • Uses SharePoint 2010 Query Servers • Bolts on FAST Servers for additional processing • Add server ROWS for query performance or COLUMNS for crawl performance • Can scale to support BILLIONS of items!

10 million, 100 million, 1 Billion?

In Review… • Storage is the KEY to Performance • RBS reduces Content DB Size and facilitates large repositories • SharePoint governs end-user operations • Content Type Publishing and Content Organization help balance database loading • Search solutions now handle the entire range of corpus possibilities • 10 million is easy, 100 million can be done, 1 BILLION is theoretically possible!

More… http://www.houberg.net @rhouberg http://www.knowledgelake.com/whitepaper

Thank you sponsors!!

2 HP Netbook’s Also Tons of books 2 thinkgeekgiftcards for $100 Telerikrad controls set 2 licenses of essential user interface studio 1 webcast from critical path Microsoft Zune

Architecting for Scale in SharePoint 2010

Architecting for Scale in SharePoint 2010

Presentation Transcript

SharePoint 2010

BCS in SharePoint 2010

Best Practices for Building your Website for Scale with SharePoint 2010

Best Practices for Building Your Website for Scale with Microsoft SharePoint 2010

SharePoint 2010

SharePoint 2010

SharePoint 2010

Sharepoint 2010

SharePoint 2010

Architecting the Network for SharePoint 2007

SharePoint 2010

Large-scale SharePoint Architecture

SharePoint 2010

Developing Applications for SharePoint 2010

SharePoint 2010

SharePoint 2010

SharePoint 2010

SharePoint 2010

FAST Search for sharepoint 2010