420 likes | 506 Views
Explore effective strategies to handle large content scenarios in SharePoint Server 2007, addressing architecture, challenges, and actionable solutions for database growth, manageability, storage, and availability. Learn how to optimize manageability, limit storage, plan for software boundaries, upgrade hardware/software, and ensure high availability. Discuss content archival strategies, database snapshots, external blob storage, and performance enhancements.
E N D
Dealing with large Content Scenarios in SharePoint Server 2007 Architecture, Challenges, and Strategies Abrar Chisti, Microsoft Corporation
Agenda • Overview • Manageability • Planning • Availability • Case Study • Takeaway’s
Content Database Growth • Use as Document Repository • Multiple versions of documents • 70-95% of size is File Stream • Storage of large Multi Media files • Lack of Governance/Site Quotas • One Large Site Collection • Lack of Planning
Is SharePoint the Right Solution? • SharePoint sites evolve organically. • Database Capacity planning is often overlooked • Limited or no Governance • One or more large content database(s) • Difficulty for IT to maintain • IO Throughput and Latency is effected
Plan for Manageability • Limit Content Database Size to <= 100G • If Content DB Size is > 100G • Use Differential/Incremental Backups • SQL Server 2005/2008 • DPM 2007 • Test & Baseline IO Sub-System • Set DB Auto-growth to Fixed Value • Split Sites in Content DB to multiple Content DB’s
How to Manage Content • Split Content Database • Move Site Collections between Databases • Move Sites into Site Collections (Re-Parent) • May need to promote sub sites to sites • May need to move site collections between web applications • Use OOB or 3rd Party Tools • Stsadm –o export/import • Stsadm –o backup/restore • Stsadm –o mergecontentdb • Content Deployment API (Selective)
How to Limit Storage • Document Libraries • Limit # of Versions. • Archive or Delete Old Sites • Archive or Delete Unused Sites • Impose Site Quotas • Different types of quotas – Small/Med/Large • Take into Consideration Recycle Bin • Manage Lists for Performance
Upgrade Hardware/Software • Ensure Latest SP/Patch • Use Dedicated SQL Server • Use 64 Bit Architectures and 64 Bit OS • Use MS Hardware Recommendations • Use SQL Server connection alias when you configure your farm • Increase Bus Bandwidth
Take Advantage of SQL Server 2008 Capabilities • Performance - Implement database backup compression. • Availability - Implement log stream compression. • Security – Implement Transparent Data Encryption (TDE). • Resource management – Use SQL Server 2008 Resource Governor • Be Aware of DB Migration Considerations
Content Archival/Reduction • Use Database Snapshots • Use Records Repository Implementation • Externalize (BLOB) storage
Database Snapshot • Provides “snapshot” of Content DB at given instant. • Requires Same DB Server Instance • Refers to the Original Database • Uses “Copy on write” mechanism • Need to create Separate Web App.
Remote/External Blob Storage • Reduce Storage Costs • External Blob Storage API • Remote Blob Storage API • SQL Server 2008 has support for RBS • Can write BLOB directly using RBI • http://blogs.msdn.com/sqlrbs/
External Blob Based Solution -BLOB IO is moved to Web Front End -Supports Compression And Encryption Capability
Plan for Software Boundaries • Bottom Up Approach • Plan for SQL Storage • SharePoint Performance Recommendations • # of Site Collections/Content DB • 50,000 • # of Site Collections/Web Application • 150,000 Site Collections • 100 Content DB’s Per Web Application • Use Multiple SQL Servers for Higher Scalability
Storage Architecture • Use Appropriate Disk and SAN interface • SCSI vs IDE vs SATA vs SAS • Consideration – Hot Swap, Multiple IO, Speed, Capacity, Protocol • Use Appropriate Disks and RAID Arrays • Faster Disks/Arrays • Separate Disks for TempDB, ContentDB, and Trans Logs • Multiple Data Files for Large Content and Search DB’s • Distribute files across Disks
Content Database Allocation • SharePoint Allocation of Content DB’s • Pre-Allocate Pool of db’s • Round Robin Scheme between DB’s • Based on Delta between Max sites and Current sites • Example • Site Collection Per Database • Create Database with 100G (using ALTER DB Command) • Leverage Managed Paths
Clustering • SAN or Shared Disks • Use Windows/SQL Clustering for HA • Dedicated Disks or DAS • Use SQL Server Mirroring
Redundancy across Data Centers • Log Shipping • Synchronous Mirroring • Asynchronous Mirroring • SQL Server 2008 Log Compression
Monitoring • Processor: % Processor Time: _Total. On the computer that is running SQL Server, this counter should be kept between 50 percent and 75 percent. • System: Processor Queue Length: (N/A). 2 x #of core CPUs. • Memory: Available Mbytes: (N/A). Monitor this counter to ensure that you maintain a level of at least 20 percent of the total physical RAM available. • Memory: Pages/sec: (N/A). Monitor this counter to ensure that it remains below 100.
Disk Counters • Logical Disk: Disk Transfers/sec • Logical Disk:Disk Read Bytes/sec & Disk Write Bytes/sec • Logical Disk: Average Disk sec/Read (Read Latency)/Avg Disk Sec/Write • Logical Disk: Average Disk Byte/Read/Write • Physical Disk: % Disk Time • Logical Disk: Current Disk Queue Length • Logical Disk: Average Disk Reads/Sec and Logical Disk
Performance Monitoring • Perfmon • Analyze Logs using codeplex tools • Favorite Web Monitoring (3rd Party) solution. • System Center Operations Manager (SC-OM) • SharePoint Monitoring Toolkit • http://blogs.msdn.com/sharepoint/archive/2007/12/10/announcing-new-system-center-operations-manager-2007-packs-for-wss-3-0-and-moss-2007.aspx
Case Study Large Automotive Loan Origination Application
Large Storage Scenario (Phase I) • Ability to house 10.5 million content items (1+TB). • System input with "normal" input load, defined as 27,000 document per day (1 day = 10 hours). • Simulate user load to represent 200 users simultaneously accessing the system to: • Use search to find elements of document metadata. • View a document (scanned TIFF image). • Update elements of document metadata.
Phase II • Ability to house 50 million content items (5+TB). • 35 million TIFF images. • 15 million Microsoft Office documents • Determine the maximum number of users the solution could support. • Users perform the following tasks: • Use search to find elements of document content (full-text) and metadata. • View a document (scanned TIFF image or Microsoft Office document).
Architectural Overview Logical Architecture – Phase I
Takeaway’s • Optimize Performance • Planning & Monitoring • Plan for Scale • Plan for Availability • Plan for Manageability
References • SQL Server Database Optimization • http://technet.microsoft.com/en-us/library/cc263261.aspx • Plan for Software Boundaries • http://technet.microsoft.com/en-us/library/cc262787.aspx • Move Site Collections to new Content Database • http://technet.microsoft.com/en-us/library/cc825328.aspx • Enable SharePoint 2010 to Use Remote BLOB Storage • http://technet.microsoft.com/en-us/library/ee748641(office.14).aspx/ • Content Deployment API (PRIME) • http://msdn.microsoft.com/en-us/library/cc264073.aspx • Integration of SQL Server 2008 and SharePoint • http://msdn.microsoft.com/en-us/library/cc264073.aspx • Use Database Snapshots for Archiving Sites • http://technet.microsoft.com/en-us/library/cc706872.aspx • Configure Availability in SharePoint Farm • http://technet.microsoft.com/en-us/library/dd207311.aspx • Case Study for Large Content Scenario • http://technet.microsoft.com/en-us/library/cc262067.aspx • Scaling Storage Architecture • http://www.knowledgelake.com/whitepaper/Scaling%20SharePoint%202007%20-%20Storage%20Architecture.pdf
Tools Availability • SPUsed Space Info • SPSiteInfo • Content Deployment Wizard • Migrate from other source systems. • Other tools in CodePlex • 3rd Party • Metalogix, Qwest, Tzunami, AvePoint, StoragePoint, Knowledge Lake