1 / 24

So you think you can crawl? Stretching the Boundaries of SharePoint 2013!

So you think you can crawl? Stretching the Boundaries of SharePoint 2013!. Petter Skodvin-Hvammen AD- Gruppen , Norway. Who am I?. www.adgruppen.no. Petter Skodvin-Hvammen. Solutions Architect SharePoint Consultant Search Enthusiast Community Lead @ pettersh - psh@adgruppen.no.

wyatt
Download Presentation

So you think you can crawl? Stretching the Boundaries of SharePoint 2013!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. So you think you can crawl? Stretching the Boundaries of SharePoint 2013! Petter Skodvin-Hvammen AD-Gruppen, Norway

  2. Who am I? www.adgruppen.no Petter Skodvin-Hvammen • Solutions Architect • SharePoint Consultant • SearchEnthusiast • Community Lead • @pettersh- psh@adgruppen.no Oseberg ship - Discovered 1904 in Tønsberg, Norway. Buried by Vikings in 834 AD

  3. Enterprise Search Challenges and Solutions Index thousandsofsources Automateindex management Infrastructuresizing www.sharepointeurope.com Not Included:code/scripts, userexperience, relevancy, governance

  4. The Mission… Enterprise Searchusing SharePoint Server 2013 • 30,000 users • 85 locations in 30 countries • 15,000 dailysearches • 100,000,000 documents(?) • 60 core systems, 2,000 applications

  5. What do we index? 100,000,000 documents 500servers 3,000 fileshares

  6. Where is the data? • Datacenters • Time zones • Bandwidth www.sharepointeurope.com

  7. How canweget it? • Limit bandwidth usage for specific server locations • Limit crawler impact within local business hours • Grant read access to crawler per file share • Avoid token bloat issues with more than 1,015*groups per account *http://blogs.technet.com/b/shanecothran/archive/2010/07/16/maxtokensize-and-kerberos-token-bloat.aspx

  8. How do weoperate it? • File shares are created, changed, and deleted every day using a custom self service solution • File shares are moved between servers every day by automation rules • Manage indexing and crawling of each file shares with minimum manual effort www.sharepointeurope.com

  9. Whatcan SharePoint do? • Max 50 contentsourcesper service application • Max 500 withOctober 2013 CU installed • Max 100 start addressesper contentsource • Max 500 withOctober 2013 CU installed • Max 20 concurrent crawls per service application • Limitation has beenremoved http://technet.microsoft.com/en-us/library/cc262787(v=office.15).aspx#Search

  10. It’s complicated • More data thanwe have space for • It’s located all over theplace • Everythingchanges all ofthe time • Therearelimitations in SharePoint • Someone’s gottamaintainthis • It has to be secure and relevant www.sharepointeurope.com

  11. Whatdidwe do? • Createdlogicalgroupsof file shares • Used symbolic linking fewer content sources Start address \\file00\share\sym01 \\file01\share01 \\file00\share\sym02 \\file02\share03 \\file00\share \\file00\share\sym03 \\file03\share03 www.sharepointeurope.com

  12. Whatdidwe do? • Grouped file sharesbasedon region • One contentsource per region • Incremental crawls everynight crawlingbasedon time zones www.sharepointeurope.com

  13. Whatdidwe do? • Created DNS alias per impactrule in etc/hosts on crawl servers reduced crawler impact www.sharepointeurope.com

  14. Whatdidwe do? • Granted file shareaccess to theaccountincluded in leastgroups • Monitoredgroupmemberships • Grouped file shares by crawl account • Crawl rulesmatched folder structure managed pool of crawl accounts SP\spcrwl01 SP\spcrwl02 file://.*/spcrwl01/.* file://.*/spcrwl02/.* Include Include www.sharepointeurope.com

  15. The biggerpicture • Folder structure: • Start addresses: <content source>/<crawler impact>/<crawl account>/<symbolic link> file://<crawler impact>/<content source>/<crawler impact>

  16. How didwemanagethis? custom timer job to get list of file sharesto crawl from self service portal self service portal for enablingindexingof file shares custom timer job for creatingand removingsymbolic links customsolution for grantingaccess to crawl accounts AUTOMATION custom lists for mappingserver to contentsource, scheduleand impact, shares to crawl accountsand metadata, UNC to symlink custom web service integration in self service portal contentenrichment service forreplacingsymlinks in pathswithactual file paths www.sharepointeurope.com

  17. Example: Self Service Portal Example: Custom Lists Title: European SharePoint Conference Title: European SharePoint Conference Owner: Petter Skodvin-Hvammen Owner: Petter Skodvin-Hvammen Business Area: Consulting Business Area: Consulting Classification: Internal Classification: Internal Type: Project Type: Project UNC Path: Assignedautomatically UNC Path: \\file01\share01 Crawl Account: Assignedautomatically Crawl Account: SP\spcrawl01 Symlink: \\default\europe\default\spcrwl01\e5dc12a41d Save Cancel Location: europe (server file01 is located in Oslo DC) Bandwidth: 5Mbps www.sharepointeurope.com

  18. 40 10 WFE WFE Query Query Admin Admin Million Documents Queries / Second Caching Index-0 Index-1 Index-0 Index-1 Caching Doc Proc Doc Proc Doc Proc Doc Proc Enrichment Enrichment Enrichment Enrichment Crawling Index-2 Index-3 Index-2 Index-3 Crawling Analytics Doc Proc Doc Proc Doc Proc Doc Proc Analytics Doc Proc Central Admin Enrichment Enrichment Enrichment Enrichment Doc Proc Enrichment Enrichment • Admin DB • Analytics DB • Crawl DB • Link DB • Other SP DBs SQL Server SQL Server

  19. Capacity testing Purpose • Crawlingofsymbolic links • Scalingofvirtualmachines • Sizingof disk space • VerifyMicrosoft’sadvises Approach • 4 server farm with 2 partitions • 8 vCPU, 16 GB RAM, 850 GB • Crawl 10 file shares (3.7M files) • Replaytop 300 queries • Apache JMeter www.sharepointeurope.com

  20. Capacity testing – findings • Crawl rate declined 1% per million items indexed • Query latency increased exponentially from 12 million items indexed per partition • Database latency was insignificant during crawling • Successfully crawled file shares via symbolic directory links • Disk space usage was significantly… lower than expected • Reduced data volume from 850 GB to 450 GB • 40+ servers => huge cost savings www.sharepointeurope.com

  21. Infrastructure – VM sizing Dedicated ESX Cluster • 14 x VM for SharePoint 2013 • 4 physicalmachines • 4 x 32 = 128 CPUs • 4 x 56 = 1024 GB memory • HA maxutiliization = ¾ • 3 x 32 = 96 CPUs • 3 x 56 = 768 GB memory • CPU and Memory can be over-commited • CPU over-commited 1,34 (1,78 if one physical host fail) • VM’s must wait for physical CPU Wait time for 8 cpu = 2 x 4 cpu • Mitigation: • Reduce allocated virtual CPU, or • Increase physical CPU • Memory factor 0,44 (0,59) • Reserved and lockedmemoryprevents HA failover www.sharepointeurope.com

  22. Infrastructure – VM tuning Peak and average CPU usage is calculated over 30 days www.sharepointeurope.com

  23. Summary • Indexingthousandsofcontentsources • Automation for rapid changingindexrequirements • Sizingtheinfrastructure for performance and HA www.sharepointeurope.com

  24. Questions? @pettersh petter.skodvin-hvammen@adgruppen.no http://linkedin.com/in/petterskodvin

More Related