1 / 48

Search Topology and Optimization

April 12, 2013. Search Topology and Optimization. Mike Maadarani SharePoint Architect. Bio. Mike Maadarani App Dev and Architecture for over 18 years (15 Years Microsoft, 3 Years with the “Other Guys”) Business focused on Enterprise Content Management, Publishing Sites, & Search

pooky
Download Presentation

Search Topology and Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. April 12, 2013 Search Topology and Optimization Mike Maadarani SharePoint Architect

  2. Bio.. • Mike Maadarani • App Dev and Architecture for over 18 years (15 Years Microsoft, 3 Years with the “Other Guys”) • Business focused on Enterprise Content Management, Publishing Sites, & Search • Technology focused on SharePoint, SQL Server and SharePoint Integration • Architect, trainer, and presenter • Blog: www.maadarani.com • mike@maadarani.com; @mikemaadarani

  3. Architecture and Resource Utilization Agenda Configuring SSA and PS Topology Scenarios Relevancy, Query Builder, & Optimization SharePoint 2013 Search Overview Hybrid… Say What? Closing and Q&A

  4. Search in 2010 SharePoint 2010 Search Service Application WFE Query Component Query Engine User Property Store (SQL) Crawl Indexing Engine Crawl Component Search Admin Content

  5. FAST Search for SharePoint 2010 FAST back-end components (managed separately) FAST Content SSA FAST Query SSA WFE Query Engine Query Pipeline • Extensibility: • Sandbox • Entity Extraction User Indexing Engine Content Pipeline Crawl Analysis Engine Search Admin Content

  6. … In SharePoint 2013 Entire index on local disk SharePoint 2013 Search Service Application • Extensibility: • Web callout • Entity Extraction Index Component WFE Query Engine Query Pipeline User Crawl Component Content Processing Component Query Processing Component ContentPipeline Indexing Engine Crawl Admin Component Analytics Processing Component Analysis Engine Search Admin Content Link/query analysis & recommendations Separate crawl and indexing Property Store (SQL)

  7. SharePoint 2013 Search Architecture Public API Search topology components Content Query ContentProcessing Index FAST Search Index QueryProcessing WFE Crawl API Content User HTTP File shares SharePoint User profiles Lotus Notes Documentum Exchange folders Custom - BCS SharePoint SP Apps Devices Non-SP UX Analytics Processing Crawl Link Search Admin Analytics Reporting SearchAdmin

  8. Why Search is so important? FAST I just uploaded a document. Make it searchable, quick!

  9. Why Search is so important? EASY

  10. Why Search is so important? EASY

  11. Why Search is so important? Search Driven Applications

  12. Why Search is so important? Search Everything I can find ALL of Rob Ford’s hidden videos!

  13. Where does Search live in the farm? Windows services SharePoint Search Host Controller service Runtime/lifecycle control of search components (except crawler)  hostcontrollerservice.exe SharePoint Server Search service Crawl Component mssearch.exe mssdmn.exe Processes Noderunner.exe Runtime environment for search components (except crawler) Admin entities Search Service Instance: Provisioning of the search service on each box Search Service Application: SharePoint Configuration entity Still there, but only Crawl Component hostcontrollerservice.exe msseearch.exe mssdmn.exe Search Runtime Environment Host Controller CrawlComponent SharePoint App Server noderunner.exe noderunner.exe noderunner.exe noderunner.exe noderunner.exe AdminComponent Query ProcessingComponent Content ProcessingComponent IndexComponent Analytics ProcessingComponent

  14. Where do I host my components?

  15. Query processing component (QPC) CPU load Driving factors QPS Query transformations Network load Driving factors Number of index partitions Size of queries and results Example: 20 index partitions @ 20 qps=> 200/100 Mbit/s in/outbound http://social.technet.microsoft.com/wiki/contents/articles/16002.sharepoint-2013-capacity-planning-sizing-and-high-availability-for-search-in-spc172.aspx

  16. Index component CPU load Driving factors QPS and item count Guidelines per index component @ 2 GHz CPU 1M items: 5 QPS per CPU core 5M items: 2 QPS per CPU core 10M items: 1 QPS per CPU core Disk load Driving factors QPS and item count New content invalidates caches Disk size: 500GB @ 10M items per index component

  17. Crawl component CPU load Driving factors Documents per second Link discovery Crawl management Network load Driving factors Downloading items from content sources Passing items on to CPC Disk load All documents are temporarily stored in data folder

  18. Content processing component (CPC) CPU load Driving factors Documents per second Document size and complexity Feature extraction Estimate: 5-10 DPS per CPU core Network load Driving factors Documents per second Document size

  19. Analytics processing component (APC) CPU load Driving factors Number of items Site activity Disk load Local disk used for temporary storage Bulk load, primacy concern is load isolation Network load Same as for CPU load PLUS: Network traffic increases when distributing APC across multiple machines

  20. Search administration component Low CPU and network load Load increase with more components in the search topology

  21. Create your SSA $SSADB = "SharePoint_Demo_Search" $SSAName = "Search Service Application SPS Boston" $SVCAcct = "mcm\sp_search" $SSI = get-spenterprisesearchserviceinstance -local #1. Start the search services for SSI Start-SPEnterpriseSearchServiceInstance -Identity $SSI #2. Create the Application Pool $AppPool = new-SPServiceApplicationPool -name $SSAName"-AppPool" -account $SVCAcct #3. Create the search application and set it to a variable $SearchApp = New-SPEnterpriseSearchServiceApplication -Name $SSAName -applicationpool $AppPool -databaseserver SQL2012 -databasename $SSADB #4. Create search service application proxy $SSAProxy = new-SPEnterpriseSearchServiceApplicationProxy -name $SSAName" Application Proxy" -Uri $SearchApp.Uri.AbsoluteURI #5. Provision Search Admin Component Set-SPEnterpriseSearchAdministrationComponent -searchapplication $SearchApp -searchserviceinstance $SSI #6. Create the topology $Topology = New-SPEnterpriseSearchTopology -SearchApplication $SearchApp #7. Assign server(s) to the topology $hostApp1 = Get-SPEnterpriseSearchServiceInstance -Identity "SPWFE" New-SPEnterpriseSearchAdminComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1 New-SPEnterpriseSearchCrawlComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1 New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1 New-SPEnterpriseSearchAnalyticsProcessingComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1 New-SPEnterpriseSearchQueryProcessingComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1 New-SPEnterpriseSearchIndexComponent -SearchTopology $Topology -SearchServiceInstance $hostApp1 –IndexPartition 0 #8. Create the topology $Topology | Set-SPEnterpriseSearchTopology

  22. Small Search Topology

  23. Fault tolerant small search topology

  24. Small search farm (up to 10M items) Separate disk for index Sized independently Resources @ 10M items 8x CPU cores 24 GB RAM 800 GB disk

  25. Scaling from small to medium search topology Adm Adm CPC Index Index QPC Index Index Index Index Index Index CPC CPC CPC Crawl Crawl APC QPC APC

  26. Extend your SSA #2. Extend the Search Topology: $hostApp1 = Get-SPEnterpriseSearchServiceInstance -Identity "SPWFE" $hostApp2 = Get-SPEnterpriseSearchServiceInstance -Identity "SPSearch" Start-SPEnterpriseSearchServiceInstance -Identity $hostApp1 Start-SPEnterpriseSearchServiceInstance -Identity $hostApp2 #3. Keep running this command until the Status is Online:Get-SPEnterpriseSearchServiceInstance-Identity $hostApp1 Get-SPEnterpriseSearchServiceInstance-Identity $hostApp2 #4. Once the status is online, you can proceed with the following commands:$ssa = Get-SPEnterpriseSearchServiceApplication$active = Get-SPEnterpriseSearchTopology -SearchApplication $ssa–Active $newTopology= New-SPEnterpriseSearchTopology -SearchApplication $ssa New-SPEnterpriseSearchAdminComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1 New-SPEnterpriseSearchCrawlComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1 New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1 New-SPEnterpriseSearchAnalyticsProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1 New-SPEnterpriseSearchQueryProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1 New-SPEnterpriseSearchIndexComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp1 –IndexPartition 0 New-SPEnterpriseSearchAdminComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2 New-SPEnterpriseSearchCrawlComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2 New-SPEnterpriseSearchContentProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2 New-SPEnterpriseSearchAnalyticsProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2 New-SPEnterpriseSearchQueryProcessingComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2 New-SPEnterpriseSearchIndexComponent -SearchTopology $newTopology -SearchServiceInstance $hostApp2 –IndexPartition 1 #5. Activate the topology: Set-SPEnterpriseSearchTopology -Identity $newTopology

  27. Medium Search Topology

  28. Hybrid Search

  29. Why Hybrid Search? Hybrid SharePoint environment Pieces of content distributed across multiple environments Complexity due to multiple locations Many top level domains requiring knowledge of where to go to locate the most relevant content No single Enterprise Search Center for finding content Lost user productivity and added frustration while trying to locate relevant content

  30. Benefits Provide integrated search results allowing for a single place to find content One Enterprise Search center to reduce User Interface complexity Query all of your SharePoint content at the same time Allow O365 and On-Premises solutions to coexist Provides a solution allowing customers to move to the cloud on their own terms Reduce operation cost Take advantage of newer SharePoint feature updates in O365 Hybrid search solves many problems as data is moving from on-premises to O365

  31. One-way outbound topology Microsoft data center On-premises SharePoint Server 2013 Farm Office365 tenant SharePoint Online Internet Outbound Hybrid search results Local search results only Inbound Site collection WFE SharePoint Server can querySharePoint Online SharePoint Online can NOT query SharePoint On-prem

  32. One-way inbound topology Microsoft data center On-premises DMZ Office365 tenant SharePoint Server 2013 Farm SharePoint Online Internet Reverse Proxy Outbound Local search results only Hybrid search results Inbound Site collection WFE SharePoint Server can NOT query SharePoint Online SharePoint Online can query SharePoint On-prem

  33. One-way inbound topology Microsoft data center On-premises DMZ Office365 tenant SharePoint Server 2013 Farm SharePoint Online Internet Reverse Proxy Outbound Local search results only Hybrid search results Inbound Site collection WFE SharePoint Server can query SharePoint Online SharePoint Online can query SharePoint On-prem

  34. Tweaking Your results

  35. Challenges: Intent Where is my talk Project Plan? Infrastructure Project Are Documents held at the same place? I wonder if there are references from previous projects? There is rarely a single right answer Different people have different intents Query Rules help you handle intents

  36. Authorities: SSA-level configuration Takes ~24hrs to propagate Sites that are important Sites with low intrinsic relevance

  37. Authorities: Connected

  38. Authorities: Connected Setting an authority affects all sites connected through hyperlinks 2 1 1 0 4 3 2 1 4 Sites are weighted by distance to the authority ∞

  39. Query Rules • Tune Search Results • Created at the SSA, Tenant, Site Collection or Site • SSA • Site Collection • Site

  40. Query Rules • Condition • When Do I apply the rule? • Action • What to do when the rule is matched? • Publishing • When should the rule be active?

  41. Query Rules Actions Conditions Show a promoted result Show a block of results Replace the core results with a different query Exact match, beginning or end Ad-hoc or term store dictionary Match a regex (advanced) Is this query more likely aimed at the following source…? Do people mostly click on result of the following type…?

  42. Query Builder Dynamically Ranking Change Part of the query Results Ranking

  43. Query Builder

  44. Configuration in the Conceptual Relevance Flow (WORDS HR, Human Resources) AND (WORDS employees, employed) AND (WORDS quarterly, quarterlies) AND (WORDS report, reports, reported) • Mixed Results for: • HR Employment best bet • HR Employment quarterly report • HR Employment ContentType=reports Query: HR Employment quarterly report Search Web Part Query Processing Engine Thesaurus: HR  Human Resources Best bets: HR Employment /HR/employment Document Collection Dynamic Reordering Rules: Quarterly Report  {prefer docs from http://reports} Query Rule: {Terms} Quarterly Report  {Terms} ContentType=“reports” For all queries: Authorities: Level 1:http://employment Ranking model: {incorporate user ratings}

  45. Create a Query Rule – Hybrid From Result Source drop-down list, select the specified result source Under Query is performed on these sources, if you select “One of these sources”, make sure to select the result source you created

  46. Hybrid Results Results from SharePoint Online Results from SharePoint Server

  47. Session Objective and Takeaways

  48. Thank You! www.maadarani.com, mike@maadarani.com , @mikemaadarani www.slideshare.net/maadarani

More Related