This document discusses building a scalable search architecture in SharePoint 2013. It begins with an overview of the speaker and agenda. It then addresses common misunderstandings around search architecture before explaining the logical components of search - crawl, content processing, analytics processing, index, administration, and query processing. It provides examples of how to design the architecture based on assessment of content size and user load. Finally, it offers guidance on implementing and verifying the search architecture using PowerShell.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Building a scalable search architecture in share point 2013
1. Building a scalable Search
architecture in SharePoint
2013
Thuan Nguyen, SharePoint MVP
thuan@outlook.com
@nnthuan
Vietnam SharePoint User Group
2. About Me
SharePoint Practice Lead, Solution Architect – FPT Software
Microsoft SharePoint MVP (2011, 2012, 2013, 2014)
Used to love start-up with two SharePoint-based products.
Now focus on building a SharePoint core standard and
framework for Singapore Government.
Vietnam SharePoint User Group
3. Agenda
Common Misunderstandings
Architecture & Topology
Practical Guide
Question & Answer
Vietnam SharePoint User Group
For those who are looking into having multiple Search servers
handling millions documents.
4. Common Misunderstandings
For High Availability, create two Search Service
Applications.
There is only one machine playing Search role in your farm
Scale out Search architecture by adding more servers.
Start Search service is to make search functionality work.
Vietnam SharePoint User Group
5. Architecture & Topology
Logical Architecture
Crawl
Content Processing
Analytics Processing
Index
Administration
Query Processing
Understand each component will help better design a scalable &
maintainable Search for your organization.
Vietnam SharePoint User Group
6. Crawl Component
Responsible for crawling content from
different sources
SharePoint sites
Exchange
Lotus Notes
Documentum
HTTP Website
Deliver crawled items to content
processing component.
Crawl database stores information
about crawl items and crawl history
Vietnam SharePoint User Group
dbo.MSSCrawlHistoryLocal
7. Content Processing
Processes crawled items and passes these
items to the index component
Performs linguistic processing at index
time (e.g. language detection and entity
extraction)
Writes information about links and URLs
to the Link database
Vietnam SharePoint User Group
dbo.MSSQLogResultDocs
8. Analytics Processing
Vietnam SharePoint User Group
Analyzes crawled items and how users interact
with search results.
When an user does an action (e.g. view a page)
the event is collected in usage files on the WFE’s
and regularly pushed to event store where they
are stored until processed
Results are then returned to the Content
Processing Component to be included in the
search index
dbo.SearchReportData
9. Index Component
Vietnam SharePoint User Group
Receives the processed items from the content
processing component and writes them to the
search index.
Handles incoming queries, retrieves information
from the search index, and sends back the result set
to the query processing component.
10. Index Architecture
Vietnam SharePoint User Group
An index partition is a logical portion of
the entire search index.
Each partition is served by one or more
index components (or “replicas”)
In a partition there’s only one
primary (or “Active”) replica who’s the
only one that writes data in a
partition
Other secondary (or “passive”) replicas are
there for fault tolerance and increased
query throughput
Index can scale in both horizontal
(partitions) and vertical (replicas) ways
Partitions can be added but NOT
removed
Secondary
Replica 1
Secondary
Replica 2
Secondary
Replica 1
Secondary
Replica 2
Secondary
Replica 3
Secondary
Replica 2
Secondary
Replica 1
Partition #1 Partition #2 Partition #3
Secondary
Replica 3
Secondary
Replica 3
Servers
Index Servers
1, 2 & 3
Index Servers
4, 5 & 6
Index Servers
7, 8 & 9
Index Servers
10, 11 & 12
11. Query Processing
Analyses and processes search queries and results.
The processed query is then submitted to the index
component, which returns a set of search results for the
query.
Vietnam SharePoint User Group
12. Search Administration
Vietnam SharePoint User Group
Search Admin Component
Runs number of system processes required
for search
Is responsible for search provisioning and
topology changes
Coordinates search components – Content
Processing, Query Processing, Analytics, and
Indexing.
Search Admin DB
Stores search configuration data:
Topology
Crawl rules
Query rules
Managed property mappings
Content sources
Crawl schedules
Stores Analytics settings
dbo.MSSConfiguration
14. Practical Guide- Assessment
Don’t hastily touch your
SharePoint. Leave it alone!
Think about your content
What are your content sources
(SharePoint document library,
Exchange, File Server..)?
How much of content you want
to search? (e.g. 100,000
documents)
Assess the number of concurrent
users.
Search database sizing
Vietnam SharePoint User Group
15. Practical Guide- Assessment
Vietnam SharePoint User Group
Sizing factor:
Total Database Size
Total Index Size
Query Component Index Size
Disk Storage
Link Database
Search Admin Database
Total Crawl Database Size
Total Crawl Database Log Size
Analytics Database Size
=> Total database size for Search
Microsoft already published the formula for these things above.
16. Practical Guide - Assessment
Vietnam SharePoint User Group
What is exactly High Availability for Search?
Business language: Search doesn’t stop end
users searching something.
Technical language: All search logical
components and Search databases must be
functional as always.
Two or more Search service applications
Two or more Search servers
17. Practical Guide- Design
Don’t hastily touch your SharePoint. Leave it alone!
Start with one machine hosting all components
Vietnam SharePoint User Group
18. Practical Guide - Design
Vietnam SharePoint User Group
Don’t hastily touch your SharePoint. Leave it alone!
Think about two machines for Search but different
set of components
Redundant set of (Query + Crawl). If one goes down, Query
component in another machine still keeps functioning.
19. Practical Guide - Design
Vietnam SharePoint User Group
Don’t hastily touch your
SharePoint. Leave it alone!
Do you need three machines for
Search?
Speed up Query component?
Reduce crawling time?
Balance CPU utilization in
machine?
With more three machines, go to start an assessment of
components in terms of the usage of hardware resources
20. Practical Guide - Design
Component CPU Network Disk RAM
Crawl Component MEDIUM HIGH MEDIUM MEDIUM
Content processing (CPC) HIGH MEDIUM HIGH
Analytics processing (APC) MEDIUM HIGH MEDIUM MEDIUM
Index Component HIGH MEDIUM HIGH HIGH
Query processing (QPC) MEDIUM MEDIUM MEDIUM
Search Admin Component LOW LOW LOW
Vietnam SharePoint User Group
Microsoft Ignite – BK3176
If logical architecture requires scale-out, consider utilization
21. Practical Guide - Design
Volume of content Sample Search Architecture
< 1 mil items Single-server Search farm
1 mil – 5 mil Two-server Search farm
5 mil – 10 mil Small Search farm (3-4 servers)
10 mil – 40 mil Medium Search farm (5-6 servers)
> 40 mil Large Search farm
Vietnam SharePoint User Group
22. Sample Search Architecture
Vietnam SharePoint User Group
Handle number of different content
sources (with 20 custom applications)
Nearly 1 million items currently
Full crawl takes 2 hours
Serving for nearly 20,000 users with
500 concurrent users.
23. Sample Search Architecture
Vietnam SharePoint User Group
Optimize search query to serve
hundreds of concurrent users.
Handle million of documents (approx.
5 TB)
25. Central Administration doesn’t help much.
PowerShell is your friend
1. Create Search Service Application
2. Clone existing topology
3. Modify Search component based on your designated
architecture
4. Assign Index component and location
5. Activate the new Search topology
Vietnam SharePoint User Group
Practical Guide- Implementation
Build Search farm with PowerShell http://bit.ly/search_multi_server_PS
26. $app1 = "APP-Server-01"
$app2 = "APP-Server-02"
$SearchAppPoolName = "SharePoint_SearchApp"
$SearchAppPoolAccountName = "TestDomainSPSearchPool"
$SearchServiceName = "SharePoint_Search_Service"
$SearchServiceProxyName = "SharePoint_Search_Proxy"
$DatabaseName = "SharePoint_Search_AdminDB"
#Create a Search Service Application Pool
$spAppPool = New-SPServiceApplicationPool -Name $SearchAppPoolName -Account
$SearchAppPoolAccountName -Verbose
#Start Search Service Instance on all Application Servers
Start-SPEnterpriseSearchServiceInstance $App1 -ErrorAction SilentlyContinue
Start-SPEnterpriseSearchServiceInstance $App2 -ErrorAction SilentlyContinue
Start-SPEnterpriseSearchQueryAndSiteSettingsServiceInstance $App1 -ErrorAction SilentlyContinue
Start-SPEnterpriseSearchQueryAndSiteSettingsServiceInstance $App2 -ErrorAction SilentlyContinue
#Create Search Service Application
$ServiceApplication = New-SPEnterpriseSearchServiceApplication -Partitioned -Name $SearchServiceName
-ApplicationPool $spAppPool.Name -DatabaseName $DatabaseName
#Create Search Service Proxy
New-SPEnterpriseSearchServiceApplicationProxy -Partitioned -Name $SearchServiceProxyName -
SearchApplication $ServiceApplication
Vietnam SharePoint User Group
Practical Guide- Implementation
27. Practical Guide- Implementation
#We need only one admin component
New-SPEnterpriseSearchAdminComponent –SearchTopology $clone -SearchServiceInstance $App1SSI
#We need two content processing components for HA
New-SPEnterpriseSearchContentProcessingComponent –SearchTopology $clone -SearchServiceInstance $App1SSI
#We need two analytics processing components for HA
New-SPEnterpriseSearchAnalyticsProcessingComponent –SearchTopology $clone -SearchServiceInstance $App1SSI
#We need two crawl components for HA
New-SPEnterpriseSearchCrawlComponent –SearchTopology $clone -SearchServiceInstance $App1SSI
New-SPEnterpriseSearchCrawlComponent –SearchTopology $clone -SearchServiceInstance $App2SSI
#We need two query processing components for HA
New-SPEnterpriseSearchQueryProcessingComponent –SearchTopology $clone -SearchServiceInstance $App1SSI
New-SPEnterpriseSearchQueryProcessingComponent –SearchTopology $clone -SearchServiceInstance $App2SSI
Vietnam SharePoint User Group
$clone = $ServiceApplication.ActiveTopology.Clone()
$App1SSI = Get-SPEnterpriseSearchServiceInstance -Identity $app1
$App2SSI = Get-SPEnterpriseSearchServiceInstance -Identity $app2
29. Practical Guide- Verification
Vietnam SharePoint User Group
Central Administration can help
PowerShell
Get-
SPEnterpriseSearchStatus
Get-
SPEnterpriseSearchTopolog
y
Search PowerShell http://bit.ly/PowerShell_SP2013_Search
30. Helpful References
SharePoint 2013: SharePoint and Enterprise Search Survival
Guide http://bit.ly/search_survival_guide
Plan enterprise search architecture in SharePoint Server 2013
http://bit.ly/plan_for_ent_search
Search Architecture for SharePoint 2013 http://zoom.it/Tsuy
Vietnam SharePoint User Group