1. What we have learned
about SharePoint 2013 and
Enterprise Search
Petter Skodvin-Hvammen Tallak Hellebust
2. Agenda
• How to run a successful search project
• Architecture and infrastructure learning's
• User experience and search customizations
• How can you crawl thousands of file shares
• Discover associations and enrich indexed content
• What about search relevancy?
4. Sprint 0 – goal
Best Solution Business Goals
Technology
User Needs
5. Sprint 0 – process
Analysis
• User Interviews
• Stakeholder
interviews
• Search Logs
• Existing work and
documentation
Technology
Assessment
• Sources
• Information
Model
• Technology
components
• Architecture
• Scaling
Concept
Development
• Problem Solving
• Information modus
• Mockups
• Clickable concept
demo
• Best practices
• Concept testing
Enterprise
Strategy
• Information
Marketplace
• Achieving
business goals
Final
Report
• Presentations
• Recommendations
• Project plan
• Quickwins
6.
7. How to run a successful search project
• Sprint 0
• Planning
• Development
• Testing
• Demo
• Deployment
8. One sprint ahead
• Let the UX-work be one sprint ahead of the
technical team
• Produce a clickable prototype each sprint
• The prototype are a visual presentation of the
product backlog
• The technical team implements the prototype in the
next sprint
Sprint 3
UX (Sprint 4)
Sprint 2
UX (Sprint 3)
Sprint n
UX (Sprint n+1)
Sprint 1
UX (Sprint 2)
UX (Sprint 0)
11. Infrastructure Investments
What Spec Count Total
SharePoint Server Virtual Machine 12 12 VMs
CPU 8 cores 12 96 cores
Memory 16 GB 12 192 GB
System Disk 150 GB 12 1,8 TB
Data Disk 450 GB 12 5,4 TB
Disk IO 200 (Indexer) 10 2 000 IOPS
• Physical Servers
• Database Servers
• Load Balancer
• SAN or local disk arrays
• Domain Controller
• Other networking
• Licenses for
• SharePoint Server
• SQL Server
• Windows Server
• CALs/eCALs
• Visual Studio
• Comperio FRONT
• UAT Env
• QA/Test Env
• Dev Envs
12. We have learned that…
You will need
• Funding!
• Time
• Documentation
• Network
• To automate
Performance will get you
• Add more CPU
• Add more Memory
• Optimize Disk IO
• Balance load vicely
• Tune Distributed cache
• Know your Anti virus
13. Capacity Test Findings
• Crawl rate decline 1% per million items indexed
• Query latency increase exponentially from 12 million
items indexed per partition
• Database latency insignificant during crawling
• Successfully crawled file shares via symbolic directory
links
• Disk space usage significant lower than expected
15. Disk Space Usage
Server System Volume (C:) Data Volume (E:)
Used
space
Free space Capacity Used space Free space Capacity
Admin, Crawler, Content Processing, Analytics Processing 33.3 116 149 42 807 849
Query Processing, Index Partition 0 34.4 115 149 270 579 849
Query Processing, Index Partition 1 34.5 115 149 268 581 849
Crawler, Content Processing, Analytics Processing 34.5 115 149 55 794 849
Disk volume Total
Number of servers 4
Data 52
Index 1 077 248
Logs 24 576
MB 1 101 876
GB 1 076
We reduced data volume
from 850 GB to 450 GB
Huge savings in storage costs!
The table above shows measured disk space usage for 31 million items indexed
16. Database Space Usage
Database Capacity Test
Number of searchable items (in millions) 30
Search Service Application 156
Analytics Reporting 6
Crawl Store 19 151
Links Store 24 316
MB 43 628
GB 43
Table to the left shows
measured database space
usage for 31 million items
indexed
23. FRONT Search
• Advanced query and result processing
• Highly customizable business logic represented through
reusable tasks and flows
• Lightweight development environment
• Lightweight deployment
• Fully integrated with SharePoint result presentation and
display templates
• Fully integrated with SharePoint security
24.
25. FRONT Search in SP2013
• Front webpart
– Handles communication between Front and UI
• Front app
– Handles claims security
• Front webservice
– Flow engine
26. FRONT Search in SP2013
• Javascript events
– QueryIssuingEvent
– ResultReadyEvent
• Search Rest API
– Query, postquery and suggestions
– Json and XML result
– Windows security / claims
– http://host/site/_api/search
27. FRONT Search <=> Query rules
FRONT Search
• Conditions
– Analyze query
– Analyze request
– Full flexibility
• Tasks (Actions)
– Change query model
– Perform parallel queries
– Full flexibility
• Publishing
– Special conditions case
• Result processing
– Analyze result from a query
– Perform new queries based on
result
– Change order/grouping/content
of result
Query rules
• Conditions
– Six types
• Actions
– Add promoted result
– Add blocked result
– Change query
• Publishing
– When is the rule active
28. FRONT Search <=> Result sources
FRONT Search
• Source system
– SP 2013
– SP 2010
– FAST ESP
– Lucene/Solr
– …
• Query transformation
– Full control of query model
Result sources
• Source system
– Local SP 2013 index
– Remote SP 2013 index
– OpenSearch
• Query transformation
– Subset of content
30. Search UX Examples has been removed from presentation to preserve client IP
Please contact Petter or Tallak if you like to discuss search user experience
31. How do you index
millions of documents
in thousands of file shares
in hundreds of locations?
Bonus! Support governance and operations
32. Challenges
• Max 50 content sources per service application
• Max 100 start addresses per content source
• Max 20 concurrent crawls per service application
• Limit bandwidth usage for specific server locations
• Limit crawler impact within local business hours
• Grant read access to crawler per file share
• Avoid token bloat issues with more than 1000
groups per account
• Manage indexing and crawling of each file shares
with minimum manual effort
33. A Proven Approach
• Symbolic links in smart folder
structure
impactfilessourceimpactaccountsymlink
• Content Sources per region with
smart start addresses
file://impact/files/source/impact
• Content Enrichment to fix file
paths in results
• Custom application for
managing file shares and
granting access to crawler
• Host aliases for crawler impact
• Custom timer job that synchs
custom lists from custom app
• Custom timer job that
creates/removes symbolic links
• Custom list: Locations
– Map server prefix to content
source
– Map location to schedule
and impact
• Custom List: File shares
– Map share to crawl account
– Map UNC to symlink
– Map share specific metadata
34. Example Solution
Files in Norway
• Incremental Crawl every 6 hours
• Start address: file://default/files/norway/default
Files in India
• Incremental Crawl every night at 21:00 IST
• Start address: file://reduced/files/india/reduced
Crawl Rules
• file://*/user1/* account=user1
• file://*/user2/* account=user2
Crawler Impact Rules
• Server name: default
• Server name: reduced wait 60 secs
Folders
• files/norway/default/user1/symlink1
• files/norway/default/user1/symlink2
• files/norway/default/user2/symlink3
• files/india/reduced/user1/symlink4
• files/india/reduced/user1/symlink5
• files/india/reduced/user2/symlink6
Custom list: Locations
• Server Prefix: osl
• Content Source: norway
• Crawler Impact: default
Custom list: File Shares
• UNC Path: osl-file01share1hr
• Crawl Account: user2
• Symlink: files/norway/default/user2/symlink3
35. Discover associations
in your indexed data using
custom entity extractors
Explore how your
indexed data is
associated with terms
often used by your
business
• Examples
– Organization
– Projects
– Customers
– Products
36. Add metadata or clean up
your indexed data using
custom content enrichment
• Based on where the
items are located, add
info about
– Department
– information owner,
– Security classification
• Lookup name based
on user account
• Remove company
name from title for all
web pages
• Normalize names
• Normalize phone
numbers
• Fix search result link
38. How fast can you find
what you are searching for?
• What should be
indexed?
• What should be
searchable?
• What should be
displayed?
- Relevancy - Recall – Precision -
• How to a weight a
managed property?
• How to change
ranking model?
• How to tune
ranking?
40. Change Ranking Model
• The default ranking model
in SP 2013 did not fit us!
– Power Points always won
– Complete matches in site
titles and document titles
were outranked by number
of partial matches in body
– Community sites were
weighted lower than
discussions and posts
We replaced the SP 2013
ranking model with the
SP 2010 ranking model
41. Tune Ranking Model
Microsoft will soon
release a tool for tuning
ranking models!
1. Select ranking model to tune
2. Select result source to search
3. Add judgement sets
4. Add queries to judgement sets
5. Run queries and evaluate
results
6. Add and tune features
7. Save and publish model
Se på hva brukere har behov for, hvilke utfordringer oppstår I hverdagen
Hvilke tekniske muligheter/begrensinger ertilgjengelige
Hvilke mål har bedriften
Development Environment
OS: Windows Server 2008 R2 SP1
CPU: 4 cores
Memory: 8GB -> 16 GB
Disk: Fast disks
Visual Studio 2012
SQL Server 2012 (Max server memory: 1500 MB)
Dedicated search farm for 40 million searchable items and 10 queries per second
Front end server to host your search UI
One index server per 10 million items
20 million items
30 million items
40 million items
Server to host crawling
Analytics processing
Central administration and other sharepoint application services
Query and results processing
Search administration
Document processing
Database server
Load balanced front end and redundant admin and query processing
Index replicas for redundancy and increased throughput
Extra crawl component per 20 M items and redundancy
Cluster or mirror the database server for fault tolerance
Multiple data centers for disaster scenarioes
For advanced query and result processing, put Comperio Front between your search center and REST API
For advanced content enrichment, deploy your content enrichment web services
7,2 TB
Funding
System requirements have increased
Infrastructure investments are massive
There must be a significant PAIN to solve
Time
To analyse requirements
To purchase and setup the infrastructure
To get to know all the new stuff
To build and deploy your customizations
Documentation
We were early adopters -> not much to find on Google, MSDN or Technet
Network
Knowing someone who knows something…
Automation
You will ned to re-install SharePoint
You will re-deploy your solutions
Autospinstaller, custom cmdlets and scripts
Performance
CPU
increased from 4 > 8 cores on dev env
Memory
Increased from 8 GB > 16 GB on dev env (paging)
Increased from 16 GB per SQL Server to 16 GB per database instance
Disk IO
You need enough disk spindles to handle the IO
You need to configure your SAN correct
Opt out of dynamic disk solution
Load balancer
Turn of sticky sessions and trust the distributed cache
Test and tune timeouts
Distributed cache
Configure enough memory
Anti virus
Turn it of
Exclude the index folder ++
The purpose of the search capacity test is to validate the documented and undocumented soft boundaries in Microsoft SharePoint Server 2013, with focus on
maximum number of documents in search partition
maximum number of documents in a crawl database
architecture for crawling a large number of file shares
getting an initial picture of search and crawl performance
Crawled 30 million documents from file shares via symbolic links on crawler server. Tested 20,000 searches per day and used top 300 used search queries from search statistics.
4 server farm with 2 index partitions, 2 crawl component and 1 crawl database.
Slide shows actual numbers with 31 million items indexed
Display templates control which managed properties are shown in the search results, and how they appear in the Web Part. Each display template is made of two files: an HTML version of the display template that you can edit in your HTML editor, and a .js file that SharePoint uses.
Control templates determine the overall structure of how the results are presented. Includes lists, lists with paging, and slide shows.
Item templates determine how each result in the set is displayed. Includes images, text, video, and other items.
Group templates is special for search results and is used for html surrounding grouped items
Hover templates is used for presenting more information on a search result hit. A item template and a hover template have a connection
Hvordan display templates er bygd opp
Control
Group
Item
Hover
API
Enkelt grensesnitt for å spørre SP uten å ha SP-bibliotek
Lett å teste og konsumere
Query rules conditions
Query matches string exactly
Query contains string
Query matches dictionary exactly
Query more common in source
Result type commonly clicked
Advanced query matching
What should be indexed?
Content sources and start addresses
Content types / file types
Crawl rules for exclusions
What parts of the indexed content should be searchable?
Full-text index
Fielded search
Refiners
What should be displayed?
In search suggestions
In search results
In search flyouts