Share point 2013 enterprise search (public)

What we have learned
about SharePoint 2013 and
Enterprise Search
Petter Skodvin-Hvammen Tallak Hellebust

Agenda
• How to run a successful search project
• Architecture and infrastructure learning's
• User experience and search customizations
• How can you crawl thousands of file shares
• Discover associations and enrich indexed content
• What about search relevancy?

HOW TO RUN A SUCCESSFUL
PROJECT

Sprint 0 – goal
Best Solution Business Goals
Technology
User Needs

Sprint 0 – process
Analysis
• User Interviews
• Stakeholder
interviews
• Search Logs
• Existing work and
documentation
Technology
Assessment
• Sources
• Information
Model
• Technology
components
• Architecture
• Scaling
Concept
Development
• Problem Solving
• Information modus
• Mockups
• Clickable concept
demo
• Best practices
• Concept testing
Enterprise
Strategy
• Information
Marketplace
• Achieving
business goals
Final
Report
• Presentations
• Recommendations
• Project plan
• Quickwins

How to run a successful search project
• Sprint 0
• Planning
• Development
• Testing
• Demo
• Deployment

One sprint ahead
• Let the UX-work be one sprint ahead of the
technical team
• Produce a clickable prototype each sprint
• The prototype are a visual presentation of the
product backlog
• The technical team implements the prototype in the
next sprint
Sprint 3
UX (Sprint 4)
Sprint 2
UX (Sprint 3)
Sprint n
UX (Sprint n+1)
Sprint 1
UX (Sprint 2)
UX (Sprint 0)

Infrastructure Needs
Is Microsoft moving into server hardware business?

Index-0
Query
WFE
Doc Proc
Crawling
Central Admin
Enrichment
FRONT
Query
WFE
FRONT
Index-2
Index-1
Index-3
Index-0
Index-2
Index-1
Index-3
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Doc Proc
Crawling
Analytics
Admin
Admin
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Enrichment
Analytics
Doc Proc
Enrichment
Doc Proc
Enrichment
40
Million
Documents
10
Queries /
Second
SQL Server SQL Server
• Admin DB
• Analytics DB
• Crawl DB
• Link DB
• Other SP DBs

Infrastructure Investments
What Spec Count Total
SharePoint Server Virtual Machine 12 12 VMs
CPU 8 cores 12 96 cores
Memory 16 GB 12 192 GB
System Disk 150 GB 12 1,8 TB
Data Disk 450 GB 12 5,4 TB
Disk IO 200 (Indexer) 10 2 000 IOPS
• Physical Servers
• Database Servers
• Load Balancer
• SAN or local disk arrays
• Domain Controller
• Other networking
• Licenses for
• SharePoint Server
• SQL Server
• Windows Server
• CALs/eCALs
• Visual Studio
• Comperio FRONT
• UAT Env
• QA/Test Env
• Dev Envs

We have learned that…
You will need
• Funding!
• Time
• Documentation
• Network
• To automate
Performance will get you
• Add more CPU
• Add more Memory
• Optimize Disk IO
• Balance load vicely
• Tune Distributed cache
• Know your Anti virus

Capacity Test Findings
• Crawl rate decline 1% per million items indexed
• Query latency increase exponentially from 12 million
items indexed per partition
• Database latency insignificant during crawling
• Successfully crawled file shares via symbolic directory
links
• Disk space usage significant lower than expected

Disk Space Usage
Server System Volume (C:) Data Volume (E:)
Used
space
Free space Capacity Used space Free space Capacity
Admin, Crawler, Content Processing, Analytics Processing 33.3 116 149 42 807 849
Query Processing, Index Partition 0 34.4 115 149 270 579 849
Query Processing, Index Partition 1 34.5 115 149 268 581 849
Crawler, Content Processing, Analytics Processing 34.5 115 149 55 794 849
Disk volume Total
Number of servers 4
Data 52
Index 1 077 248
Logs 24 576
MB 1 101 876
GB 1 076
We reduced data volume
from 850 GB to 450 GB
Huge savings in storage costs!
The table above shows measured disk space usage for 31 million items indexed

Database Space Usage
Database Capacity Test
Number of searchable items (in millions) 30
Search Service Application 156
Analytics Reporting 6
Crawl Store 19 151
Links Store 24 316
MB 43 628
GB 43
Table to the left shows
measured database space
usage for 31 million items
indexed

USER EXPERIENCE
&
SEARCH CUSTOMIZATIONS

Display templates
• Content search webpart
– Control, item
• Refinement webpart
– Control, item
• Search result webpart
– Control, group, item, hover

FRONT Search
• Advanced query and result processing
• Highly customizable business logic represented through
reusable tasks and flows
• Lightweight development environment
• Lightweight deployment
• Fully integrated with SharePoint result presentation and
display templates
• Fully integrated with SharePoint security

FRONT Search in SP2013
• Front webpart
– Handles communication between Front and UI
• Front app
– Handles claims security
• Front webservice
– Flow engine

FRONT Search in SP2013
• Javascript events
– QueryIssuingEvent
– ResultReadyEvent
• Search Rest API
– Query, postquery and suggestions
– Json and XML result
– Windows security / claims
– http://host/site/_api/search

FRONT Search <=> Query rules
FRONT Search
• Conditions
– Analyze query
– Analyze request
– Full flexibility
• Tasks (Actions)
– Change query model
– Perform parallel queries
– Full flexibility
• Publishing
– Special conditions case
• Result processing
– Analyze result from a query
– Perform new queries based on
result
– Change order/grouping/content
of result
Query rules
• Conditions
– Six types
• Actions
– Add promoted result
– Add blocked result
– Change query
• Publishing
– When is the rule active

FRONT Search <=> Result sources
FRONT Search
• Source system
– SP 2013
– SP 2010
– FAST ESP
– Lucene/Solr
– …
• Query transformation
– Full control of query model
Result sources
• Source system
– Local SP 2013 index
– Remote SP 2013 index
– OpenSearch
• Query transformation
– Subset of content

Crawl
Admin
Link
Analytics
Reporting
Public API
Unit of scale/role boundary
Custom components
HTTP
File shares
SharePoint
User profiles
Lotus Notes
Documentum
Exchange
folders
Custom - BCS

Search UX Examples has been removed from presentation to preserve client IP
Please contact Petter or Tallak if you like to discuss search user experience

How do you index
millions of documents
in thousands of file shares
in hundreds of locations?
Bonus! Support governance and operations

Challenges
• Max 50 content sources per service application
• Max 100 start addresses per content source
• Max 20 concurrent crawls per service application
• Limit bandwidth usage for specific server locations
• Limit crawler impact within local business hours
• Grant read access to crawler per file share
• Avoid token bloat issues with more than 1000
groups per account
• Manage indexing and crawling of each file shares
with minimum manual effort

A Proven Approach
• Symbolic links in smart folder
structure
impactfilessourceimpactaccountsymlink
• Content Sources per region with
smart start addresses
file://impact/files/source/impact
• Content Enrichment to fix file
paths in results
• Custom application for
managing file shares and
granting access to crawler
• Host aliases for crawler impact
• Custom timer job that synchs
custom lists from custom app
• Custom timer job that
creates/removes symbolic links
• Custom list: Locations
– Map server prefix to content
source
– Map location to schedule
and impact
• Custom List: File shares
– Map share to crawl account
– Map UNC to symlink
– Map share specific metadata

Example Solution
Files in Norway
• Incremental Crawl every 6 hours
• Start address: file://default/files/norway/default
Files in India
• Incremental Crawl every night at 21:00 IST
• Start address: file://reduced/files/india/reduced
Crawl Rules
• file://*/user1/* account=user1
• file://*/user2/* account=user2
Crawler Impact Rules
• Server name: default
• Server name: reduced wait 60 secs
Folders
• files/norway/default/user1/symlink1
• files/india/reduced/user1/symlink4
Custom list: Locations
• Server Prefix: osl
• Content Source: norway
• Crawler Impact: default
Custom list: File Shares
• UNC Path: osl-file01share1hr
• Crawl Account: user2
• Symlink: files/norway/default/user2/symlink3

Discover associations
in your indexed data using
custom entity extractors
Explore how your
indexed data is
associated with terms
often used by your
business
• Examples
– Organization
– Projects
– Customers
– Products

Add metadata or clean up
your indexed data using
custom content enrichment
• Based on where the
items are located, add
info about
– Department
– information owner,
– Security classification
• Lookup name based
on user account
• Remove company
name from title for all
web pages
• Normalize names
• Normalize phone
numbers
• Fix search result link

Synchronize Terms with Search
Spelling and Synonyms Dictionaries
Synchronize
Spelling Inclusion
Synchronize
Thesaurus
«Custom Timer Job» «Custom Timer Job»
SSA

How fast can you find
what you are searching for?
• What should be
indexed?
• What should be
searchable?
• What should be
displayed?
- Relevancy - Recall – Precision -
• How to a weight a
managed property?
• How to change
ranking model?
• How to tune
ranking?

Managed Property Weighting
These are not ordered
by importance!

Change Ranking Model
• The default ranking model
in SP 2013 did not fit us!
– Power Points always won
– Complete matches in site
titles and document titles
were outranked by number
of partial matches in body
– Community sites were
weighted lower than
discussions and posts
We replaced the SP 2013
ranking model with the
SP 2010 ranking model

Tune Ranking Model
Microsoft will soon
release a tool for tuning
ranking models!
1. Select ranking model to tune
2. Select result source to search
3. Add judgement sets
4. Add queries to judgement sets
5. Run queries and evaluate
results
6. Add and tune features
7. Save and publish model

THE END
Petter Skodvin-Hvammen
psh@adgruppen.no
@pettersh
Tallak Hellebust
tallak.hellebust@comperiosearch.com
@titakker

Share point 2013 enterprise search (public)

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (8)

Similar a Share point 2013 enterprise search (public)

Similar a Share point 2013 enterprise search (public) (20)

Share point 2013 enterprise search (public)

Notas del editor