4. Enterprise Search Product Portfolio
Solutions for Solutions for
Internet Business Business Productivity
FAST Search FAST Search
for SharePoint Internet Sites for SharePoint
Integrated
with
SharePoint
SharePoint Server for
Internet Sites SharePoint Server
Stand-alone FAST Search FAST Search
For Internet Business for Internal Applications
Entry-Level Search Server
Solutions
Search Server
Express
6. End-User UI
• Out-of-box refinement
– Refine over key results properties
– Metadata, taxonomy and social tags based results
refinement
– Easy to extend over custom properties
• One-stop Search Center
– Scopes, web parts, best bets, top answers ,
advanced search
– Query federation brings together results from all
over - native support for OpenSearch
• Core search experience
– Improved did you mean suggestions
– New pre-query and post related query suggestions
– “View in browser” link (for most office docs)
– Improved query syntax
8. New Query Syntax
• Support for Boolean operators for FreeText
queries and Property queries
– (“SharePoint Search” OR “Live Search”) AND
(title:“Keyword Syntax” OR title:”Query Syntax”)
• Prefix matching support for keywords and
properties
– Micro* author:bill*
• Improved operator support for property
restrictions
– =, >, <, <=, >=
– Can create range refinements
9. Great Search Experience OOB Win7
Connector
Get more relevant
results Related
through a search center with searches
hit highlighting, results
summaries, related queries,
and enhanced query syntax
Find information
faster
with metadata-driven
refinement, query suggestions,
search scopes, and federated
Launch in Office
results which help pinpoint
Web Apps
information
Search from anywhere Refinement
Including mobile and desktop Federated
panel results
integration; Office Web Apps
speed access to results;
enhancements for multi-lingual
10. Search is Social
• People finding experience
– Front door to the office social network
– Better expertise & interest search
• Email mining to bootstrap profiles with
interests and colleagues
– “Address book style” search
• Phonetic name matching
• Nickname matching
– Relevance models tuned specifically for
people search
– Metadata refinement, better hit
highlighting, recently authored content
11. Search is Social
• Social behavior drives search quality
– Search click through behavior drives
relevance ranking
– Query suggestions mined from search
logs
– Social tagging influences relevance
ranking
– Self search - to drive people to
participate content
– Social definitions extracted from
indexed content
12. Amplify the Impact of Knowledge & Expertise
Connect with expertise
using improved matching from Refine by focus,
mined Outlook mailbox data and expertise, and
other attributes Phonetic and
SharePoint My Site profiles nickname matching
Improve relevance
with use
based on how people tag content
in SharePoint and on click-
through of search results
Expertise
Recently
identification
authored content
Find people
through nickname and phonetic
matching, people specific
refinement, tuned relevance
models
13. Search Use in Social Data Delivery
• Search is used for data retrieval and trimming in
other SharePoint social features
Feature Action Query
My Site Host home What’s New web part Retrieves up to 40 recent
page activities from colleagues
Profile Page Recent Activities web part Retrieves up to 10 recent
(person.aspx) activities for user
Tags and Notes Activities for Month web part Retrieves up to 40 tags or
page notes based on activities
for the specified month for
user
Outlook Social OSC synchs every hour for every Retrieves all recent (since
Connector user. The response sends updates the last synch) activities
for colleagues since the last time from colleagues
OSC synched
14. Search Depends on Social
• Some of the functionality in Search also depends
on data from Social
• Only difference between SS and FS for social FS
doesn’t index social tags
Feature SS FS
Core Results Page showing social tags (up to 5) for search results
Core Results Page Refinement by social tags
Core Results Page Refinement by Taxonomy data / Authoritative tags
All features on the people search tab - searching for people, searching
for expertise, refining by people properties etc.
16. Go Beyond the Search Box Sorting on any
property
Visual Best Bets
Refinement with
counts on any
property
Scrolling PowerPoint
Previews
Thumbnails
17. Go Beyond the Search Box
• Site admin/Search admin control
• Visual Best Bets
• Promote/Demote documents and sites
• UI extensibility (web parts, ..)
• Relevancy profiles and parameters
• User Context parameter & admin
• End User Control
• Sorting, Ranking, and Navigation
• Admin-enabled controls
• Linguistics and term control
• Keywords, phrases, synonyms, spellcheck
• Multilingual searching control
• Lists for metadata extraction
• Search similar (based on document vectors)
• Index based did you mean suggestions
18. User Context Matters
Renee Lo, Engineer Alan Brewer, Sales
What should I know about What should I know about
implementing ERP? selling ERP consulting?
19. Go Beyond the Search Box
Afrikaans Hausa Pashto, Pushto
Albanian Hebrew Persian
Arabic Hindi Polish
• Can search in any language Armenian
Azerbaijani
Hungarian
Icelandic
Portuguese
Punjabi
Basque Indonesian Rhaeto-Romance
• 84 languages detected to allow language-specific handling
Bengali,Bangla Irish Romanian
Bosnian Italian Russian
• Lemmatization improves recall Breton
Bulgarian
Japanese
Kannada
Sami (Northern)
Serbian
Catalan Kazakh Slovak
(‘better’ includes ’good’) Chinese-S Kirghiz Slovenian
Chinese-T Korean Sorbian
• Phrase search includes stopwords Croatian,
Czech
Kurdish
Latin
Spanish
Swahili
(“a room with a view”) Danish
Dutch
Latvian, Lettish
Letzeburgesch
Swedish
Tagalog
English Lithuanian Tamil
– Only nouns and adjectives are expanded (higher precision)
Estonian
Faroese
Macedonian
Malay
Telugu
Thai
(‘book’ -> ‘books’, not ‘booked’) Finnish
French
Malayalam
Maltese
Turkish
Ukrainian
Galician Maori Urdu
Georgian Marathi Uzbek
German Mongolian Vietnamese
Greek Norwegian Welsh
Greenlandic Norwegian-B Yiddish
Gujarati Norwegian-N Zulu
22. Architecture and Design
• Deployment and management
• Scale-Out architecture
– Introduction to concepts
– Scale-out features and options
• Other engine enhancements
23. Search Center - UI for users to issue queries and
interact with results
Query Object Model OpenSearch
Source
Query Servers- Accept query requests from users
and return results
Query Servers
Query Federation - Return results from non-
Index
SharePoint Indexes Partition
Indexing - Extract information from items to
enable efficient matching Indexer
Index Partition - Subset of the overall index
Crawling -Traverse URL space to record items in
Crawler
searchcatalog
Connectors -Know how to process different
content sources
Content Sources - Host the content we want
to return in main results Content Content Content
24. MOSS 2007 search scale-out
“The whole index”
Query
“Bottleneck”
“Single point of
Indexer
Query failure”
“Bottleneck”
25. SharePoint Search 2010 Scale-out Multiple Index Partitions
Stateless Crawlers
Crawl Distribution
Admin Admin
Database Component
Query Query
Query Mirroring
Query Components
Multiple Property DBs
“The whole index”
Admin Database +
Admin Component
Query
Query
“Bottleneck”
“Single point of
Crawler
Indexer Crawler failure”
Indexer
Crawler Crawler
“Bottleneck”
26. Search First Migration
• Begin Migrating MOSS 2007 with SharePoint 2010
Search
– Good approach for most cases
• User’s content kept in MOSS but User search queries handled
by SharePoint 2010
• Can Be SharePoint Search or FAST Search Server 2010 for
SharePoint
– Flexible approach
• Can add other services later or as needed
• Can Migrate Content later or in Parallel
– Can be implemented easily
28. Indexing MOSS 2007 User Store
• Create a Content Source
– Content Source Type - SharePoint Sites
– Start Address: sps3://<MOSS 2007 Site>
– Search Results from that source - not all options will be
available
• No Add as a colleague
• No Browse in Organization Chart
29. User Profile Replication Engine
• UPRE ships in SPS2010 Admin Toolkit
– Sync between MOSS 2007 and SPS2010
• Co-existence
– Sync between SPS2010 and SPS2010
• User Profile SA can’t be used across the WAN
• Includes social data
31. High Availability / Fault Tolerance
A design that enables a system to continue
operation, possibly at a reduced level (also
known as graceful degradation), rather than
failing completely, when some part of the
system fails.
“Fault tolerant design”, Wikipedia
32. High Availability for Search
• Content side High Availability
– Full redundancy in the feeding chain
– Normally not critical for intranet applications
– Preferred by many clients
• Query side High Availability
– Full redundancy of all query components
– Critical for internet facing applications
– Preferred for intranet applications
• Backup/recovery alternatives not covered
33. SharePoint Search – Content Data Flow
Doc. properties Index fragments
Distribute
request
Poll
request
Crawl DB
Log
request
Poll
request
Security
descriptors
(ACLs and ACEs)
Request
crawl
34. SharePoint Search – Content Side HA
Property DB
Automatic re- Crawlers are stateless,
election of Master Redundant instances
automatic failover
will automatically fail over
Crawl DB Crawl DB
No redundancy support,
but can be quickly relocated
via PowerShell
37. The cost of overinvestment in hardware is
almost always far less than the cumulative
expenses related to troubleshooting
problems cause by under sizing.
TechNet, Capacity management
and sizing for Sharepoint 2010
38. Search Sizing
• Scale up
(Add more hardware:
processors/memory)
• Scale out
(Add more
servers to a farm)
• Search is by far the service application in SP 2010
with the largest hardware utilization
39.
40. Sizing approach
Crawl DB instances
Index partitions Property
DB instances
Crawler components / Indexers
42. SP Search – Pilot/Dev Deployment
SP2010 Farm
All roles
43. SP Search – Extra Small Deployment
SP2010 Farm SP2010 Farm
All roles
Web Front End
Query
SP Crawl
People Crawl
SQL Server
All DBs
SQL 2008 Cluster
Web Front End
Query
SP Crawl
People Crawl
SQL Server
44. SP Search – Small Deployment
SP2010 Farm
*
Web Front End Web Front End
Query Query
Index partition 1 Index partition 1
*
Central Admin SP Crawl
SP Crawl People Crawl
People Crawl
Search Admin DB
Crawl DB
Property DB
SharePoint DB
SQL 2008 Cluster
Note:
Servers marked with * are only
needed for high availability
45. SP Search – Medium Deployment
SP2010 Farm
Web Front End Web Front End
Query Query Query Query
Index partition 1 Index partition 1 Index partition 2 Index partition 3
Index partition 4 Index partition 2 Index partition 3 Index partition 4
Central Admin SP Crawl
SP Crawl People Crawl
People Crawl
Search Admin DB Crawl DB
Property DB
SharePoint DB
SQL 2008 Cluster
46. SP Search – Large Deployment
SP2010 Farm
Web Front End Web Front End
Query Query Query Query Query Query Query Query Query Query
Index partition 1 Index partition 1 Index partition 2 Index partition 3 Index partition 4 Index partition 5 Index partition 6 Index partition 7 Index partition 8 Index partition 9
Index partition 10 Index partition 2 Index partition 3 Index partition 4 Index partition 5 Index partition 6 Index partition 7 Index partition 8 Index partition 9 Index partition 10
Central Admin SP Crawl SP Crawl
SP Crawl People Crawl People Crawl
People Crawl
Crawl DB Property DB Property DB
Crawl DB
SharePoint Search Admin DB
SQL 2008 Cluster
47. Server Calculation Matrix
Item Query Crawl Prop Content Query
Name count WFEs Comps Comp DBs Crawl DBs Total Side HA Side HA
Single VM (Lab + min
production) 1 (shared) (shared) 1 (shared) (shared) 1 (x) (x)
Extra Small 5 (shared) (shared) 1 1 (shared) 2
Small 10 2 (shared) 1 1 (shared) 4 x
Medium 40 2 4 2 1 1 10 x x
Large 100 2 10 3 2 2 19 x x
Disclaimer:
The numbers might not be representative for the customer environment and data. Please use
caution when using these numbers for sizing.
48. FAST Search for SharePoint 2010
Sorting on any
Query property
Related
completion
searches &
people
Scrolling
Document previews
thumbnails
Read in Office
Web Apps
Federated
results
49. FAST Search – Content Data Flow (1/2)
Doc. properties Index fragments
Query
Property DB Crawl comp. component
Distribute
request
Master Crawl
comp.
Poll
request
Crawl data
Crawl history Crawl DB
Crawl queue additions
Log
request
Admin
component
Poll
request
Security
Admin DB descriptors
(ACLs and ACEs)
Request
crawl
50. FAST Search – Content Side HA (1/2)
Query
Property DB Property DB Crawl comp.
Crawl comp. Query
Query
component
Crawl comp. component
component
Automatic re- Crawlers are stateless,
election of Master Redundant instances
automatic failover
will automatically fail over
Master Crawl
comp.
Crawl DB Crawl DB
Admin No redundancy support,
component but can be quickly relocated
via PowerShell
Admin DB Admin DB
51. FAST Search – Content Data Flow (2/2)
Search
Distribute
index
Indexing
Pass on
batch
Indexing
Dispatcher
Ready to
index
Item Detected Link Analysis
Processing links (Web Analyzer)
Pass on
batch
Content
Distributor
Crawled
batch
52. FAST Search – Content Side HA (2/2)
Search rows have Search
Search
automatic failover
Backup indexer, Indexing
Indexing
manual failover Must be set up for
redundancy.
Does not hold state, Indexing
Indexing
Disk errors may
Indexing
automatic failover
Dispatcher
Dispatcher
Dispatcher require manual
recovery.
Does not hold state, Item
Item
Link Analysis
Item
Processing (Web Analyzer)
automatic failover Processing
Processing
Does not hold state, Content
Content Crawl DB and Crawl
Distributor
Content
automatic failover Distributor
Distributor Component requirements are
as for SharePoint Search
54. FAST Search for SharePoint Search Service Applications
Summary of architectural elements FAST Search for SharePoint
Web Frontend
Site Collection Level Admin UI PowerShell Central Administration UI
- Keyword Management - Schema configuration - Property mapping
- User Context Management - Admin configuration - Entity extracton
- Site Promotion/Demotion - Deployment configuration - Spell-checking
Administration and Schema Object Model
SharePoint
Front-end Connectors:
Security Content
- SharePoint
Access Indexing - BDC
Query Object Model
Module - Exchange
Content
Processing
Content
And
Custom Linguistics
Query Web Service
front-end Connectors:
Query and
- Web Crawler
Result Search - JDBC Content
Federation Processing - Lotus Notes
Object Model
Monitoring Services
OpenSearch or
other Sources People Search Microsoft System Center Operations Manager !
!
55. Content Processing Flow
OpenSearch
Source
Content
End Users
Federation
Query Content
Indexer Crawler
Processor Processor
Search Center Index
Partition Profiles
User Relevance Metadata Indexing
Context Control Connectivity
• Data moves from content source to end user queries
It gets crawled, processed and refined, an index is created
User executes queries and retrieves data, metadata, and federated search
results
56. Content Pipeline Stages
Default Optional
XML Properties mapper
• Format Conversion Offensive Content Filter
• Language detection and encoding Verbatim extractor
• Lemmatizer
Loads dictionary for custom extraction,
– Linguistics normalization
e.g product names
• Tokenizer
Field Collapsing
– Word breaking
• Entity Extraction
– Persons, companies, locations, email,
…
date/time, URL, prices, file names
• DateTimeNormalizer
– Date normalization
• Vectorizer
– Create document vector for similarity
searching
• WebAnalyzer
– Anchor text and link cardinality analysis
• PropertiesMapper
– Map to crawled properties
• PropertiesReporter
– Report detected properties
57. FAST Search for SharePoint Scaleout
Scale-out in different
“dimensions”
Query Volume
Content Volume
Processing power
Indexing freshness
Redundancy options
Search
Indexing
Performance targets*
30 mDocs/node
50 QPS/node
35 docs/sec
* Dependent on document and HW characteristics
58. FAST Search – Disk Calculation
Max item count
(in Millions) Adm Web Analyzer Crawl DB Server Indexer Indexer (HD)
1 1 x 72 GB 1 x 5 GB 1 x 10 GB 1 x 120 GB 1 x 120 GB
10 1 x 72 GB 1 x 50 GB 1 x 40 GB 1 x 1.2 TB 1 x 1.2 TB
40 1 x 72 GB 1 x 60 GB 1 x 150 GB 3 x 2.0 TB 1 x 4.8 TB
100 1 x 72 GB 2 x 75 GB 1 x 350 GB 6 x 2.0 TB 3 x 4.8 TB
150 1 x 72 GB 4 x 75 GB 1 x 500 GB 10 x 2.0 TB 4 x 4.8 TB
200 1 x 72 GB 5 x 75 GB 2 x 350 GB 14 x 2.0 TB 5 x 4.8 TB
500 1 x 72 GB 9 x 75 GB 2 x 500 GB 34 x 2.0 TB 13 x 4.8 TB
59. SharePoint Search/FAST Search Recap
• Search is the most demanding service in SP 2010 –
plan accordingly
• All components involved in querying and steady-
state crawling support HA
• High Density mode may be an attractive
alternative
• Sizing models are based on thorough testing – find
one that fits your scenario
61. 2010 Upgrade improvements
• Detect issues early
– Provide O12 tools to admins
– Report critical issues at start of upgrade
• Keep the administrator informed
• No data loss
– Keep content and settings
• Continue when possible
• Be reentrant
– Upgrade should not be catch 22
62. 2010 Upgrade Overview
New Changed
• Upgrade Preparation Tools • Upgrade Methods
• Windows PowerShell Upgrade
Cmdlets Improved
• Feature Upgrade • Upgrade Status Reporting
• Visual Upgrade • Upgrade Logging
Removed
• Gradual Upgrade
• Side By Side Installation
63. 2010 Upgrade Scenarios and Methods
Supported Scenarios Unsupported Scenarios
• In-Place Upgrade • Upgrade from earlier than WSS v3
• Database Attach Upgrade: SP2/MOSS 2007 SP2
• Direct upgrade from WSS v2/SPS
– Content Database
2003 or earlier
– Profile Service Database
• Side by side installation
• Gradual upgrade
64. In-Place
• Next, next, finished
• Advancements
– Restartable!
– Common blocking time outs removed
65. In-Place Pros/Cons
Farm wide settings are preserved Servers and farms are
and upgraded offline while the upgrade
Customizations are available in the is in progress
environment after The upgrade proceeds continuously
the upgrade if they are v4 Existing v3 farm must support (64
compatible bit and performance
67. Database Attach
• Databases that can be attached
– Content database
– Profile service database
– Project service database
• V3 databases that cannot be attached
– Configuration
– Search
69. DB Attach Pros/Cons
Pros Cons
Upgrade multiple content The server and farm settings are
databases at the same time not upgraded
Combine multiple farms Customizations must be
into one farm transferred manually
Customizations must be Missing customizations
transferred manually
71. Hybrid Pros/Cons
Farm wide settings preserved Labor intensive
Customizations already Direct access to the
in place database servers
Multiple content databases x86 is a lot of work
at the same time Existing hardware
Non-upgraded sites may need replacing
(in read-only mode) while
you upgrade the content
72. Upgrading FBA Web Apps
• Convert Web applications to claims-based
authentication
• Update web.config with necessary connection
information for your provider
• Use PowerShell to migrate users and permissions
74. SSP
• O12 SSPs and service settings =
Flexible shared services model
• Service Applications = part of Foundation
• Notification of new services after
in-place upgrade
• Backup/restore of individual services
+ Provisioning offbox
75. What is “Visual Upgrade”
• A feature that separates data upgrade
from UI upgrade
– Data and code upgrade happens all at once
– Site UI has two modes: this version and
previous version
– Pages and components make the decision
at runtime, and it’s safe by default
76. Summary
• SharePoint 2010 Search/FAST Search
– Capabilities
– Architecture
– Search First Migration
– High Availability and Sizing considerations
• Migration options for migrating MOSS 2007 to
SPS 2010