4. Search
Search connects people
to the information
they need to get their jobs done.
5. Search
• „I know what I’m searching for and know how to
do that”
• „I know what I’m searching for but I don’t know
how to do that”
• „I don’t know what I’m searching for”
• „Am I Searching?...”
6. Enterprise Search
• Enterprise – is no longer within the firewall
• Relevance is critical
• Search within the organization
• „Transparent” Search
• Search Driven Applications
8. Search Based Application (SBA)
• Software Application
• Built on a Search Engine backbone rather
than a database infrastructure
• Purpose is not classic information
retrieval, but rather mission-oriented
information access, analysis or
discovery
15. Challenges
User Multiple search
interfaces, systems, and
Experience logons; no unified search
Challenges results
Files and email on
Data and desktops; structured and
Expertise unstructured data silos;
untapped expertise
Challenges
Relevance and ranking;
Enterprise security, privacy &
and IT compliance;
scalability, manageability &
Challenges extensibility
16.
17. Customizations for Search Driven Applications
Building on an extensible platform
Configure Extend Create
User Context Relevance Profiles Custom Elements
LOB Connectivity UI & Web Parts Work Environments
Content Processing Result Rollup New Innovations
Business language Visual Elements ….
Federation Sources Workflows
UI Look & Feel Analytics
….. …..
19. Content Sources in SBAs
• Combine (join) data
• Connect data
– Existing relationships in the source system
– Newly discovered, cross-system relationships
• Aggregate data
• Visualize data
20. Data Collection / Crawling
• Crawler:
– Connects to the Content Source
– Enumerates the content
– Reads the content items
– Extracts the metadata
– Sends the collected info back to the Indexer
21. Data Collection / Crawling
• Connector: Enables to access different types of
content
• OOTB:
– SharePoint
– File Share
– Web site
– Exchange Public Folders
– Custom Connectors
– (Lotus Notes)
– (Documentum)
22. Natural Language Processing
• Crawl/Index Time
– Language Detection
– Tokenization
– Stemming and Lemmatization
• Query Time
– Approximate Spelling
– Phonetic Spelling
– Word Truncation
– Regular Expressions
– Semantic Expansion
– Rules-based Matching
23. Processing: Crawled and
Managed Properties
• Crawled property: metadata extracted
from the documents/items during the
crawl.
• Managed property: can appear in refined
searches and helps users perform more
successful queries
25. Processing: Ranking
• Ranking: produce results that are ordered
according to some computed relevancy score
• Dynamic: Based on weighted managed
properties (title, body, social tags, etc.)
• Static:
– File Type
– Click through relevancy
– Depth
28. User Interface
• OOTB Web Parts
– Refinement Panel
– Core Results Web Part
• Federation
• People Search
• Scopes
• Custom Web Parts
– Visual Navigation
– Mashups
– Etc.
• Workflows – Act on Items Immediately
29. Search Federation
• Using remote index for queries
• Location type:
– SharePoint Search index
– FAST index
– OpenSearch 1.0/1.1
31. Search Federation
• Benefits:
– No resources needed for indexing
– Custom Credentials
– Usage restrictions
– Prefix / Pattern match
– Query Template
• {searchTerms} scope:Documents
• {searchTerms} type:.doc type:.docx type:.docm
• BUT:
– Live Internet connection is required
– Bandwith
– No control over results (order, relevance, etc.)
– Separated Web Parts
33. Summary
• Search Based Applications?
– Need to Aggregate Heterogeneous Content
– Neet to Process Large Volume of Data
– Need for Real Time Information
– Need for Ad Hoc Reporting
Customer Service + supportLogistical track and traceContextual advertisingDecision intelligenceE-Discovery
Built by Customer and Microsoft Services: Dow JonesInvestment portfolio analysis application
MOCKUP ONLY Innovation portal
MOCKUP OnlyWealth Management Advisor portal
Time: 2 minutes.Speaker Notes:There are three levels of search customization that cover the spectrum:Configuring out of the box behaviorExtending existing components (e.g. Web Parts)Creating brand new componentsThe actual tools (sharepoint, SPD, VS) are provided as *examples* of the tools that you would work with at each of these levels.
Language Detection: English, French, ...?Tokenization: into a sequence of individual words (grammar, punctuation, word separation rules)Stemming: Applying language specific suffixing rules to remove common suffixesLemmatization: morphological analysis (mice -> Mouse)Approximate SpellingPhonetic SpellingWord Truncation – rob = robust, robert, robinRegular Expressions – re.ort = report, resortSemantic Expansion – plane vs. airplaneRules-based Matching