2. AGENDA
› Short glimpse to past and modern search engines
› Case Evira
› Environments & Cloud
› CMS and Elasticsearch combination
› More practical stuff
4. SEARCH ENGINES 2014 (AND BEFORE)
› We (Solita) had 3 different types of search solutions
1. Google Search Appliance
• Only in your own data center
• Require investment beforehand (the box)
2. Episerver Find
• Only in cloud (in Ireland)
• Pricing depends on content items, languages and QPS
3. Lucene search
• Only in one server
• built-in Episerver
• Free
5. OPTIONS AND STRENGTHS
› Google Search Appliance
• Crawler-driven search engines are excellent to multiplatform
environments (web, extranet, document bank, blogs)
• Excellent statistics
› Episerver Find
• Easy to install and start using
• Don’t need to host any server
• Excellent API
› Lucene search
• OOTB in Episerver
• Requires only files
6. OPTIONS AND WEAKNESSES
› Google Search Appliance
• Talks only xml
• Sort by metadata: “The sorting occurs only on the 1000 most
relevant results for the specific query”
› Episerver Find
• It’s a bit expensive
• Limited dev options
› Lucene search
• Built-in Episerver -> hard to customize
• Error prone
• Only a full-text search
7. SEARCH ENGINES 2017
1. Google Search Appliance
• 2016 Google ended development
2. Episerver Find
• Find has become key component in many web
sites and in DXC platform
3. Lucene search
• Episerver’s focus is in Find
4. Elasticsearch
• Elastic has build a hole family of products around
Elasticsearch
8. CRAWLER VS EVENT-DRIVEN
› Event-driven engines fits larger variation of use cases (projects)
• Example access rights management, real-time
• Might need more time to install
› Crawler-driven engines often have lot’s of easy to use OOTB features
• Don’t need that much customizing
• Customizing is expensive
Event-driven search engines fits
nicely into Episerver projects
9. EPISERVER FIND
› First released with the name Truffler
in the end of 2011
› Event-driven search engine
› Build on top of Elasticsearch which is
build on top of Lucene
› Father is Joel Abrahamsson
10. ELASTICSEARCH
› Open source project
› Build on top of Lucene and Java
› Allows communication only through REST API and JSON
› Various platforms have Client libraries to ease the communication
(.NET, JAVA, JavaScript)
› It’s build to be distributed and scalable search engine
› Elasticsearch is a key product which has a hole family of products
around (Kibane, Logstash, Beats, Monitoring, Alerting, Machine Learning)
13. CASE: EVIRA
› Evira is Finnish Food Safety Authority.
• Lots of official documentation
• Lots of content editors
• Contains mostly text, documents, forms and table data
• Low amount of images and rich content
› Same project contains also intranet for Evira. So the same
architecture was required to work with intranet case also.
16. Search with URL
-parameters
Facet groups
Ordering
Search word
highlights
Customizable
search results
Did you mean this
-feature
Filters and
Facets
Easily
customizable
facets
Fallback wildcard search
File search
for most common
document types
18. KEY COMPONENTS
› Episerver CMS (Content editing, UI and master data store)
• Platform for content editing with many languages, versioning, document
bank, metadata, etc.
• Master data and primary data source for Elasticsearch
› Elasticsearch (Search and performance)
• Global search and efficient way to query large data sets with full-text
support
› Azure (Cloud platform and scale)
• Azure contains all the environments, files, data, backups, monitoring,
maintenance jobs, etc.
19. CUSTOMIZABLE PLATFORM
› Episerver CMS
• From medium size to very large projects
• Easily customizable front-end and pluggable/extendable back-end
› Elasticsearch
• From the smallest to very large projects
• Runs locally your laptop, buy it from the cloud or private data center
› Azure Cloud
• From the smallest to very large projects
• IaaS and PaaS options
20. On premise / Private cloudAzure IaaS on virtual
machines (one or many)
Developer’s laptop
IIS Web
Server
SETUP OPTIONS
PaaS on Azure App Services
and Elastic Cloud
SEARCH VM
Elasticsearch
Web Site
SQL Database
Blob Storage
Elasticsearch
Web
ServerWeb Server
SQL Server
21. ELASTIC IS NOT JUST FOR SEARCH
› It’s a performance tool. It makes querying large data sets much more
efficient than tools like SQL Server or many other search tools
› We use Elastic:
› Global search
› Internal searches and listings:
Products news, announcements,
comments, Files, RSS, sitemap
› Handling a large datasets. Example
some migrations.
› Analytics and statistics
• Site visitor analytics
• Search usage analytics
› 404 statistics
› Error logging and log analyzing
› Monitoring servers
Full-text search, Listings, performance Analytics, statistics
22.
23.
24.
25. NOT A CRAWLER
› We have integrated Elasticsearch to events of Episerver
› Real-time (1 or 2 seconds latency)
• Long latencies often cause multiple other problems
› We can send more data than what’s visible (example access rights)
Real-time is really hard gain
if it’s not built into the architecture
27. CQRS WITH CMS (TRADITIONAL FORMAT)
Commands
Queries
SQL Server
database
Elasticsearch
Index
Web Site
Episerver CMS
Elasticsearch
28. CQRS WITH CMS (AS WE USE IT)
Commands
Queries
Elasticsearch
Index
Web Site
Episerver CMS
Elasticsearch
Simple Queries
Episerver CMS
SQL Server
database
29. CQRS WITH CMS (AS EPISERVER FIND USE IT)
Commands
Queries
Episerver Find
Index
Web Site
Episerver CMS
Elasticsearch
Simple Queries
Episerver CMS
SQL Server
database
returns only
the id’s
GetContenResults() -method
30. FIND PROJECTIONS
› CQRS pattern with traditional format
› Querying Find without IContent
› Does not use Episerver cache or
database
› All IContent (example BlogPost)
properties do not exists in index or
might not be up-to-date (example
FriendlyUrl and AccessRights)
var result = client.Search<BlogPost>()
.Select(x => new SearchResult {
Title = x.Title,
Author = x.Author.Name})
.GetResult();
31. WHY ELASTIC WITH CMS
› Content Management Systems are generally good for managing
content, files, content relations, hierarchy, language variations,
content versions, access rights, user management, model type
management and CACHING
› They often have hierarchical structure of handling content
› So querying a page and querying parent or child pages often
comes straight from the cache and does not even make a database
query.
› But CMS often do not include good tools querying across hierarchies
32. CHOOSE THE BEST TOOL
› Use Episerver/CMS for simple queries
• If you need to query: just one object, sibling objects or child objects from
less than 2 hierarchy levels
› Use Elasticsearch/Find
• Everything else
› Except don’t use Elasticsearch/Find:
• If 1-2 second latency is too much
• If there is some transactions requirements
• If Find host exists in too far away (lag) or SLA requirements for the feature is
higher than Find can provide (or use with cache)
33. ELASTIC INDEX = QUERY DATABASE
› We can always recreate elastic index
from SQL Server “master data”
› That’s why we don’t really need
multiple nodes or chards
Get all the data
Elasticsearch
Index
Episerver CMS
SQL Server
database
Reindex
34. ELASTICSEARCH.NET & NEST
› Official .NET Elasticsearch clients
› ElasticSearch.NET & NEST makes
the usage strongly typed:
• No JSON
• No typos
• Every value has a type
• IDE will help you
var response = client.Search<Tweet>(s => s
.From(0)
.Size(10)
.Query(q =>
q.Term(t => t.HashTags, "elasticsearch")
)
);
public class Tweet
{
public string[] HasTags;
...
}
› Code example:
35. MAPPINGS ARE LIKE SCHEMA IN DB
› NEST will automatically map most of the types
but not all:
› Separate string types:
• Text (analyzed)
default type for strings
• Keywords (not analyzed)
Keyword fields are only searchable by their exact value
› Automating the mappings will help a lot in
long run
public class Tweet
{
[Text]
public string Content;
[keyword]
public string Url;
[keyword]
public string[] HashTags;
...
}
› Code example:
› Mappings is normally generated automatically based on content you insert
into index. But sometimes you need custom mappings.
36. SCORING OPTIMIZATION
› Boosting fields is the most important scoring customization
› We normally have 3 fields which we boost with different values:
• Titles (boost 2.0)
• FullTextField (boost 1.5)
• ExtraContent (boost 1.0)
37. SCORING OPTIMIZATION
› Script scoring allows us to boost results with certain properties:
• Search result type
• Number of internal links
• Depth in hierarchy
• Recently published / edited
• Popularity by user visits
› Requires that dynamic scripting is enabled from the Elasticsearch.
All hosting partners won’t allow it.
38. SUMMARY
› Event-driven search engines fits nicely into Episerver projects
› Episerver Find is build on top of Elasticsearch
› Elasticsearch/Find fits with most CMSes because they lack good search
tools
› CQRS pattern will help with performance but choose wisely how to use it
› Invest your platform that it’s customizable. So it fits your next project also.