SlideShare una empresa de Scribd logo
1 de 38
EPISERVER AND SEARCH
ENGINES
CASE: EVIRA
Episerver Meetup 19.4.2017
Mikko Huilaja
AGENDA
› Short glimpse to past and modern search engines
› Case Evira
› Environments & Cloud
› CMS and Elasticsearch combination
› More practical stuff
SHORT GLIMPSE TO
PAST AND MODERN
SEARCH ENGINES
SEARCH ENGINES 2014 (AND BEFORE)
› We (Solita) had 3 different types of search solutions
1. Google Search Appliance
• Only in your own data center
• Require investment beforehand (the box)
2. Episerver Find
• Only in cloud (in Ireland)
• Pricing depends on content items, languages and QPS
3. Lucene search
• Only in one server
• built-in Episerver
• Free
OPTIONS AND STRENGTHS
› Google Search Appliance
• Crawler-driven search engines are excellent to multiplatform
environments (web, extranet, document bank, blogs)
• Excellent statistics
› Episerver Find
• Easy to install and start using
• Don’t need to host any server
• Excellent API
› Lucene search
• OOTB in Episerver
• Requires only files
OPTIONS AND WEAKNESSES
› Google Search Appliance
• Talks only xml
• Sort by metadata: “The sorting occurs only on the 1000 most
relevant results for the specific query”
› Episerver Find
• It’s a bit expensive
• Limited dev options
› Lucene search
• Built-in Episerver -> hard to customize
• Error prone
• Only a full-text search
SEARCH ENGINES 2017
1. Google Search Appliance
• 2016 Google ended development
2. Episerver Find
• Find has become key component in many web
sites and in DXC platform
3. Lucene search
• Episerver’s focus is in Find
4. Elasticsearch
• Elastic has build a hole family of products around
Elasticsearch
CRAWLER VS EVENT-DRIVEN
› Event-driven engines fits larger variation of use cases (projects)
• Example access rights management, real-time
• Might need more time to install
› Crawler-driven engines often have lot’s of easy to use OOTB features
• Don’t need that much customizing
• Customizing is expensive
Event-driven search engines fits
nicely into Episerver projects
EPISERVER FIND
› First released with the name Truffler
in the end of 2011
› Event-driven search engine
› Build on top of Elasticsearch which is
build on top of Lucene
› Father is Joel Abrahamsson
ELASTICSEARCH
› Open source project
› Build on top of Lucene and Java
› Allows communication only through REST API and JSON
› Various platforms have Client libraries to ease the communication
(.NET, JAVA, JavaScript)
› It’s build to be distributed and scalable search engine
› Elasticsearch is a key product which has a hole family of products
around (Kibane, Logstash, Beats, Monitoring, Alerting, Machine Learning)
Most/all search engines are using
Lucene in the background
CASE EVIRA
CASE: EVIRA
› Evira is Finnish Food Safety Authority.
• Lots of official documentation
• Lots of content editors
• Contains mostly text, documents, forms and table data
• Low amount of images and rich content
› Same project contains also intranet for Evira. So the same
architecture was required to work with intranet case also.
WWW.EVIRA.FI
Go and test the search!
Search with URL
-parameters
Facet groups
Ordering
Search word
highlights
Customizable
search results
Did you mean this
-feature
Filters and
Facets
Easily
customizable
facets
Fallback wildcard search
File search
for most common
document types
ARCHITECTURE
KEY COMPONENTS
› Episerver CMS (Content editing, UI and master data store)
• Platform for content editing with many languages, versioning, document
bank, metadata, etc.
• Master data and primary data source for Elasticsearch
› Elasticsearch (Search and performance)
• Global search and efficient way to query large data sets with full-text
support
› Azure (Cloud platform and scale)
• Azure contains all the environments, files, data, backups, monitoring,
maintenance jobs, etc.
CUSTOMIZABLE PLATFORM
› Episerver CMS
• From medium size to very large projects
• Easily customizable front-end and pluggable/extendable back-end
› Elasticsearch
• From the smallest to very large projects
• Runs locally your laptop, buy it from the cloud or private data center
› Azure Cloud
• From the smallest to very large projects
• IaaS and PaaS options
On premise / Private cloudAzure IaaS on virtual
machines (one or many)
Developer’s laptop
IIS Web
Server
SETUP OPTIONS
PaaS on Azure App Services
and Elastic Cloud
SEARCH VM
Elasticsearch
Web Site
SQL Database
Blob Storage
Elasticsearch
Web
ServerWeb Server
SQL Server
ELASTIC IS NOT JUST FOR SEARCH
› It’s a performance tool. It makes querying large data sets much more
efficient than tools like SQL Server or many other search tools
› We use Elastic:
› Global search
› Internal searches and listings:
Products news, announcements,
comments, Files, RSS, sitemap
› Handling a large datasets. Example
some migrations.
› Analytics and statistics
• Site visitor analytics
• Search usage analytics
› 404 statistics
› Error logging and log analyzing
› Monitoring servers
Full-text search, Listings, performance Analytics, statistics
NOT A CRAWLER
› We have integrated Elasticsearch to events of Episerver
› Real-time (1 or 2 seconds latency)
• Long latencies often cause multiple other problems
› We can send more data than what’s visible (example access rights)
Real-time is really hard gain
if it’s not built into the architecture
CQRS = COMMAND QUERY
RESPONSIBILITY SEGREGATION
CQRS WITH CMS (TRADITIONAL FORMAT)
Commands
Queries
SQL Server
database
Elasticsearch
Index
Web Site
Episerver CMS
Elasticsearch
CQRS WITH CMS (AS WE USE IT)
Commands
Queries
Elasticsearch
Index
Web Site
Episerver CMS
Elasticsearch
Simple Queries
Episerver CMS
SQL Server
database
CQRS WITH CMS (AS EPISERVER FIND USE IT)
Commands
Queries
Episerver Find
Index
Web Site
Episerver CMS
Elasticsearch
Simple Queries
Episerver CMS
SQL Server
database
returns only
the id’s
GetContenResults() -method
FIND PROJECTIONS
› CQRS pattern with traditional format
› Querying Find without IContent
› Does not use Episerver cache or
database
› All IContent (example BlogPost)
properties do not exists in index or
might not be up-to-date (example
FriendlyUrl and AccessRights)
var result = client.Search<BlogPost>()
.Select(x => new SearchResult {
Title = x.Title,
Author = x.Author.Name})
.GetResult();
WHY ELASTIC WITH CMS
› Content Management Systems are generally good for managing
content, files, content relations, hierarchy, language variations,
content versions, access rights, user management, model type
management and CACHING
› They often have hierarchical structure of handling content
› So querying a page and querying parent or child pages often
comes straight from the cache and does not even make a database
query.
› But CMS often do not include good tools querying across hierarchies
CHOOSE THE BEST TOOL
› Use Episerver/CMS for simple queries
• If you need to query: just one object, sibling objects or child objects from
less than 2 hierarchy levels
› Use Elasticsearch/Find
• Everything else
› Except don’t use Elasticsearch/Find:
• If 1-2 second latency is too much
• If there is some transactions requirements
• If Find host exists in too far away (lag) or SLA requirements for the feature is
higher than Find can provide (or use with cache)
ELASTIC INDEX = QUERY DATABASE
› We can always recreate elastic index
from SQL Server “master data”
› That’s why we don’t really need
multiple nodes or chards
Get all the data
Elasticsearch
Index
Episerver CMS
SQL Server
database
Reindex
ELASTICSEARCH.NET & NEST
› Official .NET Elasticsearch clients
› ElasticSearch.NET & NEST makes
the usage strongly typed:
• No JSON
• No typos
• Every value has a type
• IDE will help you
var response = client.Search<Tweet>(s => s
.From(0)
.Size(10)
.Query(q =>
q.Term(t => t.HashTags, "elasticsearch")
)
);
public class Tweet
{
public string[] HasTags;
...
}
› Code example:
MAPPINGS ARE LIKE SCHEMA IN DB
› NEST will automatically map most of the types
but not all:
› Separate string types:
• Text (analyzed)
default type for strings
• Keywords (not analyzed)
Keyword fields are only searchable by their exact value
› Automating the mappings will help a lot in
long run
public class Tweet
{
[Text]
public string Content;
[keyword]
public string Url;
[keyword]
public string[] HashTags;
...
}
› Code example:
› Mappings is normally generated automatically based on content you insert
into index. But sometimes you need custom mappings.
SCORING OPTIMIZATION
› Boosting fields is the most important scoring customization
› We normally have 3 fields which we boost with different values:
• Titles (boost 2.0)
• FullTextField (boost 1.5)
• ExtraContent (boost 1.0)
SCORING OPTIMIZATION
› Script scoring allows us to boost results with certain properties:
• Search result type
• Number of internal links
• Depth in hierarchy
• Recently published / edited
• Popularity by user visits
› Requires that dynamic scripting is enabled from the Elasticsearch.
All hosting partners won’t allow it.
SUMMARY
› Event-driven search engines fits nicely into Episerver projects
› Episerver Find is build on top of Elasticsearch
› Elasticsearch/Find fits with most CMSes because they lack good search
tools
› CQRS pattern will help with performance but choose wisely how to use it
› Invest your platform that it’s customizable. So it fits your next project also.

Más contenido relacionado

La actualidad más candente

SPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
SPTECHCON - Rev Your Engines - SharePoint 2013 Performance EnhancementsSPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
SPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
Eric Shupps
 

La actualidad más candente (20)

I5 - Bring yourself up to speed with power shell
I5 -  Bring yourself up to speed with power shellI5 -  Bring yourself up to speed with power shell
I5 - Bring yourself up to speed with power shell
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
Navigating the turbulence on take-off: Setting up SharePoint on Azure IaaS th...
Navigating the turbulence on take-off: Setting up SharePoint on Azure IaaS th...Navigating the turbulence on take-off: Setting up SharePoint on Azure IaaS th...
Navigating the turbulence on take-off: Setting up SharePoint on Azure IaaS th...
 
Building Search Engines - Lucene, SolR and Elasticsearch
Building Search Engines - Lucene, SolR and ElasticsearchBuilding Search Engines - Lucene, SolR and Elasticsearch
Building Search Engines - Lucene, SolR and Elasticsearch
 
PowerShell for the Anxious ITPro
PowerShell for the Anxious ITProPowerShell for the Anxious ITPro
PowerShell for the Anxious ITPro
 
Navigating the turbulence on takeoff: Setting up SharePoint on Azure IaaS the...
Navigating the turbulence on takeoff: Setting up SharePoint on Azure IaaS the...Navigating the turbulence on takeoff: Setting up SharePoint on Azure IaaS the...
Navigating the turbulence on takeoff: Setting up SharePoint on Azure IaaS the...
 
SPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
SPTECHCON - Rev Your Engines - SharePoint 2013 Performance EnhancementsSPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
SPTECHCON - Rev Your Engines - SharePoint 2013 Performance Enhancements
 
Design for scale
Design for scaleDesign for scale
Design for scale
 
RavenDB 4.0
RavenDB 4.0RavenDB 4.0
RavenDB 4.0
 
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
 
Real World SharePoint Add-In Development
Real World SharePoint Add-In DevelopmentReal World SharePoint Add-In Development
Real World SharePoint Add-In Development
 
RavenDB 3.5
RavenDB 3.5RavenDB 3.5
RavenDB 3.5
 
Rev Your Engines - SharePoint Performance Enhancements
Rev Your Engines - SharePoint Performance EnhancementsRev Your Engines - SharePoint Performance Enhancements
Rev Your Engines - SharePoint Performance Enhancements
 
Owin & katana
Owin & katanaOwin & katana
Owin & katana
 
Henry been azure resource manager - inside out
Henry been   azure resource manager - inside outHenry been   azure resource manager - inside out
Henry been azure resource manager - inside out
 
Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014
 
Azure Automation and Update Management
Azure Automation and Update ManagementAzure Automation and Update Management
Azure Automation and Update Management
 
Azure functions serverless
Azure functions serverlessAzure functions serverless
Azure functions serverless
 
PowerShell for the Hybrid Admin
PowerShell for the Hybrid AdminPowerShell for the Hybrid Admin
PowerShell for the Hybrid Admin
 
Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018 Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018
 

Similar a Episerver and search engines

AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
Amazon Web Services Korea
 

Similar a Episerver and search engines (20)

Elastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case EviraElastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case Evira
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and Multitenancy
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
0bbleedingedge long-140614012258-phpapp02 lynn-langit
0bbleedingedge long-140614012258-phpapp02 lynn-langit0bbleedingedge long-140614012258-phpapp02 lynn-langit
0bbleedingedge long-140614012258-phpapp02 lynn-langit
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenJ1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch ServiceAWS October Webinar Series - Introducing Amazon Elasticsearch Service
AWS October Webinar Series - Introducing Amazon Elasticsearch Service
 
Elastic pivorak
Elastic pivorakElastic pivorak
Elastic pivorak
 
Getting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchGetting started with Laravel & Elasticsearch
Getting started with Laravel & Elasticsearch
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Episerver and search engines

  • 1. EPISERVER AND SEARCH ENGINES CASE: EVIRA Episerver Meetup 19.4.2017 Mikko Huilaja
  • 2. AGENDA › Short glimpse to past and modern search engines › Case Evira › Environments & Cloud › CMS and Elasticsearch combination › More practical stuff
  • 3. SHORT GLIMPSE TO PAST AND MODERN SEARCH ENGINES
  • 4. SEARCH ENGINES 2014 (AND BEFORE) › We (Solita) had 3 different types of search solutions 1. Google Search Appliance • Only in your own data center • Require investment beforehand (the box) 2. Episerver Find • Only in cloud (in Ireland) • Pricing depends on content items, languages and QPS 3. Lucene search • Only in one server • built-in Episerver • Free
  • 5. OPTIONS AND STRENGTHS › Google Search Appliance • Crawler-driven search engines are excellent to multiplatform environments (web, extranet, document bank, blogs) • Excellent statistics › Episerver Find • Easy to install and start using • Don’t need to host any server • Excellent API › Lucene search • OOTB in Episerver • Requires only files
  • 6. OPTIONS AND WEAKNESSES › Google Search Appliance • Talks only xml • Sort by metadata: “The sorting occurs only on the 1000 most relevant results for the specific query” › Episerver Find • It’s a bit expensive • Limited dev options › Lucene search • Built-in Episerver -> hard to customize • Error prone • Only a full-text search
  • 7. SEARCH ENGINES 2017 1. Google Search Appliance • 2016 Google ended development 2. Episerver Find • Find has become key component in many web sites and in DXC platform 3. Lucene search • Episerver’s focus is in Find 4. Elasticsearch • Elastic has build a hole family of products around Elasticsearch
  • 8. CRAWLER VS EVENT-DRIVEN › Event-driven engines fits larger variation of use cases (projects) • Example access rights management, real-time • Might need more time to install › Crawler-driven engines often have lot’s of easy to use OOTB features • Don’t need that much customizing • Customizing is expensive Event-driven search engines fits nicely into Episerver projects
  • 9. EPISERVER FIND › First released with the name Truffler in the end of 2011 › Event-driven search engine › Build on top of Elasticsearch which is build on top of Lucene › Father is Joel Abrahamsson
  • 10. ELASTICSEARCH › Open source project › Build on top of Lucene and Java › Allows communication only through REST API and JSON › Various platforms have Client libraries to ease the communication (.NET, JAVA, JavaScript) › It’s build to be distributed and scalable search engine › Elasticsearch is a key product which has a hole family of products around (Kibane, Logstash, Beats, Monitoring, Alerting, Machine Learning)
  • 11. Most/all search engines are using Lucene in the background
  • 13. CASE: EVIRA › Evira is Finnish Food Safety Authority. • Lots of official documentation • Lots of content editors • Contains mostly text, documents, forms and table data • Low amount of images and rich content › Same project contains also intranet for Evira. So the same architecture was required to work with intranet case also.
  • 14.
  • 16. Search with URL -parameters Facet groups Ordering Search word highlights Customizable search results Did you mean this -feature Filters and Facets Easily customizable facets Fallback wildcard search File search for most common document types
  • 18. KEY COMPONENTS › Episerver CMS (Content editing, UI and master data store) • Platform for content editing with many languages, versioning, document bank, metadata, etc. • Master data and primary data source for Elasticsearch › Elasticsearch (Search and performance) • Global search and efficient way to query large data sets with full-text support › Azure (Cloud platform and scale) • Azure contains all the environments, files, data, backups, monitoring, maintenance jobs, etc.
  • 19. CUSTOMIZABLE PLATFORM › Episerver CMS • From medium size to very large projects • Easily customizable front-end and pluggable/extendable back-end › Elasticsearch • From the smallest to very large projects • Runs locally your laptop, buy it from the cloud or private data center › Azure Cloud • From the smallest to very large projects • IaaS and PaaS options
  • 20. On premise / Private cloudAzure IaaS on virtual machines (one or many) Developer’s laptop IIS Web Server SETUP OPTIONS PaaS on Azure App Services and Elastic Cloud SEARCH VM Elasticsearch Web Site SQL Database Blob Storage Elasticsearch Web ServerWeb Server SQL Server
  • 21. ELASTIC IS NOT JUST FOR SEARCH › It’s a performance tool. It makes querying large data sets much more efficient than tools like SQL Server or many other search tools › We use Elastic: › Global search › Internal searches and listings: Products news, announcements, comments, Files, RSS, sitemap › Handling a large datasets. Example some migrations. › Analytics and statistics • Site visitor analytics • Search usage analytics › 404 statistics › Error logging and log analyzing › Monitoring servers Full-text search, Listings, performance Analytics, statistics
  • 22.
  • 23.
  • 24.
  • 25. NOT A CRAWLER › We have integrated Elasticsearch to events of Episerver › Real-time (1 or 2 seconds latency) • Long latencies often cause multiple other problems › We can send more data than what’s visible (example access rights) Real-time is really hard gain if it’s not built into the architecture
  • 26. CQRS = COMMAND QUERY RESPONSIBILITY SEGREGATION
  • 27. CQRS WITH CMS (TRADITIONAL FORMAT) Commands Queries SQL Server database Elasticsearch Index Web Site Episerver CMS Elasticsearch
  • 28. CQRS WITH CMS (AS WE USE IT) Commands Queries Elasticsearch Index Web Site Episerver CMS Elasticsearch Simple Queries Episerver CMS SQL Server database
  • 29. CQRS WITH CMS (AS EPISERVER FIND USE IT) Commands Queries Episerver Find Index Web Site Episerver CMS Elasticsearch Simple Queries Episerver CMS SQL Server database returns only the id’s GetContenResults() -method
  • 30. FIND PROJECTIONS › CQRS pattern with traditional format › Querying Find without IContent › Does not use Episerver cache or database › All IContent (example BlogPost) properties do not exists in index or might not be up-to-date (example FriendlyUrl and AccessRights) var result = client.Search<BlogPost>() .Select(x => new SearchResult { Title = x.Title, Author = x.Author.Name}) .GetResult();
  • 31. WHY ELASTIC WITH CMS › Content Management Systems are generally good for managing content, files, content relations, hierarchy, language variations, content versions, access rights, user management, model type management and CACHING › They often have hierarchical structure of handling content › So querying a page and querying parent or child pages often comes straight from the cache and does not even make a database query. › But CMS often do not include good tools querying across hierarchies
  • 32. CHOOSE THE BEST TOOL › Use Episerver/CMS for simple queries • If you need to query: just one object, sibling objects or child objects from less than 2 hierarchy levels › Use Elasticsearch/Find • Everything else › Except don’t use Elasticsearch/Find: • If 1-2 second latency is too much • If there is some transactions requirements • If Find host exists in too far away (lag) or SLA requirements for the feature is higher than Find can provide (or use with cache)
  • 33. ELASTIC INDEX = QUERY DATABASE › We can always recreate elastic index from SQL Server “master data” › That’s why we don’t really need multiple nodes or chards Get all the data Elasticsearch Index Episerver CMS SQL Server database Reindex
  • 34. ELASTICSEARCH.NET & NEST › Official .NET Elasticsearch clients › ElasticSearch.NET & NEST makes the usage strongly typed: • No JSON • No typos • Every value has a type • IDE will help you var response = client.Search<Tweet>(s => s .From(0) .Size(10) .Query(q => q.Term(t => t.HashTags, "elasticsearch") ) ); public class Tweet { public string[] HasTags; ... } › Code example:
  • 35. MAPPINGS ARE LIKE SCHEMA IN DB › NEST will automatically map most of the types but not all: › Separate string types: • Text (analyzed) default type for strings • Keywords (not analyzed) Keyword fields are only searchable by their exact value › Automating the mappings will help a lot in long run public class Tweet { [Text] public string Content; [keyword] public string Url; [keyword] public string[] HashTags; ... } › Code example: › Mappings is normally generated automatically based on content you insert into index. But sometimes you need custom mappings.
  • 36. SCORING OPTIMIZATION › Boosting fields is the most important scoring customization › We normally have 3 fields which we boost with different values: • Titles (boost 2.0) • FullTextField (boost 1.5) • ExtraContent (boost 1.0)
  • 37. SCORING OPTIMIZATION › Script scoring allows us to boost results with certain properties: • Search result type • Number of internal links • Depth in hierarchy • Recently published / edited • Popularity by user visits › Requires that dynamic scripting is enabled from the Elasticsearch. All hosting partners won’t allow it.
  • 38. SUMMARY › Event-driven search engines fits nicely into Episerver projects › Episerver Find is build on top of Elasticsearch › Elasticsearch/Find fits with most CMSes because they lack good search tools › CQRS pattern will help with performance but choose wisely how to use it › Invest your platform that it’s customizable. So it fits your next project also.