SlideShare una empresa de Scribd logo
1 de 46
Descargar para leer sin conexión
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)
LUCENE
•
•
•
•

Tagline: “Proven Search Capabilities”
Free & Open Source
Created in 1999
Features:
• Indexes & Analyzes Data
• Tokenizing, Stemming, Filtering

• Search Queries
• Phrases, wildcards, proximity searches, ranges, fielded searches

• Relevance Scoring, Field Sorting
ELASTICSEARCH
•
•
•
•

Tagline: “You know, for Search”
Free & Open Source
Created by Shay Banon @kimchy
Versions
• First public release, v0.4 in February 2010
• A rewrite of earlier “Compass” project, w/ scalability built-in from the very core

• Latest release 0.90.5
• In Java, so inherently cross-platform
DISTRIBUTED & HIGHLY AVAILABLE
• Multiple servers (nodes) running in a cluster
• Acts as single service (internal routing)
• Data is split into shards (# shards is configurable)
• Zero or more replicas
• Replicas on different servers (server pools) for failover
• Node in cluster goes down? Replica takes over.
• Self managing cluster
• Automatic master detection + failover
• Responsible for distribution/relocating shards
$ cd ~/Downloads
$ wget https://download […] /elasticsearch-0.90.5.tar.gz
$ tar -xzf elasticsearch-0.90.5.tar.gz
$ cd elasticsearch-0.90.5/
$ ./bin/elasticsearch
Introduction To ElasticSearch (DamnData)
$ curl -XPUT http://localhost:9200/reddevils/matches/1 -d
'{"date": "2013-10-15T19:00:00Z", "opponent":
"Wales", "result": "1-1"}'
{"ok":true,"_index":"reddevils","_type":"matches","_id":"1","
_version":1}
$ curl -XPUT http://localhost:9200/reddevils/matches/2 -d
'{"date": "2013-10-11T15:00:00Z", "opponent":
"Croatia", "result": "1-2"}'
{"ok":true,"_index":"reddevils","_type":"matches","_id":"2","
_version":1}
$ curl -XPUT http://localhost:9200/reddevils/matches/2 -d
'{"date": "2013-10-11T15:00:00Z", "opponent":
"Croatia", "result": "1-2", "girlfriend_attention_span": 30}’
{"ok":true,"_index":"reddevils","_type":"matches","_id":"2","
_version":2}
Introduction To ElasticSearch (DamnData)
“Aha! A NoSQL store?!”
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)
QUERY DSL
• Full Text Search
• Search for “Croatia”
• Structured Search
• Search for “All matches where outcome was „1-1‟”
• Analytics
• Search for “Average attention span of my girlfriend”
• Incl. custom functions (scripts)

• … or a combination of those!
QUERY DSL (CONT‟D)
• Searching in your data set …
• queries: full text search & relevance scoring
• filters: exact matches
• Aggregating information from your data set …
• facets:
• Averages
• Sums
• Date histograms
•…
curl -XGET
'http://localhost:9200/reddevils/matches/_search?pretty=tru
e' -d '{
"query": {
"query_string": {
"query": "croatia"
}
}
}'
{
"took" : 18,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.40240064,
"hits" : [ {
"_index" : "reddevils",
"_type" : "matches",
"_id" : "2",
"_score" : 0.40240064, "_source" : {"date": "2013-10-11T15:00:00Z", "opponent": "Croatia", "result": "1-2"}
}, {
"_index" : "reddevils",
"_type" : "matches",
"_id" : "4",
"_score" : 0.3125, "_source" : {"date": "2012-09-11T15:00:00Z", "opponent": "Croatia", "result": "1-1"}
}]
}
}
curl -XGET
'http://localhost:9200/reddevils/matches/_search?pretty=true' -d
'{
"query": {
"constant_score": {
"filter": {
"term": {
"result": "1-1”
}
}
}
}
}’
curl -XGET
'http://localhost:9200/reddevils/matches/_search?pretty=true' -d
'{
"size": 0,
"facets": {
"opponent": {
"terms": {
"field": "opponent"
}
}
}
}'
{

…
"facets" : {
"opponent" : {
"_type" : "terms",
"missing" : 0,
"total" : 10,
"other" : 0,
"terms" : [ {
"term" : "wales”, "count" : 2
}, {
"term" : "serbia”, "count" : 2
}, … {
"term" : "croatia”, "count" : 2
}]
…
DOCUMENT RELATIONS
• ElasticSearch provides 2 mechanisms
• Parent/Child Documents
• add links between documents by defining parent/child ids.
• query example: “return children where parent matches x”
• use case: linking “product” and “offer” documents.
• query-time join
• Nested Documents
• use case: “actions” on a “mention” (Engagor)
• denormalized in Lucene index
• in Lucene index data is stored nearby
• thus local join, thus very fast.
• index-time join
Introduction To ElasticSearch (DamnData)
EXAMPLE EXPLAINED
•
•
•
•

range filter on publish_date
query_string w/ (internal version of) user defined query string
date_histogram facet on mention-document publish_date field
term_stats facet per action type on “delay” field nesteddocument “action” of mention-document
• result contains:
• amount of mentions with action
• amount of actions
• total delay of actions
• facet_filter per defined facet.
THE ENGAGOR SETUP
• Running ES since 2 years
• 1 billion social messages, sharded by client
• 20 nodes cluster
• 24GB RAM, 12-18 reserved for ES
• Main data source
• Other storage systems in place mainly for backup

• Usage:
• write heavy (indexing new data all the time, real time)
• less reads (no need for micro-optimizing read caches, yet)
• # updates on data depends on client use case
• social care and/or pure analytics
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)
3 lessons learned …
1/3: INDEXING SPEED
• Bulk Indexing is faster, obviously
• Less network overhead
• With RabbitMQ
• Handles peaks in data
• Allows us to slow down throughput to ES while still
consuming firehoses from our 3rd party services
• Bulk w/ Timeouts
• (so Engagor users get their messages near-realtime)
2/3: CHOOSE SHARDING STRATEGY
WISELY
• Plan # shards on expected growth, not on current set-up
• But, take care …
• We have several shards per monitored topic (related to #
customers and volume of data)
• Biggest problem in our cluster right now is big # shards
• Bugfixes in latest versions
• You can use “aliases” to create “virtual shards”/”windows on
shards”
3/3: TRY TO KEEP UP WITH RELEASES
• ElasticSearch is a young product
• 0.90 releases
• September 2013
• August 2013
• June 2013
• May 2013
• April 2013

• The 1.0 release is for early 2014.
• Updates help you
• Great improvements over every release
• Much needed bugfixes over every release
• Bonus Tip: + keep your JVM up to date
Introduction To ElasticSearch (DamnData)
“filtering, free text search & analytics
all in the same box”
“power of search and data-digging
in the hands of your users”
flexible and powerful open
source, distributed real-time search
and analytics engine for the cloud
$ sudo bin/service/elasticsearch stop
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)

Más contenido relacionado

Más de Jurriaan Persyn

An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the CloudJurriaan Persyn
 
Database Sharding At Netlog
Database Sharding At NetlogDatabase Sharding At Netlog
Database Sharding At NetlogJurriaan Persyn
 
Database Sharding at Netlog
Database Sharding at NetlogDatabase Sharding at Netlog
Database Sharding at NetlogJurriaan Persyn
 
Meet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogMeet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogJurriaan Persyn
 
Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Jurriaan Persyn
 

Más de Jurriaan Persyn (7)

An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Engagor Walkthrough
Engagor WalkthroughEngagor Walkthrough
Engagor Walkthrough
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the Cloud
 
Database Sharding At Netlog
Database Sharding At NetlogDatabase Sharding At Netlog
Database Sharding At Netlog
 
Database Sharding at Netlog
Database Sharding at NetlogDatabase Sharding at Netlog
Database Sharding at Netlog
 
Meet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogMeet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: Netlog
 
Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)
 

Último

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 

Último (20)

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 

Introduction To ElasticSearch (DamnData)

  • 9. LUCENE • • • • Tagline: “Proven Search Capabilities” Free & Open Source Created in 1999 Features: • Indexes & Analyzes Data • Tokenizing, Stemming, Filtering • Search Queries • Phrases, wildcards, proximity searches, ranges, fielded searches • Relevance Scoring, Field Sorting
  • 10. ELASTICSEARCH • • • • Tagline: “You know, for Search” Free & Open Source Created by Shay Banon @kimchy Versions • First public release, v0.4 in February 2010 • A rewrite of earlier “Compass” project, w/ scalability built-in from the very core • Latest release 0.90.5 • In Java, so inherently cross-platform
  • 11. DISTRIBUTED & HIGHLY AVAILABLE • Multiple servers (nodes) running in a cluster • Acts as single service (internal routing) • Data is split into shards (# shards is configurable) • Zero or more replicas • Replicas on different servers (server pools) for failover • Node in cluster goes down? Replica takes over. • Self managing cluster • Automatic master detection + failover • Responsible for distribution/relocating shards
  • 12. $ cd ~/Downloads $ wget https://download […] /elasticsearch-0.90.5.tar.gz $ tar -xzf elasticsearch-0.90.5.tar.gz $ cd elasticsearch-0.90.5/ $ ./bin/elasticsearch
  • 14. $ curl -XPUT http://localhost:9200/reddevils/matches/1 -d '{"date": "2013-10-15T19:00:00Z", "opponent": "Wales", "result": "1-1"}' {"ok":true,"_index":"reddevils","_type":"matches","_id":"1"," _version":1}
  • 15. $ curl -XPUT http://localhost:9200/reddevils/matches/2 -d '{"date": "2013-10-11T15:00:00Z", "opponent": "Croatia", "result": "1-2"}' {"ok":true,"_index":"reddevils","_type":"matches","_id":"2"," _version":1}
  • 16. $ curl -XPUT http://localhost:9200/reddevils/matches/2 -d '{"date": "2013-10-11T15:00:00Z", "opponent": "Croatia", "result": "1-2", "girlfriend_attention_span": 30}’ {"ok":true,"_index":"reddevils","_type":"matches","_id":"2"," _version":2}
  • 18. “Aha! A NoSQL store?!”
  • 21. QUERY DSL • Full Text Search • Search for “Croatia” • Structured Search • Search for “All matches where outcome was „1-1‟” • Analytics • Search for “Average attention span of my girlfriend” • Incl. custom functions (scripts) • … or a combination of those!
  • 22. QUERY DSL (CONT‟D) • Searching in your data set … • queries: full text search & relevance scoring • filters: exact matches • Aggregating information from your data set … • facets: • Averages • Sums • Date histograms •…
  • 23. curl -XGET 'http://localhost:9200/reddevils/matches/_search?pretty=tru e' -d '{ "query": { "query_string": { "query": "croatia" } } }'
  • 24. { "took" : 18, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.40240064, "hits" : [ { "_index" : "reddevils", "_type" : "matches", "_id" : "2", "_score" : 0.40240064, "_source" : {"date": "2013-10-11T15:00:00Z", "opponent": "Croatia", "result": "1-2"} }, { "_index" : "reddevils", "_type" : "matches", "_id" : "4", "_score" : 0.3125, "_source" : {"date": "2012-09-11T15:00:00Z", "opponent": "Croatia", "result": "1-1"} }] } }
  • 25. curl -XGET 'http://localhost:9200/reddevils/matches/_search?pretty=true' -d '{ "query": { "constant_score": { "filter": { "term": { "result": "1-1” } } } } }’
  • 26. curl -XGET 'http://localhost:9200/reddevils/matches/_search?pretty=true' -d '{ "size": 0, "facets": { "opponent": { "terms": { "field": "opponent" } } } }'
  • 27. { … "facets" : { "opponent" : { "_type" : "terms", "missing" : 0, "total" : 10, "other" : 0, "terms" : [ { "term" : "wales”, "count" : 2 }, { "term" : "serbia”, "count" : 2 }, … { "term" : "croatia”, "count" : 2 }] …
  • 28. DOCUMENT RELATIONS • ElasticSearch provides 2 mechanisms • Parent/Child Documents • add links between documents by defining parent/child ids. • query example: “return children where parent matches x” • use case: linking “product” and “offer” documents. • query-time join • Nested Documents • use case: “actions” on a “mention” (Engagor) • denormalized in Lucene index • in Lucene index data is stored nearby • thus local join, thus very fast. • index-time join
  • 30. EXAMPLE EXPLAINED • • • • range filter on publish_date query_string w/ (internal version of) user defined query string date_histogram facet on mention-document publish_date field term_stats facet per action type on “delay” field nesteddocument “action” of mention-document • result contains: • amount of mentions with action • amount of actions • total delay of actions • facet_filter per defined facet.
  • 31. THE ENGAGOR SETUP • Running ES since 2 years • 1 billion social messages, sharded by client • 20 nodes cluster • 24GB RAM, 12-18 reserved for ES • Main data source • Other storage systems in place mainly for backup • Usage: • write heavy (indexing new data all the time, real time) • less reads (no need for micro-optimizing read caches, yet) • # updates on data depends on client use case • social care and/or pure analytics
  • 37. 1/3: INDEXING SPEED • Bulk Indexing is faster, obviously • Less network overhead • With RabbitMQ • Handles peaks in data • Allows us to slow down throughput to ES while still consuming firehoses from our 3rd party services • Bulk w/ Timeouts • (so Engagor users get their messages near-realtime)
  • 38. 2/3: CHOOSE SHARDING STRATEGY WISELY • Plan # shards on expected growth, not on current set-up • But, take care … • We have several shards per monitored topic (related to # customers and volume of data) • Biggest problem in our cluster right now is big # shards • Bugfixes in latest versions • You can use “aliases” to create “virtual shards”/”windows on shards”
  • 39. 3/3: TRY TO KEEP UP WITH RELEASES • ElasticSearch is a young product • 0.90 releases • September 2013 • August 2013 • June 2013 • May 2013 • April 2013 • The 1.0 release is for early 2014. • Updates help you • Great improvements over every release • Much needed bugfixes over every release • Bonus Tip: + keep your JVM up to date
  • 41. “filtering, free text search & analytics all in the same box”
  • 42. “power of search and data-digging in the hands of your users”
  • 43. flexible and powerful open source, distributed real-time search and analytics engine for the cloud

Notas del editor

  1. Good afternoonMy name is Jurriaan,And I want to thank Thijs for inviting me to speak on this conference.
  2. I want to introduceElasticSearch to you.Worked with ElasticSearch for the last 2 yearsAt a company called EngagorSocial media monitoring and management tool.We’re based in Gent, have an office in SFO and are a team of 25 people now, 10 technicalInstead of diving into the technical details firstI want to start with showing you one of the coolestthings we’ve built with it so far
  3. Engagor is basically a huge database of social messages. (profiles, keyword searches)Our clients use Engagor to address those messages, like replying to it, or assigning it to a team member or adding metadata.This is a page in Engagor where you see the amount of incoming messages per day and how often and how fast they are being replied to.The dataset is about 40k social messages, data from the last 28days.The purple bars are the response times per day.And there’s graphs with details on response times during and outside business hoursOur clients use this to evaluate performance of the customer support they deliver eg through twitter & facebook
  4. This is the same page, but now of data from the last 3 months (140k messages) and grouped by week.Andthey use it to improve there response times … As you can see here.
  5. Not only this, but our users can also drill down and search and filter into the dataset.Here’s is a filter for messages with a certain tag, negative sentiment and from a certain region.They might want to have a better response time for certain times of messages. Give priorities.This filter can then be applied on the previous page, and you get statistics about the subset of data in real time.
  6. This shows the page in our debug mode, showing a bunch of statistics about the page that’s rendered.This particular request was 32ms.But we see that on average, and also for bigger acounts, when searching in millions of messages we get great performance.And it’s realtime. As soon as a message comes in, action is done, it’s in these statistics. No pre-processing.
  7. From the what and who to the how …Main component of Engagor is ElasticSearchWhat I want to do for the next 20 minutes is… quickly go over a few verybasic ES things… explain how the example I started with is implemented… and finish with some of the lessons we learned.
  8. If we talk ES, we have to talk LuceneThat’s the search engineElasticSearchis built on top of Lucene. Noticethe flat design of that Lucene logo. This was made in 1999
  9. Apache Foundation projectLucene is a proven technology for search indices.
  10. ES joined in 2010.And it was built to be a full featured search product, with scalability features built in from the bottom.
  11. What does that mean?
  12. So, how do we get started …PrettysimpleDownload, unzip, start.
  13. When it’s running. You can view that it’s running in your browser.The easiest way to interact with ElasticSearch is through it’s REST api.It’s JSON in and JSON out.
  14. Example of adding something to ElasticSearch from your command shell via a HTTP PUT request done by curl.You can do this right after installation. No need to create an index or configure anything, just add data right away.ElasticSearch is smart about what type of data you’re giving it.
  15. Adding a second record (document).
  16. Now we’re adding some more interesting data to the mix …Updating an existing record. (Mind the new version number.)(An update is an atomic DELETE & PUT.)
  17. Want to see view a document?HTTP GETThe url consist of index, type & id.
  18. So, what we have right now is a NoSQL store
  19. Yes, ES is a NoSQL store(and you could use it to replace your current type of data source, but I’m not saying you should)But that’s not the field where ES shines.
  20. ES is for search. So let’s do a search.Here we do a GET request (in the browser)that searches our newly created index for the word “croatia”It returns the match from last Friday.
  21. The language for searching in your ES cluster is called the Query DSLThere’s 3 different basic types of searches
  22. In ES terms this maps to the following words …
  23. By now I’ve added all matches from the qualification campaign, and I will do a fulltext search for croatia.The query string supports everything you can do with Lucene, so that includesWildcardsNear searchesField restrictions
  24. And these are the results. We played Croatia twice, and won once.
  25. This is a filtered search, where we will only get back exact matches.If you can, it’s better to use ES filters. Since they can than be cached by ES.
  26. This ES HTTP Request will facet our data on the opponent field.Thus returning how often we played each opponent.
  27. Which is obviously, now we’re qualified, 2 times against each of the other 5 teams in our group.That covers it for the 3 basic types of searches.
  28. I need to explain one more thing before we can go back to the example we started with. And that’s document relations.The equivalent of Mysql join2 typesWe are using nested documents for mentions & actions
  29. So, if you remember this screenshot …This is a single ES call …How is that call set-up?
  30. What setup and hardware is needed to make this work for Engagor?
  31. With all of this, on a typical day this is how ES our dashboard looks like …Lots of greenBlue indicating the current master
  32. And if by now you’re thinking …I don’t believe you.Well, you’re right.
  33. There have been days where it looked like this
  34. ANIMATIONAnd where server density – our monitoring platform – looked like this.We’ve seen servers with load averages up to 1800.I wonder if that’s a record setting value?Getting the full set-up and configuration right, is a bit more work then unzipping the software and starting up 20 nodes
  35. So I want to move on to some lessons learned
  36. Our set-up (firehose)RabbitMQ in frontsometimes (when relocating, initializing, recovering from …) indexing slows down
  37. I want to end with 2 quotes from a presentation from this summer On whyElasticSearch was builtWhat it’s goals were …
  38. Use the right tool for the jobES does filtering, free text & analytics in a single bogThat’s definitely easier then having to move big piles of data between different systems
  39. When I look at the features we can build for our clients, it definitely does that for us.Thanks to elasticsearch we’ve been able to offer our clients features like this,and it’s interesting to see which queries they’re using for their day to day business.
  40. And I’m not sure when exactly this happenedProbably around the time of their hugefundingBut the tagline is no longer“You know, for search” …But is now fully buzzword compliantAnd on that bombshell
  41. Ready to dive in?