SlideShare una empresa de Scribd logo
1 de 51
Descargar para leer sin conexión
Elasticsearch:
first steps with an
Aggregate-oriented
database
Jug Roma
28/11/2013
Matteo Moci
Me
Matteo Moci
@matteomoci
http://mox.fm
Software Engineer
R&D, new product development
Agenda
• 2 Use cases
• Elasticsearch Basics
• Data Design for scaling
Social Media Analytics Platform
for Marketing Agencies
Scenario

• Using Elasticsearch as:
• Analytics engine
Aggregate repository
•
Use case 1

• count values distribution over
time
Before

• ~10M documents
• Heaviest query:
~10 minutes
•
• Our staff had a problem
After

• ~10M documents
• Heaviest query:
~1 second (also with larger
•
dataset)
Use case 2
• Aggregate-oriented repository
• ...as in DDD

http://ptgmedia.pearsoncmg.com/images/chap10_9780321834577/elementLinks/10fig05.jpg
Elasticsearch
Distributed RESTful search and analytics
real time data and analytics
distributed
high availability
multi tenancy
full-text search
schema free
RESTful, JSON API
Elasticsearch basics
• Install
• API
• Types mapping
• Facets
• Relations
Install
$ wget https://
download.elasticsearch.org/...
$ tar -xf
elasticsearch-0.90.7.tar.gz
Run!
Run!
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f

es
Hulk
Run!
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
es
Hulk
Run!
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
es
Hulk

Thor
Index a document
$ curl -X PUT localhost:9200/
products/product/1 -d '{
"name" : "Camera"
}'
Search
$ curl	‐X	GET 'localhost:9200/
products/product/_search?
q=Camera'
Shards and Replicas
es
Hulk
Products
1

2

1

2
Shards and Replicas
es
Hulk
Products

Thor

1

2

1

2
Shards and Replicas
es
Hulk
Products

Thor
Products

1

2

1

2
Shards and Replicas
es
Hulk
Products

Thor
Products
2

1
1

2
Shards and Replicas
es
Hulk
Products

Thor
Products
2

1
2

1
Integration

Hulk

Thor
9300

9300
Integration
TransportClient

Hulk

Thor
9300

9300
Async Java API
this.client.prepareGet("documents", "document", id)
//async, non blocking APIs
//use a listener to handle result. non-blocking
.execute(new ActionListener<GetResponse>() {
@Override
public void onResponse(GetResponse
getFields)
{
//
}
@Override
public void onFailure(Throwable e) {
//
}
Mapping
Mappings define how primitive
types are stored and analyzed
Mapping
• JSON data is parsed on indexing
• Mapping is done on first field indexing
• Inferred if not configured (!)
• Types: float, long, boolean, date

(+formatting), object, nested
• String type can have arbitrary analyzers
• Fields can be split up in more fields
"text": {
"type": "multi_field",
"fields": {
"text": {
"type": "string",
"index": "analyzed",
"index_analyzer": "whitespace",
"analyzer": "whitespace"
},
"text_bigram": {
"type": "string",
"index": "analyzed",
"index_analyzer": "bigram_analyzer",
"search_analyzer": "bigram_analyzer"
},
"text_trigram": {
"type": "string",
"index": "analyzed",
"index_analyzer": "trigram_analyzer",
"search_analyzer": "trigram_analyzer"
Mapping - lessons
• schema can evolve (e.g. add fields)
• inferred if not specified (!)
• worst case: reindex
• use aliases to enable zero downtime
Search with Facets
final TermsFacetBuilder userFacet =
FacetBuilders.termsFacet(MENTION_FACET_NAME)
.field(USER_ID).size(maxUsersAmount);
SearchResponse response;
response = client.prepareSearch(Indices.USERS)
.setTypes(USER_TYPE)
.setQuery(someQuery).setSize(0)
.setSearchType(SearchType.COUNT)
.addFacet(userFacet).execute().actionGet()
;
final TermsFacet facets = (TermsFacet)
response.getFacets().facetsAsMap()
.get(MENTION_FACET_NAME);
Query

Facets
Date Histogram Facet
The histogram facet works with numeric data by
building a histogram across intervals of the field values.
Each value is placed in a “bucket”
{
 
 
 
 
 
 
 
 
 
 
 
}

 
 
 
 
 
 
 
 
 
 
 

"query" : {
    "match_all" : {}
},
"facets" : {
    "histo1" : {
        "histogram" : {
            "field" : "followers",
            "interval" : 10
        }
    }
}
Facets - lessons
•

•
•

Bug in 0.90.x:
https://github.com/elasticsearch/elasticsearch/
issues/1305*
Solutions:
use 1 shard
ask for top 100 instead of 10
*will be solved in 1.0 with aggregation
module
Analyzers
A Lucene analyzer consists of a tokenizer and
an arbitrary amount of filters (+ char filters)
{
"index":{
"analysis":{
"filter":{
"bigram_shingle_filter":{
"type":"shingle",
"max_shingle_size":2,
"min_shingle_size":2,

...
"analyzer":{
"bigram_analyzer":{
"tokenizer":"whitespace",
"filter":[
"standard",
"bigram_shingle_filter"
]
},
"trigram_analyzer":{
"tokenizer":"whitespace",
"filter":[
"standard",
"trigram_shingle_filter"
]
}

"output_unigrams":"false",
"output_unigrams_if_no_shingles":"fal
se"
},
"trigram_shingle_filter":
{
"type":"shingle",
"max_shingle_size":3,
"min_shingle_size":3,

}
}

"output_unigrams":"false",
"output_unigrams_if_no_shingles":"fal
se"
}
} ...

}
}
Relations between
Documents
Author

1

N

Book

• nested: faster reads, update needs reindex, cross object

match
• parent/child: same shard, no reindex on update, difficult
sorting
Nested Documents
Specify Book type is “nested” in Author’s Mapping
We can query Authors with a query on properties
of nested Books
“Authors who published at least a book with
Penguin, in scifi genre”
curl -XGET localhost:9200/authors/nested_author/
_search -d '
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"nested": {
"path": "books",
"query":{
"filtered": {
"query": { "match_all": {}},
"filter": {
"and": [
{"term": {"books.publisher":
"penguin"}},
{"term": {"books.genre": "scifi"}}
]
Parent and Child
Indexing happens separately
Specify _parent type in Child mapping (Book)
When indexing Books, specify id of Author
curl -XPOST localhost:9200/authors/book/_mapping -d
'{
"book":{
"_parent": {"type": "bare_author"}
}
}'

curl -XPOST localhost:9200/authors/book/1?parent=2 -d
'{
"name": "Revelation Space",
"genre": "scifi",
"publisher": "penguin"
}'
Parent and Child query
curl -XPOST localhost:9200/authors/bare_author/
_search -d '{
"query": {
"has_child": {
"type": "book",
"query" : {
"filtered": {
"query": { "match_all": {}},
"filter" : {
"and": [
{"term": {"publisher": "penguin"}},
{"term": {"genre": "scifi"}}
]
Data Design
Index Configurations
• One index “per user”
• Single index
• SI + Routing: 1 index + custom doc routing
•

to shards
Time: 1 index per time window *

* we can search across indices
One Index per user
Hulk

Thor

User1 s0

User1 s1

User2 s0

+ different sharding per user
- small users own (and cost) at least 1 shard
Single Index
Hulk

Thor

Users s0

Users s3

Users s2

+ filter by user id, support growth
- search hits all shards
Single Index + routing
Hulk

Thor

Users s0

Users s3

Users s2

+ a user’s data is all in one shard,
allows large overallocation
Index per time range
Hulk

Thor

2013_01 s1

2013_01 s2

2013_02 s1

+ allows change in future indices
Data Design - lessons
Test, test, test your use case!
Take a single node with one shard and
throw load at it, checking the shard capacity
The shard is the scaling unit:
overallocate to enable future scaling
#shards > #nodes
...ES has lots of other
features!
• Bulk operations
• Percolator (alerts, classification, …)
• Suggesters (“Did you mean …?”)
• Index templates (Automatic index
•
•
•

configuration)
Monitoring API (Amount of memory used,
number of operations, …)
Plugins
...
Thanks!
@matteomoci
http://mox.fm

Más contenido relacionado

La actualidad más candente

Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
Tom Z Zeng
 

La actualidad más candente (20)

Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Elastic search Walkthrough
Elastic search WalkthroughElastic search Walkthrough
Elastic search Walkthrough
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 

Similar a Elasticsearch first-steps

1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
Gabriel Moreira
 

Similar a Elasticsearch first-steps (20)

Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
Elasticsearch - basics and beyond
Elasticsearch - basics and beyondElasticsearch - basics and beyond
Elasticsearch - basics and beyond
 
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
 
Extensible RESTful Applications with Apache TinkerPop
Extensible RESTful Applications with Apache TinkerPopExtensible RESTful Applications with Apache TinkerPop
Extensible RESTful Applications with Apache TinkerPop
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
Relevance trilogy may dream be with you! (dec17)
Relevance trilogy  may dream be with you! (dec17)Relevance trilogy  may dream be with you! (dec17)
Relevance trilogy may dream be with you! (dec17)
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
Elasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics engineElasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics engine
 
Schema design
Schema designSchema design
Schema design
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Elasticsearch first-steps