Based on the experience of an ElasticSearch implementation at bol.com, we'll discuss the consequences of different modes of operation of ElasticSearch in an environment of existing SQL databases. How can you connect ElasticSearch to change queues of other databases, how can the versioning mechanism be used to implement optimistic locking, and what are the consistency consequences of using ElasticSearch as either a free text index on external data, a data cache or as the single source-of-truth system?
2. agenda
• Introduction
• Bol.com Plaza / Square project
• Using ElasticSearch in a mixed DB landscape
– ES as a DB free-text index or as a separate DB
• Consistency issues and solutions
• Lessons learned
3. bol.com
• Leading ecommerce platform inThe Netherlands and Belgium
– 5M active customers
– 1M visits every day
– 9M products
– €680M revenue
• Growing (pains)
– 750 employees, 37 scrum teams
– moving towards continuous deployment, team independence
• Plaza / Square Seller platform
– 7k sellers, 16% of total revenue
4.
5.
6. Square ElasticSearch
• Using ElasticSearch to combine Offer and Product information
– Offers from Oracle
– Products from MongoDb
• ReplacingOracle SQL queries
– Too slow for faceting and result sets (for sellers with over 2k offers)
• About 12M productoffer documents
• Scala,Team 1B
• ElasticSearch 1.4
– With Search, Master and Data nodes
• In production now, rolling out to sellers
9. option: right
• ElasticSearch as a free-text DB index on Offers
• DB update update ES too
– In the same ‘transaction’
• Benefits
– easier
• Drawbacks
– Less service independence
– Slower (b/c refresh)
SDD
SDD
PCS
PCS
STEP
SSY
ES
10. option: left
SDD
SDD
PCS
PCS
STEP
SSY
ES
• ElasticSearch as a separate database
• Updates from DB sent to ES via async queues
• Benefits
– Architecture more loosely coupled
– Search performance
• Drawbacks
– some latency between DB and ES: eventual consistency
17. “immediate” consistency?
• Relational databases
– User view vs. DB view
– Take it or leave it
– Only vertical scaling
• ElasticSearch
– Read snapshots by
refresh interval
– Caching
– Write once, read many
user 1 db user 2
START TRANSACTION;
UPDATE OFFERS SET STOCK=1 WHERE ID=42;
COMMIT TRANSACTION;
18. sources of temporal inconsistencies
• Internal inconsistencies
– within ElasticSearch
• External inconsistencies
– nature of ElasticSearch
– between Database and ElasticSearch
– between User expectations and Application behavior
19. send data to index API
receives new data
updates index
quorum says ‘ok’
app master replica
got ‘ok’
user
curl -XPOST localhost:9200/demo/drinks -d
'{brand:"Glenlivet", age:18}’
{"_index":"demo","_type":"drinks","_id":"AUxKuw5pxgWzNUrImnD4
","_version":1,"created":true}
21. influencing search refresh
• Set index.refresh_interval
curl -XPUT localhost:9200/demo/_settings -d
'{index:{refresh_interval:"30s"}}’
• Refresh on demand
curl -XPOST localhost:9200/demo/_refresh
• Refresh after index (be careful!)
curl -XPOST
'localhost:9200/demo/drinks?refresh=true' -d
'{brand:"Famous Grouse", age:12}’
22. dealing with search delay
For a user updating a single item in the UI
• On the client
– Wait until refresh_interval has passed before searching again
– Do a get-by-id for changed item (=real time)
• And only change the single item (but: aggregations out sync)
• On the server
– Wait until refresh_interval has passed
– Show a “done” message and hope user is slow
– Refresh all searchers upon index (all searches slower!)
– Add queue priority
– Update ES too
• Or: accept eventual consistency
23. app ES dbqueue
async queue issue
Measure DB ES latency
{drinks: { _timestamp: {enabled: true, store: 'yes'}}}
localhost:9200/demo/_search?fields=_timestamp,_version,_source
26. app ES dbqueue
queue order issue
• Only update if newer (w/ optimistic locking)
– read (with _version) update index (with expected _version) retry
• version_type=external, use DB last-modified timestamp
curl -XPUT
localhost:9200/demo/drinks/1?version=1427279177904&version_type=
external -d '{brand: "Glenlivet", age: 12}'
27. conclusions
• Compromises hurt someone
• Are you sure you want an eventual-consistent
database?
– Lots of patch work needed by bol.com…
– Choose left, make it look like you chose right
• In real-life, consistency concerns
– more than just ES-writes
– Also ES-reads
– How to get data in and keep fresh influences
DBES
DBES
right: as a free-text index
left: as a separate DB
28. ES Consistency
knobs to control “consistency level”
eventualimmediate
faster
slower
1
4
2
3
1. Optimistic
locking &
refresh=true
2. -
3. -
4. Eventually
consistent
31. lessons learned
• Make assumptions even more clear
• There is more to eventual consistency than you think
– User-oriented round-trip consistency latency in a mixed DB
context
• Use the ES knobs and dials to make it
– as consistent as you need
– while keeping it as fast as you can
• You have to know what you’re doing
32. thank you
@anneveling
‘t is een kwestie van geduld
rustig wachten op de dag
dat heel Holland Elasticsearch lult
dat heel Holland Elasticsearch lult
eventually: Elasticsearch.