SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
ElasticSearch 5.x - New Tricks - 2017-02-08 - Elasticsearch Meetup
1. Roma – 8 Febbraio 2017
presenta Alberto Paro, Seacom
ElasticSearch 5.x
New Tricks
2. Alberto Paro
Laureato in Ingegneria Informatica (POLIMI)
Autore di 3 libri su ElasticSearch da 1 a 5.x + 6 Tech
review
Lavoro principalmente in Scala e su tecnologie BD
(Akka, Spray.io, Playframework, Apache Spark) e NoSQL
(Accumulo, Cassandra, ElasticSearch e MongoDB)
Evangelist linguaggio Scala e Scala.JS
3. Tip 1: Shrink - 1/5
Why?
The wrong number of shards during the initial
design sizing. Often sizing the shards without
knowing the correct data/text distribution tends to
oversize the number of shards
Reducing the number of shards to reduce memory
and resource usage
Reducing the number of shards to speed up
searching
4. Tip 1: Shrink - 2/5 - Where is your data?
We can retrieve it via the _nodes API:
curl -XGET 'http://localhost:9200/_nodes?pretty'
In the result there will be a similar section:
.... "nodes" : {
"5Sei9ip8Qhee3J0o9dTV4g" : {
"name" : "Gin Genie",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "5.1.1",....
The name of my node is Gin Genie
5. Tip 1: Shrink - 3/5 - Relocate your data
We can change the index settings, forcing allocation to a single node for
our index, and disabling the writing for the index.
curl -XPUT 'http://localhost:9200/myindex/_settings' -d ’
{
"settings": {
"index.routing.allocation.require._name": "Gin Genie", "index.blocks.write":
true
}
}’
We can check for the green status:
curl -XGET 'http://localhost:9200/_cluster/health?pretty'
6. Tip 1: Shrink - 4/5 – Shrink our shards
We need to disable the writing for the index via:
curl -XPUT 'http://localhost:9200/myindex/_settings?index.blocks.write=true'
The shrink call for creating the reduced_index, will be:
curl -XPOST 'http://localhost:9200/myindex/_shrink/reduced_index' -d '{
"settings": {
"index.number_of_replicas": 1,
"index.number_of_shards": 1,
"index.codec": "best_compression”
},
"aliases": {"my_search_indices": {}}
}'
7. Tip 1: Shrink - 5/5 – Post Shrinking
We can also wait for a yellow status if the index it is ready to work:
curl -XGET 'http://localhost:9200/_cluster/health? wait_for_status=yellow’
Now we can remove the read-only by changing the index settings:
curl -XPUT 'http://localhost:9200/myindex/_settings? index.blocks.write=true'
8. Tip 2: Reindex - 1/2
Why?
Changing an analyzer for a mapping
Adding a new subfield to a mapping and you need
to reprocess all the records to search for the new
subfield
Removing an unused mapping
Changing a record structure that requires a new
mapping
10. Tip 3: Update By Query with painless
Add a new Field
1. Create your mapping (i.e modified: date)
2. Call an update by query
curl -XPOST http://$server/$index/$mapping/_update_by_query -d '{
"script": {
"inline": "ctx._source.modified="2015-10-06T00:00:00.000+00:00"",
"lang": "painless”
},
"query": {
"bool": {"must_not":[{"exists":{"field":"modified"} }]}
}
}'
12. Tip 5: Reindex for a remote node – 1/2
Why?
The backup is a safe Lucene index copy, so it depends on the
Elasticsearch version used. If you are switching from a version
of Elastisearch that is prior to version 5.x, it's not possible to
restore old indices.
It's not possible to restore backups of a newer Elasticsearch
version in an older version. The restore is only forward-
compatible.
It's not possible to restore partial data from a backup.
13. Tip 5: Reindex for a remote node – 2/2
In config/elasticsearch.yml add:
reindex.remote.whitelist: ["192.168.1.227:9200"]
Then:
curl -XPOST "http://$server/_reindex" -d' {
"source": {
"remote": { "host": "http://192.168.1.227:9200" },
"index": "test-source”
},
"dest": {
"index": "test-dest”
}
}'
14. Tip 6: Ingest Pipeline – 1/2
Why
Adding/Removing fields without changing your code
Manipulate your records before ingesting
Computed fields
Also supports scripting