This presentation slide is a condensed theoretical overview of Elasticsearch prepared by going through the official ES Definitive Guide and Practical Guide.
3. Elasticsearch: An Introduction
Written in Java, open source, based on Apache Lucene
https://github.com/elastic/elasticsearch
Document storage
Format: JSON
Full-text search engine
Full-text search?
Every doc, every word
Search large dataset in few seconds
How?
Via Inverted Index, Distributed Nature
Analytics Platform
Aggregations and analysis
4. Use Cases Where ES
Overshadows DB
Full-text search is more efcient in ES
due to fexible indexing.
Relevance based searching
5. Use Cases Where ES
Overshadows DB
Searching when entered spelling is
wrong
Synonym based search
Phonetic based search
Use of distributed architecture
Works well with unstructured data
7. How does Elasticsearch Work?
Querying Document
Via JSON Based REST API
HTTP Request Method (Get, Put, Post, Delete)
REST Client
(e.g:
Insomnia)
REST
API
Elasticsearch
JSON
Request
JSON
Response
JSON
Response
JSON
Request
8. All in All
Easy to get started with
Complex technology if its full potential is
to be used
By far, the hottest search engine in
market used by a huge community
11. When Not To Use ES: Use
Cases
Data Storage
No/Rare/Simple Analysis
Analysis on single value text-felds
(usernames, zip-codes), value lookups
Huge computations (extensive
preprocessing and transformations)
13. Types of Scaling
Vertical Scaling Horizontal Scaling
Scaling Up Scaling Out
Increasing size of a machine Having multiple machines
Has limits Real power of distributed system
comes from here
15. Architecture of Elasticsearch
Nodes
Can carry out indexing and searching
Every node is aware of each other
Every node can forward request to any other node in the cluster.
Every node can accept HTTP request from REST clients.
Every node as its own unique name (UUID).
First seven characters used as node id. Persists even after restart.
Node is considered as running instance of Elasticsearch
Categories of Dedicated Nodes:
Master Node
Data Node
Ingest Node
Coordinating Node
By default, a node is master eligible, data and ingest node
16. Architecture of Elasticsearch
Indices and Types
Parallel concepts between Databases and Elasticsearch
Change in latest ES version : 6.5
Database Table Index
Table Type
17. Index name, type name and
feld name rules
Lowercase only
Cannot include , / , * , ? , " , < , > , | ,
space (the character, not the word), , , #
Indices prior to 7.0 could contain a colon
( : ), but that's been deprecated and won't
be supported in 7.0+
Cannot start with - , _ , +
Cannot be . or ..
Cannot be longer than 255 characters.
18. Sharding
Size of single index exceeds physical
capacity of available nodes
Example:
Each Node: 512 MB
Size of Index: 1 TB
Sharding comes to the rescue during
such cases of bottleneck.
19. Sharding
Advantages:
Enables adjusting with growing amount of data
Better throughput in cases where shards are distributed to multiple nodes
Parallel execution of queries across nodes possible
20. Replication
What if a node fails?
Is there any fault tolerance mechanism in ES?
YES, via Replication
Replication means duplicating available shards
For high availability/ fault tolerance
For better throughput (provided hardware is available)
Shard that is replicated-> Primary Shard
Replicated version of shared->Replica Shard
Replication Group= Primary shard + Its Replicas
21. Defaults
Cluster Name: elasticsearch
Number of shards per index: 5
Number of replicas: 1 for each shard
27. Monitoring Cluster Health
localhost:9200/_cluster/health
Statu
s
Reason
Gree
n
All the shards are properly
assigned/allocated to
nodes.
Yello
w
Some/All of the shard’s
replicas are unassigned.
Red Specifc primary shard is
unassigned/unallocated.
In Shard Level:
Index Health: Worst Shard Status
Cluster Health: Worst Index Status
29. Document Management
Simple Index Creation
PUT /<index-name>
Similar to creation of table in database (if
we are to consider from ES V_6.X)
Creating Index with Setting
{ "settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
} }
30. File Directory Structure
The frst time you install ES and run it,
you are running an instance of ES, i.e., a
node.
data
Elasticsearch
Nodes
0
_state
global-<version>.st (contains node/cluster settings)
node.lock (so that only one ES instance writes to
the directory at a time)
31. Index Creation Leads To
Inside node, a new indices folder
appear.
indices
<index-name>/<uuid> (you can fnd this
uuid inside localhost:9200/_cluster/state
-> metadata key->indices key
0 … 5 (shards, default number)
_state
state-<version>.st (certain index’s
metadata/setting)
32. Document Management
Creating/Indexing/Inserting a new document
PUT /<index-name>/_doc/1
{“name”:”Basics of Elastic Stack”,
“course”:”Searching and Analytics”
“price”:500}
POST /<index-name>/_doc
{
"name": "Umagi",
"course": "Fiction",
"price": 2000
}
33. What actually happens when we create a
new document?
In-Memory Indexing
Bufer
Transaction Log
File System Cache
Disk
• Refresh Rate (Default 1 sec)
{“settings”:{“refresh-interval”:”30s”}}
• File System Cache: Segment Creation
• Disk: Segments fushed into commit point