Elasticsearch: An Overview

Basics on
Elasticsearch
Ruby Shrestha

Elasticsearch: An Introduction
 Written in Java, open source, based on Apache Lucene
 https://github.com/elastic/elasticsearch
 Document storage
 Format: JSON
 Full-text search engine
 Full-text search?
 Every doc, every word
 Search large dataset in few seconds
 How?
 Via Inverted Index, Distributed Nature
 Analytics Platform
 Aggregations and analysis

Use Cases Where ES
Overshadows DB
 Full-text search is more efcient in ES
due to fexible indexing.
 Relevance based searching

Use Cases Where ES
Overshadows DB
 Searching when entered spelling is
wrong
 Synonym based search
 Phonetic based search
 Use of distributed architecture
 Works well with unstructured data

How does Elasticsearch Work?
 Data stored as document
 Format: JSON

How does Elasticsearch Work?
 Querying Document
 Via JSON Based REST API
HTTP Request Method (Get, Put, Post, Delete)
REST Client
(e.g:
Insomnia)
REST
API
Elasticsearch
JSON
Request
JSON
Response
JSON
Response
JSON
Request

All in All
 Easy to get started with
 Complex technology if its full potential is
to be used
 By far, the hottest search engine in
market used by a huge community

When Not To Use ES: Use
Cases
 Data Storage
 No/Rare/Simple Analysis
 Analysis on single value text-felds
(usernames, zip-codes), value lookups
 Huge computations (extensive
preprocessing and transformations)

Types of Scaling
Vertical Scaling Horizontal Scaling
Scaling Up Scaling Out
Increasing size of a machine Having multiple machines
Has limits Real power of distributed system
comes from here

Architecture of Elasticsearch
 Cluster

 Nodes
 Can carry out indexing and searching
 Every node is aware of each other
 Every node can forward request to any other node in the cluster.
 Every node can accept HTTP request from REST clients.
 Every node as its own unique name (UUID).
 First seven characters used as node id. Persists even after restart.
 Node is considered as running instance of Elasticsearch
 Categories of Dedicated Nodes:
 Master Node
 Data Node
 Ingest Node
 Coordinating Node
 By default, a node is master eligible, data and ingest node

 Indices and Types
Parallel concepts between Databases and Elasticsearch
Change in latest ES version : 6.5
Database Table Index
Table Type

Index name, type name and
feld name rules
 Lowercase only
 Cannot include , / , * , ? , " , < , > , | ,
space (the character, not the word), , , #
 Indices prior to 7.0 could contain a colon
( : ), but that's been deprecated and won't
be supported in 7.0+
 Cannot start with - , _ , +
 Cannot be . or ..
 Cannot be longer than 255 characters.

Sharding
 Size of single index exceeds physical
capacity of available nodes
 Example:
 Each Node: 512 MB
 Size of Index: 1 TB
 Sharding comes to the rescue during
such cases of bottleneck.

Sharding
 Advantages:
 Enables adjusting with growing amount of data
 Better throughput in cases where shards are distributed to multiple nodes
 Parallel execution of queries across nodes possible

Replication
 What if a node fails?
 Is there any fault tolerance mechanism in ES?
 YES, via Replication
 Replication means duplicating available shards
 For high availability/ fault tolerance
 For better throughput (provided hardware is available)
 Shard that is replicated-> Primary Shard
 Replicated version of shared->Replica Shard
 Replication Group= Primary shard + Its Replicas

Defaults
 Cluster Name: elasticsearch
 Number of shards per index: 5
 Number of replicas: 1 for each shard

Characteristics of ES
 Near-real Time Searching
 Indexing
 Distributed Nature
 Multi-Tenancy

Indexing in Elastisearch
{
"statement": "Winter is coming"
}
{
"statement": “Ours is the fury"
}
{
"statement": “The choice is yours"
}

Let’s get started practically!

Monitoring Cluster Health
 localhost:9200/_cluster/health
Statu
s
Reason
Gree
n
All the shards are properly
assigned/allocated to
nodes.
Yello
w
Some/All of the shard’s
replicas are unassigned.
Red Specifc primary shard is
unassigned/unallocated.
In Shard Level:
Index Health: Worst Shard Status
Cluster Health: Worst Index Status

Cluster State
 localhost:9200/_cluster/state

Document Management
 Simple Index Creation
 PUT /<index-name>
 Similar to creation of table in database (if
we are to consider from ES V_6.X)
 Creating Index with Setting
 { "settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
} }

File Directory Structure
 The frst time you install ES and run it,
you are running an instance of ES, i.e., a
node.
 data
 Elasticsearch
 Nodes
 0
 _state
 global-<version>.st (contains node/cluster settings)
 node.lock (so that only one ES instance writes to
the directory at a time)

Index Creation Leads To
 Inside node, a new indices folder
appear.
 indices
 <index-name>/<uuid> (you can fnd this
uuid inside localhost:9200/_cluster/state
-> metadata key->indices key
 0 … 5 (shards, default number)
 _state
 state-<version>.st (certain index’s
metadata/setting)

Document Management
 Creating/Indexing/Inserting a new document
 PUT /<index-name>/_doc/1
{“name”:”Basics of Elastic Stack”,
“course”:”Searching and Analytics”
“price”:500}
 POST /<index-name>/_doc
{
"name": "Umagi",
"course": "Fiction",
"price": 2000
}

What actually happens when we create a
new document?
In-Memory Indexing
Bufer
Transaction Log
File System Cache
Disk
• Refresh Rate (Default 1 sec)
{“settings”:{“refresh-interval”:”30s”}}
• File System Cache: Segment Creation
• Disk: Segments fushed into commit point

Elasticsearch: An Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Elasticsearch: An Overview

Similar to Elasticsearch: An Overview (20)

Recently uploaded

Recently uploaded (20)

Elasticsearch: An Overview