SlideShare a Scribd company logo
1 of 33
Download to read offline
Basics on
Elasticsearch
Ruby Shrestha
Overview Session
Elasticsearch: An Introduction
 Written in Java, open source, based on Apache Lucene
 https://github.com/elastic/elasticsearch
 Document storage
 Format: JSON
 Full-text search engine
 Full-text search?
 Every doc, every word
 Search large dataset in few seconds
 How?
 Via Inverted Index, Distributed Nature
 Analytics Platform
 Aggregations and analysis
Use Cases Where ES
Overshadows DB
 Full-text search is more efcient in ES
due to fexible indexing.
 Relevance based searching
Use Cases Where ES
Overshadows DB
 Searching when entered spelling is
wrong
 Synonym based search
 Phonetic based search
 Use of distributed architecture
 Works well with unstructured data
How does Elasticsearch Work?
 Data stored as document
 Format: JSON
How does Elasticsearch Work?
 Querying Document
 Via JSON Based REST API
HTTP Request Method (Get, Put, Post, Delete)
REST Client
(e.g:
Insomnia)
REST
API
Elasticsearch
JSON
Request
JSON
Response
JSON
Response
JSON
Request
All in All
 Easy to get started with
 Complex technology if its full potential is
to be used
 By far, the hottest search engine in
market used by a huge community
Used by a huge
community
Elastic Stack
When Not To Use ES: Use
Cases
 Data Storage
 No/Rare/Simple Analysis
 Analysis on single value text-felds
(usernames, zip-codes), value lookups
 Huge computations (extensive
preprocessing and transformations)
Conceptual Details
Types of Scaling
Vertical Scaling Horizontal Scaling
Scaling Up Scaling Out
Increasing size of a machine Having multiple machines
Has limits Real power of distributed system
comes from here
Architecture of Elasticsearch
 Cluster
Architecture of Elasticsearch
 Nodes
 Can carry out indexing and searching
 Every node is aware of each other
 Every node can forward request to any other node in the cluster.
 Every node can accept HTTP request from REST clients.
 Every node as its own unique name (UUID).
 First seven characters used as node id. Persists even after restart.
 Node is considered as running instance of Elasticsearch
 Categories of Dedicated Nodes:
 Master Node
 Data Node
 Ingest Node
 Coordinating Node
 By default, a node is master eligible, data and ingest node
Architecture of Elasticsearch
 Indices and Types
Parallel concepts between Databases and Elasticsearch
Change in latest ES version : 6.5
Database Table Index
Table Type
Index name, type name and
feld name rules
 Lowercase only
 Cannot include  , / , * , ? , " , < , > , | ,
space (the character, not the word), , , #
 Indices prior to 7.0 could contain a colon
( : ), but that's been deprecated and won't
be supported in 7.0+
 Cannot start with - , _ , +
 Cannot be . or ..
 Cannot be longer than 255 characters.
Sharding
 Size of single index exceeds physical
capacity of available nodes
 Example:
 Each Node: 512 MB
 Size of Index: 1 TB
 Sharding comes to the rescue during
such cases of bottleneck.
Sharding
 Advantages:
 Enables adjusting with growing amount of data
 Better throughput in cases where shards are distributed to multiple nodes
 Parallel execution of queries across nodes possible
Replication
 What if a node fails?
 Is there any fault tolerance mechanism in ES?
 YES, via Replication
 Replication means duplicating available shards
 For high availability/ fault tolerance
 For better throughput (provided hardware is available)
 Shard that is replicated-> Primary Shard
 Replicated version of shared->Replica Shard
 Replication Group= Primary shard + Its Replicas
Defaults
 Cluster Name: elasticsearch
 Number of shards per index: 5
 Number of replicas: 1 for each shard
Keeping Replicas in Sync
Complete Architecture
Characteristics of ES
 Near-real Time Searching
 Indexing
 Distributed Nature
 Multi-Tenancy
Indexing in Elastisearch
{
"statement": "Winter is coming"
}
{
"statement": “Ours is the fury"
}
{
"statement": “The choice is yours"
}
Let’s get started practically!
Monitoring Cluster Health
 localhost:9200/_cluster/health
Statu
s
Reason
Gree
n
All the shards are properly
assigned/allocated to
nodes.
Yello
w
Some/All of the shard’s
replicas are unassigned.
Red Specifc primary shard is
unassigned/unallocated.
In Shard Level:
Index Health: Worst Shard Status
Cluster Health: Worst Index Status
Cluster State
 localhost:9200/_cluster/state
Document Management
 Simple Index Creation
 PUT /<index-name>
 Similar to creation of table in database (if
we are to consider from ES V_6.X)
 Creating Index with Setting
 { "settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
} }
File Directory Structure
 The frst time you install ES and run it,
you are running an instance of ES, i.e., a
node.
 data
 Elasticsearch
 Nodes
 0
 _state
 global-<version>.st (contains node/cluster settings)
 node.lock (so that only one ES instance writes to
the directory at a time)
Index Creation Leads To
 Inside node, a new indices folder
appear.
 indices
 <index-name>/<uuid> (you can fnd this
uuid inside localhost:9200/_cluster/state
-> metadata key->indices key
 0 … 5 (shards, default number)
 _state
 state-<version>.st (certain index’s
metadata/setting)
Document Management
 Creating/Indexing/Inserting a new document
 PUT /<index-name>/_doc/1
{“name”:”Basics of Elastic Stack”,
“course”:”Searching and Analytics”
“price”:500}
 POST /<index-name>/_doc
{
"name": "Umagi",
"course": "Fiction",
"price": 2000
}
What actually happens when we create a
new document?
In-Memory Indexing
Bufer
Transaction Log
File System Cache
Disk
• Refresh Rate (Default 1 sec)
{“settings”:{“refresh-interval”:”30s”}}
• File System Cache: Segment Creation
• Disk: Segments fushed into commit point

More Related Content

What's hot

Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
macrochen
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Sematext Group, Inc.
 

What's hot (20)

Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya BhamidpatiPhilly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Lucene indexing
Lucene indexingLucene indexing
Lucene indexing
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
ElasticSearch Basics
ElasticSearch BasicsElasticSearch Basics
ElasticSearch Basics
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Elastic search
Elastic searchElastic search
Elastic search
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
Solr vs ElasticSearch
Solr vs ElasticSearchSolr vs ElasticSearch
Solr vs ElasticSearch
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 

Similar to Elasticsearch: An Overview

Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
Tom Z Zeng
 

Similar to Elasticsearch: An Overview (20)

ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
ELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGELK - Stack - Munich .net UG
ELK - Stack - Munich .net UG
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analytics
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Introduction to ElasticSearch
Introduction to ElasticSearchIntroduction to ElasticSearch
Introduction to ElasticSearch
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 
Bridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly DetectionBridging Batch and Real-time Systems for Anomaly Detection
Bridging Batch and Real-time Systems for Anomaly Detection
 

Recently uploaded

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Elasticsearch: An Overview

  • 3. Elasticsearch: An Introduction  Written in Java, open source, based on Apache Lucene  https://github.com/elastic/elasticsearch  Document storage  Format: JSON  Full-text search engine  Full-text search?  Every doc, every word  Search large dataset in few seconds  How?  Via Inverted Index, Distributed Nature  Analytics Platform  Aggregations and analysis
  • 4. Use Cases Where ES Overshadows DB  Full-text search is more efcient in ES due to fexible indexing.  Relevance based searching
  • 5. Use Cases Where ES Overshadows DB  Searching when entered spelling is wrong  Synonym based search  Phonetic based search  Use of distributed architecture  Works well with unstructured data
  • 6. How does Elasticsearch Work?  Data stored as document  Format: JSON
  • 7. How does Elasticsearch Work?  Querying Document  Via JSON Based REST API HTTP Request Method (Get, Put, Post, Delete) REST Client (e.g: Insomnia) REST API Elasticsearch JSON Request JSON Response JSON Response JSON Request
  • 8. All in All  Easy to get started with  Complex technology if its full potential is to be used  By far, the hottest search engine in market used by a huge community
  • 9. Used by a huge community
  • 11. When Not To Use ES: Use Cases  Data Storage  No/Rare/Simple Analysis  Analysis on single value text-felds (usernames, zip-codes), value lookups  Huge computations (extensive preprocessing and transformations)
  • 13. Types of Scaling Vertical Scaling Horizontal Scaling Scaling Up Scaling Out Increasing size of a machine Having multiple machines Has limits Real power of distributed system comes from here
  • 15. Architecture of Elasticsearch  Nodes  Can carry out indexing and searching  Every node is aware of each other  Every node can forward request to any other node in the cluster.  Every node can accept HTTP request from REST clients.  Every node as its own unique name (UUID).  First seven characters used as node id. Persists even after restart.  Node is considered as running instance of Elasticsearch  Categories of Dedicated Nodes:  Master Node  Data Node  Ingest Node  Coordinating Node  By default, a node is master eligible, data and ingest node
  • 16. Architecture of Elasticsearch  Indices and Types Parallel concepts between Databases and Elasticsearch Change in latest ES version : 6.5 Database Table Index Table Type
  • 17. Index name, type name and feld name rules  Lowercase only  Cannot include , / , * , ? , " , < , > , | , space (the character, not the word), , , #  Indices prior to 7.0 could contain a colon ( : ), but that's been deprecated and won't be supported in 7.0+  Cannot start with - , _ , +  Cannot be . or ..  Cannot be longer than 255 characters.
  • 18. Sharding  Size of single index exceeds physical capacity of available nodes  Example:  Each Node: 512 MB  Size of Index: 1 TB  Sharding comes to the rescue during such cases of bottleneck.
  • 19. Sharding  Advantages:  Enables adjusting with growing amount of data  Better throughput in cases where shards are distributed to multiple nodes  Parallel execution of queries across nodes possible
  • 20. Replication  What if a node fails?  Is there any fault tolerance mechanism in ES?  YES, via Replication  Replication means duplicating available shards  For high availability/ fault tolerance  For better throughput (provided hardware is available)  Shard that is replicated-> Primary Shard  Replicated version of shared->Replica Shard  Replication Group= Primary shard + Its Replicas
  • 21. Defaults  Cluster Name: elasticsearch  Number of shards per index: 5  Number of replicas: 1 for each shard
  • 24. Characteristics of ES  Near-real Time Searching  Indexing  Distributed Nature  Multi-Tenancy
  • 25. Indexing in Elastisearch { "statement": "Winter is coming" } { "statement": “Ours is the fury" } { "statement": “The choice is yours" }
  • 26. Let’s get started practically!
  • 27. Monitoring Cluster Health  localhost:9200/_cluster/health Statu s Reason Gree n All the shards are properly assigned/allocated to nodes. Yello w Some/All of the shard’s replicas are unassigned. Red Specifc primary shard is unassigned/unallocated. In Shard Level: Index Health: Worst Shard Status Cluster Health: Worst Index Status
  • 29. Document Management  Simple Index Creation  PUT /<index-name>  Similar to creation of table in database (if we are to consider from ES V_6.X)  Creating Index with Setting  { "settings" : { "number_of_shards" : 3, "number_of_replicas" : 2 } }
  • 30. File Directory Structure  The frst time you install ES and run it, you are running an instance of ES, i.e., a node.  data  Elasticsearch  Nodes  0  _state  global-<version>.st (contains node/cluster settings)  node.lock (so that only one ES instance writes to the directory at a time)
  • 31. Index Creation Leads To  Inside node, a new indices folder appear.  indices  <index-name>/<uuid> (you can fnd this uuid inside localhost:9200/_cluster/state -> metadata key->indices key  0 … 5 (shards, default number)  _state  state-<version>.st (certain index’s metadata/setting)
  • 32. Document Management  Creating/Indexing/Inserting a new document  PUT /<index-name>/_doc/1 {“name”:”Basics of Elastic Stack”, “course”:”Searching and Analytics” “price”:500}  POST /<index-name>/_doc { "name": "Umagi", "course": "Fiction", "price": 2000 }
  • 33. What actually happens when we create a new document? In-Memory Indexing Bufer Transaction Log File System Cache Disk • Refresh Rate (Default 1 sec) {“settings”:{“refresh-interval”:”30s”}} • File System Cache: Segment Creation • Disk: Segments fushed into commit point