SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Distributed Multitenant NoSQL Datastore and Search Engine
NoSQL is not a silver bullet
SQL is not a silver bullet
Disclaimer
Data Storage Types
SQL
• Relational DB





Principles: 



ACID - 

Atomicity, 

Consistency, 

Isolation, 

Durability
NoSQL (NotOnlySQL)
• Key Value Store
• Document Store
• Column Family (Column Store)



Principles: 



CAP theorem - 

Consistency,

Availability,

Partition tolerance



BASE -

Basically Available,

Soft state,

Eventual consistency
Overview
• Based on Lucene
• Developed in Java
• Schema free JSON
• Index and Search
• Apache License (Open Source, Free)
• RESTful API
• Supports Faceted search
• Supports Idempotency
• Distributed and build for cloud
• First version released in February 2010
• Current supported versions 2.x and 5.x
• AWS, Elasticsearch Service, Elastic Cloud
Query with scores
Filter with params
Bool Query to combining filters
Usually it’s not primary data storage
Out of the box does not support ACID transactions
Overview
Available Clients
• JavaScript
• PHP
• Perl
• Ruby
• Curl
• Java
• C#
• Python
Users
• Wikimedia
• Adobe Systems
• Facebook
• Mozilla
• Quora
• Foursquare
• SoundCloud
• GitHub
• CERN
• Stack Exchange
• Netflix
• Amadeus IT Group
Concepts
Field
• Smallest unit of data
• Has a type: boolean, string, array, integer and so on
• A collection of fields is a document
• Field name cannot start with special characters and
cannot contain dots
Concepts
Document
• JSON objects - base unit of storage
• Can be compared to a row in RDBMS table
• No limit documents you can store in index
• Contain key-value fields
• Contain reserved fields eg: _index, _type, _id
Concepts
Type
• Represents a unique class of documents.
• Consist of a name and a mapping and are used by
adding the _type field. This field can then be used
for filtering when querying a specific type.
• Index can have any number of types, and we can
store documents belonging to these types in the
same index.
Concepts
Index
• Largest unit of data
• Logical partition of documents and can be
compared to a database in RDBMS
• You can have as many indices defined in
Elasticsearch as you want
• Contain types, mappings, documents, fields
Concepts
Mapping
• Like a schema in RDBMSD
• Defines fields data type (such as string and integer)
• Defines how the fields should be indexed and stored
• Can be defined explicitly
• Can be generated automatically when a document is
indexed
Concepts
Shards
• Building block of Elasticsearch and are what facilitate its
scalability
• We can split up indices horizontally into pieces called
shards. This allows you to distribute operations across
shards and nodes to improve performance.
• When you create an index, you can define how many
shards you want. Each shard is an independent Lucene
index that can be hosted anywhere in your cluster.
Concepts
Replica
• Fail-safe mechanisms and are basically copies of your index’s shards
• Useful backup system when a node crashes
• Serve read requests, so adding replicas increase search performance
• To ensure high availability - not placed on the same node as the
original(primary) shards
• Like with shards, the number of replicas can be defined per index when the
index is created
• Unlike shards you may change the number of replicas anytime after the index
is created
Concepts
Node
• The heart of any ELK setup is the Elasticsearch
instance, which has the crucial task of storing and
indexing data.
• By default, each node is automatically assigned a
unique identifier, or name, that is used for
management purposes and becomes even more
important in a multi-node, or clustered, environment.
Concepts
Cluster
• An Elasticsearch cluster is comprised of one or more
Elasticsearch nodes. As with nodes, each cluster has a unique
identifier that must be used by any node attempting to join the
cluster.
• One node in the cluster is the “master” node, which is in
charge of cluster-wide management and configurations actions
(such as adding and removing nodes). This node is chosen
automatically by the cluster, but it can be changed if it fails.
• As a cluster grows, it will reorganize itself to spread the data.
Scaling
• Vertical - more hardware resources for one server
• Horizontal - more servers
Horizontal scaling
Elasticsearch cluster is not limited to a single
machine, you can infinitely scale your system to
handle higher traffic and larger data sets.
Each index is comprised of shards across one or many nodes. In this
case, this Elasticsearch cluster has two nodes, two indices
(properties and deals) and five shards in each node.
Horizontal scaling
We have here three primary shards and three replica shards. Primary
shards are where the first write happens. A primary shard can have
zero through many replica shards that simply duplicate its data. The
primary shard is not limited to single node, which is a testament to
the distributed nature of the system. In case one node fails, replica
shards in a functioning node can be promoted to the primary shard
automatically. Data must be written to a primary shard before it’s
duplicated to replica shards. Data can be read from both primary
and replica shards.
“Green” - means that all primary shards are available
and they each have at least one replica.
“Yellow” would mean that all primary shards are
available, but they don’t all have a replica.
“Red” means not all primary shards are available.
Index status
Conclusion of theoretical part
• Nodes make up a cluster and contain shards;
• Shards contain documents that you’re searching through;
• Elasticsearch routes requests through nodes;
• The nodes then merge results from shards (Lucene
indices) together to create a search result.
Amazon Elasticsearch Service
• Multiple configurations of CPU, memory, and storage capacity, known as instance types
• Storage volumes for your data using Amazon EBS volumes
• Multiple geographical locations for your resources, known as regions and Availability Zones
• Cluster node allocation across two Availability Zones in the same region, known as zone awareness
• Security with AWS Identity and Access Management (IAM) access control
• Dedicated master nodes to improve cluster stability
• Domain snapshots to back up and restore Amazon ES domains and replicate domains across Availability Zones
• Data visualization using the Kibana tool
• Integration with Amazon CloudWatch for monitoring Amazon ES domain metrics
• Integration with AWS CloudTrail for auditing configuration API calls to Amazon ES domains
• Integration with Amazon S3, Amazon Kinesis, and Amazon DynamoDB for loading streaming data into Amazon ES
ELK:
Typical requests
Show domain info:

GET /



Show all domain indices:

GET /_cat/indices?v



Show stats:

GET /_stats



Create index with name “test_data”:

PUT /test_data



Search example:

GET /test_data/_search?source={ "query" : { "match" : { "name" : “T1xq" } } }
Sample
curl -XPUT 'http://localhost:9200/blog/user/dilbert' -d '{ "name" : "Dilbert Brown" }'
curl -XPUT 'http://localhost:9200/blog/post/1' -d '
{
"user": "dilbert",
"postDate": "2011-12-15",
"body": "Search is hard. Search should be easy." ,
"title": "On search"
}'
curl -XPUT 'http://localhost:9200/blog/post/2' -d '
{
"user": "dilbert",
"postDate": "2011-12-12",
"body": "Distribution is hard. Distribution should be easy." ,
"title": "On distributed search"
}'
Sample
Find all blog posts by Dilbert:

curl 'http://localhost:9200/blog/post/_search?q=user:dilbert&pretty=true'



All posts which don't contain the term search:

curl 'http://localhost:9200/blog/post/_search?q=-title:search&pretty=true'
Retrieve the title of all posts which contain search and not distributed:

curl 'http://localhost:9200/blog/post/_search?q=+title:search%20-title:distributed&pretty=true&fields=title'



A range search on postDate:

curl -XGET 'http://localhost:9200/blog/_search?pretty=true' -d '
{
"query" : {
"range" : {
"postDate" : { "from" : "2011-12-10", "to" : "2011-12-12" }
}
}
}'

Bulk operations
curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
'
Idempotent index
Create or update:
curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
'



Create if not exist:
curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d'
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
'
Why Elasticsearch?
• Easy to Scale
• Everything is One JSON Call Away
• Unleashed Power of Lucene Under the Hood
• Excellent Query DSL
• Multi-Tenancy
• Support for Advanced Search Features
• Configurable and Extensible
• Percolation
• Custom Analyzers and On-the-Fly Analyzer Selection
• Rich Ecosystem
• Active Community
• Proactive Company
Links
• https://dou.ua/lenta/articles/nosql-vs-sql/
• https://dou.ua/lenta/articles/not-only-sql/
• https://dou.ua/lenta/columns/dont-use-rdbms/
• http://logz.io/blog/10-elasticsearch-concepts/
• https://buildingvts.com/elasticsearch-architectural-overview-
a35d3910e515#.78kiybh6b
• https://habrahabr.ru/company/oleg-bunin/blog/319052/
• https://www.amazon.com/Elasticsearch-Definitive-Guide-Clinton-
Gormley/dp/1449358543

Más contenido relacionado

La actualidad más candente

Search domain basics
Search domain basicsSearch domain basics
Search domain basics
pmanvi
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Sematext Group, Inc.
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
Tom Z Zeng
 

La actualidad más candente (20)

Elasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseElasticsearch V/s Relational Database
Elasticsearch V/s Relational Database
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Eventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real WorldEventually Elasticsearch: Eventual Consistency in the Real World
Eventually Elasticsearch: Eventual Consistency in the Real World
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Search domain basics
Search domain basicsSearch domain basics
Search domain basics
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 

Destacado

Destacado (6)

Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English version
 
Elasticsearch in Zalando
Elasticsearch in ZalandoElasticsearch in Zalando
Elasticsearch in Zalando
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 

Similar a ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine

UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Aws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled AppsAws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled Apps
Amazon Web Services
 

Similar a ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine (20)

Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Scaling horizontally on AWS
Scaling horizontally on AWSScaling horizontally on AWS
Scaling horizontally on AWS
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
Datastores
DatastoresDatastores
Datastores
 
Big data stores
Big data  storesBig data  stores
Big data stores
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with Elasticsearch
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
NoSQL - Cassandra & MongoDB.pptx
NoSQL -  Cassandra & MongoDB.pptxNoSQL -  Cassandra & MongoDB.pptx
NoSQL - Cassandra & MongoDB.pptx
 
Aws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled AppsAws for Startups Building Cloud Enabled Apps
Aws for Startups Building Cloud Enabled Apps
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefullySQL, NoSQL, Distributed SQL: Choose your DataStore carefully
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
NoSql
NoSqlNoSql
NoSql
 

Último

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Último (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine

  • 1. Distributed Multitenant NoSQL Datastore and Search Engine
  • 2. NoSQL is not a silver bullet SQL is not a silver bullet Disclaimer
  • 3. Data Storage Types SQL • Relational DB
 
 
 Principles: 
 
 ACID - 
 Atomicity, 
 Consistency, 
 Isolation, 
 Durability NoSQL (NotOnlySQL) • Key Value Store • Document Store • Column Family (Column Store)
 
 Principles: 
 
 CAP theorem - 
 Consistency,
 Availability,
 Partition tolerance
 
 BASE -
 Basically Available,
 Soft state,
 Eventual consistency
  • 4. Overview • Based on Lucene • Developed in Java • Schema free JSON • Index and Search • Apache License (Open Source, Free) • RESTful API • Supports Faceted search • Supports Idempotency • Distributed and build for cloud • First version released in February 2010 • Current supported versions 2.x and 5.x • AWS, Elasticsearch Service, Elastic Cloud
  • 5. Query with scores Filter with params Bool Query to combining filters Usually it’s not primary data storage Out of the box does not support ACID transactions Overview
  • 6. Available Clients • JavaScript • PHP • Perl • Ruby • Curl • Java • C# • Python
  • 7. Users • Wikimedia • Adobe Systems • Facebook • Mozilla • Quora • Foursquare • SoundCloud • GitHub • CERN • Stack Exchange • Netflix • Amadeus IT Group
  • 8. Concepts Field • Smallest unit of data • Has a type: boolean, string, array, integer and so on • A collection of fields is a document • Field name cannot start with special characters and cannot contain dots
  • 9. Concepts Document • JSON objects - base unit of storage • Can be compared to a row in RDBMS table • No limit documents you can store in index • Contain key-value fields • Contain reserved fields eg: _index, _type, _id
  • 10. Concepts Type • Represents a unique class of documents. • Consist of a name and a mapping and are used by adding the _type field. This field can then be used for filtering when querying a specific type. • Index can have any number of types, and we can store documents belonging to these types in the same index.
  • 11. Concepts Index • Largest unit of data • Logical partition of documents and can be compared to a database in RDBMS • You can have as many indices defined in Elasticsearch as you want • Contain types, mappings, documents, fields
  • 12.
  • 13. Concepts Mapping • Like a schema in RDBMSD • Defines fields data type (such as string and integer) • Defines how the fields should be indexed and stored • Can be defined explicitly • Can be generated automatically when a document is indexed
  • 14. Concepts Shards • Building block of Elasticsearch and are what facilitate its scalability • We can split up indices horizontally into pieces called shards. This allows you to distribute operations across shards and nodes to improve performance. • When you create an index, you can define how many shards you want. Each shard is an independent Lucene index that can be hosted anywhere in your cluster.
  • 15. Concepts Replica • Fail-safe mechanisms and are basically copies of your index’s shards • Useful backup system when a node crashes • Serve read requests, so adding replicas increase search performance • To ensure high availability - not placed on the same node as the original(primary) shards • Like with shards, the number of replicas can be defined per index when the index is created • Unlike shards you may change the number of replicas anytime after the index is created
  • 16. Concepts Node • The heart of any ELK setup is the Elasticsearch instance, which has the crucial task of storing and indexing data. • By default, each node is automatically assigned a unique identifier, or name, that is used for management purposes and becomes even more important in a multi-node, or clustered, environment.
  • 17. Concepts Cluster • An Elasticsearch cluster is comprised of one or more Elasticsearch nodes. As with nodes, each cluster has a unique identifier that must be used by any node attempting to join the cluster. • One node in the cluster is the “master” node, which is in charge of cluster-wide management and configurations actions (such as adding and removing nodes). This node is chosen automatically by the cluster, but it can be changed if it fails. • As a cluster grows, it will reorganize itself to spread the data.
  • 18.
  • 19. Scaling • Vertical - more hardware resources for one server • Horizontal - more servers
  • 20. Horizontal scaling Elasticsearch cluster is not limited to a single machine, you can infinitely scale your system to handle higher traffic and larger data sets.
  • 21. Each index is comprised of shards across one or many nodes. In this case, this Elasticsearch cluster has two nodes, two indices (properties and deals) and five shards in each node. Horizontal scaling
  • 22. We have here three primary shards and three replica shards. Primary shards are where the first write happens. A primary shard can have zero through many replica shards that simply duplicate its data. The primary shard is not limited to single node, which is a testament to the distributed nature of the system. In case one node fails, replica shards in a functioning node can be promoted to the primary shard automatically. Data must be written to a primary shard before it’s duplicated to replica shards. Data can be read from both primary and replica shards.
  • 23. “Green” - means that all primary shards are available and they each have at least one replica. “Yellow” would mean that all primary shards are available, but they don’t all have a replica. “Red” means not all primary shards are available. Index status
  • 24. Conclusion of theoretical part • Nodes make up a cluster and contain shards; • Shards contain documents that you’re searching through; • Elasticsearch routes requests through nodes; • The nodes then merge results from shards (Lucene indices) together to create a search result.
  • 25. Amazon Elasticsearch Service • Multiple configurations of CPU, memory, and storage capacity, known as instance types • Storage volumes for your data using Amazon EBS volumes • Multiple geographical locations for your resources, known as regions and Availability Zones • Cluster node allocation across two Availability Zones in the same region, known as zone awareness • Security with AWS Identity and Access Management (IAM) access control • Dedicated master nodes to improve cluster stability • Domain snapshots to back up and restore Amazon ES domains and replicate domains across Availability Zones • Data visualization using the Kibana tool • Integration with Amazon CloudWatch for monitoring Amazon ES domain metrics • Integration with AWS CloudTrail for auditing configuration API calls to Amazon ES domains • Integration with Amazon S3, Amazon Kinesis, and Amazon DynamoDB for loading streaming data into Amazon ES
  • 26. ELK:
  • 27. Typical requests Show domain info:
 GET /
 
 Show all domain indices:
 GET /_cat/indices?v
 
 Show stats:
 GET /_stats
 
 Create index with name “test_data”:
 PUT /test_data
 
 Search example:
 GET /test_data/_search?source={ "query" : { "match" : { "name" : “T1xq" } } }
  • 28. Sample curl -XPUT 'http://localhost:9200/blog/user/dilbert' -d '{ "name" : "Dilbert Brown" }' curl -XPUT 'http://localhost:9200/blog/post/1' -d ' { "user": "dilbert", "postDate": "2011-12-15", "body": "Search is hard. Search should be easy." , "title": "On search" }' curl -XPUT 'http://localhost:9200/blog/post/2' -d ' { "user": "dilbert", "postDate": "2011-12-12", "body": "Distribution is hard. Distribution should be easy." , "title": "On distributed search" }'
  • 29. Sample Find all blog posts by Dilbert:
 curl 'http://localhost:9200/blog/post/_search?q=user:dilbert&pretty=true'
 
 All posts which don't contain the term search:
 curl 'http://localhost:9200/blog/post/_search?q=-title:search&pretty=true' Retrieve the title of all posts which contain search and not distributed:
 curl 'http://localhost:9200/blog/post/_search?q=+title:search%20-title:distributed&pretty=true&fields=title'
 
 A range search on postDate:
 curl -XGET 'http://localhost:9200/blog/_search?pretty=true' -d ' { "query" : { "range" : { "postDate" : { "from" : "2011-12-10", "to" : "2011-12-12" } } } }'

  • 30. Bulk operations curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d' { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } } { "field1" : "value1" } { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } } { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } } { "field1" : "value3" } { "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} } { "doc" : {"field2" : "value2"} } '
  • 31. Idempotent index Create or update: curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d' { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } } { "field1" : "value1" } '
 
 Create if not exist: curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d' { "create" : { "_index" : "test", "_type" : "type1", "_id" : "1" } } { "field1" : "value1" } '
  • 32. Why Elasticsearch? • Easy to Scale • Everything is One JSON Call Away • Unleashed Power of Lucene Under the Hood • Excellent Query DSL • Multi-Tenancy • Support for Advanced Search Features • Configurable and Extensible • Percolation • Custom Analyzers and On-the-Fly Analyzer Selection • Rich Ecosystem • Active Community • Proactive Company
  • 33. Links • https://dou.ua/lenta/articles/nosql-vs-sql/ • https://dou.ua/lenta/articles/not-only-sql/ • https://dou.ua/lenta/columns/dont-use-rdbms/ • http://logz.io/blog/10-elasticsearch-concepts/ • https://buildingvts.com/elasticsearch-architectural-overview- a35d3910e515#.78kiybh6b • https://habrahabr.ru/company/oleg-bunin/blog/319052/ • https://www.amazon.com/Elasticsearch-Definitive-Guide-Clinton- Gormley/dp/1449358543