SlideShare una empresa de Scribd logo
1 de 106
Descargar para leer sin conexión
OpenSearch
-Abhi Jain
Agenda
● OpenSearch
○ What is it?
○ Benefits/ Uses
○ How to use it
○ Features
● Migrate from Elastic to OpenSearch
● Tools & Plugins
About Me
● Lead Dev
● Located in Florida
● Trainer
● Presenter
● .NET Developer
● Youtuber: Coach4Dev
● Husband/ Father
Amazon Elasticsearch
● Launched in 2015
● Gained popularity for log analytics usage
● Used open-source Elastic under Apache License v2
● Jan 2021
○ Elastic NV changed licensing strategy
○ After ElasticSearch 7.10.2 & Kibana 7.10.2
■ Not release under Apache License v2
■ Release under Elastic License
OpenSearch
● Sep 2021:
○ Renamed from ElasticSearch to OpenSearch
● OpenSource fork from Elastic 7.10.2 and Kibana 7.10.2
● Highly scalable
● Fast access & response to large volumes of data
● Powered by Apache Lucene Search library
Apache Lucene
● Apache Lucene project develops open-source search software
○ Releases a core search library named Lucene core
● Lucene Core
○ Java Library providing powerful indexing and search features
Apache Solr
● Open source search platform
● Built on Apache Lucene
Solr vs ElasticSearch
● Similar performance mostly.
● ES has better support for scalability
○ due to horizontal scaling
■ Better cloud support too
● ES can support multiple doc types in a single index better
○ More difficult to do this in Solr
● ES supports native DSL (Domain Specific Language)
○ Need to program queries in Solr
● https://mindmajix.com/elasticsearch-vs-solr
Why OpenSearch
● Huge amount of machine generated data these days
○ Growing exponentially
● Getting insights is important
● Interactive log analytics
● Real-time application monitoring
● Website Search, etc.
OpenSearch Features
● Easy to set-up and configure
● In-place upgrades
● Enables data monitoring & setting alerts based on thresholds
● Supports authentication, encryption & compliance requirements
OpenSearch vs ElasticSearch
● OpenSearch was forked from Elastic Search
○ Now they are separate from each other
● Each is adding features separately
● OpenSearch
○ Inbuilt support from AWS
OpenSearch features not in ES (free version)
● Centralized user accounts / access control
● Cross-cluster replication
● IP filtering
● Configurable retention period
● Anomaly detection
● Tableau connector
● JDBC driver
● ODBC driver
● Machine learning features such as regression and classification
● Link
ElasticSearch Features
● Based on subscription levels
● https://www.elastic.co/subscriptions
OpenSearch & ElasticSearch Version Support
● Currently supports the following OpenSearch versions:
○ 1.3, 1.2, 1.1, 1.0
● And supports the following ElasticSearch versions:
○ 7.10, 7.9, 7.8, 7.7, 7.4, 7.1
○ 6.8, 6.7, 6.5, 6.4, 6.3, 6.2, 6.0
○ 5.6, 5.5, 5.3, 5.1
○ 2.3
○ 1.5
What is Kibana
● Free & open front end application
● Charting tool for Elastic Stack
● Sits on top of Elastic Stack
● Sample Dashboard
OpenSearch Dashboards
● Default visualization tool for data in OpenSearch
● Filter data with queries
● Comes with opensearch service
Terminologies
OpenSearch Cluster
● Synonymous to domain
● Domains are clusters with
○ settings,
○ instance types,
○ instance counts,
○ and storage resources that you specify.
● Group of nodes
○ With same cluster.name attribute
Opensearch Node
● Member of a cluster
● A distinct host
● With IP address
Getting Started
● Create a domain
● Size the domain appropriately for your workload
● Control access to your domain using a domain access policy or fine-grained
access control
● Index data manually or from other AWS services
● Use OpenSearch Dashboards to search your data and create visualizations
Custom Endpoint
● If we want easier to read or custom domain name
● Can use Https
○ Upload SSL certificate
Run OpenSearch locally
● Install docker
● wsl -d docker-desktop
● sysctl -w vm.max_map_count=262144
● Ctrl+C
● docker-compose up
● Visit http://localhost:5601/
● Use admin/admin to login and explore
● Link
Upload Data
● One at a time
● Bulk
Upload Data One At a time
● curl -XPUT -u "master:XXXX"
"https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a
mazonaws.com/movies/_doc/1" -d "{"director": "Burton, Tim", "genre":
["Comedy","Sci-Fi"], "year": 1996, "actor": ["Jack Nicholson","Pierce
Brosnan","Sarah Jessica Parker"], "title": "Mars Attacks!"}" -H "Content-Type:
application/json"
Upload Data Bulk
● curl -XPOST -u "master:XXXXX"
"https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a
mazonaws.com/_bulk" --data-binary @bulk_movies.txt -H "Content-Type:
application/json"
How to Query?
Searching Data
● URI Searches
● Command Line
● OpenSearch Dashboards
Searching Data - URI
● GET Request
● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am
azonaws.com/movies/_search?q=rebel&pretty=true
● Searches all the indices and properties
URI Search Specific fields
● Search movies index and title property
● GET
https://search-my-domain.us-west-1.es.amazonaws.com/movies/_search?q=ti
tle:house
Get Search Results - Command Line
● curl -XGET -u "master:XXXXX"
"https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a
mazonaws.com/movies/_search?q=rebel&pretty=true"
Query DSL
● For more complex queries
○ OpenSearch Domain Specific Language (DSL)
● POST request with query body
●
Get Search Results - Dev Tools
● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am
azonaws.com/_dashboards/app/dev_tools#/console
○ GET _search
○ {
○ "query": {
○ "match_all": {}
○ }
○ }
Search on only specific fields
GET _search
{
"size": 20,
"query": {
"multi_match": {
"query": "U.S.",
"fields": ["title", "actor", "director"]
}
}
}
Search - Boosting fields
GET _search
{
"size": 20,
"query": {
"multi_match": {
"query": "john",
"fields": ["title^4", "actor", "director^4"]
}
}
}
Search - Pagination
GET _search
{
"from": 0,
"size": 1,
"query": {
"multi_match": {
"query": "Drama",
"fields": ["genre"]
}
}
}
Query -With Highlights In Response
GET _search
{
"size": 20,
"query": {
"multi_match": {
"query": "Manchurian",
"fields": ["title^4", "actor", "director"]
}
},
"highlight": {
"fields": {
"title": {}
},
"pre_tags": "<strong>",
"post_tags": "</strong>",
"fragment_size": 200,
"boundary_chars": ".,!? "
}
}
Query - Count
GET movies/_count
{
"query": {
"multi_match": {
"query": "Manchurian",
"fields": ["title^4", "actor", "director"]
}
}
}
Dashboard Query Language
● Use DQL in Dashboards
○ Search for data and visualizations
● Terms Query
○ Search for any text
■ E.g. www.example.com
○ Access object’s nested field
■ E.g. coordinates.lat:43.7102
○ Leading and trailing wildcards
■ host.keyword:*.example.com/*
● Operators
○ AND
○ OR
Dashboard Query Language
● Date and range Queries
○ bytes >= 15 and memory < 15
○ @timestamp > "2020-12-14T09:35:33"
● Nested field query
○ superheroes: {hero-name: Superman}
Dashboard Plugins
Query Workbench
● SQL
○ Run SQL
○ Treat indices as tables
● PPL
○ Piped Processing Language
○ Commands delimited by pipes
Reporting
● Multiple file formats
● On demand/ Scheduled
● Generate from
○ Dashboard
○ Visualization
○ Discover
Anomaly Detection
● Detect unusual behavior in time series data
● Anomaly Grade
● Confidence Score
Notifications
● Supported
○ Amazon Chime
○ SNS
○ SES
○ SMTP
○ Slack
○ Custom Webhooks
Observability plugin
● Visualize/Query time series data
● Event analytics
● Compare the data the way you like
Index Management
● Create ISM policy
● To manage your indexes
Security plugin
● Set up RBAC
●
Migrate from ElasticSearch to OpenSearch
Three major approaches
● Snapshot
● Rolling Upgrade
● Cluster Restart
Snapshot Method
● Generate snapshot in ElasticSearch
● Save in shared directory
● Restore in OpenSearch
● Snapshot
○ Backup of entire cluster state
○ Useful for recovery from failure and migration
● Link
Snapshot Method
● Check Index compatibility
○ E.g.: Cant restore 7.6.0 snapshot into 7.5.0 cluster
● Link
● Fastest
● Easiest
● Most efficient
●
Rolling Upgrade
● Official way to migrate cluster
● Without interruption
● Rolling upgrades are supported:
○ Between minor versions
○ From 5.6 to 6.8
○ From 6.8 to 7.14.1
○ From any version since 7.14.0 to 7.14.1
Rolling Upgrade
● Shut down one node at a time
○ Minimal disruption
Cluster Restart Upgrades
● Shut down all nodes
● Perform the upgrade
● Restart the cluster
Mapping
OpenSearch Mapping
● Dynamic
○ When you index a document
○ Opensearch adds fields automatically
○ It deduces their types by itself
● Explicit
○ If you know your data types
○ Preferred way of doing things
OpenSearch Mapping
● If you do not define a mapping ahead of time, OpenSearch dynamically
creates a mapping for you.
● If you do decide to define your own mapping, you can do so at index creation.
● ONE mapping is defined per index. Once the index has been created, we can
only add new fields to a mapping. We CANNOT change the mapping of an
existing field.
● If you must change the type of an existing field, you must create a new index
with the desired mapping, then reindex all documents into the new index.
Text vs keyword data types
● Text type
○ Full text searches
● Keyword type
○ Exact searches
○ Aggregations
○ Sorting
Text vs Keyword
● Inverted Index
Aggregations
OpenSearch Aggregations
● Analyze data
○ In real time too
● Extract statistics
● More expensive than queries
○ Or CPU and Memory
○ In general
Aggregation Query
● Use aggs or aggregations
Example
● Get average of
Data Streams
Data Streams in OpenSearch
● Ingesting time series data
○ Logs
○ Events
○ Metrics, etc.
● Number of documents grows rapidly
● Append Only data
● Don't need to update older documents (Very rarely)
Rollover
● If data is growing rapidly
● Write to index upto certain threshold
○ Then create a new index
○ And start writing to it
● Optimize the active index for high ingest rates on high-performance hot
nodes.
● Optimize for search performance on warm nodes.
● Shift older, less frequently accessed data to less expensive cold nodes,
● Delete data according to your retention policies by removing entire indices.
Index Template
● Data Stream requires an index template
● A name or wildcard (*) pattern for the data stream.
● The data stream’s timestamp field. This field must be mapped as a date or
date_nanos field data type and must be included in every document indexed
to the data stream.
● The mappings and settings applied to each backing index when it’s created.
ILM Policy
● Index Lifecycle Management Policy
● Can be applied to any number of indices
● Usage
○ Allocate
○ Delete
○ Rollover
○ Read Only
○ Wait for snapshot
ILM Policy
● Create a policy:
● Link
Create ILM Policy
Create ILM Policy
Create ILM Policy
Index Template
● Tells ElasticSearch how to configure an index when it is created
● For data streams
○ Configures the stream’s backing indices
○ Configured prior to index creation
Templates Types
● Component Templates
○ Reusable building blocks that configure
■ mappings,
■ settings, and
■ Aliases
○ Not directly applied to indices
● Index Template
○ Collection of component templates
○ Directly applied to indices
○ Some defaults: metrics-*-*, logs-*-*
Create Component Template
● Link
Create Index Template
● Data Stream requires matching index template
● PUT _index_template/{template_name}
Create Index Template
● Link
Create data stream
● Documents must contain timestamp field
● PUT _data_stream/my-data-stream
● Stream’s name must match one of your index template’s index patterns
Get Info About Data Stream
● GET _data_stream/my-data-stream
Delete Data Stream
● DELETE _data_stream/my-data-stream
Cross Cluster Replication
Cross Cluster Replication
● Cross Cluster replication plugin
○ Replicates indexes, mapping & metadata from one cluster to another
● Advantages
○ Continue to handle search requests if there is an outage
○ Can help reduce latency in application
■ Replicating data across geographically distant data centers
Replication
● Active passive model
○ Follower index pulls data from leader index
● It can be
○ Started
○ Paused
○ Stopped
○ Resumed
● Can be secured
○ Security plugin
○ Encrypt cross cluster traffic
Exercise
● Create 2 domains in AWS OpenSearch
● Link
Exercise
● Source Domain Connections Tab -> Outbound ->
○ Create Connection to Destination Domain
● Set access policy on destination domain:
● Link
○
○
Exercise
● Get Connection status
○ GET _plugins/_replication/connect1/_status
● Start syncing
○ PUT _plugins/_replication/connect1/_start
○ {
○ "leader_alias": "Connect1",
○ "leader_index": "movies",
○ "use_roles":{
○ "leader_cluster_role": "all_access",
○ "follower_cluster_role": "all_access"
○ }
○ }
Plugins
Opensearch plugins
● Standalone components
○ That add features and capabilities
● Huge number of plugins available
● E.g.
○ Replication Plugin
○ Security plugin
○ Notification plugin
SQL Plugin
● Lets you run SQL queries on ESDB
● Add data
○ PUT movies/_doc/1
○ { "title": "Spirited Away" }
● Query data
○ POST _plugins/_sql
○ {
○ "query": "SELECT * FROM movies LIMIT 50"
○ }
○
SQL Plugin
● Delete data from ESDB Index
● Enable Delete via SQL plugin
○ PUT _plugins/_query/settings
○ {
○ "transient": {
○ "plugins.sql.delete.enabled": "true"
○ }
○ }
○
SQL PLugin - Delete
● To Delete the data
○ POST _plugins/_sql
○ {
○ "query": "DELETE FROM movies"
○ }
○
Asynchronous Search
● Large volumes of data
● Can take longer to search
● Async
○ Run searches in the background
○ Monitor progress of these searches
○ Get back partial results as they become available
Asynchronous Search
● POST _plugins/_asynchronous_search
● Response contents:
○ ID
■ Can be used to track the state of the search
■ Get partial results
○ State
■ Running
■ Completed
■ Persisted
● Link
OpenSearch Clients
Clients
● OpenSearch Python client
● OpenSearch JavaScript (Node.js) client
● OpenSearch .NET clients
● OpenSearch Go client
● OpenSearch PHP client
Open Search Client for .NET
● OpenSearch.Net
○ Low level client
● OpenSearch.Client
○ High level client
● Sample code: Link
Exercise
● Create a .NET application
● Add a document to OpenSearch using the .NET Application
○ OpenSearch.Client (.NET High level client)
Agents and Ingestion Tools
Beats
● Data shippers
● Agents on servers
● Send data to ES/ Logstash
Grafana
● An open source visualization tool
● Various sources can be used as data source:
○ InfluxDB
○ MySQL
○ ElasticSearch
○ PostgreSQL
● Better suited for metrics visualizations
● Does not allow full text data querying
Logstash
● Free/ Open-Source
● Data processing pipeline
● Ingests data from multitude of sources
● Transforms it
● Sends it to your favorite stash
Logstash - Ingestion
● Data of all shapes/ sizes/ source
○ Can be ingested
● It can parse/ transform your data
Logstash - Output
● ElasticSearch
● Mongodb
● S3
● Etc.
● Link
AWS OpenSearch Security
● Use multi-factor authentication (MFA) with each account.
● Use SSL/TLS to communicate with AWS resources. We recommend TLS 1.2
or later.
● Set up API and user activity logging with AWS CloudTrail.
● Use AWS encryption solutions, along with all default security controls within
AWS services.
● Use advanced managed security services such as Amazon Macie, which
assists in discovering and securing personal data that is stored in Amazon S3.
● If you require FIPS 140-2 validated cryptographic modules when accessing
AWS through a command line interface or an API, use a FIPS endpoint.
Summary
● Opensearch
○ Open Source Search solution
● Upcoming and supported by AWS
● Caters to most search use cases
○ Great Query performance
● Powerful tools
● Community Support
Connect with me
● Trainings on various tech topics
● For any questions:
○ https://linkedin.com/in/coach4dev

Más contenido relacionado

La actualidad más candente

quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search medcl
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaSpringPeople
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaObjectRocket
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaMushfekur Rahman
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowYohei Onishi
 
Elasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseElasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseRicha Budhraja
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solrKnoldus Inc.
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
API Gateway를 이용한 토큰 기반 인증 아키텍처
API Gateway를 이용한 토큰 기반 인증 아키텍처API Gateway를 이용한 토큰 기반 인증 아키텍처
API Gateway를 이용한 토큰 기반 인증 아키텍처Yoonjeong Kwon
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 

La actualidad más candente (20)

quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Logstash
LogstashLogstash
Logstash
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Elasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseElasticsearch V/s Relational Database
Elasticsearch V/s Relational Database
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
API Gateway를 이용한 토큰 기반 인증 아키텍처
API Gateway를 이용한 토큰 기반 인증 아키텍처API Gateway를 이용한 토큰 기반 인증 아키텍처
API Gateway를 이용한 토큰 기반 인증 아키텍처
 
Security Analytics with OpenSearch
Security Analytics with OpenSearchSecurity Analytics with OpenSearch
Security Analytics with OpenSearch
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Elk
Elk Elk
Elk
 

Similar a OpenSearch: A Guide to the Powerful Open Source Search and Analytics Engine

Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in RetailHari Shreedharan
 
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanAnalytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanDatabricks
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Marcos García
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
Serverless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportServerless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportMetosin Oy
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013Emanuel Calvo
 
TRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseTRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseHakan Ilter
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django applicationbangaloredjangousergroup
 
Google app engine - Soft Uni 19.06.2014
Google app engine - Soft Uni 19.06.2014Google app engine - Soft Uni 19.06.2014
Google app engine - Soft Uni 19.06.2014Dimitar Danailov
 
Load testing in Zonky with Gatling
Load testing in Zonky with GatlingLoad testing in Zonky with Gatling
Load testing in Zonky with GatlingPetr Vlček
 
The Professional Programmer
The Professional ProgrammerThe Professional Programmer
The Professional ProgrammerDave Cross
 
Log Management: AtlSecCon2015
Log Management: AtlSecCon2015Log Management: AtlSecCon2015
Log Management: AtlSecCon2015cameronevans
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbWei Shan Ang
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at UberDatabricks
 

Similar a OpenSearch: A Guide to the Powerful Open Source Search and Analytics Engine (20)

Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in Retail
 
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari ShreedharanAnalytic Insights in Retail Using Apache Spark with Hari Shreedharan
Analytic Insights in Retail Using Apache Spark with Hari Shreedharan
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)Initial presentation of swift (for montreal user group)
Initial presentation of swift (for montreal user group)
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Serverless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience reportServerless Clojure and ML prototyping: an experience report
Serverless Clojure and ML prototyping: an experience report
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
 
TRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use CaseTRHUG 2015 - Veloxity Big Data Migration Use Case
TRHUG 2015 - Veloxity Big Data Migration Use Case
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Google app engine - Soft Uni 19.06.2014
Google app engine - Soft Uni 19.06.2014Google app engine - Soft Uni 19.06.2014
Google app engine - Soft Uni 19.06.2014
 
Load testing in Zonky with Gatling
Load testing in Zonky with GatlingLoad testing in Zonky with Gatling
Load testing in Zonky with Gatling
 
The Professional Programmer
The Professional ProgrammerThe Professional Programmer
The Professional Programmer
 
Log Management: AtlSecCon2015
Log Management: AtlSecCon2015Log Management: AtlSecCon2015
Log Management: AtlSecCon2015
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 

Último

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 

Último (20)

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 

OpenSearch: A Guide to the Powerful Open Source Search and Analytics Engine

  • 2. Agenda ● OpenSearch ○ What is it? ○ Benefits/ Uses ○ How to use it ○ Features ● Migrate from Elastic to OpenSearch ● Tools & Plugins
  • 3. About Me ● Lead Dev ● Located in Florida ● Trainer ● Presenter ● .NET Developer ● Youtuber: Coach4Dev ● Husband/ Father
  • 4. Amazon Elasticsearch ● Launched in 2015 ● Gained popularity for log analytics usage ● Used open-source Elastic under Apache License v2 ● Jan 2021 ○ Elastic NV changed licensing strategy ○ After ElasticSearch 7.10.2 & Kibana 7.10.2 ■ Not release under Apache License v2 ■ Release under Elastic License
  • 5. OpenSearch ● Sep 2021: ○ Renamed from ElasticSearch to OpenSearch ● OpenSource fork from Elastic 7.10.2 and Kibana 7.10.2 ● Highly scalable ● Fast access & response to large volumes of data ● Powered by Apache Lucene Search library
  • 6. Apache Lucene ● Apache Lucene project develops open-source search software ○ Releases a core search library named Lucene core ● Lucene Core ○ Java Library providing powerful indexing and search features
  • 7. Apache Solr ● Open source search platform ● Built on Apache Lucene
  • 8. Solr vs ElasticSearch ● Similar performance mostly. ● ES has better support for scalability ○ due to horizontal scaling ■ Better cloud support too ● ES can support multiple doc types in a single index better ○ More difficult to do this in Solr ● ES supports native DSL (Domain Specific Language) ○ Need to program queries in Solr ● https://mindmajix.com/elasticsearch-vs-solr
  • 9. Why OpenSearch ● Huge amount of machine generated data these days ○ Growing exponentially ● Getting insights is important ● Interactive log analytics ● Real-time application monitoring ● Website Search, etc.
  • 10. OpenSearch Features ● Easy to set-up and configure ● In-place upgrades ● Enables data monitoring & setting alerts based on thresholds ● Supports authentication, encryption & compliance requirements
  • 11. OpenSearch vs ElasticSearch ● OpenSearch was forked from Elastic Search ○ Now they are separate from each other ● Each is adding features separately ● OpenSearch ○ Inbuilt support from AWS
  • 12. OpenSearch features not in ES (free version) ● Centralized user accounts / access control ● Cross-cluster replication ● IP filtering ● Configurable retention period ● Anomaly detection ● Tableau connector ● JDBC driver ● ODBC driver ● Machine learning features such as regression and classification ● Link
  • 13. ElasticSearch Features ● Based on subscription levels ● https://www.elastic.co/subscriptions
  • 14. OpenSearch & ElasticSearch Version Support ● Currently supports the following OpenSearch versions: ○ 1.3, 1.2, 1.1, 1.0 ● And supports the following ElasticSearch versions: ○ 7.10, 7.9, 7.8, 7.7, 7.4, 7.1 ○ 6.8, 6.7, 6.5, 6.4, 6.3, 6.2, 6.0 ○ 5.6, 5.5, 5.3, 5.1 ○ 2.3 ○ 1.5
  • 15. What is Kibana ● Free & open front end application ● Charting tool for Elastic Stack ● Sits on top of Elastic Stack ● Sample Dashboard
  • 16. OpenSearch Dashboards ● Default visualization tool for data in OpenSearch ● Filter data with queries ● Comes with opensearch service
  • 18. OpenSearch Cluster ● Synonymous to domain ● Domains are clusters with ○ settings, ○ instance types, ○ instance counts, ○ and storage resources that you specify. ● Group of nodes ○ With same cluster.name attribute
  • 19. Opensearch Node ● Member of a cluster ● A distinct host ● With IP address
  • 20. Getting Started ● Create a domain ● Size the domain appropriately for your workload ● Control access to your domain using a domain access policy or fine-grained access control ● Index data manually or from other AWS services ● Use OpenSearch Dashboards to search your data and create visualizations
  • 21. Custom Endpoint ● If we want easier to read or custom domain name ● Can use Https ○ Upload SSL certificate
  • 22. Run OpenSearch locally ● Install docker ● wsl -d docker-desktop ● sysctl -w vm.max_map_count=262144 ● Ctrl+C ● docker-compose up ● Visit http://localhost:5601/ ● Use admin/admin to login and explore ● Link
  • 23. Upload Data ● One at a time ● Bulk
  • 24. Upload Data One At a time ● curl -XPUT -u "master:XXXX" "https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a mazonaws.com/movies/_doc/1" -d "{"director": "Burton, Tim", "genre": ["Comedy","Sci-Fi"], "year": 1996, "actor": ["Jack Nicholson","Pierce Brosnan","Sarah Jessica Parker"], "title": "Mars Attacks!"}" -H "Content-Type: application/json"
  • 25. Upload Data Bulk ● curl -XPOST -u "master:XXXXX" "https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a mazonaws.com/_bulk" --data-binary @bulk_movies.txt -H "Content-Type: application/json"
  • 27. Searching Data ● URI Searches ● Command Line ● OpenSearch Dashboards
  • 28. Searching Data - URI ● GET Request ● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am azonaws.com/movies/_search?q=rebel&pretty=true ● Searches all the indices and properties
  • 29. URI Search Specific fields ● Search movies index and title property ● GET https://search-my-domain.us-west-1.es.amazonaws.com/movies/_search?q=ti tle:house
  • 30. Get Search Results - Command Line ● curl -XGET -u "master:XXXXX" "https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a mazonaws.com/movies/_search?q=rebel&pretty=true"
  • 31. Query DSL ● For more complex queries ○ OpenSearch Domain Specific Language (DSL) ● POST request with query body ●
  • 32. Get Search Results - Dev Tools ● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am azonaws.com/_dashboards/app/dev_tools#/console ○ GET _search ○ { ○ "query": { ○ "match_all": {} ○ } ○ }
  • 33. Search on only specific fields GET _search { "size": 20, "query": { "multi_match": { "query": "U.S.", "fields": ["title", "actor", "director"] } } }
  • 34. Search - Boosting fields GET _search { "size": 20, "query": { "multi_match": { "query": "john", "fields": ["title^4", "actor", "director^4"] } } }
  • 35. Search - Pagination GET _search { "from": 0, "size": 1, "query": { "multi_match": { "query": "Drama", "fields": ["genre"] } } }
  • 36. Query -With Highlights In Response GET _search { "size": 20, "query": { "multi_match": { "query": "Manchurian", "fields": ["title^4", "actor", "director"] } }, "highlight": { "fields": { "title": {} }, "pre_tags": "<strong>", "post_tags": "</strong>", "fragment_size": 200, "boundary_chars": ".,!? " } }
  • 37. Query - Count GET movies/_count { "query": { "multi_match": { "query": "Manchurian", "fields": ["title^4", "actor", "director"] } } }
  • 38. Dashboard Query Language ● Use DQL in Dashboards ○ Search for data and visualizations ● Terms Query ○ Search for any text ■ E.g. www.example.com ○ Access object’s nested field ■ E.g. coordinates.lat:43.7102 ○ Leading and trailing wildcards ■ host.keyword:*.example.com/* ● Operators ○ AND ○ OR
  • 39. Dashboard Query Language ● Date and range Queries ○ bytes >= 15 and memory < 15 ○ @timestamp > "2020-12-14T09:35:33" ● Nested field query ○ superheroes: {hero-name: Superman}
  • 41. Query Workbench ● SQL ○ Run SQL ○ Treat indices as tables ● PPL ○ Piped Processing Language ○ Commands delimited by pipes
  • 42. Reporting ● Multiple file formats ● On demand/ Scheduled ● Generate from ○ Dashboard ○ Visualization ○ Discover
  • 43. Anomaly Detection ● Detect unusual behavior in time series data ● Anomaly Grade ● Confidence Score
  • 44. Notifications ● Supported ○ Amazon Chime ○ SNS ○ SES ○ SMTP ○ Slack ○ Custom Webhooks
  • 45. Observability plugin ● Visualize/Query time series data ● Event analytics ● Compare the data the way you like
  • 46. Index Management ● Create ISM policy ● To manage your indexes
  • 47. Security plugin ● Set up RBAC ●
  • 48. Migrate from ElasticSearch to OpenSearch
  • 49. Three major approaches ● Snapshot ● Rolling Upgrade ● Cluster Restart
  • 50. Snapshot Method ● Generate snapshot in ElasticSearch ● Save in shared directory ● Restore in OpenSearch ● Snapshot ○ Backup of entire cluster state ○ Useful for recovery from failure and migration ● Link
  • 51. Snapshot Method ● Check Index compatibility ○ E.g.: Cant restore 7.6.0 snapshot into 7.5.0 cluster ● Link ● Fastest ● Easiest ● Most efficient ●
  • 52. Rolling Upgrade ● Official way to migrate cluster ● Without interruption ● Rolling upgrades are supported: ○ Between minor versions ○ From 5.6 to 6.8 ○ From 6.8 to 7.14.1 ○ From any version since 7.14.0 to 7.14.1
  • 53. Rolling Upgrade ● Shut down one node at a time ○ Minimal disruption
  • 54. Cluster Restart Upgrades ● Shut down all nodes ● Perform the upgrade ● Restart the cluster
  • 56. OpenSearch Mapping ● Dynamic ○ When you index a document ○ Opensearch adds fields automatically ○ It deduces their types by itself ● Explicit ○ If you know your data types ○ Preferred way of doing things
  • 57. OpenSearch Mapping ● If you do not define a mapping ahead of time, OpenSearch dynamically creates a mapping for you. ● If you do decide to define your own mapping, you can do so at index creation. ● ONE mapping is defined per index. Once the index has been created, we can only add new fields to a mapping. We CANNOT change the mapping of an existing field. ● If you must change the type of an existing field, you must create a new index with the desired mapping, then reindex all documents into the new index.
  • 58. Text vs keyword data types ● Text type ○ Full text searches ● Keyword type ○ Exact searches ○ Aggregations ○ Sorting
  • 59. Text vs Keyword ● Inverted Index
  • 61. OpenSearch Aggregations ● Analyze data ○ In real time too ● Extract statistics ● More expensive than queries ○ Or CPU and Memory ○ In general
  • 62. Aggregation Query ● Use aggs or aggregations
  • 65. Data Streams in OpenSearch ● Ingesting time series data ○ Logs ○ Events ○ Metrics, etc. ● Number of documents grows rapidly ● Append Only data ● Don't need to update older documents (Very rarely)
  • 66. Rollover ● If data is growing rapidly ● Write to index upto certain threshold ○ Then create a new index ○ And start writing to it ● Optimize the active index for high ingest rates on high-performance hot nodes. ● Optimize for search performance on warm nodes. ● Shift older, less frequently accessed data to less expensive cold nodes, ● Delete data according to your retention policies by removing entire indices.
  • 67. Index Template ● Data Stream requires an index template ● A name or wildcard (*) pattern for the data stream. ● The data stream’s timestamp field. This field must be mapped as a date or date_nanos field data type and must be included in every document indexed to the data stream. ● The mappings and settings applied to each backing index when it’s created.
  • 68. ILM Policy ● Index Lifecycle Management Policy ● Can be applied to any number of indices ● Usage ○ Allocate ○ Delete ○ Rollover ○ Read Only ○ Wait for snapshot
  • 69. ILM Policy ● Create a policy: ● Link
  • 73. Index Template ● Tells ElasticSearch how to configure an index when it is created ● For data streams ○ Configures the stream’s backing indices ○ Configured prior to index creation
  • 74. Templates Types ● Component Templates ○ Reusable building blocks that configure ■ mappings, ■ settings, and ■ Aliases ○ Not directly applied to indices ● Index Template ○ Collection of component templates ○ Directly applied to indices ○ Some defaults: metrics-*-*, logs-*-*
  • 76. Create Index Template ● Data Stream requires matching index template ● PUT _index_template/{template_name}
  • 78. Create data stream ● Documents must contain timestamp field ● PUT _data_stream/my-data-stream ● Stream’s name must match one of your index template’s index patterns
  • 79. Get Info About Data Stream ● GET _data_stream/my-data-stream
  • 80. Delete Data Stream ● DELETE _data_stream/my-data-stream
  • 82. Cross Cluster Replication ● Cross Cluster replication plugin ○ Replicates indexes, mapping & metadata from one cluster to another ● Advantages ○ Continue to handle search requests if there is an outage ○ Can help reduce latency in application ■ Replicating data across geographically distant data centers
  • 83. Replication ● Active passive model ○ Follower index pulls data from leader index ● It can be ○ Started ○ Paused ○ Stopped ○ Resumed ● Can be secured ○ Security plugin ○ Encrypt cross cluster traffic
  • 84. Exercise ● Create 2 domains in AWS OpenSearch ● Link
  • 85. Exercise ● Source Domain Connections Tab -> Outbound -> ○ Create Connection to Destination Domain ● Set access policy on destination domain: ● Link ○ ○
  • 86. Exercise ● Get Connection status ○ GET _plugins/_replication/connect1/_status ● Start syncing ○ PUT _plugins/_replication/connect1/_start ○ { ○ "leader_alias": "Connect1", ○ "leader_index": "movies", ○ "use_roles":{ ○ "leader_cluster_role": "all_access", ○ "follower_cluster_role": "all_access" ○ } ○ }
  • 88. Opensearch plugins ● Standalone components ○ That add features and capabilities ● Huge number of plugins available ● E.g. ○ Replication Plugin ○ Security plugin ○ Notification plugin
  • 89. SQL Plugin ● Lets you run SQL queries on ESDB ● Add data ○ PUT movies/_doc/1 ○ { "title": "Spirited Away" } ● Query data ○ POST _plugins/_sql ○ { ○ "query": "SELECT * FROM movies LIMIT 50" ○ } ○
  • 90. SQL Plugin ● Delete data from ESDB Index ● Enable Delete via SQL plugin ○ PUT _plugins/_query/settings ○ { ○ "transient": { ○ "plugins.sql.delete.enabled": "true" ○ } ○ } ○
  • 91. SQL PLugin - Delete ● To Delete the data ○ POST _plugins/_sql ○ { ○ "query": "DELETE FROM movies" ○ } ○
  • 92. Asynchronous Search ● Large volumes of data ● Can take longer to search ● Async ○ Run searches in the background ○ Monitor progress of these searches ○ Get back partial results as they become available
  • 93. Asynchronous Search ● POST _plugins/_asynchronous_search ● Response contents: ○ ID ■ Can be used to track the state of the search ■ Get partial results ○ State ■ Running ■ Completed ■ Persisted ● Link
  • 95. Clients ● OpenSearch Python client ● OpenSearch JavaScript (Node.js) client ● OpenSearch .NET clients ● OpenSearch Go client ● OpenSearch PHP client
  • 96. Open Search Client for .NET ● OpenSearch.Net ○ Low level client ● OpenSearch.Client ○ High level client ● Sample code: Link
  • 97. Exercise ● Create a .NET application ● Add a document to OpenSearch using the .NET Application ○ OpenSearch.Client (.NET High level client)
  • 99. Beats ● Data shippers ● Agents on servers ● Send data to ES/ Logstash
  • 100. Grafana ● An open source visualization tool ● Various sources can be used as data source: ○ InfluxDB ○ MySQL ○ ElasticSearch ○ PostgreSQL ● Better suited for metrics visualizations ● Does not allow full text data querying
  • 101. Logstash ● Free/ Open-Source ● Data processing pipeline ● Ingests data from multitude of sources ● Transforms it ● Sends it to your favorite stash
  • 102. Logstash - Ingestion ● Data of all shapes/ sizes/ source ○ Can be ingested ● It can parse/ transform your data
  • 103. Logstash - Output ● ElasticSearch ● Mongodb ● S3 ● Etc. ● Link
  • 104. AWS OpenSearch Security ● Use multi-factor authentication (MFA) with each account. ● Use SSL/TLS to communicate with AWS resources. We recommend TLS 1.2 or later. ● Set up API and user activity logging with AWS CloudTrail. ● Use AWS encryption solutions, along with all default security controls within AWS services. ● Use advanced managed security services such as Amazon Macie, which assists in discovering and securing personal data that is stored in Amazon S3. ● If you require FIPS 140-2 validated cryptographic modules when accessing AWS through a command line interface or an API, use a FIPS endpoint.
  • 105. Summary ● Opensearch ○ Open Source Search solution ● Upcoming and supported by AWS ● Caters to most search use cases ○ Great Query performance ● Powerful tools ● Community Support
  • 106. Connect with me ● Trainings on various tech topics ● For any questions: ○ https://linkedin.com/in/coach4dev