OpenSearch: A Guide to the Powerful Open Source Search and Analytics Engine

Agenda
● OpenSearch
○ What is it?
○ Benefits/ Uses
○ How to use it
○ Features
● Migrate from Elastic to OpenSearch
● Tools & Plugins

About Me
● Lead Dev
● Located in Florida
● Trainer
● Presenter
● .NET Developer
● Youtuber: Coach4Dev
● Husband/ Father

Amazon Elasticsearch
● Launched in 2015
● Gained popularity for log analytics usage
● Used open-source Elastic under Apache License v2
● Jan 2021
○ Elastic NV changed licensing strategy
○ After ElasticSearch 7.10.2 & Kibana 7.10.2
■ Not release under Apache License v2
■ Release under Elastic License

OpenSearch
● Sep 2021:
○ Renamed from ElasticSearch to OpenSearch
● OpenSource fork from Elastic 7.10.2 and Kibana 7.10.2
● Highly scalable
● Fast access & response to large volumes of data
● Powered by Apache Lucene Search library

Apache Lucene
● Apache Lucene project develops open-source search software
○ Releases a core search library named Lucene core
● Lucene Core
○ Java Library providing powerful indexing and search features

Apache Solr
● Open source search platform
● Built on Apache Lucene

Solr vs ElasticSearch
● Similar performance mostly.
● ES has better support for scalability
○ due to horizontal scaling
■ Better cloud support too
● ES can support multiple doc types in a single index better
○ More difficult to do this in Solr
● ES supports native DSL (Domain Specific Language)
○ Need to program queries in Solr
● https://mindmajix.com/elasticsearch-vs-solr

Why OpenSearch
● Huge amount of machine generated data these days
○ Growing exponentially
● Getting insights is important
● Interactive log analytics
● Real-time application monitoring
● Website Search, etc.

OpenSearch Features
● Easy to set-up and configure
● In-place upgrades
● Enables data monitoring & setting alerts based on thresholds
● Supports authentication, encryption & compliance requirements

OpenSearch vs ElasticSearch
● OpenSearch was forked from Elastic Search
○ Now they are separate from each other
● Each is adding features separately
● OpenSearch
○ Inbuilt support from AWS

OpenSearch features not in ES (free version)
● Centralized user accounts / access control
● Cross-cluster replication
● IP filtering
● Configurable retention period
● Anomaly detection
● Tableau connector
● JDBC driver
● ODBC driver
● Machine learning features such as regression and classification
● Link

ElasticSearch Features
● Based on subscription levels
● https://www.elastic.co/subscriptions

OpenSearch & ElasticSearch Version Support
● Currently supports the following OpenSearch versions:
○ 1.3, 1.2, 1.1, 1.0
● And supports the following ElasticSearch versions:
○ 7.10, 7.9, 7.8, 7.7, 7.4, 7.1
○ 6.8, 6.7, 6.5, 6.4, 6.3, 6.2, 6.0
○ 5.6, 5.5, 5.3, 5.1
○ 2.3
○ 1.5

What is Kibana
● Free & open front end application
● Charting tool for Elastic Stack
● Sits on top of Elastic Stack
● Sample Dashboard

OpenSearch Dashboards
● Default visualization tool for data in OpenSearch
● Filter data with queries
● Comes with opensearch service

OpenSearch Cluster
● Synonymous to domain
● Domains are clusters with
○ settings,
○ instance types,
○ instance counts,
○ and storage resources that you specify.
● Group of nodes
○ With same cluster.name attribute

Opensearch Node
● Member of a cluster
● A distinct host
● With IP address

Getting Started
● Create a domain
● Size the domain appropriately for your workload
● Control access to your domain using a domain access policy or fine-grained
access control
● Index data manually or from other AWS services
● Use OpenSearch Dashboards to search your data and create visualizations

Custom Endpoint
● If we want easier to read or custom domain name
● Can use Https
○ Upload SSL certificate

Run OpenSearch locally
● Install docker
● wsl -d docker-desktop
● sysctl -w vm.max_map_count=262144
● Ctrl+C
● docker-compose up
● Visit http://localhost:5601/
● Use admin/admin to login and explore
● Link

Upload Data
● One at a time
● Bulk

Upload Data One At a time
● curl -XPUT -u "master:XXXX"
"https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.a
mazonaws.com/movies/_doc/1" -d "{"director": "Burton, Tim", "genre":
["Comedy","Sci-Fi"], "year": 1996, "actor": ["Jack Nicholson","Pierce
Brosnan","Sarah Jessica Parker"], "title": "Mars Attacks!"}" -H "Content-Type:
application/json"

Upload Data Bulk
● curl -XPOST -u "master:XXXXX"
mazonaws.com/_bulk" --data-binary @bulk_movies.txt -H "Content-Type:
application/json"

Searching Data
● URI Searches
● Command Line
● OpenSearch Dashboards

Searching Data - URI
● GET Request
● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am
azonaws.com/movies/_search?q=rebel&pretty=true
● Searches all the indices and properties

URI Search Specific fields
● Search movies index and title property
● GET
https://search-my-domain.us-west-1.es.amazonaws.com/movies/_search?q=ti
tle:house

Get Search Results - Command Line
● curl -XGET -u "master:XXXXX"
mazonaws.com/movies/_search?q=rebel&pretty=true"

Query DSL
● For more complex queries
○ OpenSearch Domain Specific Language (DSL)
● POST request with query body
●

Get Search Results - Dev Tools
● https://search-test-domain-s7g5csgqurpevadhaonp75mwgm.us-west-1.es.am
azonaws.com/_dashboards/app/dev_tools#/console
○ GET _search
○ {
○ "query": {
○ "match_all": {}
○ }
○ }

Search on only specific fields
GET _search
{
"size": 20,
"query": {
"multi_match": {
"query": "U.S.",
"fields": ["title", "actor", "director"]
}
}
}

Search - Boosting fields
GET _search
{
"size": 20,
"query": {
"multi_match": {
"query": "john",
"fields": ["title^4", "actor", "director^4"]
}
}
}

Search - Pagination
GET _search
{
"from": 0,
"size": 1,
"query": {
"multi_match": {
"query": "Drama",
"fields": ["genre"]
}
}
}

Query -With Highlights In Response
GET _search
{
"size": 20,
"query": {
"multi_match": {
"query": "Manchurian",
"fields": ["title^4", "actor", "director"]
}
},
"highlight": {
"fields": {
"title": {}
},
"pre_tags": "<strong>",
"post_tags": "</strong>",
"fragment_size": 200,
"boundary_chars": ".,!? "
}
}

Query - Count
GET movies/_count
{
"query": {
"multi_match": {
"query": "Manchurian",
"fields": ["title^4", "actor", "director"]
}
}
}

Dashboard Query Language
● Use DQL in Dashboards
○ Search for data and visualizations
● Terms Query
○ Search for any text
■ E.g. www.example.com
○ Access object’s nested field
■ E.g. coordinates.lat:43.7102
○ Leading and trailing wildcards
■ host.keyword:*.example.com/*
● Operators
○ AND
○ OR

Dashboard Query Language
● Date and range Queries
○ bytes >= 15 and memory < 15
○ @timestamp > "2020-12-14T09:35:33"
● Nested field query
○ superheroes: {hero-name: Superman}

Query Workbench
● SQL
○ Run SQL
○ Treat indices as tables
● PPL
○ Piped Processing Language
○ Commands delimited by pipes

Reporting
● Multiple file formats
● On demand/ Scheduled
● Generate from
○ Dashboard
○ Visualization
○ Discover

Anomaly Detection
● Detect unusual behavior in time series data
● Anomaly Grade
● Confidence Score

Notifications
● Supported
○ Amazon Chime
○ SNS
○ SES
○ SMTP
○ Slack
○ Custom Webhooks

Observability plugin
● Visualize/Query time series data
● Event analytics
● Compare the data the way you like

Index Management
● Create ISM policy
● To manage your indexes

Security plugin
● Set up RBAC
●

Migrate from ElasticSearch to OpenSearch

Three major approaches
● Snapshot
● Rolling Upgrade
● Cluster Restart

Snapshot Method
● Generate snapshot in ElasticSearch
● Save in shared directory
● Restore in OpenSearch
● Snapshot
○ Backup of entire cluster state
○ Useful for recovery from failure and migration
● Link

Snapshot Method
● Check Index compatibility
○ E.g.: Cant restore 7.6.0 snapshot into 7.5.0 cluster
● Link
● Fastest
● Easiest
● Most efficient
●

Rolling Upgrade
● Official way to migrate cluster
● Without interruption
● Rolling upgrades are supported:
○ Between minor versions
○ From 5.6 to 6.8
○ From 6.8 to 7.14.1
○ From any version since 7.14.0 to 7.14.1

Rolling Upgrade
● Shut down one node at a time
○ Minimal disruption

Cluster Restart Upgrades
● Shut down all nodes
● Perform the upgrade
● Restart the cluster

OpenSearch Mapping
● Dynamic
○ When you index a document
○ Opensearch adds fields automatically
○ It deduces their types by itself
● Explicit
○ If you know your data types
○ Preferred way of doing things

OpenSearch Mapping
● If you do not define a mapping ahead of time, OpenSearch dynamically
creates a mapping for you.
● If you do decide to define your own mapping, you can do so at index creation.
● ONE mapping is defined per index. Once the index has been created, we can
only add new fields to a mapping. We CANNOT change the mapping of an
existing field.
● If you must change the type of an existing field, you must create a new index
with the desired mapping, then reindex all documents into the new index.

Text vs keyword data types
● Text type
○ Full text searches
● Keyword type
○ Exact searches
○ Aggregations
○ Sorting

Text vs Keyword
● Inverted Index

OpenSearch Aggregations
● Analyze data
○ In real time too
● Extract statistics
● More expensive than queries
○ Or CPU and Memory
○ In general

Aggregation Query
● Use aggs or aggregations

Data Streams in OpenSearch
● Ingesting time series data
○ Logs
○ Events
○ Metrics, etc.
● Number of documents grows rapidly
● Append Only data
● Don't need to update older documents (Very rarely)

Rollover
● If data is growing rapidly
● Write to index upto certain threshold
○ Then create a new index
○ And start writing to it
● Optimize the active index for high ingest rates on high-performance hot
nodes.
● Optimize for search performance on warm nodes.
● Shift older, less frequently accessed data to less expensive cold nodes,
● Delete data according to your retention policies by removing entire indices.

Index Template
● Data Stream requires an index template
● A name or wildcard (*) pattern for the data stream.
● The data stream’s timestamp field. This field must be mapped as a date or
date_nanos field data type and must be included in every document indexed
to the data stream.
● The mappings and settings applied to each backing index when it’s created.

ILM Policy
● Index Lifecycle Management Policy
● Can be applied to any number of indices
● Usage
○ Allocate
○ Delete
○ Rollover
○ Read Only
○ Wait for snapshot

ILM Policy
● Create a policy:
● Link

Index Template
● Tells ElasticSearch how to configure an index when it is created
● For data streams
○ Configures the stream’s backing indices
○ Configured prior to index creation

Templates Types
● Component Templates
○ Reusable building blocks that configure
■ mappings,
■ settings, and
■ Aliases
○ Not directly applied to indices
● Index Template
○ Collection of component templates
○ Directly applied to indices
○ Some defaults: metrics-*-*, logs-*-*

Create Component Template
● Link

Create Index Template
● Data Stream requires matching index template
● PUT _index_template/{template_name}

Create Index Template
● Link

Create data stream
● Documents must contain timestamp field
● PUT _data_stream/my-data-stream
● Stream’s name must match one of your index template’s index patterns

Get Info About Data Stream
● GET _data_stream/my-data-stream

Delete Data Stream
● DELETE _data_stream/my-data-stream

Cross Cluster Replication
● Cross Cluster replication plugin
○ Replicates indexes, mapping & metadata from one cluster to another
● Advantages
○ Continue to handle search requests if there is an outage
○ Can help reduce latency in application
■ Replicating data across geographically distant data centers

Replication
● Active passive model
○ Follower index pulls data from leader index
● It can be
○ Started
○ Paused
○ Stopped
○ Resumed
● Can be secured
○ Security plugin
○ Encrypt cross cluster traffic

Exercise
● Create 2 domains in AWS OpenSearch
● Link

Exercise
● Source Domain Connections Tab -> Outbound ->
○ Create Connection to Destination Domain
● Set access policy on destination domain:
● Link
○
○

Exercise
● Get Connection status
○ GET _plugins/_replication/connect1/_status
● Start syncing
○ PUT _plugins/_replication/connect1/_start
○ {
○ "leader_alias": "Connect1",
○ "leader_index": "movies",
○ "use_roles":{
○ "leader_cluster_role": "all_access",
○ "follower_cluster_role": "all_access"
○ }
○ }

Opensearch plugins
● Standalone components
○ That add features and capabilities
● Huge number of plugins available
● E.g.
○ Replication Plugin
○ Security plugin
○ Notification plugin

SQL Plugin
● Lets you run SQL queries on ESDB
● Add data
○ PUT movies/_doc/1
○ { "title": "Spirited Away" }
● Query data
○ POST _plugins/_sql
○ {
○ "query": "SELECT * FROM movies LIMIT 50"
○ }
○

SQL Plugin
● Delete data from ESDB Index
● Enable Delete via SQL plugin
○ PUT _plugins/_query/settings
○ {
○ "transient": {
○ "plugins.sql.delete.enabled": "true"
○ }
○ }
○

SQL PLugin - Delete
● To Delete the data
○ POST _plugins/_sql
○ {
○ "query": "DELETE FROM movies"
○ }
○

Asynchronous Search
● Large volumes of data
● Can take longer to search
● Async
○ Run searches in the background
○ Monitor progress of these searches
○ Get back partial results as they become available

Asynchronous Search
● POST _plugins/_asynchronous_search
● Response contents:
○ ID
■ Can be used to track the state of the search
■ Get partial results
○ State
■ Running
■ Completed
■ Persisted
● Link

Clients
● OpenSearch Python client
● OpenSearch JavaScript (Node.js) client
● OpenSearch .NET clients
● OpenSearch Go client
● OpenSearch PHP client

Open Search Client for .NET
● OpenSearch.Net
○ Low level client
● OpenSearch.Client
○ High level client
● Sample code: Link

Exercise
● Create a .NET application
● Add a document to OpenSearch using the .NET Application
○ OpenSearch.Client (.NET High level client)

Beats
● Data shippers
● Agents on servers
● Send data to ES/ Logstash

Grafana
● An open source visualization tool
● Various sources can be used as data source:
○ InfluxDB
○ MySQL
○ ElasticSearch
○ PostgreSQL
● Better suited for metrics visualizations
● Does not allow full text data querying

Logstash
● Free/ Open-Source
● Data processing pipeline
● Ingests data from multitude of sources
● Transforms it
● Sends it to your favorite stash

Logstash - Ingestion
● Data of all shapes/ sizes/ source
○ Can be ingested
● It can parse/ transform your data

Logstash - Output
● ElasticSearch
● Mongodb
● S3
● Etc.
● Link

AWS OpenSearch Security
● Use multi-factor authentication (MFA) with each account.
● Use SSL/TLS to communicate with AWS resources. We recommend TLS 1.2
or later.
● Set up API and user activity logging with AWS CloudTrail.
● Use AWS encryption solutions, along with all default security controls within
AWS services.
● Use advanced managed security services such as Amazon Macie, which
assists in discovering and securing personal data that is stored in Amazon S3.
● If you require FIPS 140-2 validated cryptographic modules when accessing
AWS through a command line interface or an API, use a FIPS endpoint.

Summary
● Opensearch
○ Open Source Search solution
● Upcoming and supported by AWS
● Caters to most search use cases
○ Great Query performance
● Powerful tools
● Community Support

Connect with me
● Trainings on various tech topics
● For any questions:
○ https://linkedin.com/in/coach4dev

OpenSearch: A Guide to the Powerful Open Source Search and Analytics Engine

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a OpenSearch: A Guide to the Powerful Open Source Search and Analytics Engine

Similar a OpenSearch: A Guide to the Powerful Open Source Search and Analytics Engine (20)

Último

Último (20)

OpenSearch: A Guide to the Powerful Open Source Search and Analytics Engine