SlideShare una empresa de Scribd logo
1 de 46
Descargar para leer sin conexión
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Searching your data with Amazon
Elasticsearch Service
Jon Handler
Principal Solutions Architect
AWS
A N T 3 8 4
Alolita Sharma
Principal Technologist
AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Elasticsearch – popular, open-source database engine
Open source
Fast time to value
Easy ingestion
Easy visualization
High performance and distributed
Best analytics and search
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Built for search and analysis
Natural language
Boolean queries
Relevance
Text search
High-volume ingest
Near real time
Distributed storage
Streaming
Time-based visualizations
Nestable statistics
Time series tools
Analysis
0010110100101110001
0111100110000001100
0100110010001100110
0110100001101010011
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What does it do?
Application DataServer, application,
network, AWS, and
other logs
Amazon Elasticsearch Service Domain
with index(es) Application users, analysts, DevOps,
security
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon ES is a fully managed
service that makes it easy to
deploy, manage, and scale
Elasticsearch and Kibana
Amazon Elasticsearch Service (Amazon ES)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tightly Integrated with
Other AWS Services
Seamless data ingestion, security,
auditing and orchestration
Benefits of Amazon ES
Supports Open-Source
APIs and Tools
Drop-in replacement with no need
to learn new APIs or skills
Easy to Use
Deploy a production-ready
Elasticsearch cluster in minutes
Scalable
Resize your cluster with a few
clicks or a single API call
Secure
Deploy into your VPC and restrict
access using security groups and
IAM policies
Highly Available
Replicate across Availability
Zones, with monitoring and
automated self-healing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Software & Internet Financial ServicesEducation Technology Biotech and Pharma
Media and Entertainment Social Media Telecommunications Travel & Transportation
Real Estate Logistics & Operations Publishing Other
Amazon ES customers
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Foundations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why do you need search?
Your customers want to find the products that you want them to buy
Your organization needs to share information across departments
You need a ride to the airport
Your developers need to know what's going wrong with the new feature
Your website is down, it's 3:00 a.m., and you need to diagnose and correct
the problem
Search enables locating a specific piece of
information, relevant to a goal
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
You create documents that are the source of indexing
and the targets of search
• You put your data in fields (analogy:
database columns)
• You send documents to an
Elasticsearch index
• Elasticsearch indexes data for each
field, according to a mapping
• Fields can have free text, numerics,
geo locations, dates, and more
• Your query specifies the fields to
search and the values to look for
and retrieves a sorted list of IDs
ID
Field: value
Field: value
Field: value
Field: value
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Documents are JSON
{
"id" : "tt0371746",
"title" : "Iron Man",
"release_date" : "2008-04-14T00:00:00Z",
"actors" : [
"Robert Downey Jr.",
"Gwyneth Paltrow",
"Terrence Howard"
],
"directors" : [
"Jon Favreau"
],
"rating" : 7.9,
"rank" : 171,
"running_time_secs" : 7560,
"genres" : [
"Action",
"Adventure",
"Sci-Fi"
],
"plot" : "When wealthy industrialist Tony
Stark is forced to build an armored suit after a
life-threatening incident, he ultimately decides
to use its technology to fight against evil.",
}
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Search APIs unlock the information
• Your application presents your users with a frame that lets them
express their goal – e.g., find a movie I want to watch
• It translates that goal to an Elasticsearch query
• Elasticsearch returns relevant results
ID
Field: value
Field: value
Field: value
Field: value
ID
Field: value
Field: value
Field: value
Field: value
ID
Field: value
Field: value
Field: value
Field: value
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Writing structured
queries
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different ways to query data
Query string queries
(URI/Kibana)
Query DSL
• Supports fielded string
matching for document
terms
• Lucene query syntax: AND,
OR, NOT, +/-/*, etc.
• Most useful for Kibana search
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Structure of a Query DSL query
{
"query": { ... },
"aggs" : { ... },
"size": 10,
"from": 0,
"sort": { ... },
"_source": [...],
"highlight": { ... },
"explain": true,
...
}
• Query – all of the query
clauses
• Aggs – Aggregations
• Size and from – pagination
• Sort – field-based sorting
• _source – control return
fields
• Highlight – fields to highlight
• Explain – show scoring info
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Query DSL for querying data
GET endpoint/index/_search
{
"query": {
"simple_query_string": {
"query": "Iron Man",
"fields": ["title"]
}
}
}
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Search results
• Results are JSON
documents – retrieve any
or all fields
• Overview
• Hits
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Full-text search ("match")
Blocks of natural language text
are decomposed and processed
to support matching
Queries are textual and match
any document that contains
them
A scoring function produces a
measure of relevance for sorting
results
Kinds of queries (the "query" part of Query DSL)
Exact matching ("terms")
Full-value matching for shorter
strings, numeric values, locations,
and dates
Used for filtering full-text
matches
Overlaps with database
functionality
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Queries vs. Filters
Query Filter
How well a document matches a
criterion?
Calculates a _score
No caching
Does this document match this
criterion?
Answer is binary – yes or no
No score is calculated
Filters are cached
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Terms queries
term/terms Matches one term or a list of terms
range Matches a numeric range
exists Matches documents with a non-null value for a field
prefix Matches docs with terms that match the prefix
wildcard Supports ? and * wildcard matching
regex Matches a regular expression
fuzzy Matches terms within a specified edit-distance
type/id Matches all docs of a type or docs of a type/id pair
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titles
Iron Man 2
Iron Man 3
Sky Captain and the World of
Tomorrow
Compound queries across multiple fields
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Text search
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Text Analysis
• Decomposes the text into a set of terms
• Applies natural language rules
• When querying a text field, apply the same analysis
Input Text Analyzers
Searchable
Index
Query
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The standard analyzer
Standard tokenizer – Unicode segmenter
Standard token filter – noop
Lowercase token filter – lower case
Stop token filter – disabled
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Standard tokenizer – Unicode segmenter
Standard token filter – noop
Lowercase token filter – lower case
Stop token filter – disabled
The standard analyzer
when wealthy industrialist tony
stark is forced to build an
armored suit after a life
threatening incident he
ultimately decides to use its
technology to fight against evil
`
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Improve matching - stemming
when wealthi
industrialist toni
stark is forc to build
an armored suit after a
life threaten incid, he
ultim decid to us it
technolog to fight
against evil.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Improve matching - stopwords
when wealthi
industrialist toni
stark is forc to build
an armored suit after a
life threaten incid, he
ultim decid to us it
technolog to fight
against evil.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Match query clauses
match_all Match all documents
match Match words
match_phrase Match an exact phrase or based on proximity
match_phrase_prefix Match the beginning of a phrase
multi_match Match against more than one field
common_terms Allow stopwords to have some influence on scoring
query_string Use Lucene query syntax
simple_query_string Simpler, more robust version of the query_string syntax
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Text queries
Plot
Big budget remake of the classic cartoon
about a creature intent on stealing Christmas.
A scientist in a surrealist society kidnaps
children to steal their dreams, hoping that
they slow his aging process.
The Hughes cottage vacation is violently
interrupted by a family on a murderous and
identity stealing journey, in search of the
"perfect" life.
Comic caper movie about a plan to steal a
gold shipment from the streets of Turin by
creating a traffic jam.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aggregations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aggregations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Query structure for aggregations
{
"query": { ... },
"aggs": {
"NAME": {
"terms": {
"field": "NAME",
"size": 10
}
}
}
}
Aggregations have a name
Terms – the type of the
aggregation
Field – use keyword fields to
keep cardinality low
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aggregation results
• Results follow the "hits"
section of the search result
• Buckets and counts translate
directly to the UI
• Send the "key" as a filter to
narrow search results
"aggregations": {
"actors_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 10,
"buckets": [
{
"key": "Gwyneth Paltrow",
"doc_count": 3
},
{
"key": "Robert Downey Jr.",
"doc_count": 3
},
{
"key": "Christopher Kirby",
"doc_count": 1
},
{
"key": "Cung Le",
"doc_count": 1
}, ...
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aggregations nest
• The query matches title with
"iron"
• Top-level bucket is actors
• Each actor bucket is further
divided by directors
"aggregations": {
"actors_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 10,
"buckets": [
{
"key": "Gwyneth Paltrow",
"doc_count": 3,
"directors_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jon Favreau",
"doc_count": 2
},
{
"key": "Shane Black",
"doc_count": 1
}
]
}
} ...
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Faceted drill down: add aggregation results as query
filters
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adjusting score
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scoring text queries: Okapi BM25
• Also known as “tf-idf”
Term Frequency – count of occurrences in this document
Inverse Document Frequency – (1 / occurrences across the corpus)
• Additional options such as:
Field value based ranking
Via rank functions that include document information
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Structure of the function_score query
{
"query": {
"function_score": {
"query": { ... },
"script_score": [
"script": {
"inline: "..."
}
]
} } }
• Use the _score as part of the
function
• Use built-in functions like
gauss, linear, exponential
• Works on numbers, dates,
and geo points
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results for querying for ‘love’
“title” _score
Love 5.718655
Love Sick Love 4.5949717
Love Actually 4.5949717
Stuck in Love 4.5949717
The Lovely Bones 4.5949717
Shakespeare in Love 4.5949717
To Rome with Love 4.5949717
The Look of Love 4.5949717
Love and Honor 4.5949717
The Loved Ones 4.5949717
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Add in rating and freshness
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Revised scores
Title _score Date Rating Orig _score
Love Actually 47.676254 20030907 7.7 4.5949717
Shakespeare in Love 44.878765 19981203 7.2 4.5949717
Stuck in Love 44.278923 20120909 7 4.5949717
Love 42.763493 20110202 5.5 5.718655
Love & Basketball 42.64078 20000126 6.8 4.5949717
Love Story 42.08128 19701216 6.7 4.5949717
The Lovely Bones 41.54153 20091124 6.6 4.5949717
The Loved Ones 41.536766 20090913 6.6 4.5949717
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Geo points and shapes
• Store geo hashes, points, polygons
• Bounding box filtering
• Polygon filtering
• Distance sorting
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Discussion
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Jon Handler
handler@amazon.com
Alolita Sharma
alolitas@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Data Patterns and Analysis with Amazon Neptune: A Case Study in Healthcare Bi...
Data Patterns and Analysis with Amazon Neptune: A Case Study in Healthcare Bi...Data Patterns and Analysis with Amazon Neptune: A Case Study in Healthcare Bi...
Data Patterns and Analysis with Amazon Neptune: A Case Study in Healthcare Bi...
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data...
 
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
 
Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018
Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018
Build an ETL Pipeline to Analyze Customer Data (AIM416) - AWS re:Invent 2018
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
 
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
One Data Lake, Many Uses: Enabling Multi-Tenant Analytics with Amazon EMR (AN...
 
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
Advancing Autonomous Vehicle Development Using Distributed Deep Learning (CMP...
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
On-Ramp to Graph Databases and Amazon Neptune (DAT335) - AWS re:Invent 2018
 
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
 
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
Deep Dive on Amazon S3: Manage Operations Across Amazon S3 Objects at Scale (...
 
Customer Uses of Data Lakes
Customer Uses of Data LakesCustomer Uses of Data Lakes
Customer Uses of Data Lakes
 
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
How Do I Know I Need an Amazon Neptune Graph Database? (DAT316) - AWS re:Inve...
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
 
[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...
[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...
[REPEAT] Better Analytics Through Natural Language Processing (AIM405-R) - AW...
 
AWS reInvent 2018 recap edition
AWS reInvent 2018 recap editionAWS reInvent 2018 recap edition
AWS reInvent 2018 recap edition
 
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
 

Similar a Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Invent 2018

AWS의 새로운 언어, 음성, 텍스트 처리 인공지능 서비스::Vikram Anbazhagan::AWS Summit Seoul 2018
AWS의 새로운 언어, 음성, 텍스트 처리 인공지능 서비스::Vikram Anbazhagan::AWS Summit Seoul 2018AWS의 새로운 언어, 음성, 텍스트 처리 인공지능 서비스::Vikram Anbazhagan::AWS Summit Seoul 2018
AWS의 새로운 언어, 음성, 텍스트 처리 인공지능 서비스::Vikram Anbazhagan::AWS Summit Seoul 2018
Amazon Web Services Korea
 

Similar a Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Invent 2018 (20)

Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Using Search with a Database
Using Search with a DatabaseUsing Search with a Database
Using Search with a Database
 
Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018
Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018
Connecting the Unconnected using GraphDB - Tel Aviv Summit 2018
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Non-Relational Revolution
Non-Relational RevolutionNon-Relational Revolution
Non-Relational Revolution
 
Non-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SFNon-Relational Revolution: Database Week SF
Non-Relational Revolution: Database Week SF
 
How to Enhance your Application using Amazon Comprehend for NLP - AWS Online ...
How to Enhance your Application using Amazon Comprehend for NLP - AWS Online ...How to Enhance your Application using Amazon Comprehend for NLP - AWS Online ...
How to Enhance your Application using Amazon Comprehend for NLP - AWS Online ...
 
ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...
ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...
ABD338_MirrorWeb - Powering Large-scale, Full-text Search for the UK Governme...
 
[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...
[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...
[NEW LAUNCH!] Extract Insights from Millions of Documents with Amazon Textrac...
 
Deep Dive on Amazon Neptune - AWS Online Tech Talks
Deep Dive on Amazon Neptune - AWS Online Tech TalksDeep Dive on Amazon Neptune - AWS Online Tech Talks
Deep Dive on Amazon Neptune - AWS Online Tech Talks
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter Dachnowicz
 
Adding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San FranciscoAdding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San Francisco
 
Using Search with a Database: Database Week SF
Using Search with a Database: Database Week SFUsing Search with a Database: Database Week SF
Using Search with a Database: Database Week SF
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Choose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day IsraelChoose the right DB for the Job - Builders Day Israel
Choose the right DB for the Job - Builders Day Israel
 
AWS의 새로운 언어, 음성, 텍스트 처리 인공지능 서비스::Vikram Anbazhagan::AWS Summit Seoul 2018
AWS의 새로운 언어, 음성, 텍스트 처리 인공지능 서비스::Vikram Anbazhagan::AWS Summit Seoul 2018AWS의 새로운 언어, 음성, 텍스트 처리 인공지능 서비스::Vikram Anbazhagan::AWS Summit Seoul 2018
AWS의 새로운 언어, 음성, 텍스트 처리 인공지능 서비스::Vikram Anbazhagan::AWS Summit Seoul 2018
 
Building the Organization of the Future: Leveraging AI & ML
Building the Organization of the Future: Leveraging AI & ML Building the Organization of the Future: Leveraging AI & ML
Building the Organization of the Future: Leveraging AI & ML
 
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
 

Más de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Searching your data with Amazon Elasticsearch Service Jon Handler Principal Solutions Architect AWS A N T 3 8 4 Alolita Sharma Principal Technologist AWS
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Elasticsearch – popular, open-source database engine Open source Fast time to value Easy ingestion Easy visualization High performance and distributed Best analytics and search
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Built for search and analysis Natural language Boolean queries Relevance Text search High-volume ingest Near real time Distributed storage Streaming Time-based visualizations Nestable statistics Time series tools Analysis 0010110100101110001 0111100110000001100 0100110010001100110 0110100001101010011
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What does it do? Application DataServer, application, network, AWS, and other logs Amazon Elasticsearch Service Domain with index(es) Application users, analysts, DevOps, security
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon ES is a fully managed service that makes it easy to deploy, manage, and scale Elasticsearch and Kibana Amazon Elasticsearch Service (Amazon ES)
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tightly Integrated with Other AWS Services Seamless data ingestion, security, auditing and orchestration Benefits of Amazon ES Supports Open-Source APIs and Tools Drop-in replacement with no need to learn new APIs or skills Easy to Use Deploy a production-ready Elasticsearch cluster in minutes Scalable Resize your cluster with a few clicks or a single API call Secure Deploy into your VPC and restrict access using security groups and IAM policies Highly Available Replicate across Availability Zones, with monitoring and automated self-healing
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Software & Internet Financial ServicesEducation Technology Biotech and Pharma Media and Entertainment Social Media Telecommunications Travel & Transportation Real Estate Logistics & Operations Publishing Other Amazon ES customers
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Foundations
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why do you need search? Your customers want to find the products that you want them to buy Your organization needs to share information across departments You need a ride to the airport Your developers need to know what's going wrong with the new feature Your website is down, it's 3:00 a.m., and you need to diagnose and correct the problem Search enables locating a specific piece of information, relevant to a goal
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. You create documents that are the source of indexing and the targets of search • You put your data in fields (analogy: database columns) • You send documents to an Elasticsearch index • Elasticsearch indexes data for each field, according to a mapping • Fields can have free text, numerics, geo locations, dates, and more • Your query specifies the fields to search and the values to look for and retrieves a sorted list of IDs ID Field: value Field: value Field: value Field: value
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Documents are JSON { "id" : "tt0371746", "title" : "Iron Man", "release_date" : "2008-04-14T00:00:00Z", "actors" : [ "Robert Downey Jr.", "Gwyneth Paltrow", "Terrence Howard" ], "directors" : [ "Jon Favreau" ], "rating" : 7.9, "rank" : 171, "running_time_secs" : 7560, "genres" : [ "Action", "Adventure", "Sci-Fi" ], "plot" : "When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil.", }
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Search APIs unlock the information • Your application presents your users with a frame that lets them express their goal – e.g., find a movie I want to watch • It translates that goal to an Elasticsearch query • Elasticsearch returns relevant results ID Field: value Field: value Field: value Field: value ID Field: value Field: value Field: value Field: value ID Field: value Field: value Field: value Field: value
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Writing structured queries
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different ways to query data Query string queries (URI/Kibana) Query DSL • Supports fielded string matching for document terms • Lucene query syntax: AND, OR, NOT, +/-/*, etc. • Most useful for Kibana search
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Structure of a Query DSL query { "query": { ... }, "aggs" : { ... }, "size": 10, "from": 0, "sort": { ... }, "_source": [...], "highlight": { ... }, "explain": true, ... } • Query – all of the query clauses • Aggs – Aggregations • Size and from – pagination • Sort – field-based sorting • _source – control return fields • Highlight – fields to highlight • Explain – show scoring info
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Query DSL for querying data GET endpoint/index/_search { "query": { "simple_query_string": { "query": "Iron Man", "fields": ["title"] } } }
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Search results • Results are JSON documents – retrieve any or all fields • Overview • Hits
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Full-text search ("match") Blocks of natural language text are decomposed and processed to support matching Queries are textual and match any document that contains them A scoring function produces a measure of relevance for sorting results Kinds of queries (the "query" part of Query DSL) Exact matching ("terms") Full-value matching for shorter strings, numeric values, locations, and dates Used for filtering full-text matches Overlaps with database functionality
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Queries vs. Filters Query Filter How well a document matches a criterion? Calculates a _score No caching Does this document match this criterion? Answer is binary – yes or no No score is calculated Filters are cached
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Terms queries term/terms Matches one term or a list of terms range Matches a numeric range exists Matches documents with a non-null value for a field prefix Matches docs with terms that match the prefix wildcard Supports ? and * wildcard matching regex Matches a regular expression fuzzy Matches terms within a specified edit-distance type/id Matches all docs of a type or docs of a type/id pair
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Titles Iron Man 2 Iron Man 3 Sky Captain and the World of Tomorrow Compound queries across multiple fields
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Text search
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Text Analysis • Decomposes the text into a set of terms • Applies natural language rules • When querying a text field, apply the same analysis Input Text Analyzers Searchable Index Query
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The standard analyzer Standard tokenizer – Unicode segmenter Standard token filter – noop Lowercase token filter – lower case Stop token filter – disabled
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Standard tokenizer – Unicode segmenter Standard token filter – noop Lowercase token filter – lower case Stop token filter – disabled The standard analyzer when wealthy industrialist tony stark is forced to build an armored suit after a life threatening incident he ultimately decides to use its technology to fight against evil `
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Improve matching - stemming when wealthi industrialist toni stark is forc to build an armored suit after a life threaten incid, he ultim decid to us it technolog to fight against evil.
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Improve matching - stopwords when wealthi industrialist toni stark is forc to build an armored suit after a life threaten incid, he ultim decid to us it technolog to fight against evil.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Match query clauses match_all Match all documents match Match words match_phrase Match an exact phrase or based on proximity match_phrase_prefix Match the beginning of a phrase multi_match Match against more than one field common_terms Allow stopwords to have some influence on scoring query_string Use Lucene query syntax simple_query_string Simpler, more robust version of the query_string syntax
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Text queries Plot Big budget remake of the classic cartoon about a creature intent on stealing Christmas. A scientist in a surrealist society kidnaps children to steal their dreams, hoping that they slow his aging process. The Hughes cottage vacation is violently interrupted by a family on a murderous and identity stealing journey, in search of the "perfect" life. Comic caper movie about a plan to steal a gold shipment from the streets of Turin by creating a traffic jam.
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Aggregations
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Aggregations
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Query structure for aggregations { "query": { ... }, "aggs": { "NAME": { "terms": { "field": "NAME", "size": 10 } } } } Aggregations have a name Terms – the type of the aggregation Field – use keyword fields to keep cardinality low
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Aggregation results • Results follow the "hits" section of the search result • Buckets and counts translate directly to the UI • Send the "key" as a filter to narrow search results "aggregations": { "actors_agg": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 10, "buckets": [ { "key": "Gwyneth Paltrow", "doc_count": 3 }, { "key": "Robert Downey Jr.", "doc_count": 3 }, { "key": "Christopher Kirby", "doc_count": 1 }, { "key": "Cung Le", "doc_count": 1 }, ...
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Aggregations nest • The query matches title with "iron" • Top-level bucket is actors • Each actor bucket is further divided by directors "aggregations": { "actors_agg": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 10, "buckets": [ { "key": "Gwyneth Paltrow", "doc_count": 3, "directors_agg": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Jon Favreau", "doc_count": 2 }, { "key": "Shane Black", "doc_count": 1 } ] } } ...
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Faceted drill down: add aggregation results as query filters
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Adjusting score
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scoring text queries: Okapi BM25 • Also known as “tf-idf” Term Frequency – count of occurrences in this document Inverse Document Frequency – (1 / occurrences across the corpus) • Additional options such as: Field value based ranking Via rank functions that include document information
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Structure of the function_score query { "query": { "function_score": { "query": { ... }, "script_score": [ "script": { "inline: "..." } ] } } } • Use the _score as part of the function • Use built-in functions like gauss, linear, exponential • Works on numbers, dates, and geo points
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results for querying for ‘love’ “title” _score Love 5.718655 Love Sick Love 4.5949717 Love Actually 4.5949717 Stuck in Love 4.5949717 The Lovely Bones 4.5949717 Shakespeare in Love 4.5949717 To Rome with Love 4.5949717 The Look of Love 4.5949717 Love and Honor 4.5949717 The Loved Ones 4.5949717
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Add in rating and freshness
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Revised scores Title _score Date Rating Orig _score Love Actually 47.676254 20030907 7.7 4.5949717 Shakespeare in Love 44.878765 19981203 7.2 4.5949717 Stuck in Love 44.278923 20120909 7 4.5949717 Love 42.763493 20110202 5.5 5.718655 Love & Basketball 42.64078 20000126 6.8 4.5949717 Love Story 42.08128 19701216 6.7 4.5949717 The Lovely Bones 41.54153 20091124 6.6 4.5949717 The Loved Ones 41.536766 20090913 6.6 4.5949717
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Geo points and shapes • Store geo hashes, points, polygons • Bounding box filtering • Polygon filtering • Distance sorting
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Discussion
  • 45. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Jon Handler handler@amazon.com Alolita Sharma alolitas@amazon.com
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.