Más contenido relacionado La actualidad más candente (20) Similar a Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Invent 2018 (20) Más de Amazon Web Services (20) Searching Your Data with Amazon Elasticsearch Service (ANT384) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Searching your data with Amazon
Elasticsearch Service
Jon Handler
Principal Solutions Architect
AWS
A N T 3 8 4
Alolita Sharma
Principal Technologist
AWS
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Elasticsearch – popular, open-source database engine
Open source
Fast time to value
Easy ingestion
Easy visualization
High performance and distributed
Best analytics and search
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Built for search and analysis
Natural language
Boolean queries
Relevance
Text search
High-volume ingest
Near real time
Distributed storage
Streaming
Time-based visualizations
Nestable statistics
Time series tools
Analysis
0010110100101110001
0111100110000001100
0100110010001100110
0110100001101010011
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What does it do?
Application DataServer, application,
network, AWS, and
other logs
Amazon Elasticsearch Service Domain
with index(es) Application users, analysts, DevOps,
security
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon ES is a fully managed
service that makes it easy to
deploy, manage, and scale
Elasticsearch and Kibana
Amazon Elasticsearch Service (Amazon ES)
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tightly Integrated with
Other AWS Services
Seamless data ingestion, security,
auditing and orchestration
Benefits of Amazon ES
Supports Open-Source
APIs and Tools
Drop-in replacement with no need
to learn new APIs or skills
Easy to Use
Deploy a production-ready
Elasticsearch cluster in minutes
Scalable
Resize your cluster with a few
clicks or a single API call
Secure
Deploy into your VPC and restrict
access using security groups and
IAM policies
Highly Available
Replicate across Availability
Zones, with monitoring and
automated self-healing
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Software & Internet Financial ServicesEducation Technology Biotech and Pharma
Media and Entertainment Social Media Telecommunications Travel & Transportation
Real Estate Logistics & Operations Publishing Other
Amazon ES customers
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Foundations
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why do you need search?
Your customers want to find the products that you want them to buy
Your organization needs to share information across departments
You need a ride to the airport
Your developers need to know what's going wrong with the new feature
Your website is down, it's 3:00 a.m., and you need to diagnose and correct
the problem
Search enables locating a specific piece of
information, relevant to a goal
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
You create documents that are the source of indexing
and the targets of search
• You put your data in fields (analogy:
database columns)
• You send documents to an
Elasticsearch index
• Elasticsearch indexes data for each
field, according to a mapping
• Fields can have free text, numerics,
geo locations, dates, and more
• Your query specifies the fields to
search and the values to look for
and retrieves a sorted list of IDs
ID
Field: value
Field: value
Field: value
Field: value
12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Documents are JSON
{
"id" : "tt0371746",
"title" : "Iron Man",
"release_date" : "2008-04-14T00:00:00Z",
"actors" : [
"Robert Downey Jr.",
"Gwyneth Paltrow",
"Terrence Howard"
],
"directors" : [
"Jon Favreau"
],
"rating" : 7.9,
"rank" : 171,
"running_time_secs" : 7560,
"genres" : [
"Action",
"Adventure",
"Sci-Fi"
],
"plot" : "When wealthy industrialist Tony
Stark is forced to build an armored suit after a
life-threatening incident, he ultimately decides
to use its technology to fight against evil.",
}
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Search APIs unlock the information
• Your application presents your users with a frame that lets them
express their goal – e.g., find a movie I want to watch
• It translates that goal to an Elasticsearch query
• Elasticsearch returns relevant results
ID
Field: value
Field: value
Field: value
Field: value
ID
Field: value
Field: value
Field: value
Field: value
ID
Field: value
Field: value
Field: value
Field: value
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Writing structured
queries
15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different ways to query data
Query string queries
(URI/Kibana)
Query DSL
• Supports fielded string
matching for document
terms
• Lucene query syntax: AND,
OR, NOT, +/-/*, etc.
• Most useful for Kibana search
16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Structure of a Query DSL query
{
"query": { ... },
"aggs" : { ... },
"size": 10,
"from": 0,
"sort": { ... },
"_source": [...],
"highlight": { ... },
"explain": true,
...
}
• Query – all of the query
clauses
• Aggs – Aggregations
• Size and from – pagination
• Sort – field-based sorting
• _source – control return
fields
• Highlight – fields to highlight
• Explain – show scoring info
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Query DSL for querying data
GET endpoint/index/_search
{
"query": {
"simple_query_string": {
"query": "Iron Man",
"fields": ["title"]
}
}
}
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Search results
• Results are JSON
documents – retrieve any
or all fields
• Overview
• Hits
19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Full-text search ("match")
Blocks of natural language text
are decomposed and processed
to support matching
Queries are textual and match
any document that contains
them
A scoring function produces a
measure of relevance for sorting
results
Kinds of queries (the "query" part of Query DSL)
Exact matching ("terms")
Full-value matching for shorter
strings, numeric values, locations,
and dates
Used for filtering full-text
matches
Overlaps with database
functionality
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Queries vs. Filters
Query Filter
How well a document matches a
criterion?
Calculates a _score
No caching
Does this document match this
criterion?
Answer is binary – yes or no
No score is calculated
Filters are cached
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Terms queries
term/terms Matches one term or a list of terms
range Matches a numeric range
exists Matches documents with a non-null value for a field
prefix Matches docs with terms that match the prefix
wildcard Supports ? and * wildcard matching
regex Matches a regular expression
fuzzy Matches terms within a specified edit-distance
type/id Matches all docs of a type or docs of a type/id pair
22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Titles
Iron Man 2
Iron Man 3
Sky Captain and the World of
Tomorrow
Compound queries across multiple fields
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Text search
24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Text Analysis
• Decomposes the text into a set of terms
• Applies natural language rules
• When querying a text field, apply the same analysis
Input Text Analyzers
Searchable
Index
Query
25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The standard analyzer
Standard tokenizer – Unicode segmenter
Standard token filter – noop
Lowercase token filter – lower case
Stop token filter – disabled
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Standard tokenizer – Unicode segmenter
Standard token filter – noop
Lowercase token filter – lower case
Stop token filter – disabled
The standard analyzer
when wealthy industrialist tony
stark is forced to build an
armored suit after a life
threatening incident he
ultimately decides to use its
technology to fight against evil
`
27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Improve matching - stemming
when wealthi
industrialist toni
stark is forc to build
an armored suit after a
life threaten incid, he
ultim decid to us it
technolog to fight
against evil.
28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Improve matching - stopwords
when wealthi
industrialist toni
stark is forc to build
an armored suit after a
life threaten incid, he
ultim decid to us it
technolog to fight
against evil.
29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Match query clauses
match_all Match all documents
match Match words
match_phrase Match an exact phrase or based on proximity
match_phrase_prefix Match the beginning of a phrase
multi_match Match against more than one field
common_terms Allow stopwords to have some influence on scoring
query_string Use Lucene query syntax
simple_query_string Simpler, more robust version of the query_string syntax
30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Text queries
Plot
Big budget remake of the classic cartoon
about a creature intent on stealing Christmas.
A scientist in a surrealist society kidnaps
children to steal their dreams, hoping that
they slow his aging process.
The Hughes cottage vacation is violently
interrupted by a family on a murderous and
identity stealing journey, in search of the
"perfect" life.
Comic caper movie about a plan to steal a
gold shipment from the streets of Turin by
creating a traffic jam.
31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aggregations
32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aggregations
33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Query structure for aggregations
{
"query": { ... },
"aggs": {
"NAME": {
"terms": {
"field": "NAME",
"size": 10
}
}
}
}
Aggregations have a name
Terms – the type of the
aggregation
Field – use keyword fields to
keep cardinality low
34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aggregation results
• Results follow the "hits"
section of the search result
• Buckets and counts translate
directly to the UI
• Send the "key" as a filter to
narrow search results
"aggregations": {
"actors_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 10,
"buckets": [
{
"key": "Gwyneth Paltrow",
"doc_count": 3
},
{
"key": "Robert Downey Jr.",
"doc_count": 3
},
{
"key": "Christopher Kirby",
"doc_count": 1
},
{
"key": "Cung Le",
"doc_count": 1
}, ...
35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aggregations nest
• The query matches title with
"iron"
• Top-level bucket is actors
• Each actor bucket is further
divided by directors
"aggregations": {
"actors_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 10,
"buckets": [
{
"key": "Gwyneth Paltrow",
"doc_count": 3,
"directors_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Jon Favreau",
"doc_count": 2
},
{
"key": "Shane Black",
"doc_count": 1
}
]
}
} ...
36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Faceted drill down: add aggregation results as query
filters
37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adjusting score
38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scoring text queries: Okapi BM25
• Also known as “tf-idf”
Term Frequency – count of occurrences in this document
Inverse Document Frequency – (1 / occurrences across the corpus)
• Additional options such as:
Field value based ranking
Via rank functions that include document information
39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Structure of the function_score query
{
"query": {
"function_score": {
"query": { ... },
"script_score": [
"script": {
"inline: "..."
}
]
} } }
• Use the _score as part of the
function
• Use built-in functions like
gauss, linear, exponential
• Works on numbers, dates,
and geo points
40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results for querying for ‘love’
“title” _score
Love 5.718655
Love Sick Love 4.5949717
Love Actually 4.5949717
Stuck in Love 4.5949717
The Lovely Bones 4.5949717
Shakespeare in Love 4.5949717
To Rome with Love 4.5949717
The Look of Love 4.5949717
Love and Honor 4.5949717
The Loved Ones 4.5949717
41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Add in rating and freshness
42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Revised scores
Title _score Date Rating Orig _score
Love Actually 47.676254 20030907 7.7 4.5949717
Shakespeare in Love 44.878765 19981203 7.2 4.5949717
Stuck in Love 44.278923 20120909 7 4.5949717
Love 42.763493 20110202 5.5 5.718655
Love & Basketball 42.64078 20000126 6.8 4.5949717
Love Story 42.08128 19701216 6.7 4.5949717
The Lovely Bones 41.54153 20091124 6.6 4.5949717
The Loved Ones 41.536766 20090913 6.6 4.5949717
43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Geo points and shapes
• Store geo hashes, points, polygons
• Bounding box filtering
• Polygon filtering
• Distance sorting
44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Discussion
45. Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Jon Handler
handler@amazon.com
Alolita Sharma
alolitas@amazon.com
46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.