Session SV302 from re:Invent 2013
Today's applications work across many different data assets - documents stored in Amazon S3, metadata stored in NoSQL data stores, catalogs and orders stored in relational database systems, raw files in filesystems, etc. Building a great search experience across all these disparate datasets and contexts can be daunting. Amazon CloudSearch provides simple, low-cost search, enabling your users to find the information they are looking for. In this session, we will show you how to integrate search with your application, including key areas such as data preparation, domain creation and configuration, data upload, integration of search UI, search performance and relevance tuning. We will cover search applications that are deployed for both desktop and mobile devices. Peter Simpkin from Elsevier provides a summary of their use of CloudSearch.
6. Hands-Off Operation
Document Quantity and Size
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
Index Partition 1
Copy 1
Search
Request
Volume and
Complexity
Index Partition 1
Copy 2
Index Partition 1
Copy n
Index Partition 2
Copy 1
Index Partition 2
Copy 2
Index Partition 2
Copy n
Index Partition n
Copy 1
Index Partition n
Copy 2
Index Partition n
Copy n
9. Mobile Experience
Cancel
Iron Man!
Iron Man
Done
Iron Man 3 (2013)!
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution. !
Iron Man 2 (2010)!
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...!
Iron Man (2008)!
!
When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil. !
The Man With The Iron Fists (2012) !
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...!
Movies
Search
Social
Nearby
Account
Movies
Search
Social
Nearby
Account
10. Agenda
•
•
•
•
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
• Developer example:
Peter Simpkin, Elsevier Oxford
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
11. CloudSearch Documents
• Unique identifier
• Version
• Fields
– Indexed according to configuration
– Source of matches
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
13. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
Bootstrap Strategy
Amazon
CloudSearch
Amazon EC2
Amazon SQS
Source
System
Processing
Script
Amazon EC2
Queuing Batching
14. Document Construction
• One source will be the master
for
each
record
determine
doc
id
and
version
create
fields
for
each
auxiliary
source
gather
additional
data
send
or
queue
the
document
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
16. S3
• Clips, images, reviews
• Apache Tika to extract content
• S3 Metadata for additional fields
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
17. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
Dynamo DB
DynamoDB
CloudSearch
Table
Item
Domain
Attribute
Attribute
Attribute
Attribute
Field
Field
Field
Field
Document
18. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
Cancel
Iron Man!
Iron Man
Done
Iron Man 3 (2013)!
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution. !
Iron Man 2 (2010)!
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...!
Iron Man (2008)!
!
When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil. !
The Man With The Iron Fists (2012) !
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...!
Movies
Search
Social
Nearby
Account
Movies
Search
Social
Nearby
Account
19. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
Searching Show Times
id title
description t_name
t_street date
time
1
Iron
Man
...
Galaxy
Main
11/11 12:30pm
2
Iron
Man
...
Galaxy
Main
11/11 1:15pm
3
Iron
Man
...
Galaxy
Main
11/11 2:45pm
4
Iron
Man
...
Galaxy
Main
11/11 6:00pm
20. Heterogenous Data
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
21. Multi Domain
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
22. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
Updating CloudSearch
Update Processor
Web Server
Users
Amazon EC2
Amazon SQS
Amazon EC2
DynamoDB
Amazon RDS
Amazon
CloudSearch
Amazon S3
23. Section Summary
• Multiple sources
• Bootstrap / Update
• Heterogeneous data
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
24. Agenda
•
•
•
•
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
• Developer example:
Peter Simpkin, Elsevier Oxford
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
25. Cancel
Iron Man!
Iron Man 3 (2013)!
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution. !
Iron Man 2 (2010)!
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...!
Good Matches
Iron Man (2008)!
When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil. !
The Man With The Iron Fists (2012) !
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...!
Movies
Search
Social
Nearby
Account
26. The Search Algorithm
• Locate documents that satisfy Boolean
constraints
– Usually intersection
• Relevance rank those documents
– Differentiates from databases by relevance
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
28. Configuring for Search
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
• Text fields for individual word search
– User-generated and external text – titles, descriptions
• Literal fields for exact matches
– Application-generated text like facets
• Integer fields for range searching and ranking
29. Searching Text
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
http(s)://<endpoint>/2011-02-01/search?
• Simple searches
– q=<text>
• Filtering
– bq= (or title:'iron' (and description:'iron' description:'man'))
• Filtering with integer ranges
– bq=(and 'iron man' year:..2010)
• Geo filtering
– bq=(and 'iron man' latitude:12700..12900 longitude:5700..5800)
30. Search Results
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
{"rank":
"-‐text_relevance",
"match-‐expr":
"(label
'iron
man')",
"hits":
{
"found":
204,
"start":
0,
"hit":
[
{
"id":
"sontsst12cf5f88b42"
},
{
"id":
"sopvopr12ab017f082"
},
{
"id":
"sorzrpw12ac468a13b"
},
]
},
...
}
31. Cancel
Iron Man!
Iron Man 3 (2013)!
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution. !
Iron Man 2 (2010)!
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...!
Relevant Results
Iron Man (2008)!
When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil. !
The Man With The Iron Fists (2012) !
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...!
Movies
Search
Social
Nearby
Account
32. Customizing Ranking
• text_relevance and cs.text_relevance
• Rank expressions
– Compute a score for each document
– &rank=<function>
• Defined in the console
• Defined at query-time
– &q='iron-man'&rank-recency=text_relevance + year
&rank=recency
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
34. Field Weighting
• Adjust relative importance of fields
• &rank-title=
cs.text_relevance({"weights":{"title":4.0},
"default_weight":1})
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
36. Popularity
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
• Convert floating point to integer
• Weight by the number of ranks
• rank-pop=text_relevance +
log10(user-rating * number-user-ranks) * 10 +
metascore * 3
38. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
Freshness
• Exponential decay function
r = ce
− λt
• &rank-decay=text_relevance +
200*Math.exp(-0.1*days_ago)
41. Location Sort
• Cartesian distance function
(lat − latuser )2 + (lon − lonuser )2
• &rank-geo=sqrt(pow(latitude - lat, 2) +
pow(longitude - lon), 2)
• &rank=-geo
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
42.
43. Rank Expressions: Combined
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
• &rank-combined=text_relevance + 2.0 * geo +
0.5 * popularity + 0.3 * freshness
• &rank=combined
44. Section Summary
• Search API basics
• Customizing ranking
– Field weighting, popularity, freshness, GEO, combined
• Rank expression comparison tool
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
45. Agenda
•
•
•
•
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
• Developer example:
Peter Simpkin, Elsevier Oxford
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
53. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
Document
Movie
title
description
oscar1
oscar2
oscar3
•
•
•
•
•
title: Lincoln
description: ...
oscar1: Awards
oscar2: Awards/Best Actor
oscar3: Awards/Best Actor/
Daniel Day Lewis
55. Drilldown
•
•
•
•
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
bq=oscar1:'Awards'
bq=oscar2:'Awards/Best Actor'
bq=oscar3:'Awards/Best Actor/Daniel Day Lewis'
bq=(and 'star' oscar2:'Awards/Best Actor')
56. Section Summary
• Simple faceting
• Hierarchical faceting
• Hierarchical data handling
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
58. The Search Algorithm
• Locate documents that satisfy Boolean
constraints
– Usually intersection
• Relevance rank those documents
– Differentiates from databases by relevance
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
59. Performance Best Practices
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
• Match set size
• Text queries perform better than integer queries
• Complex relevance functions
60. Optimizing Index Size
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
• Trade off literal and uint for cost/performance
• Result fields matter most
• Enabling faceting increases size
61. Wrap Up
•
•
•
•
Sourcing documents from various locations
Building queries and ranking
UI Components for faceting
Getting the most out of your index
62. Agenda
•
•
•
•
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
• Developer example:
Peter Simpkin, Elsevier Oxford
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and Scale
Developer example
66. Content Challenges:
• No central place for consumers
to discover content
•
Is not currently possible to
search and retrieve atomic
assets
•
Assets are not reusable across
products
Content Systems
Consumer Platforms
67. Empower our product development partners
Search Opportunities:
• Create a comprehensive
inventory to discover easily
content Elsevier owns
•
Provide access to Granular /
Modular content they want at
will
•
Assets must be uniquely
addressable
Enterprise Content Search Engine
68. Enterprise Content Search eco-system
Amazon SWF
SDF metadata
E.U Corporate Data center
Amazon S3
U.S Corporate Data center
Amazon
CloudSearch
DynamoDB
Federated Content Warehouse
Product Platform Data center
70. Elsevier Technical Drivers & Approach
• Fully-managed, full featured search service in
the cloud
• Automatically scales for data & traffic
• Easy to set up and use
• PoC created in days
• Search Engine as a Service
• Pay-as-you-go pricing model
73. Optimised Nested Query
((not action:'D')
(or (and issn:'0022-1694' and type‘1.2'
and pubstartdate:..2013176 pubenddate:2005002..)
(and issn:'0022-1694' and type‘1.2'
and pubstartdate:2005001 pubstarttime:0..235959)
(and issn:'0022-1694' and type‘1.2'
and pubstartdate:2013177 pubstarttime:0..235959)
(and issn:'0022-1694' and type‘1.2'
and pubenddate:2005001 pubendtime:0..)
(and issn:'0022-1694' and type‘1.2'
and pubenddate:2013177 pubendtime:0..235959)))
• Response Time = 0.17ms
74. CloudSearch Observations
facilitate knowledge sharing on content matters
across Elsevier’s product platforms
ability to leverage content infrastructure and
capabilities across Elsevier’s divisions
easy to integrate with existing on-premise
Content Systems
speed to market, allows developers to focus
building other core Content Strategy components
need to spend time optimising queries to
maximise performance
75. Please give us your feedback on this
presentation
SVC302
As a thank you, we will select prize
winners daily for completed surveys!
Thank You