This document discusses using Amazon CloudSearch to provide search capabilities across different parts of an application. It covers sourcing documents from multiple data sources, customizing retrieval and ranking, building search user interfaces with facets, and optimizing for performance and scale. The agenda includes sourcing documents, retrieval and ranking, search user interfaces, and performance and scale. It aims to help developers implement search using CloudSearch.
6. Hands-Off Operation
Document Quantity and Size
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
SEARCH INSTANCE
Index Partition 1
Copy 1
Search
Request
Volume and
Complexity
Index Partition 1
Copy 2
Index Partition 1
Copy n
Index Partition 2
Copy 1
Index Partition 2
Copy 2
Index Partition 2
Copy n
Index Partition n
Copy 1
Index Partition n
Copy 2
Index Partition n
Copy n
9. Mobile Experience
Iron Man
Cancel
Iron Man
Done
Iron Man 3 (2013)
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution.
Iron Man 2 (2010)
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...
Iron Man (2008)
When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil.
The Man With The Iron Fists (2012)
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...
Movies
Search
Social
Nearby
Account
Movies
Search
Social
Nearby
Account
10. Agenda
•
•
•
•
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
• Developer example:
Peter Simpkin, Solution Architect, Elsevier
11. Amazon CloudSearch Documents
• Unique identifier
• Version
• Fields
– Indexed according to configuration
– Source of matches
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
13. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Bootstrap Strategy
Amazon
CloudSearch
Amazon EC2
Amazon SQS
Source
System
Processing
Script
Amazon EC2
Queuing Batching
14. Document Construction
• One source will be the master
for each record
determine doc id and version
create fields
for each auxiliary source
gather additional data
send or queue the document
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
16. Amazon S3
• Clips, images, reviews
• Apache Tika to extract content
• Amazon S3 Metadata for additional fields
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
17. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Amazon DynamoDB
DynamoDB
CloudSearch
Table
Item
Domain
Attribute
Field
Attribute
Field
Attribute
Field
Attribute
Field
Document
18. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Iron Man
Cancel
Iron Man
Done
Iron Man 3 (2013)
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution.
Iron Man 2 (2010)
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...
Iron Man (2008)
When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil.
The Man With The Iron Fists (2012)
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...
Movies
Search
Social
Nearby
Account
Movies
Search
Social
Nearby
Account
19. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Searching Show Times
id title
description t_name
t_street date
time
1
Iron
Man
...
Galaxy
Main
11/1
1
12:30pm
2
Iron
Man
...
Galaxy
Main
11/1
1
1:15pm
3
Iron
Man
...
Galaxy
Main
11/1
1
2:45pm
4
Iron
Man
...
Galaxy
Main
11/1
1
6:00pm
21. Multi Domain
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
22. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Updating CloudSearch
Update Processor
Web Server
Users
Amazon EC2
Amazon SQS
Amazon EC2
Amazon
DynamoDB
Amazon RDS
Amazon
CloudSearch
Amazon S3
23. Section Summary
• Multiple sources
• Bootstrap / Update
• Heterogeneous data
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
24. Agenda
•
•
•
•
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
• Developer example:
Peter Simpkin, Solution Architect, Elsevier
25. Iron Man
Cancel
Iron Man 3 (2013)
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution.
Iron Man 2 (2010)
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...
Correct Matches
Iron Man (2008)
When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil.
The Man With The Iron Fists (2012)
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...
Movies
Search
Social
Nearby
Account
26. The Search Algorithm
• Locate documents that satisfy Boolean
constraints
– Usually intersection
• Relevance rank those documents
– Differentiated from databases by relevance
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
28. Configuring for Search
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
• Text fields for individual word search
– User-generated and external text – titles, descriptions
• Literal fields for exact matches
– Application-generated text like facets
• Integer fields for range searching and ranking
29. Searching Text
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
http(s)://<endpoint>/2011-02-01/search?
• Simple searches
– q=<text>
• Filtering
– bq= (and title:'iron man' genre:'Action')
• Filtering with integer ranges
– bq=(and 'iron man' year:..2010)
• Geo filtering
– bq=(and 'iron man' latitude:12700..12900 longitude:5700..5800)
30. Search Results
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
{"rank": "-text_relevance",
"match-expr": "(label 'iron man')",
"hits": { "found": 204, "start": 0,
"hit": [ { "id": "sontsst12cf5f88b42" },
{ "id": "sopvopr12ab017f082" },
{ "id": "sorzrpw12ac468a13b" },
] },
...
}
31. Iron Man
Cancel
Iron Man 3 (2013)
When Tony Stark's world is torn apart by a
formidable terrorist called the Mandarin, he
starts an odyssey of rebuilding and retribution.
Iron Man 2 (2010)
Tony Stark has declared himself Iron Man and
installed world peace... or so he thinks. He soon
realizes that not only is there a mad man...
Relevant Results
Iron Man (2008)
When wealthy industrialist Tony Stark is forced
to build an armored suit after a life-threatening
incident, he ultimately decides to use its
technology to fight against evil.
The Man With The Iron Fists (2012)
On the hunt for a fabled treasure of gold, a band
of warriors, assassins, and a rogue British soldier
descend upon a village in feudal China, where a
humble blacksmith...
Movies
Search
Social
Nearby
Account
32. Customizing Ranking
• text_relevance and cs.text_relevance
• Rank expressions
– Compute a score for each document
– &rank=<function>
• Defined in the console
• Defined at query-time
– &q='iron-man'&rank-recency=text_relevance + year
&rank=recency
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
34. Field Weighting
• Adjust relative importance of fields
• &rank-title=
cs.text_relevance({"weights":{"title":4.0},
"default_weight":1})
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
36. Popularity
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
• Convert floating point to integer
• Weight by the number of ranks
• rank-pop=text_relevance +
(user-rating - 2) * log10(number-user-ranks) * 10
+ metascore * 3
38. Freshness
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
• Exponential decay function
r = ce-lt
• &rank-decay=text_relevance + 200*Math.exp(0.1*days_ago)
41. Location Sort
• Cartesian distance function
(lat - latuser )2 + (lon - lonuser )2
• &rank-geo=sqrt(pow(latitude - lat, 2) +
pow(longitude - lon, 2)
• &rank=-geo
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
42.
43. Rank Expressions: Combined
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
• &rank-combined=text_relevance + 2.0 * geo +
0.5 * popularity + 0.3 * freshness
• &rank=combined
44. Section Summary
• Search API basics
• Customizing ranking
– Field weighting, popularity, freshness, GEO, combined
• Rank expression comparison tool
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
45. Agenda
•
•
•
•
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
• Developer example:
Peter Simpkin, Solution Architect, Elsevier
53. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Document
Movie
title
description
oscar1
oscar2
oscar3
•
•
•
•
•
title: Lincoln
description: ...
oscar1: Awards
oscar2: Awards/Best Actor
oscar3: Awards/Best
Actor/Daniel Day Lewis
55. Drilldown
•
•
•
•
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
bq=oscar1:'Awards'
bq=oscar2:'Awards/Best Actor'
bq=oscar3:'Awards/Best Actor/Daniel Day Lewis'
bq=(and 'star' oscar2:'Awards/Best Actor')
56. Section Summary
• Simple faceting
• Hierarchical faceting
• Hierarchical data handling
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
58. The Search Algorithm
• Locate documents that satisfy Boolean
constraints
– Usually intersection
• Relevance rank those documents
– Differentiated from databases by relevance
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
59. Performance Best Practices
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
• Match set size
• Text queries perform better than integer queries
• Complex relevance functions
60. Optimizing Index Size
Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
• Trade off literal and uint for cost/performance
• Result fields matter most
• Enabling faceting increases size
61. Wrap Up
•
•
•
•
Sourcing documents from various locations
Building queries and ranking
UI Components for faceting
Getting the most out of your index
62. Sourcing your documents
Retrieval and ranking
Search user interface
Performance and scale
Developer example
Peter Simpkin
Solution Architect, Elsevier
66. Content Challenges:
• No central place for consumers
to discover content
•
Is not currently possible to
search and retrieve atomic
assets
•
Assets are not reusable across
products
Content Systems
Consumer Platforms
67. Empower our product development partners
Search Opportunities:
• Create a comprehensive
inventory to discover easily
content Elsevier owns
•
Provide access to Granular /
Modular content they want at
will
•
Assets must be uniquely
addressable
Enterprise Content Search Engine
68. Enterprise Content Search eco-system
Amazon SWF
SDF metadata
E.U Corporate Data center
Amazon S3
U.S Corporate Data center
Amazon
CloudSearch
Amazon
DynamoDB
Federated Content Warehouse
Product Platform Data center
70. Elsevier Technical Drivers & Approach
• Fully-managed, full featured search service in
the cloud
• Automatically scales for data & traffic
• Easy to set up and use
• PoC created in days
• Search engine as a service
• Pay-as-you-go pricing model
73. Optimised Nested Query
((not action:'D')
(or (and issn:'0022-1694' and type‘1.2'
and pubstartdate:..2013176 pubenddate:2005002..)
(and issn:'0022-1694' and type‘1.2'
and pubstartdate:2005001 pubstarttime:0..235959)
(and issn:'0022-1694' and type‘1.2'
and pubstartdate:2013177 pubstarttime:0..235959)
(and issn:'0022-1694' and type‘1.2'
and pubenddate:2005001 pubendtime:0..)
(and issn:'0022-1694' and type‘1.2'
and pubenddate:2013177 pubendtime:0..235959)))
• Response Time = 0.17ms
74. Amazon CloudSearch Observations
facilitates knowledge sharing on content
matters across Elsevier’s product platforms
ability to leverage content infrastructure and
capabilities across Elsevier’s divisions
easy to integrate with existing on-premise
content systems
speed to market, allows developers to focus
building other core content strategy components
need to spend time optimising queries to
maximise performance