4. • Vision, motivations and definitions
• Use cases for ecommerce, compliance, fraud and customer support
• Fusion and the evolution of recommenders
• Demo
• Future Directions
Agenda
6. • Many companies treat search, recommendations/discovery and analytics as different
beasts, yet:
• The same inputs that make search better can also drive recommendations and better
analytics
• Engagement analytics is the key:
• Your users give you engagement signals regarding the content that is relevant to them
• Over time, patterns emerge in similarities of behavior (simplest possible pattern is just
“popularity”)
• These signals are often the biggest factor in both search relevance AND
recommendations
• In the enterprise, this is still the case, but the types of signals are often different (email,
IM)
Three Sides of the Same Coin
7. • Content — documents which are textually similar are often good as “similar items” to be
recommended
• Collaborative — documents which have been engaged with by the same people (and/or in the
same search context) are also similar in a more subtle, but often more powerful way
• Multi-Modal — why choose one? Try a smooth interpolation between using a content-based
similarity metric, and an engagement based one!
Defining Moments
8. Search-Driven Online Retail
Increase conversions with a
personalized shopping experience with
best in class reliability and
performance.
CATALOG
DYNAMIC NAVIGATION
AND LANDING PAGES
INSTANT INSIGHTS AND
ANALYTICS
PERSONALIZED
SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Data Processing
Smart Access API
9. Search-Driven Compliance and Surveillance
Detect and investigate activity for
regulatory compliance, from one
unified view.
DATABASE
ACCURATE REAL-TIME
INFORMATION
CONTEXTUALLY-
ENRICHED
INFORMATION
MESSAGESLOGS
DATA EXPLORATION
AND VISUALIZATION
Data Acquisition
Indexing & Streaming
Smart Access API
10. Search-Driven Customer Service
Resolve customer issues quickly with
immediate access to relevant answers.
CUSTOMER
SELF-SERVICE
KNOWLEDGE BASE
PROACTIVE ALERTS AND
RECOMMENDATIONS
EXPERT TUNED
RELEVANCY DRIVEN BY
ANALYTICS AND INSIGHTS
CRM SUPPORT TICKETS &
ISSUE TRACKING
Data Acquisition
Data Processing
Smart Access API
12. Lucidworks Fusion Is Search-Driven Everything
•Drive next generation relevance
via Content, Collaboration and
Context
•Harness best in class Open
Source: Apache Solr + Spark
•Simplify application
development and reduce
ongoing maintenance
CATALOG
DYNAMIC NAVIGATION
AND LANDING PAGES
INSTANT INSIGHTS AND
ANALYTICS
PERSONALIZED
SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations &
Alerts
Analytics & InsightsExtreme Relevancy
Access data from
anywhere to build
intelligent, data-
driven applications.
14. • Fusion
• Recommenders API
• Machine Learning pipeline stages
• Scheduling
• Solr:
• More Like This + Signals
• Spark:
• MLlib, Mahout, custom
Key Platform Tech
15. • Solr comes built-in with a query parser, MoreLikeThis, which takes a given document, and:
• Extracts nontrivial terms from specified fields in it
• Builds an “OR” query to search for closest matches (like a cosine similarity computation)
• Has many knobs to tune regarding “data-cleaning” non-useful terms from the query
• TF-IDF is great, but there are other metrics possible: LSI, LDA, W2V
Content-focused
{!mlt qf=body,suggest,subject,title mintf=2 mindf=5 minwl=3}<DOC_ID>
16. “People who bought X also bought Y” / “Movies recommended for you”
Collaborative Filtering
Search User/
Item Index
Top K users
who’ve
interacted with
this Item
Search and
Rollup on User/
Item Index
Top Y docs
Current Doc
Filter by
context
Profit
User/Item Index
Offline Tasks
User/Item Signals
Math!
17. • Fusion CF-based “documents like this” pipeline stages:
• Sub-query: search aggregated signals index for current doc_id,
extracting the top-K pairs of (user_id, weight)
• Sub-query: search that table again with a weighted OR query:
(user_id:user_id_1^weight_1 OR user_id:user_id_2^weight_2 OR … )
• Roll-up: topN(sum(score_i * weight_i))
• Sub-query: fetch the documents from primary Solr index of
these top N doc_ids
Collaborative Filtering: step by step in Fusion
18. • Both content-based and CF recommenders use features of the documents to generate a
similarity metric
• Content uses the tokens in the document
• CF uses user ids who have engaged with it
• Metrics can be weighted-summed, allowing a “slider” between the two
• Fancy similarity techniques which can be done to a (doc, token) matrix can often be done on a
(doc, userId) matrix, or even a joint (doc, (token or userId)) concatenated matrix
• There is a cost to such techniques: harder to maintain, harder to A/B test variations
Multi-modal
19. • Basics:
• 26 Apache Projects registered so far plus LW web properties
• 93 datasources* including email, Github, JIRA*, Website and Wiki
• Fusion 2.4
• Signals everywhere
• UI based on Lucidworks View
• ASF Mail archives mirrored at: http://asfmail.lucidworks.io
Demo
http://searchhub.lucidworks.com