Webinar: Search and Recommenders

2016
OCTOBER 11-14 
BOSTON, MA
http://lucenerevolution.com

Search and
Recommenders
Grant Ingersoll
@gsingers
CTO, Lucidworks
Jake Mannix
@pbrane
Lead Data Engineer, Lucidworks

• Vision, motivations and deﬁnitions
• Use cases for ecommerce, compliance, fraud and customer support
• Fusion and the evolution of recommenders
• Demo
• Future Directions
Agenda

Search-Driven
Everything
Customer
Service
Customer
Insights
Fraud Surveillance
Research
Portal
Online Retail
Digital
Content

• Many companies treat search, recommendations/discovery and analytics as different
beasts, yet:
• The same inputs that make search better can also drive recommendations and better
analytics
• Engagement analytics is the key:
• Your users give you engagement signals regarding the content that is relevant to them
• Over time, patterns emerge in similarities of behavior (simplest possible pattern is just
“popularity”)
• These signals are often the biggest factor in both search relevance AND
recommendations
• In the enterprise, this is still the case, but the types of signals are often different (email,
IM)
Three Sides of the Same Coin

• Content — documents which are textually similar are often good as “similar items” to be
recommended
• Collaborative — documents which have been engaged with by the same people (and/or in the
same search context) are also similar in a more subtle, but often more powerful way
• Multi-Modal — why choose one? Try a smooth interpolation between using a content-based
similarity metric, and an engagement based one!
Deﬁning Moments

Search-Driven Online Retail
Increase conversions with a
personalized shopping experience with
best in class reliability and
performance.
CATALOG
DYNAMIC NAVIGATION
AND LANDING PAGES
INSTANT INSIGHTS AND
ANALYTICS
PERSONALIZED
SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Data Processing
Smart Access API

Search-Driven Compliance and Surveillance
Detect and investigate activity for
regulatory compliance, from one
uniﬁed view.
DATABASE
ACCURATE REAL-TIME
INFORMATION
CONTEXTUALLY-
ENRICHED
INFORMATION
MESSAGESLOGS
DATA EXPLORATION
AND VISUALIZATION
Data Acquisition
Indexing & Streaming
Smart Access API

Search-Driven Customer Service
Resolve customer issues quickly with
immediate access to relevant answers.
CUSTOMER  
SELF-SERVICE
KNOWLEDGE BASE
PROACTIVE ALERTS AND
RECOMMENDATIONS
EXPERT TUNED
RELEVANCY DRIVEN BY
ANALYTICS AND INSIGHTS
CRM SUPPORT TICKETS &
ISSUE TRACKING
Data Acquisition
Data Processing
Smart Access API

Lucidworks Fusion Is Search-Driven Everything
•Drive next generation relevance
via Content, Collaboration and
Context
•Harness best in class Open
Source: Apache Solr + Spark
•Simplify application
development and reduce
ongoing maintenance
CATALOG
DYNAMIC NAVIGATION
AND LANDING PAGES
INSTANT INSIGHTS AND
ANALYTICS
PERSONALIZED
SHOPPING EXPERIENCE
PROMOTIONS USER HISTORY
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations & 
Alerts
Analytics & InsightsExtreme Relevancy
Access data from
anywhere to build
intelligent, data-
driven applications.

Fusion Architecture
RESTAPI
Worker Worker Cluster Mgr.
Apache Spark
Shards Shards
Apache Solr
HDFS(Optional)
Shared Conﬁg
Mgmt
Leader
Election
Load
Balancing
ZK 1
Apache Zookeeper
ZK N
DATABASEWEBFILELOGSHADOOP CLOUD
Connectors
Alerting/Messaging
NLP
Pipelines
Blob Storage
Scheduling
Recommenders/Signals
…
Core Services
Admin UI
SECURITY BUILT-IN
Lucidworks View

• Fusion
• Recommenders API
• Machine Learning pipeline stages
• Scheduling
• Solr:
• More Like This + Signals
• Spark:
• MLlib, Mahout, custom
Key Platform Tech

• Solr comes built-in with a query parser, MoreLikeThis, which takes a given document, and:
• Extracts nontrivial terms from speciﬁed ﬁelds in it
• Builds an “OR” query to search for closest matches (like a cosine similarity computation)
• Has many knobs to tune regarding “data-cleaning” non-useful terms from the query
• TF-IDF is great, but there are other metrics possible: LSI, LDA, W2V
Content-focused
{!mlt qf=body,suggest,subject,title mintf=2 mindf=5 minwl=3}<DOC_ID>

“People who bought X also bought Y” / “Movies recommended for you”
Collaborative Filtering
Search User/
Item Index
Top K users
who’ve
interacted with
this Item
Search and
Rollup on User/
Item Index
Top Y docs
Current Doc
Filter by
context
Proﬁt
User/Item Index
Ofﬂine Tasks
User/Item Signals
Math!

• Fusion CF-based “documents like this” pipeline stages:
• Sub-query: search aggregated signals index for current doc_id,
extracting the top-K pairs of (user_id, weight)
• Sub-query: search that table again with a weighted OR query:
(user_id:user_id_1^weight_1 OR user_id:user_id_2^weight_2 OR … )
• Roll-up: topN(sum(score_i * weight_i))
• Sub-query: fetch the documents from primary Solr index of
these top N doc_ids
Collaborative Filtering: step by step in Fusion

• Both content-based and CF recommenders use features of the documents to generate a
similarity metric
• Content uses the tokens in the document
• CF uses user ids who have engaged with it
• Metrics can be weighted-summed, allowing a “slider” between the two
• Fancy similarity techniques which can be done to a (doc, token) matrix can often be done on a
(doc, userId) matrix, or even a joint (doc, (token or userId)) concatenated matrix
• There is a cost to such techniques: harder to maintain, harder to A/B test variations
Multi-modal

• Basics:
• 26 Apache Projects registered so far plus LW web properties
• 93 datasources* including email, Github, JIRA*, Website and Wiki
• Fusion 2.4
• Signals everywhere
• UI based on Lucidworks View
• ASF Mail archives mirrored at: http://asfmail.lucidworks.io
Demo
http://searchhub.lucidworks.com

Implementation Details
http://github.com/lucidworks/searchhub
Branch: GH-28-doc-view
Key Source Code
UI
Angular Directives:
perdocument
recommendations
Ofﬂine Tasks
Spark Jobs:
mail_thread_signal_creation_job.json
SimpleTwoHopRecommender.scala
Fusion Pipelines
Query:
lucidfind-recommendations
cf-similar-items-batch-rec
cf-similar-items-rec

• Ensemble and Click-based approaches
• https://github.com/lucidworks/searchhub/issues/40
• Deploy live
• User registrations
Future Work

Resources
Fusion: http://www.lucidworks.com/products/fusion
Search Hub: http://searchhub.lucidworks.com
Company: http://www.lucidworks.com
Our blog: http://www.lucidworks.com/blog
Twitter: @gsingers, @pbrane

Webinar: Search and Recommenders

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Destacado

Destacado (20)

Similar a Webinar: Search and Recommenders

Similar a Webinar: Search and Recommenders (20)

Más de Lucidworks

Más de Lucidworks (20)

Último

Último (20)

Webinar: Search and Recommenders