SlideShare una empresa de Scribd logo
1 de 67
© MapR Technologies, confidential© MapR Technologies, confidential
Introduction to Mahout
© MapR Technologies, confidential© MapR Technologies, confidential
Topic For This Section
• What is recommendation?
• What makes it different?
• What is multi-model recommendation?
• How can I build it using common household
items?
© MapR Technologies, confidential© MapR Technologies, confidential
Oh … Also This
• Detailed break-down of a recommendation
system running with Mahout on MapR
• With code examples
© MapR Technologies, confidential
I may have to
summarize
© MapR Technologies, confidential
I may have to
summarize
just a bit
© MapR Technologies, confidential
Part 1:
5 minutes of background
© MapR Technologies, confidential
Part 2:
5 minutes: I want a pony
© MapR Technologies, confidential
© MapR Technologies, confidential
Part 1:
5 minutes of background
© MapR Technologies, confidential© MapR Technologies, confidential
What Does Machine Learning Look Like?
© MapR Technologies, confidential© MapR Technologies, confidential
What Does Machine Learning Look Like?
A1 A2
é
ë
ù
û
T
A1 A2
é
ë
ù
û=
A1
T
A2
T
é
ë
ê
ê
ù
û
ú
ú
A1 A2
é
ë
ù
û
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
r1
r2
é
ë
ê
ê
ù
û
ú
ú
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
r1 = A1
T
A1 A1
T
A2
é
ëê
ù
ûú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
O(κ k d + k3 d) = O(k2 d log n + k3 d) for
small k, high quality
O(κ d log k) or O(d log κ log k) for larger
k, looser quality
But tonight we’re going to show you how to keep it simple yet
powerful…
© MapR Technologies, confidential© MapR Technologies, confidential
Recommendations as Machine Learning
• Recommendation:
– Involves observation of interactions between people taking action
(users) and items for input data to the recommender model
– Goal is to suggest additional appropriate or desirable interactions
– Applications include: movie, music or map-based restaurant choices;
suggesting sale items for e-stores or via cash-register receipts
© MapR Technologies, confidential
© MapR Technologies, confidential
© MapR Technologies, confidential
Part 2:
How recommenders work
(I still want a pony)
© MapR Technologies, confidential
Recommendations
Recap:
Behavior of a crowd helps us
understand what individuals will
do
© MapR Technologies, confidential
Recommendations
Alice got an apple and
a puppy
Charles got a bicycle
Alice
Charle
s
© MapR Technologies, confidential
Recommendations
Alice got an apple and
a puppy
Charles got a bicycle
Bob got an apple
Alice
Bob
Charle
s
© MapR Technologies, confidential
Recommendations
What else would Bob
like?
?
Alice
Bob
Charle
s
© MapR Technologies, confidential
Recommendations
A puppy, of course!
Alice
Bob
Charle
s
© MapR Technologies, confidential
You get the idea of how
recommenders work…
(By the way, like me, Bob
also wants a pony)
© MapR Technologies, confidential
Recommendations
What if everybody gets a
pony?
?
Alice
Bob
Charle
s
Amelia What else would you
recommend for Amelia?
© MapR Technologies, confidential
Recommendations
?
Alice
Bob
Charle
s
Amelia
If everybody gets a
pony, it’s not a very good
indicator of what to else
predict...
© MapR Technologies, confidential© MapR Technologies, confidential
Problems with Raw Co-occurrence
• Very popular items co-occur with everything (it’s
doesn’t help that everybody wants a pony…)
– Examples: Welcome document; Elevator music
• Widespread occurrence is not interesting
– Unless you want to offer an item that is constantly
desired, such as razor blades (or ponies)
• What we want is anomalous co-occurrence
– This is the source of interesting indicators of
preference on which to base recommendation
© MapR Technologies, confidential© MapR Technologies, confidential
Get Useful Indicators from Behaviors
• Use log files to build history matrix of users x items
– Remember: this history of interactions will be sparse
compared to all potential combinations
• Transform to a co-occurrence matrix of items x items
• Look for useful co-occurrence by looking for
anomalous co-occurrences to make an indicator
matrix
– Log Likelihood Ratio (LLR) can be helpful to judge which
co-occurrences can with confidence be used as indicators
of preference
– RowSimilarityJob in Apache Mahout uses LLR
© MapR Technologies, confidential
Log Files
Alice
Bob
Charle
s
Alice
Bob
Charle
s
Alice
© MapR Technologies, confidential
Log Files
u1
u3
u2
u1
u3
u2
u1
t1
t4
t3
t2
t3
t3
t1
© MapR Technologies, confidential
History Matrix: Users by Items
Alice
Bob
Charle
s
✔ ✔ ✔
✔ ✔
✔ ✔
© MapR Technologies, confidential
Co-occurrence Matrix: Items by Items
-
1 2
1 1
1
1
2 1
How do you tell which co-occurrences are
useful?.
0
0
0 0
© MapR Technologies, confidential
Co-occurrence Matrix: Items by Items
-
1 2
1 1
1
1
2 1
Use LLR test to turn co-occurrence into indicators…
0
0
0 0
© MapR Technologies, confidential
Co-occurrence Binary Matrix
1
1not
not
1
© MapR Technologies, confidential© MapR Technologies, confidential
Spot the Anomaly
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
What conclusion do you draw from each situation?
© MapR Technologies, confidential© MapR Technologies, confidential
Spot the Anomaly
• Root LLR is roughly like standard deviations
• In Apache Mahout, RowSimilarityJob uses LLR
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
0.90 1.95
4.52 14.3
What conclusion do you draw from each situation?
© MapR Technologies, confidential
Co-occurrence Matrix
-
1 2
1 1
1
1
2 1
Recap: Use LLR test to turn co-occurrence into
indicators
0
0
0 0
© MapR Technologies, confidential
Indicator Matrix: Anomalous Co-Occurrence
✔
✔
Result: The marked row will be added to the
indicator field in the item document…
© MapR Technologies, confidential
Indicator Matrix
✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators: (t1)
That one row from indicator matrix becomes the indicator field in the
Solr document used to deploy the recommendation engine.
Note: the indicator field is added directly to meta-data for a document
in the Solr index. No need to create a separate index for indicators.
© MapR Technologies, confidential
Internals of the Recommender Engine
37
© MapR Technologies, confidential
Internals of the Recommender Engine
38
© MapR Technologies, confidential© MapR Technologies, confidential
Looking Inside LucidWorks
What to recommend if new user listened to 2122: Fats Domino & 303: Beatles?
Recommendation is “1710 : Chuck Berry”
39
Real-time recommendation query and results: Evaluation
© MapR Technologies, confidential
Search-based Recommendations
• Sample document
– Merchant Id
– Field for text
description
– Phone
– Address
– Location
© MapR Technologies, confidential
Search-based Recommendations
• Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
– Indicator merchant id’s
– Indicator industry (SIC)
id’s
– Indicator offers
– Indicator text
– Local top40
© MapR Technologies, confidential
Search-based Recommendations
• Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
– Indicator merchant id’s
– Indicator industry (SIC)
id’s
– Indicator offers
– Indicator text
– Local top40
• Sample query
– Current location
– Recent merchant
descriptions
– Recent merchant id’s
– Recent SIC codes
– Recent accepted offers
– Local top40
© MapR Technologies, confidential
Search-based Recommendations
• Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
– Indicator merchant id’s
– Indicator industry (SIC)
id’s
– Indicator offers
– Indicator text
– Local top40
• Sample query
– Current location
– Recent merchant
descriptions
– Recent merchant id’s
– Recent SIC codes
– Recent accepted offers
– Local top40
Original data
and meta-data
Derived from cooccurrence
and cross-occurrence
analysis
Recommendation
query
© MapR Technologies, confidential© MapR Technologies, confidential
For example
• Users enter queries (A)
– (actor = user, item=query)
• Users view videos (B)
– (actor = user, item=video)
• ATA gives query recommendation
– “did you mean to ask for”
• BTB gives video recommendation
– “you might like these videos”
© MapR Technologies, confidential© MapR Technologies, confidential
The punch-line
• BTA recommends videos in response to a query
– (isn’t that a search engine?)
– (not quite, it doesn’t look at content or meta-data)
© MapR Technologies, confidential© MapR Technologies, confidential
Real-life example
• Query: “Paco de Lucia”
• Conventional meta-data search results:
– “hombres de paco” times 400
– not much else
• Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/flamenco riff
© MapR Technologies, confidential© MapR Technologies, confidential
Real-life example
© MapR Technologies, confidential© MapR Technologies, confidential
Hypothetical Example
• Want a navigational ontology?
• Just put labels on a web page with traffic
– This gives A = users x label clicks
• Remember viewing history
– This gives B = users x items
• Cross recommend
– B’A = label to item mapping
• After several users click, results are whatever
users think they should be
© MapR Technologies, confidential
Nice. But we
can do
better?
© MapR Technologies, confidential© MapR Technologies, confidential
A Quick Simplification
• Users who do h (a vector of things a user has
done)
• Also do r
Ah
AT
Ah( )
AT
A( )h
User-centric recommendations
(transpose translates back to things)
Item-centric recommendations
(change the order of operations)
A translates things into users
© MapR Technologies, confidential© MapR Technologies, confidential
Symmetry Gives Cross Recommentations
AT
A( )h
BT
A( )h
Conventional
recommendations with off-
line learning
Cross recommendations
© MapR Technologies, confidential
Ausers
things
© MapR Technologies, confidential
A1 A2
é
ë
ù
û
users
thing
type 1
thing
type 2
© MapR Technologies, confidential
A1 A2
é
ë
ù
û
T
A1 A2
é
ë
ù
û=
A1
T
A2
T
é
ë
ê
ê
ù
û
ú
ú
A1 A2
é
ë
ù
û
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
r1
r2
é
ë
ê
ê
ù
û
ú
ú
=
A1
T
A1 A1
T
A2
AT
2A1 AT
2A2
é
ë
ê
ê
ù
û
ú
ú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
r1 = A1
T
A1 A1
T
A2
é
ëê
ù
ûú
h1
h2
é
ë
ê
ê
ù
û
ú
ú
© MapR Technologies, confidential
Bonus Round:
When worse is
better
© MapR Technologies, confidential© MapR Technologies, confidential
The Real Issues After First Production
• Exploration
• Diversity
• Speed
• Not the last fraction of a percent
© MapR Technologies, confidential© MapR Technologies, confidential
Result Dithering
• Dithering is used to re-order recommendation
results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line
performance worse
• Dithering also has a near perfect record of making
actual performance much better
© MapR Technologies, confidential© MapR Technologies, confidential
Result Dithering
• Dithering is used to re-order recommendation
results
– Re-ordering is done randomly
• Dithering is guaranteed to make off-line
performance worse
• Dithering also has a near perfect record of making
actual performance much better
“Made more difference than any other change”
© MapR Technologies, confidential
Why Dithering Works
Real-time
recommender
Overnight
training
Log
Files
© MapR Technologies, confidential
Exploring The Second Page
© MapR Technologies, confidential© MapR Technologies, confidential
Simple Dithering Algorithm
• Synthetic score from log rank plus Gaussian
• Pick noise scale to provide desired level of mixing
• Typically
• Also… use floor(t/T) as seed
s = logr + N(0,loge)
e Î 1.5,3[ ]
Dr
r
µe
© MapR Technologies, confidential
Example … ε = 2
1 2 8 3 9 15 7 6
1 8 14 15 3 2 22 10
1 3 8 2 10 5 7 4
1 2 10 7 3 8 6 14
1 5 33 15 2 9 11 29
1 2 7 3 5 4 19 6
1 3 5 23 9 7 4 2
2 4 11 8 3 1 44 9
2 3 1 4 6 7 8 33
3 4 1 2 10 11 15 14
11 1 2 4 5 7 3 14
1 8 7 3 22 11 2 33
© MapR Technologies, confidential
Lesson:
Exploration is good
© MapR Technologies, confidential
Part 3:
What about that worked
example?
© MapR Technologies, confidential
SolR
Indexer
SolR
Indexer
Solr
indexing
Cooccurrence
(Mahout)
Item meta-
data
Index
shards
Complete
history
Analyze with Map-Reduce
© MapR Technologies, confidential
SolR
Indexer
SolR
Indexer
Solr
search
Web tier
Item meta-
data
Index
shards
User
history
Deploy with Conventional Search System
© MapR Technologies, confidential© MapR Technologies, confidential

Más contenido relacionado

La actualidad más candente

Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and RecommendationsTed Dunning
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoopTed Dunning
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 

La actualidad más candente (20)

Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and Recommendations
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
GoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 SkinnedGoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 Skinned
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoop
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 

Similar a My talk about recommendation and search to the Hive

Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsMapR Technologies
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationTed Dunning
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersDFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersTed Dunning
 
Practical Machine Learning: Innovations in Recommendation Workshop
Practical Machine Learning:  Innovations in Recommendation WorkshopPractical Machine Learning:  Innovations in Recommendation Workshop
Practical Machine Learning: Innovations in Recommendation WorkshopMapR Technologies
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
 
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15MLconf
 
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation enginelucenerevolution
 
Recommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on SymmetryRecommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on SymmetryMapR Technologies
 
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01MapR Technologies
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadooplucenerevolution
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterDataWorks Summit
 
Using Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent ThreatsUsing Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent ThreatsDataWorks Summit/Hadoop Summit
 
Natural born conversion killers - Conversion Jam
Natural born conversion killers - Conversion JamNatural born conversion killers - Conversion Jam
Natural born conversion killers - Conversion JamCraig Sullivan
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 

Similar a My talk about recommendation and search to the Hive (20)

Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
 
Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersDFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout Recommenders
 
Practical Machine Learning: Innovations in Recommendation Workshop
Practical Machine Learning:  Innovations in Recommendation WorkshopPractical Machine Learning:  Innovations in Recommendation Workshop
Practical Machine Learning: Innovations in Recommendation Workshop
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
 
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
Ted Dunning, Chief Application Architect, MapR at MLconf ATL - 9/18/15
 
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
Recommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on SymmetryRecommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on Symmetry
 
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoop
 
900 keynote abbott
900 keynote abbott900 keynote abbott
900 keynote abbott
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
 
Using Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent ThreatsUsing Sequence Statistics to Fight Advanced Persistent Threats
Using Sequence Statistics to Fight Advanced Persistent Threats
 
Natural born conversion killers - Conversion Jam
Natural born conversion killers - Conversion JamNatural born conversion killers - Conversion Jam
Natural born conversion killers - Conversion Jam
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 

Más de Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

Más de Ted Dunning (11)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

My talk about recommendation and search to the Hive

  • 1. © MapR Technologies, confidential© MapR Technologies, confidential Introduction to Mahout
  • 2. © MapR Technologies, confidential© MapR Technologies, confidential Topic For This Section • What is recommendation? • What makes it different? • What is multi-model recommendation? • How can I build it using common household items?
  • 3. © MapR Technologies, confidential© MapR Technologies, confidential Oh … Also This • Detailed break-down of a recommendation system running with Mahout on MapR • With code examples
  • 4. © MapR Technologies, confidential I may have to summarize
  • 5. © MapR Technologies, confidential I may have to summarize just a bit
  • 6. © MapR Technologies, confidential Part 1: 5 minutes of background
  • 7. © MapR Technologies, confidential Part 2: 5 minutes: I want a pony
  • 8. © MapR Technologies, confidential
  • 9. © MapR Technologies, confidential Part 1: 5 minutes of background
  • 10. © MapR Technologies, confidential© MapR Technologies, confidential What Does Machine Learning Look Like?
  • 11. © MapR Technologies, confidential© MapR Technologies, confidential What Does Machine Learning Look Like? A1 A2 é ë ù û T A1 A2 é ë ù û= A1 T A2 T é ë ê ê ù û ú ú A1 A2 é ë ù û = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú r1 r2 é ë ê ê ù û ú ú = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú h1 h2 é ë ê ê ù û ú ú r1 = A1 T A1 A1 T A2 é ëê ù ûú h1 h2 é ë ê ê ù û ú ú O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k, high quality O(κ d log k) or O(d log κ log k) for larger k, looser quality But tonight we’re going to show you how to keep it simple yet powerful…
  • 12. © MapR Technologies, confidential© MapR Technologies, confidential Recommendations as Machine Learning • Recommendation: – Involves observation of interactions between people taking action (users) and items for input data to the recommender model – Goal is to suggest additional appropriate or desirable interactions – Applications include: movie, music or map-based restaurant choices; suggesting sale items for e-stores or via cash-register receipts
  • 13. © MapR Technologies, confidential
  • 14. © MapR Technologies, confidential
  • 15. © MapR Technologies, confidential Part 2: How recommenders work (I still want a pony)
  • 16. © MapR Technologies, confidential Recommendations Recap: Behavior of a crowd helps us understand what individuals will do
  • 17. © MapR Technologies, confidential Recommendations Alice got an apple and a puppy Charles got a bicycle Alice Charle s
  • 18. © MapR Technologies, confidential Recommendations Alice got an apple and a puppy Charles got a bicycle Bob got an apple Alice Bob Charle s
  • 19. © MapR Technologies, confidential Recommendations What else would Bob like? ? Alice Bob Charle s
  • 20. © MapR Technologies, confidential Recommendations A puppy, of course! Alice Bob Charle s
  • 21. © MapR Technologies, confidential You get the idea of how recommenders work… (By the way, like me, Bob also wants a pony)
  • 22. © MapR Technologies, confidential Recommendations What if everybody gets a pony? ? Alice Bob Charle s Amelia What else would you recommend for Amelia?
  • 23. © MapR Technologies, confidential Recommendations ? Alice Bob Charle s Amelia If everybody gets a pony, it’s not a very good indicator of what to else predict...
  • 24. © MapR Technologies, confidential© MapR Technologies, confidential Problems with Raw Co-occurrence • Very popular items co-occur with everything (it’s doesn’t help that everybody wants a pony…) – Examples: Welcome document; Elevator music • Widespread occurrence is not interesting – Unless you want to offer an item that is constantly desired, such as razor blades (or ponies) • What we want is anomalous co-occurrence – This is the source of interesting indicators of preference on which to base recommendation
  • 25. © MapR Technologies, confidential© MapR Technologies, confidential Get Useful Indicators from Behaviors • Use log files to build history matrix of users x items – Remember: this history of interactions will be sparse compared to all potential combinations • Transform to a co-occurrence matrix of items x items • Look for useful co-occurrence by looking for anomalous co-occurrences to make an indicator matrix – Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference – RowSimilarityJob in Apache Mahout uses LLR
  • 26. © MapR Technologies, confidential Log Files Alice Bob Charle s Alice Bob Charle s Alice
  • 27. © MapR Technologies, confidential Log Files u1 u3 u2 u1 u3 u2 u1 t1 t4 t3 t2 t3 t3 t1
  • 28. © MapR Technologies, confidential History Matrix: Users by Items Alice Bob Charle s ✔ ✔ ✔ ✔ ✔ ✔ ✔
  • 29. © MapR Technologies, confidential Co-occurrence Matrix: Items by Items - 1 2 1 1 1 1 2 1 How do you tell which co-occurrences are useful?. 0 0 0 0
  • 30. © MapR Technologies, confidential Co-occurrence Matrix: Items by Items - 1 2 1 1 1 1 2 1 Use LLR test to turn co-occurrence into indicators… 0 0 0 0
  • 31. © MapR Technologies, confidential Co-occurrence Binary Matrix 1 1not not 1
  • 32. © MapR Technologies, confidential© MapR Technologies, confidential Spot the Anomaly A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 2 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 What conclusion do you draw from each situation?
  • 33. © MapR Technologies, confidential© MapR Technologies, confidential Spot the Anomaly • Root LLR is roughly like standard deviations • In Apache Mahout, RowSimilarityJob uses LLR A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 2 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 0.90 1.95 4.52 14.3 What conclusion do you draw from each situation?
  • 34. © MapR Technologies, confidential Co-occurrence Matrix - 1 2 1 1 1 1 2 1 Recap: Use LLR test to turn co-occurrence into indicators 0 0 0 0
  • 35. © MapR Technologies, confidential Indicator Matrix: Anomalous Co-Occurrence ✔ ✔ Result: The marked row will be added to the indicator field in the item document…
  • 36. © MapR Technologies, confidential Indicator Matrix ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine. Note: the indicator field is added directly to meta-data for a document in the Solr index. No need to create a separate index for indicators.
  • 37. © MapR Technologies, confidential Internals of the Recommender Engine 37
  • 38. © MapR Technologies, confidential Internals of the Recommender Engine 38
  • 39. © MapR Technologies, confidential© MapR Technologies, confidential Looking Inside LucidWorks What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 39 Real-time recommendation query and results: Evaluation
  • 40. © MapR Technologies, confidential Search-based Recommendations • Sample document – Merchant Id – Field for text description – Phone – Address – Location
  • 41. © MapR Technologies, confidential Search-based Recommendations • Sample document – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40
  • 42. © MapR Technologies, confidential Search-based Recommendations • Sample document – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40 • Sample query – Current location – Recent merchant descriptions – Recent merchant id’s – Recent SIC codes – Recent accepted offers – Local top40
  • 43. © MapR Technologies, confidential Search-based Recommendations • Sample document – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40 • Sample query – Current location – Recent merchant descriptions – Recent merchant id’s – Recent SIC codes – Recent accepted offers – Local top40 Original data and meta-data Derived from cooccurrence and cross-occurrence analysis Recommendation query
  • 44. © MapR Technologies, confidential© MapR Technologies, confidential For example • Users enter queries (A) – (actor = user, item=query) • Users view videos (B) – (actor = user, item=video) • ATA gives query recommendation – “did you mean to ask for” • BTB gives video recommendation – “you might like these videos”
  • 45. © MapR Technologies, confidential© MapR Technologies, confidential The punch-line • BTA recommends videos in response to a query – (isn’t that a search engine?) – (not quite, it doesn’t look at content or meta-data)
  • 46. © MapR Technologies, confidential© MapR Technologies, confidential Real-life example • Query: “Paco de Lucia” • Conventional meta-data search results: – “hombres de paco” times 400 – not much else • Recommendation based search: – Flamenco guitar and dancers – Spanish and classical guitar – Van Halen doing a classical/flamenco riff
  • 47. © MapR Technologies, confidential© MapR Technologies, confidential Real-life example
  • 48. © MapR Technologies, confidential© MapR Technologies, confidential Hypothetical Example • Want a navigational ontology? • Just put labels on a web page with traffic – This gives A = users x label clicks • Remember viewing history – This gives B = users x items • Cross recommend – B’A = label to item mapping • After several users click, results are whatever users think they should be
  • 49. © MapR Technologies, confidential Nice. But we can do better?
  • 50. © MapR Technologies, confidential© MapR Technologies, confidential A Quick Simplification • Users who do h (a vector of things a user has done) • Also do r Ah AT Ah( ) AT A( )h User-centric recommendations (transpose translates back to things) Item-centric recommendations (change the order of operations) A translates things into users
  • 51. © MapR Technologies, confidential© MapR Technologies, confidential Symmetry Gives Cross Recommentations AT A( )h BT A( )h Conventional recommendations with off- line learning Cross recommendations
  • 52. © MapR Technologies, confidential Ausers things
  • 53. © MapR Technologies, confidential A1 A2 é ë ù û users thing type 1 thing type 2
  • 54. © MapR Technologies, confidential A1 A2 é ë ù û T A1 A2 é ë ù û= A1 T A2 T é ë ê ê ù û ú ú A1 A2 é ë ù û = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú r1 r2 é ë ê ê ù û ú ú = A1 T A1 A1 T A2 AT 2A1 AT 2A2 é ë ê ê ù û ú ú h1 h2 é ë ê ê ù û ú ú r1 = A1 T A1 A1 T A2 é ëê ù ûú h1 h2 é ë ê ê ù û ú ú
  • 55. © MapR Technologies, confidential Bonus Round: When worse is better
  • 56. © MapR Technologies, confidential© MapR Technologies, confidential The Real Issues After First Production • Exploration • Diversity • Speed • Not the last fraction of a percent
  • 57. © MapR Technologies, confidential© MapR Technologies, confidential Result Dithering • Dithering is used to re-order recommendation results – Re-ordering is done randomly • Dithering is guaranteed to make off-line performance worse • Dithering also has a near perfect record of making actual performance much better
  • 58. © MapR Technologies, confidential© MapR Technologies, confidential Result Dithering • Dithering is used to re-order recommendation results – Re-ordering is done randomly • Dithering is guaranteed to make off-line performance worse • Dithering also has a near perfect record of making actual performance much better “Made more difference than any other change”
  • 59. © MapR Technologies, confidential Why Dithering Works Real-time recommender Overnight training Log Files
  • 60. © MapR Technologies, confidential Exploring The Second Page
  • 61. © MapR Technologies, confidential© MapR Technologies, confidential Simple Dithering Algorithm • Synthetic score from log rank plus Gaussian • Pick noise scale to provide desired level of mixing • Typically • Also… use floor(t/T) as seed s = logr + N(0,loge) e Î 1.5,3[ ] Dr r µe
  • 62. © MapR Technologies, confidential Example … ε = 2 1 2 8 3 9 15 7 6 1 8 14 15 3 2 22 10 1 3 8 2 10 5 7 4 1 2 10 7 3 8 6 14 1 5 33 15 2 9 11 29 1 2 7 3 5 4 19 6 1 3 5 23 9 7 4 2 2 4 11 8 3 1 44 9 2 3 1 4 6 7 8 33 3 4 1 2 10 11 15 14 11 1 2 4 5 7 3 14 1 8 7 3 22 11 2 33
  • 63. © MapR Technologies, confidential Lesson: Exploration is good
  • 64. © MapR Technologies, confidential Part 3: What about that worked example?
  • 65. © MapR Technologies, confidential SolR Indexer SolR Indexer Solr indexing Cooccurrence (Mahout) Item meta- data Index shards Complete history Analyze with Map-Reduce
  • 66. © MapR Technologies, confidential SolR Indexer SolR Indexer Solr search Web tier Item meta- data Index shards User history Deploy with Conventional Search System
  • 67. © MapR Technologies, confidential© MapR Technologies, confidential

Notas del editor

  1. Note to speaker: Move quickly through 1st two slides just to set the tone of familiar use cases but somewhat complicated under-the-covers math and algorithms… You don’t need to explain or discuss these examples at this point… just mention one or twoTalk track: Machine learning shows up in many familiar everyday examples, from product recommendations to listing news topics to filtering out that nasty spam from email….
  2. Talk track: Under the covers, machine learning looks very complicated. So how do you get from here to the familiar examples? Tonight’s presentation will show you some simple tricks to help you apply machine learning techniques to build a powerful recommendation engine.
  3. Note to trainers: the next series of slides start with a cartoon example just to set the pattern of how to find co-occurrence and use it to find indicators of what to recommend. Of course, real examples require a LOT of data of user-item interaction history to actually work, so this is just an analogy to get the idea across…
  4. *Bob is the “new user” and getting apple is his history
  5. *Bob is the “new user” and getting apple is his history
  6. *Here is where the recommendation engine needs to go to work…Note to trainer: you might see if audience calls out the answer before revealing next slide…
  7. Now you see the idea of co-occurrence as a basis for recommendation…
  8. *Now we have a new user, Amelia. Like everybody else, she gets a pony… what should the recommender offer her based on her history?
  9. * Pony not interesting because it is so widespread that it does not differentiate a pattern
  10. Note to trainer: This is the situation similar to that in which we started, with three users in our history. The difference is that now everybody got a pony. Bob has apple and pony but not a puppy…yet
  11. *Binary matrix is stored sparsely
  12. *Convert by MapReduce into a binary matrixNote to trainer: Whether consider apple to have occurred with self is open question
  13. *Convert by MapReduce into a binary matrixNote to trainer: diagonal gives total occurrence for each item (self to self) and is a distraction/ not helpful, so the diagonal here is left blank
  14. Old joke: all the world can be divided into 2 categories: Scotch tape and non-Scotch tape… This is a way to think about the co-occurrence
  15. Note to trainer: Give students time to offer comments. There’s a lot to discuss here.*Upper left: In context of A, B occurs the largest number of times, 13 times out of 1013 appearances with over 100,000 samples. But that’s only ~1.3% as co-occurrence with A out of of all times B appears.*Upper right: B occurs in context of A 33% of time, but counts so small as to be of concern.*Lower right: most significant anomaly in that B still occurs a small number of times of over 100,000 samples, but it ALWAYS co-occurs with A when it does appear.
  16. *The test Mahout uses for this is Log Likelihood Ration (LLR)* Red circle marks the choice that displays highest confidenceNote to trainer: Slide animates with click to show LLR results. SECOND Click animates the choice that has highest confidence.
  17. Note to trainer: we go back to the earlier matrix as a reminder…
  18. Only important co-occurrence is puppy follows apple
  19. *Take that row of matrix and combine with all the meta data we might have…*Important thing to get from the co-occurrence matrix is this indicator..Cool thing: analogous to what a lot of recommendation engines do*This row forms the indicator field in a Solr document containing meta-data (you do NOT have to build a separate index for the indicators)Find the useful co-occurrence and get rid of the rest. Sparsify and get the anomalous co-occurrence
  20. Note to trainer: take a little time to explore this here and on the next couple of slides. Details enlarged on next slide
  21. *This indicator field is where the output of the Mahout recommendation engine are stored (the row from the indicator matrix that identified significant or interesting co-occurrence. *Keep in mind that this recommendation indicator data is added to the same original document in the Solr index that contains meta data for the item in question
  22. This is a diagnostics window in the LucidWorksSolr index (not the web interface a user would see). It’s a way for the developer to do a rough evaluation (laugh test) of the choices offered by the recommendation engine.In other words, do these indicator artists represented by their indicator Id make reasonable recommendations Note to trainer: artist 303 happens to be The Beatles. Is that a good match for Chuck Berry?
  23. Here we recap what we have in the different components of the recommenderWe start with the meta data for an item stored in the Solr index
  24. *Here we’ve added examples of indicator data for the indicator field(s) of the document
  25. *Here we show you what information might be in the sample query
  26. Note to trainer: you could ask the class to consider which data is related… for example, the first 3 bullets of the query relate to meta data for the item, not to data produced by the recommendation algorithm. The last 3 bullets refer to data in the sample query related to data in the indicator field(s) that were produced by the Mahout recommendation engine.