Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Search Engines for Machine Learning: Presented by Joe Blue, MapR
1.
2. Search Engines for
Machine Learning
Joseph Blue, Data Scientist, MapR
jblue@mapr.com
3. ROADMAP
The
Deployment
Challenge
(WANT)
All
About
Recom-‐
menders
(BUILD)
Search
Engine
Delivers
Results
(DEPLOY)
Improving
Those
Results
(IMPROVE)
4. Recommendations
• Data:
interacKons
between
people
taking
acKon
(users)
and
items
• Used
to
train
recommendaKon
model
• Goal
is
to
suggest
addiKonal
interacKons
• Example
applicaKons:
movie,
music
or
map-‐based
restaurant
choices;
suggesKng
sale
items
for
e-‐stores
or
via
cash-‐register
receipts
W
A
N
T
B
U
I
L
D
D
E
P
L
O
Y
I
M
P
R
O
V
E
5. Spend your Cycles Wisely
D
A
T
A
D
E
V
E
L
O
P
D
E
P
L
O
Y
Time
W
A
N
T
B
U
I
L
D
D
E
P
L
O
Y
I
M
P
R
O
V
E
D
A
T
A
D
&
D
Take
more
Kme
to
understand
your
data
and
deploy
a
good
recommender
quickly
6. Of bikes and ponies
?
Alice
Bob
Amelia
Charles
What
if
everybody
gets
a
pony?
What
else
would
you
recommend
for
new
user
Amelia?
W
A
N
T
B
U
I
L
D
D
E
P
L
O
Y
I
M
P
R
O
V
E
7. Three Matrices
But
we
need
a
method
for
iden@fying
anomalous
co-‐
occurrence…
✔
✔
1
2
0
1
1
1
1
1
0
0
0
2
Alice
Bob
Charles
✔
✔
✔
✔
✔
✔
✔
User-‐item
interacKon
Item
Co-‐occurrence
Indicators
W
A
N
T
B
U
I
L
D
D
E
P
L
O
Y
I
M
P
R
O
V
E
8. Log Likelihood Two Ways
U
S
E
R
S
• Size
=
#
users
interact
with
that
item
• Overlap
=
#
users
who
have
two
items
in
common
• LL
=
f
(
size
&
overlap
&
number
of
users)
W
A
N
T
B
U
I
L
D
D
E
P
L
O
Y
I
M
P
R
O
V
E
Items
will
be
shared
by
users,
but
how
much
is
too
much?
10
not
not
10,000
0
0
13
2Σ
2Σ
not
14.3
not
100,000
1,000
1,000
0.90
LL = 2 * yij log(
yij
μij
)
j=1
i=1
9. Updating the metadata = deployment
✔
✔
id:
t4
Ktle:
puppy
desc:
The
sweetest
liZle
puppy
ever.
keywords:
puppy,
dog,
pet
indicators:
(t1)
Indicator
Solr
document
for
“puppy”
W
A
N
T
B
U
I
L
D
D
E
P
L
O
Y
Note:
data
for
the
indicator
field
is
added
directly
to
meta-‐data
for
a
document
in
Apache
Solr
collec9on.
You
don’t
need
to
create
a
separate
index
for
the
indicators.
I
M
P
R
O
V
E
Complete
indicator
matrix
from
log-‐likelihood…
10. Example Workflow
Log
Files
New
User
History
Mahout
Analysis
S
O
L
R
C
O
L
L
E
C
T
I
O
N
Item
Meta-‐Data
Ingest
easily
via
NFS
via
NFS
MapR
Cluster
Use
Python
directly
via
NFS
Python
Pig
Web
Tier
RecommendaKons
W
A
N
T
B
U
I
L
D
D
E
P
L
O
Y
I
M
P
R
O
V
E
O
N
–
L
I
N
E
O
F
F
L
I
N
E
1
2
3
11. But we can do better…
W
A
N
T
B
U
I
L
D
D
E
P
L
O
Y
I
M
P
R
O
V
E
id:
t4
Ktle:
puppy
desc:
The
sweetest
liZle
puppy
ever.
keywords:
puppy,
dog,
pet
indicators:
(t1)
The
indicated
items
are
returned
when
we
query
the
collecKon
based
on
user
history,
but
not
all
user
behaviors
are
created
equal.
Items
with
opposite
polarity
may
turn
your
recommendaKons
into
a
spam
generator.
Example:
consider
the
difference
in
future
purchases
afer
viewing
or
purchasing
razor
blades
vs.
Blu-‐ray
movie…
12. Knowing your Data moves the Needle
W
A
N
T
✔
✔
✔
✔
B
U
I
L
D
D
E
P
L
O
Y
2
I
M
P
R
O
V
E
✔
✔
✔
✔
✔
✔
✔
1
2
0
1
1
1
1
1
0
0
0
✔
✔
✔
✔
id:
t4
Ktle:
puppy
desc:
The
sweetest
liZle
puppy
ever.
keywords:
puppy,
dog,
pet
purchase
indicators:
(t1)
click
indicators:
(t2)
0
0
1
0
1
0
1
0
1
0
1
1
✔
✔
clicks
purchases
13. More information is available…
hZps://www.mapr.com/products/mapr-‐sandbox-‐hadoop
hZps://www.mapr.com/resources/white-‐papers#e-‐books