Intelligent Search

Intelligent Search
(or at least really clever)

Some Preliminaries
• Text retrieval = matrix multiplication
A: our corpus
documents are rows
terms are columns

Some Preliminaries
A: our corpus
documents are rows
terms are columns

for each document d:
for each term t:
sd += adt qt

Some Preliminaries
A: our corpus
documents are rows
terms are columns

sd = Σt adt qt

Some Preliminaries
A: our corpus
documents are rows
terms are columns

s=Aq

More Preliminaries
• Recommendation = Matrix multiply
A: our users’ histories
users are rows
items are columns

More Preliminaries
users are rows
items are columns

Users who bought items
in the list h also bought
items in the list r

More Preliminaries
users are rows
items are columns

for each user u:
for each item t1:
for each item t2:
rt1 += au,t1 au,t2 ht2

More Preliminaries
users are rows
items are columns

sd = Σt2 Σu au,t1 au,t2 qt2

More Preliminaries
users are rows
items are columns

s = A’ (A q)

More Preliminaries
users are rows
items are columns

s = (A’ A) q

More Preliminaries
users are rows
items are columns

s = (A’ A) q ish!

Why so ish?

• In real life, ish happens because:
• Big data ... so we selectively sample
• Sparse data ... so we smooth
• Finite computers ... so we sparsify
• Top-40 effect ... so we use some stats

The same in spite of ish

• The shape of the computation is unchanged
• The cost of the computation is unchanged
• Broad algebraic conclusions still hold

Dyadic Structure
● Functional
– Interaction: actor -> item*
● Relational
– Interaction ⊆ Actors x Items
● Matrix
– Rows indexed by actor, columns by item
– Value is count of interactions
● Predict missing observations

Fundamental Algorithmics
● Cooccurrence

● A is actors x items, K is items x items
● Product has general shape of matrix
● K tells us “users who interacted with x also
interacted with y”

Fundamental Algorithmic Structure
● Cooccurrence

● Matrix approximation by factoring

● LLR

But Wait ...
Does it have to be that way?

What we have:

For a user who watched/bought/listened to this

What we have:


Sum over all other users who watched/bought/...

What we have:



Add up what they watched/bought/listened to

What we have:




And recommend that

What we have:




And recommend that

ish

What we have:


What we have:


But wait, we can do that faster

But why not ...

Why just dyadic learning?

But why not ...


Why not triadic learning?

But why not ...


Why not p-adic learning?

For example
● Users enter queries (A)
– (actor = user, item=query)
● Users view videos (B)
– (actor = user, item=video)
● AʼA gives query recommendation
– “did you mean to ask for”
● BʼB gives video recommendation
– “you might like these videos”

The punch-line
● BʼA recommends videos in response to a query
– (isnʼt that a search engine?)
– (not quite, it doesnʼt look at content or meta-data)

Real-life example
● Query: “Paco de Lucia”
● Conventional meta-data search results:
– “hombres del paco” times 400
– not much else
● Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/ﬂamenco riff

System Diagram
Viewing Logs selective
count
t user video sampler

Search Logs selective llr + Related videos
count
t user query-term sampler sparsify v => v1 v2...

Related terms
join on v => t1 t2...
count
user

Hadoop

Indexing
Related terms
v => t1 t2...

Related videos
v => v1 v2...
join on
Lucene Index
video
Video meta
v => url title...

Hadoop Lucene (+Katta?)

Hypothetical Example
● Want a navigational ontology?
● Just put labels on a web page with trafﬁc
– This gives A = users x label clicks
● Remember viewing history
– This gives B = users x items
● Cross recommend
– BʼA = click to item mapping
● After several users click, results are whatever
users think they should be

Resources
● My blog
– http://tdunning.blogspot.com/

● The original LLR in NLP paper
– Accurate Methods for the Statistics of Surprise and Coincidence
(check on citeseer)
● Source code
– Mahout project
– contact me (tdunning@apache.org)

Intelligent Search

Recommended

Recommended

More Related Content

Similar to Intelligent Search

Similar to Intelligent Search (20)

More from Ted Dunning

More from Ted Dunning (20)

Recently uploaded

Recently uploaded (20)

Intelligent Search