On Sampling Strategies for Sampling Strategies-based Collaborative Filtering

On Sampling Strategies for Neural
Network-based Collaborative Filtering
Ting Chen, Yizhou Sun, Yue Shi, Liangjie Hong

Outlines
• Neural Network-based Collaborative Filtering
• Computation Challenges and Limitations of Existing
Methods
• Two Sampling Strategies and Their Combination
• Empirical Evaluations

Content-based Recommendation
Problem

Neural Network-based Collaborative
ﬁltering

Functional Embedding
ruv = f(xu)T
g(xv)
Embedding functions
Interaction function
fu, gv 2 RdEmbeddings:

• If we have no additional features for users and
items (reduced to conventional MF)
Embedding Functions
• We have text features for items
ruv = uT
u vv
ruv = uT
u g(xv)
Neural networks
Embedding vector
uu = f(xu) = WT
xu
id-based one-hot vector

Text Embedding Function g(.)
[Y. Kim, AAAI’14]
Convolutional Neural Networks
Recurrent Neural Networks (LSTM)
[Christopher Olah]

Implicit Feedbacks and Loss Functions
• We deﬁne loss based on implicit feedbacks [Hu’08, Rendle’09]
• Interactions are positive
• Non-interactions are treated as negative
(user, item)
as a data point
(user, item+, item-)
as a data point

Training Procedure
Have different
sampling strategies

Computation Cost Using Different
Embedding Functions
Computation cost is dominated by the neural network
computation (forward / backward) for items/texts.

Major Computation Cost Breakdown
User function
computation
Item function
computation
Interaction function
(dot product) computation
tf tg ti
10 100 1
(both forward/backward)
Very rough order of magnitude estimate of time units
(depending on speciﬁc conﬁgurations)

Computation Cost in a Graph View
The loss functions are deﬁned over interactions/links,
but the major computation burden are on nodes.
Pointwise Loss Pairwise Loss

Mini-batch Sampling Matters
• Since certain data points (links/interactions) share
the same computations (on nodes).
• Different mini-batch sampling can result in different
computations.

Existing Mini-batch Sampling
Approaches
• IID Sampling [Bottou’10]
• Draw positive links uniformly at random
• Draw negative links according to negative distribution
• Negative Sampling [Rendle’09, Mikolov’13]
• Draw positive links uniformly at random
• Draw k negative links for each positive link by
replacing items

• Assuming we sample a batch of b positive links,
and k negative links for each positive link.
Cost Model Analysis for
IID and Negative Sampling
tf tg ti are unit computation costs for user/item/interaction functions
Computation: almost the same

Limitations of Existing Approaches
• IID sampling assumes computation costs are
independent among data points (links).
• So the computation cost cannot be amortized,
and thus very intensive.
• Negative sampling cannot do better since item
function computation is the most expensive

The Proposed Strategies
• Strategy one: Stratiﬁed Sampling.
• Grouping loss function terms by shared “heavy-
lifting” node, i.e. amortized the computation cost
• Strategy two: Negative Sharing.
• Once a batch of (user, item) tuples are sampled, we
add additional links with not much additional costs.
• The two strategies can be further combined.

Proposed Strategy 1: Stratiﬁed Sampling
• Node computation cost can be amortized if we
have multiple links sharing the same node when we
sample a mini-batch.
• That is to group links according to certain “heavy-
lifting” nodes (i.e. loss function terms).
• We ﬁrst draw items, then draw associated positive
and negative links.

Proposed Strategy 1: Stratiﬁed Sampling

Stratiﬁed Sampling
Speedup: ~(1+k)s times

Proposed Strategy 2: Negative Sharing
• Interaction computation is much cheaper than
(item) node computation (according to our
assumption).
• Once user/item nodes are given in a batch, adding
more links among them may not increase
computation cost much.
• Only need to draw positive links!

Proposed Strategy 2: Negative Sharing
Implementation detail: use efﬁcient matrix multiplication operation for complete interactions

Negative Sharing
Speedup: (1+k) times
Much more negative links

Limitations of Both Proposed
Strategies
• Stratiﬁed sampling:
• Cannot work well with ranking-based loss functions
• Negative sharing:
• Too much negative interactions, diminishing return
• Have-your-cake-and-eat-it solution:
• Combine both strategies to overcome their shortcomings, while
keeping their advantages.
• Draw positive links using Stratiﬁed Sampling, generate negative
links using Negative Sharing.

Proposed Hybrid Strategy:
Stratiﬁed Sampling with Batch Sharing

Stratiﬁed Sampling with Negative Sharing
Speedup: (1+k)s times
Much more negative links

Summary of Cost Model Analysis
• Computation cost estimation (using b=256, k=20, t_f=10, t_g=100, t_i=1, s=2)
• IID sampling: 597k
• Negative sampling: 546k
• Stratiﬁed sampling (by item): 72k
• Negative Sharing: 28k
• Stratiﬁed sampling with negative sharing: 16k
(all in time units)

Datasets and Setup
• We use CiteULike and Yahoo News data sets.
• Test data consists of texts never seen before.

Speed-up Comparisons
Total speedup = speedup per iter * speedup of # iter

Convergence Curves
Converges faster, and performs better!

Number of Negative Examples
More negative examples helps, with diminishing return.

Number of Positive Links per Stratum

Conclusions
• We propose a functional embedding framework with neural
networks for collaborative ﬁltering, which generalizes
several STOA models.
• We establish the connection between the loss functions
and the user-item interaction graph, which introduces
computation cost dependency between links (i.e. loss
function terms).
• Based on the understanding, we propose three novel mini-
batch sampling strategies, that speedup model training
signiﬁcantly, at the same time improve the performance.

Thank You!
code is also available @ https://github.com/chentingpc/nncf.

On Sampling Strategies for Sampling Strategies-based Collaborative Filtering

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a On Sampling Strategies for Sampling Strategies-based Collaborative Filtering

Similar a On Sampling Strategies for Sampling Strategies-based Collaborative Filtering (20)

Último

Último (20)

On Sampling Strategies for Sampling Strategies-based Collaborative Filtering