Large-scale Recommendation Systems on Just a PC

Large-scale Recommender
Systems on Just a PC
LSRS 2013 keynote
(RecSys ’13 Hong Kong)

Aapo Kyrölä
Ph.D. candidate @ CMU
http://www.cs.cmu.edu/~akyrola
Twitter: @kyrpov

Big Data – small machine

My Background
• Academic: 5th year Ph.D. @ Carnegie Mellon.
Advisors: Guy Blelloch, Carlos Guestrin (UW)
2009 

2012 

+ Shotgun : Parallel L1-regularized regression solver (ICML 2011).
+ Internships at MSR Asia (2011) and Twitter (2012)

• Startup Entrepreneur
Habbo : founded 2000

Outline of this talk
1. Why single-computer computing?
2. Introduction to graph computation and
GraphChi
3. Recommender systems with GraphChi
4. Future directions & Conclusion

Large-Scale Recommender Systems on
Just a PC

Why on a single machine?

Can’t we just use the
Cloud?

Why use a cluster?
Two reasons:
1. One computer cannot handle my problem in a
reasonable time.

1. I need to solve the problem very fast.

Why use a cluster?
Two reasons:
1. One computer cannot handle my problem in a
reasonable time.
Our work expands the space of feasible (graph) problems on
one machine:
- Our experiments use the same graphs, or bigger, than previous
papers on distributed graph computation. (+ we can do Twitter
graph on a laptop)
- Most data not that “big”.

1. I need to solve the problem very fast.
Our work raises the bar on required performance for a
“complicated” system.

Benefits of single machine systems
Assuming it can handle your big problems…
1. Programmer productivity
– Global state
– Can use “real data” for development

2. Inexpensive to install, administer, less
power.
3. Scalability.

Efficient Scaling
Distributed Graph
System
Task 7

Task 6

Task 5

Task 4

Task 3

Single-computer
system (capable of big tasks)

Task 2

Task 1
Task 2
Task 3
Task 4
Task 5
Task 6

Task 1

6 machines
(Significantly) less
than 2x throughput
with 2x machines
T11

T10

T9

T8

T7

T6

T5

T4

T3

T2

T1

Task 1
Exactly 2x 2
Task
Task 3
throughput with 2x
Task 4
machines 5
Task
Task 6
Task 10
Task 11
Task 12

12 machines
Time

T

Time

T

GRAPH COMPUTATION AND
GRAPHCHI

Why graphs for recommender systems?
• Graph = matrix: edge(u,v) = M[u,v]
– Note: always sparse graphs

• Intuitive, human-understandable
representation
– Easy to visualize and explain.

• Unifies collaborative filtering (typically matrix
based) with recommendation in social
networks.
– Random walk algorithms.

• Local view  vertex-centric computation

Vertex-Centric Computational Model
• Graph G = (V, E)
– directed edges: e = (source,
destination)
– each edge and vertex
associated with a value
(user-defined type)
– vertex and edge values can
be modified
• (structure modification also
supported)

A

B

Data

Data

Data

Data

Data

Data
Data

Data
Data

Data

GraphChi – Aapo Kyrola

12

Vertex-centric Programming
• “Think like a vertex”
• Popularized by the Pregel and GraphLab
projects
Data

Data

Data

Data

Data

{ // modify neighborhood }

Data
Data

Data

Data
Data

MyFunc(vertex)

What is GraphChi

Both in OSDI’12!

The Main Challenge of Disk-based
Graph Computation:
Random Access

<< 5-10 M random edges
/ sec to achieve
“reasonable
performance”
100s reads/writes per sec

~ 100K reads / sec (commodity)
~ 1M reads / sec (high-end arrays)

Details: Kyrola, Blelloch, Guestrin: “Large-scale graph computation on just a PC” (OSDI 2012)

Parallel Sliding Windows

or

Only P large reads for each interval (sub-graph).

P2 reads on one full pass.

GraphChi Program Execution
For T iterations:
For p=1 to P
For v in interval(p)
updateFunction(v)

For T iterations:
For v=1 to V
updateFunction(v)
“Asynchronous”: updates immediately
visible (vs. bulk-synchronous).

Performance

GraphChi can compute on the
full Twitter follow-graph with
just a standard laptop.
~ as fast as a very large Hadoop cluster!
(size of the graph Fall 2013, > 20B edges [Gupta et al 2013])

GraphChi is Open Source
• C++ and Java-versions in GitHub:
http://github.com/graphchi
– Java-version has a Hadoop/Pig wrapper.
• If you really really want to use Hadoop.

RECSYS MODEL TRAINING
WITH GRAPHCHI

Overview of Recommender Systems for
GraphChi
• Collaborative Filtering toolkit (next slide)
• Link prediction in large networks
– Random-walk based approaches (Twitter)
– Talk on Wednesday.

GraphChi’s Collaborative Filtering Toolkit
• Developed by Danny Bickson
(CMU / GraphLab Inc)
• Includes:
–
–
–
–
–
–
–
–

Alternative Least Squares (ALS)
Sparse-ALS
SVD++
LibFM (factorization machines)
GenSGD
Item-similarity based methods
PMF
CliMF (contributed by Mark
Levy)
– ….

See Danny’s blog for more
information:
http://bickson.blogspot.com
/2012/12/collaborativefiltering-with-graphchi.html
Note: In the C++ -version.
Java-version in development
by a CMU team.

TWO EXAMPLES: ALS AND
ITEM-BASED CF

Example: Alternative Least Squares
Matrix Factorization (ALS)
• Task: Predict ratings for items (movies) by
users.
• Model:
– Latent factor model (see next slide)

Reference: Y. Zhou, D. Wilkinson, R. Schreiber, R. Pan: “Large-Scale
Parallel Collaborative Filtering for the Netflix Prize” (2008)

ALS: Product – Item bipartite graph
0.4

2.3

-1.8

2.9

1.2

4

Women on the Verge of a
Nervous Breakdown
2.3

2.5

3.9

0.02

0.04

2.1

3.141

3
The Celebration
8.7

-3.2

2.8

0.9

0.2

2.9

0.04

City of God

4.1

2

Wild Strawberries
5
User’s rating of a movie modeled as a dot-product:
<factor(user), factor(movie)>
La Dolce Vita

ALS: GraphChi implementation
• Update function handles one vertex a time
(user or movie)
• For each user:
– Estimate latent(user): minimize least squares of
dot-product predicted ratings

• GraphChi executes the update function for
each vertex (in parallel), and loads edges
(ratings) from disk
– Latent factors in memory: need O(V) memory.
– If factors don’t fit in memory, can replicate to
edges. and thus store on disk
Scales to very large problems!

ALS: Performance
Matrix Factorization (Alternative Least Squares)
Netflix (99M edges), D=20

GraphChi (Mac
Mini)

GraphLab v1
(8 cores)

0

2

4

6

8

10

12

Minutes

Remark: Netflix is not a big problem, but
GraphChi will scale at most linearly with
input size (ALS is CPU bounded, so should
be sub-linear in #ratings).

Example: Item Based-CF
• Task: compute a similarity score [e,g.
Jaccard] for each movie-pair that has at least
one viewer in common.
– Similarity(X, Y) ~ # common viewers
– Output top K similar items for each item to a file.
– … or: create edge between X, Y containing the
similarity.

• Problem: enumerating all pairs takes too
much time.

Women on the Verge of a
Nervous Breakdown
3
Solution: Enumerate all
The Celebration
triangles of the graph.

New problem: how to
City of God
enumerate triangles if the
graph does not fit in RAM?
Wild Strawberries
La Dolce Vita

Enumerating Triangles (Item-CF)
• Triangles with edge (u, v) =
intersection(neighbors(u), neighbors(v))
• Iterative memory efficient solution (next
slide)

Algorithm:
• Let pivots be a subset of the vertices;
• Load all neighbor-lists (adjacency lists)
of pivots into RAM
• Use now GraphChi to load all vertices
from disk, one by one, and compare
their adjacency lists to the pivots’
adjacency lists (similar to merge).
• Repeat with a new subset of pivots.
PIVOTS

Triangle Counting Performance
Triangle Counting

twitter-2010 (1.5B edges)
GraphChi (Mac
Mini)
Hadoop (1636
machines)

0

50

100

150

200

250
Minutes

300

350

400

450

FUTURE DIRECTIONS & FINAL
REMARKS

Single-Machine Computing in
Production?
• GraphChi supports incremental
computation with dynamic graphs:
– Can keep on running indefinitely, adding new
edges to the graph  Constantly fresh model.
– However, requires engineering – not included
in the toolkit.

• Compare to a cluster-based system (such
as Hadoop) that needs to compute from
scratch.

Unified Recsys Platform for GraphChi?
• Working with masters students at CMU.
• Goal: ability to easily compare different
algorithms, parameters
– Unified input, output.
– General programmable API (not just file-based)
– Evaluation process: Several evaluation metrics;
Cross-validation, held-out data…
– Run many algorithm instances in parallel, on
same graph.
– Java.

• Scalable from the get-go.

DataDescriptor
data deﬁnition
column1 : categorical
column2: real
column3: key
column4: categorical

Input data

Algorithm X: Input
Algorithm Input Descriptor
map(input: DataDescriptor)

GraphChi
Preprocessor

aux
data

GraphChi Input

aux
data

Disk

GraphChi Input

Algorithm X Training
Program

Held-out
data (test
data)

Algorithm Y Training
Program

Algorithm X Predictor
training
metrics

test quality
metrics

Algorithm Z Training
Program

Recent developments: Disk-based Graph
Computation
• Recently two disk-based graph computation
systems published:
– TurboGraph (KDD’13)
– X-Stream (SOSP’13 in October)

• Significantly better performance than
GraphChi on many problems
– Avoid preprocessing (“sharding”)
– But GraphChi can do some computation that XStream cannot (triangle counting and related);
TurboGraph requires SSD
– Hot research area!

Do you need GraphChi – or any system?
• Heck, for many algorithms, you can just
mmap() over your (binary) adjacency list /
sparse matrix, and write a for-loop.
– See Lin, Chau, Kang Leveraging Memory Mapping for Fast and
Scalable Graph Computation on a PC (Big Data ’13)

• Obviously good to have a common API
– And some algos need more advanced
solutions (like GraphChi, XStream, TurboGraph)
Beware of the hype!

Conclusion
• Very large recommender algorithms can now
be run on just your PC or laptop.
– Additional performance from multi-core
parallelism.
– Great for productivity – scale by replicating.

• In general, good single machine scalability
requires care with data structures, memory
management  natural with C/C++, with
Java (etc.) need low-level byte massaging.
– Frameworks like GraphChi hide the low-level.

• More work needed to ‘’productize’’ current
work.

Thank you!

Aapo Kyrölä
Ph.D. candidate @ CMU – soon to
graduate! (Currently visiting U.W)
http://www.cs.cmu.edu/~akyrola
Twitter: @kyrpov

Large-scale Recommendation Systems on Just a PC

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Large-scale Recommendation Systems on Just a PC

Similar a Large-scale Recommendation Systems on Just a PC (20)

Último

Último (20)

Large-scale Recommendation Systems on Just a PC

Notas del editor