Alexey Zinoviev presented this paper on the Highload++ conference http://www.highload.ru/2014/abstracts/1516.html
This paper covers next topics: Pregel, Graph Theory, Giraph, Okapi, GraphX, GraphChi, Spark, Shrotest Path Problem, Road Network, Road Graph
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
1. Thorny path to the
Large-Scale Graph
Processing
Zinoviev Alexey
2. About
• I am a <graph theory, machine learning, traffic jams prediction, BigData
algorythms> scientist
• But I'm a <Java, JavaScript, Android, NoSQL, Hadoop, Spark>
programmer
4. Big Data of old times
• Astronomy
• Weather
• Trading
• Sea routes
• Battles
5. And now ...
• Web graph
• Facebook friend network
• Gmail email graph
• EU road network
• Citation graph
• PayPal transaction graph
6. Graph Number of
vertexes
Number of
edges
Volume Data/per day
Web-graph 1,5 * 10^12 1,2 * 10^13 100 PB 300 TB
Facebook
1,1 * 10^9 160 * 10^9 1 PB 15 TB
(friends
graph)
Road graph
of EU
18 * 10^6 42 * 10^6 20 GB 50 MB
Road graph
of this city
250 000 460 000 500 MB 100 KB
7. Problems
• Popularity rank (page rank)
• Determining popular users, news, jobs, etc.
• Shortest paths
• Max flow
• How are users, groups connected?
• Clustering, semi-clustering
• Max clique, triangle closure, label propagation algorithms
• Finding related people, groups, interests
8.
9.
10.
11.
12. Node Centrality Problem
• Verticies with high impact
• Removal of important vertices reduces the reliability
Cases:
• Bioinformatics
• Social connections
• Road network
• Spam detection
• Recommendation system
13.
14. Small World Problem
Facebook 4.74 712 M 69 G
Twitter 3.67 ---- 5G follows
MSN Messenger
(1 month)
6.6 180 M 1.3 G arcs
16. Think like a vertex…
• Majority of graph algorithms are iterative and traverse the graph in
some way
• Classic map-reduce overheads (job startup/shutdown, reloading data
from HDFS, shuffling)
• High complexity of graph problem reduction to key-value model
• Iteration algorythms, but multiple chained jobs in M/R with full saving
and reading of each state
17. Why not use MapReduce/Hadoop?
• Example: PageRank, Google‘s
famous algorithm for measuring the
authority of a webpage based on the
underlying network of hyperlinks
• defined recursively: each vertex
distributes its authority to its neighbors
in equal proportions
18. Google Pregel
• Distributed system especially developed for large scale graph
processing
• Bulk Synchronous Parallel (BSP) as execution model
• Supersteps are atomic units of parallel computation
• Any superstep can be restarted from a checkpoint (need not be user
defined)
• A new superstep provides an opportunity for rebalancing of
components among available resources
20. Vertex-centric BSP
• Each vertex has an id, a value, a list of its adjacent vertex ids and the
corresponding edge values
• Each vertex is invoked in each superstep, can recompute its value and
send messages to other vertices, which are delivered over superstep
barriers
• Advanced features : termination votes, combiners, aggregators,
topology mutations
24. Why Apache Giraph
Pregel is proprietary, but:
• Apache Giraph is an open source implementation of Pregel
• Runs on standard Hadoop infrastructure
• Computation is executed in memory
• Can be a job in a pipeline(MapReduce, Hive)
• Uses Apache ZooKeeperfor synchronization
25.
26. Why Apache Giraph
• No locks: message-based communication
• No semaphores: global synchronization
• Iteration isolation: massively parallelizable
27.
28.
29. ZooKeeper in Apache Giraph
ZooKeeper: responsible for
computation state
• Partition/worker mapping
• Global state: superstep
• Checkpoint paths, aggregator
values, statistics
30. Master in Apache Giraph
Master: responsible for coordination
• Assigns partitions to workers
• Coordinates synchronization
• Requests checkpoints
• Aggregates aggregator values
• Collects health statuses
31. Worker in Apache Giraph
Worker: responsible for vertices
• Invokes active vertices
compute() function
• Sends, receives and assigns
messages
• Computes local aggregation
values
33. Fault tolerance
No single point of failure from Giraph threads
• With multiple master threads, if the current master dies, a new
one will automatically take over.
• If a worker thread dies, the application is rolled back to a
previously checkpointed superstep.
• If a zookeeper server dies, as long as a quorum remains, the
application can proceed
Hadoop single points of failure still exist (Namenode, jobtracker)
37. MapReduce vs Giraph
6 machines with 2x8core Opteron CPUs, 4x1TB disks and 32GB RAM each, ran 1
Giraph worker per core
Wikipedia page link graph (6 million vertices, 200 million edges)
PageRank on Hadoop/Mahout
• 10 iterations approx. 29 minutes
• average time per iteration: approx. 3 minutes
PageRank on Giraph
• 30 iterations took approx. 15 minutes
• average time per iteration: approx. 30 seconds
10x performance improvement
38. Okapi
• Apache Mahout for graphs
• Graph-based recommenders: ALS,
SGD, SVD++, etc.
• Graph analytics: Graph
partitioning, Community Detection,
K-Core, etc.
40. Spark
• MapReduce in memory
• Up to 50x faster than Hadoop
• Support for Shark (like Hive), MLlib
(Machine learning), GraphX (graph
processing)
• RDD is a basic building block
(immutable distributed collections of
objects)
43. GraphChi
• Asynchronous Disk-based version of GraphLab
• Utilizing parallel sliding window
• Very small number of non-sequential accessesto the disk
• Graph does not fit in memory
• Input graph is split into P disjoint intervals to balance edges,
each associated with a shard
• For Home deals ...
47. Definition
• Edge weights > 0
• A few classes of roads
• Lat/Lon attributes for each vertex
• Subgraphs for cross-roads
• Not so big as web graph
• Static
53. We need in fast system!
• Response < 10 ms (with high accuracy)
• Shortest path (SP) with O(n)
• Preprocessing phase
• Don’t keep all SP - O(n^2)
• Use geo attributes
• Using compression and recoding for
disk storage
• Network is stable
54. EU Road network
Dijkstra ALT RE HH CH TN HL
2 008 300 24 656 2444 462.0 94.0 1.8 0.3
• ALT: [Goldberg & Harrelson 05], [Delling & Wagner 07]
• RE: [Gutman 05], [Goldberg et al. 07]
• HH: [Sanders & Schultes 06]
• CH: [Geisberger et al. 08]
• TN: [Geisberger et al. 08]
• HL: [Abraham et al. 11]
57. Transit nodes (TN)
• Divide graph G on subgraphs G_i
• Find R (subset of G_i) for each G_i
• All sortest path in G_i across R
• Build pairs (v_i, r_k) for each v_i where
r_k is closest Transit Node
• Calculate shortest paths between transit
nodes in R
• Save it!
60. Optimization problems
• Unstable graph
• Prerpocessing phase is meaningless
• How to invest 1B $ in road network to minimize human time in
traffic jams
• How to invest 1M $ in road network to improve reliability before
the flooding
61. Last steps ...
• I/O Efficient Algorythms and Data Structures
• Graphs and Memory Errors