This talk will give a good overview over the complex architecture of the Pregel framework and will give some insights where there are potential bottlenecks when writing a Pregel algorithm.
3. Michael Hackstein
ArangoDB Core Team
Web Frontend
Graph visualisation
Graph features
Host of cologne.js
Master’s Degree
(spec. Databases and
Information Systems)
1
4. Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
) Touch all vertices and their neighbourhoods
2
5. Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
) Touch all vertices and their neighbourhoods
Traversals
De1ne a speci1c start point
Iteratively explore the graph
) History of steps is known
2
6. Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
) Touch all vertices and their neighbourhoods
Traversals
De1ne a speci1c start point
Iteratively explore the graph
) History of steps is known
Global measurements
Compute one value for the graph, based on all it’s vertices
or edges
Compute one value for each vertex or edge
) Often require a global view on the graph
2
7. Pregel
A framework to query distributed, directed graphs.
Known as “Map-Reduce” for graphs
Uses same phases
Has several iterations
Aims at:
Operate all servers at full capacity
Reduce network traZc
Good at calculations touching all vertices
Bad at calculations touching a very small number of vertices
3
23. Worker ^= Map
“Map” a user-de1ned algorithm over all vertices
Output: set of messages to other vertices
Available parameters:
The current vertex and his outbound edges
All incoming messages
Global values
Allow modi1cations on the vertex:
Attach a result to this vertex and his outgoing edges
Delete the vertex and his outgoing edges
Deactivate the vertex
6
24. Combine ^= Reduce
“Reduce” all generated messages
Output: An aggregated message for each vertex.
Executed on sender as well as receiver.
Available parameters:
One new message for a vertex
The stored aggregate for this vertex
Typical combiners are SUM, MIN or MAX
Reduces network traZc
7
25. Activity ^= Termination
Execute several rounds of Map/Reduce
Count active vertices and messages
Start next round if one of the following is true:
At least one vertex is active
At least one message is sent
Terminate if neither a vertex is active nor messages were sent
Store all non-deleted vertices and edges as resulting graph
8
26. Pregel at ArangoDB
Started as a side project in free hack time
Experimental on operational database
Implemented as an alternative to traversals
Make use of the 2exibility of JavaScript:
No strict type system
No pre-compilation, on-the-2y queries
Native JSON documents
Really fast development
9
28. Pagerank for TinkerPop3
11
1 public class PageRankVertexProgram implements VertexProgram <
Double > {
2 private MessageType . Local messageType = MessageType . Local .of
(() -> GraphTraversal .< Vertex >of (). outE ());
3 public static final String PAGE_RANK = Graph .Key . hide (" gremlin
. pageRank ");
4 public static final String EDGE_COUNT = Graph .Key . hide ("
gremlin . edgeCount ");
5 private static final String VERTEX_COUNT = " gremlin .
pageRankVertexProgram . vertexCount ";
6 private static final String ALPHA = " gremlin .
pageRankVertexProgram . alpha ";
7 private static final String TOTAL_ITERATIONS = " gremlin .
pageRankVertexProgram . totalIterations ";
8 private static final String INCIDENT_TRAVERSAL = " gremlin .
pageRankVertexProgram . incidentTraversal ";
9 private double vertexCountAsDouble = 1;
10 private double alpha = 0.85 d;
11 private int totalIterations = 30;
12 private static final Set <String > COMPUTE_KEYS = new HashSet <>(
Arrays . asList ( PAGE_RANK , EDGE_COUNT ));
13
14 private PageRankVertexProgram () {}
15
16 @Override
17 public void loadState ( final Configuration configuration ) {
18 this . vertexCountAsDouble = configuration . getDouble (
VERTEX_COUNT , 1.0 d);
19 this . alpha = configuration . getDouble (ALPHA , 0.85 d);
20 this . totalIterations = configuration . getInt (
TOTAL_ITERATIONS , 30);
21 try {
22 if ( configuration . containsKey ( INCIDENT_TRAVERSAL )) {
23 final SSupplier < Traversal > traversalSupplier =
VertexProgramHelper . deserialize ( configuration ,
INCIDENT_TRAVERSAL );
24 VertexProgramHelper . verifyReversibility (
traversalSupplier .get ());
25 this . messageType = MessageType . Local .of (( SSupplier )
traversalSupplier );
26 }
27 } catch ( final Exception e) {
28 throw new IllegalStateException (e. getMessage () , e);
29 }
30 }
32 @Override
33 public void storeState ( final Configuration configuration ) {
34 configuration . setProperty ( GraphComputer . VERTEX_PROGRAM ,
PageRankVertexProgram . class . getName ());
35 configuration . setProperty ( VERTEX_COUNT , this .
vertexCountAsDouble );
36 configuration . setProperty (ALPHA , this . alpha );
37 configuration . setProperty ( TOTAL_ITERATIONS , this .
totalIterations );
38 try {
39 VertexProgramHelper . serialize ( this . messageType .
getIncidentTraversal () , configuration ,
INCIDENT_TRAVERSAL );
40 } catch ( final Exception e) {
41 throw new IllegalStateException (e. getMessage () , e);
42 }
43 }
44
45 @Override
46 public Set <String > getElementComputeKeys () {
47 return COMPUTE_KEYS ;
48 }
49
50 @Override
51 public void setup ( final Memory memory ) {
52
53 }
54
55 @Override
56 public void execute ( final Vertex vertex , Messenger <Double >
messenger , final Memory memory ) {
57 if ( memory . isInitialIteration ()) {
58 double initialPageRank = 1.0d / this . vertexCountAsDouble
;
59 double edgeCount = Double . valueOf (( Long ) this .
messageType . edges ( vertex ). count (). next ());
60 vertex . singleProperty ( PAGE_RANK , initialPageRank );
61 vertex . singleProperty ( EDGE_COUNT , edgeCount );
62 messenger . sendMessage ( this . messageType , initialPageRank
/ edgeCount );
63 } else {
64 double newPageRank = StreamFactory . stream ( messenger .
receiveMessages ( this . messageType )). reduce (0.0d, (a,
b) -> a + b);
65 newPageRank = ( this . alpha * newPageRank ) + ((1.0 d - this
. alpha ) / this . vertexCountAsDouble );
66 vertex . singleProperty ( PAGE_RANK , newPageRank );
67 messenger . sendMessage ( this . messageType , newPageRank /
vertex .<Double > property ( EDGE_COUNT ). orElse (0.0 d));
68 }
69 }
70
71 @Override
72 public boolean terminate ( final Memory memory ) {
73 return memory . getIteration () >= this . totalIterations ;
74 }
75 }
29. Pagerank for ArangoDB
1 var pageRank = function (vertex , message , global ) {
2 var total , rank , edgeCount , send , edge , alpha , sum ;
3 total = global . vertexCount ;
4 edgeCount = vertex . _outEdges . length ;
5 alpha = global . alpha ;
6 sum = 0;
7 if ( global . step > 0) {
8 while ( message . hasNext ()) {
9 sum += message . next (). data ;
10 }
11 rank = alpha * sum + (1- alpha ) / total ;
12 } else {
13 rank = 1 / total ;
14 }
15 vertex . _setResult ( rank );
16 if ( global . step < global . MAX_STEPS ) {
17 send = rank / edgeCount ;
18 while ( vertex . _outEdges . hasNext ()) {
19 edge = vertex . _outEdges . next ();
20 message . sendTo ( edge . _getTarget () , send );
21 }
22 } else {
23 vertex . _deactivate ();
24 }
25 };
26
27 var combiner = function ( message , oldMessage ) {
28 return message + oldMessage ;
29 };
30
31 var Runner = require (" org/ arangodb / pregelRunner "). Runner ;
32 var runner = new Runner ();
33 runner . setWorker ( pageRank );
34 runner . setCombiner ( combiner );
35 runner . start (" myGraph ");
12
30. Thank you
Further Questions?
Follow me on twitter/github: @mchacki
Write me a mail: mchacki@arangodb.com
Follow @arangodb on Twitter
Join our google group:
https://groups.google.com/forum/#!forum/arangodb
Visit our blog https://www.arangodb.com/blog
Slides available at https://www.slideshare.net/arangodb
13