SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
PROCESSING LARGE-SCALE GRAPHS WITH GOOGLE(TM) PREGEL 
MICHAEL HACKSTEIN 
FRONT END AND GRAPH SPECIALIST ARANGODB
Processing large-scale graphs 
with GoogleTMPregel 
November 17th 
Michael Hackstein 
@mchacki 
www.arangodb.com
Michael Hackstein 
ArangoDB Core Team 
Web Frontend 
Graph visualisation 
Graph features 
Host of cologne.js 
Master’s Degree 
(spec. Databases and 
Information Systems) 
1
Graph Algorithms 
Pattern matching 
Search through the entire graph 
Identify similar components 
) Touch all vertices and their neighbourhoods 
2
Graph Algorithms 
Pattern matching 
Search through the entire graph 
Identify similar components 
) Touch all vertices and their neighbourhoods 
Traversals 
De1ne a speci1c start point 
Iteratively explore the graph 
) History of steps is known 
2
Graph Algorithms 
Pattern matching 
Search through the entire graph 
Identify similar components 
) Touch all vertices and their neighbourhoods 
Traversals 
De1ne a speci1c start point 
Iteratively explore the graph 
) History of steps is known 
Global measurements 
Compute one value for the graph, based on all it’s vertices 
or edges 
Compute one value for each vertex or edge 
) Often require a global view on the graph 
2
Pregel 
A framework to query distributed, directed graphs. 
Known as “Map-Reduce” for graphs 
Uses same phases 
Has several iterations 
Aims at: 
Operate all servers at full capacity 
Reduce network traZc 
Good at calculations touching all vertices 
Bad at calculations touching a very small number of vertices 
3
Example – Connected Components 
1 
1 
2 
2 
5 
7 
7 
5 4 
3 4 
3 
6 
6 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
2 
2 
5 
7 
7 
5 
6 
7 
5 4 
3 4 
3 
6 
6 
4 
2 
3 
4 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
2 
2 
5 
7 
7 
5 
6 
7 
5 4 
3 4 
3 
6 
6 
4 
2 
3 
4 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
2 
2 
5 
6 
7 
5 
6 
5 
5 4 
3 4 
3 
5 
6 
3 
1 
2 
2 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
2 
2 
5 
6 
7 
5 
6 
5 
5 4 
3 4 
3 
5 
6 
3 
1 
2 
2 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 2 
2 4 
3 
5 
6 
1 
1 
2 
2 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 2 
2 4 
3 
5 
6 
1 
1 
2 
2 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 1 
1 4 
3 
5 
6 
1 
1 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 1 
1 4 
3 
5 
6 
1 
1 
active inactive 
3 forward message 2 backward message 
4
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 1 
1 4 
3 
5 
6 
active inactive 
3 forward message 2 backward message 
4
Pregel – Sequence 
5
Pregel – Sequence 
5
Pregel – Sequence 
5
Pregel – Sequence 
5
Pregel – Sequence 
5
Worker ^= Map 
“Map” a user-de1ned algorithm over all vertices 
Output: set of messages to other vertices 
Available parameters: 
The current vertex and his outbound edges 
All incoming messages 
Global values 
Allow modi1cations on the vertex: 
Attach a result to this vertex and his outgoing edges 
Delete the vertex and his outgoing edges 
Deactivate the vertex 
6
Combine ^= Reduce 
“Reduce” all generated messages 
Output: An aggregated message for each vertex. 
Executed on sender as well as receiver. 
Available parameters: 
One new message for a vertex 
The stored aggregate for this vertex 
Typical combiners are SUM, MIN or MAX 
Reduces network traZc 
7
Activity ^= Termination 
Execute several rounds of Map/Reduce 
Count active vertices and messages 
Start next round if one of the following is true: 
At least one vertex is active 
At least one message is sent 
Terminate if neither a vertex is active nor messages were sent 
Store all non-deleted vertices and edges as resulting graph 
8
Pregel at ArangoDB 
Started as a side project in free hack time 
Experimental on operational database 
Implemented as an alternative to traversals 
Make use of the 2exibility of JavaScript: 
No strict type system 
No pre-compilation, on-the-2y queries 
Native JSON documents 
Really fast development 
9
Pagerank for Giraph 
10 
1 public class SimplePageRankComputation extends BasicComputation < 
LongWritable , DoubleWritable , FloatWritable , DoubleWritable > 
{ 
2 public static final int MAX_SUPERSTEPS = 30; 
34 
@Override 
5 public void compute ( Vertex < LongWritable , DoubleWritable , 
FloatWritable > vertex , Iterable < DoubleWritable > messages ) 
throws IOException { 
6 if ( getSuperstep () >= 1) { 
7 double sum = 0; 
8 for ( DoubleWritable message : messages ) { 
9 sum += message .get (); 
10 } 
11 DoubleWritable vertexValue = new DoubleWritable ((0.15 f / 
getTotalNumVertices ()) + 0.85 f * sum ); 
12 vertex . setValue ( vertexValue ); 
13 } 
14 if ( getSuperstep () < MAX_SUPERSTEPS ) { 
15 long edges = vertex . getNumEdges (); 
16 sendMessageToAllEdges (vertex , new DoubleWritable ( vertex . 
getValue ().get () / edges )); 
17 } else { 
18 vertex . voteToHalt (); 
19 } 
20 } 
21 
22 public static class SimplePageRankWorkerContext extends 
WorkerContext { 
23 @Override 
24 public void preApplication () throws InstantiationException , 
IllegalAccessException { } 
25 @Override 
26 public void postApplication () { } 
27 @Override 
28 public void preSuperstep () { } 
29 @Override 
30 public void postSuperstep () { } 
31 } 
32 
33 public static class SimplePageRankMasterCompute extends 
DefaultMasterCompute { 
34 @Override 
35 public void initialize () throws InstantiationException , 
IllegalAccessException { 
36 } 
37 } 
38 public static class SimplePageRankVertexReader extends 
GeneratedVertexReader < LongWritable , DoubleWritable , 
FloatWritable > { 
39 @Override 
40 public boolean nextVertex () { 
41 return totalRecords > recordsRead ; 
42 } 
44 @Override 
45 public Vertex < LongWritable , DoubleWritable , FloatWritable > 
getCurrentVertex () throws IOException { 
46 Vertex < LongWritable , DoubleWritable , FloatWritable > vertex 
= getConf (). createVertex (); 
47 LongWritable vertexId = new LongWritable ( 
48 ( inputSplit . getSplitIndex () * totalRecords ) + 
recordsRead ); 
49 DoubleWritable vertexValue = new DoubleWritable ( vertexId . 
get () * 10d); 
50 long targetVertexId = ( vertexId .get () + 1) % ( inputSplit . 
getNumSplits () * totalRecords ); 
51 float edgeValue = vertexId . get () * 100 f; 
52 List <Edge < LongWritable , FloatWritable >> edges = Lists . 
newLinkedList (); 
53 edges .add ( EdgeFactory . create (new LongWritable ( 
targetVertexId ), new FloatWritable ( edgeValue ))); 
54 vertex . initialize ( vertexId , vertexValue , edges ); 
55 ++ recordsRead ; 
56 return vertex ; 
57 } 
58 } 
59 
60 public static class SimplePageRankVertexInputFormat extends 
GeneratedVertexInputFormat < LongWritable , DoubleWritable , 
FloatWritable > { 
61 @Override 
62 public VertexReader < LongWritable , DoubleWritable , 
FloatWritable > createVertexReader ( InputSplit split , 
TaskAttemptContext context ) 
63 throws IOException { 
64 return new SimplePageRankVertexReader (); 
65 } 
66 } 
67 
68 public static class SimplePageRankVertexOutputFormat extends 
TextVertexOutputFormat < LongWritable , DoubleWritable , 
FloatWritable > { 
69 @Override 
70 public TextVertexWriter createVertexWriter ( 
TaskAttemptContext context ) throws IOException , 
InterruptedException { 
71 return new SimplePageRankVertexWriter (); 
72 } 
73 
74 public class SimplePageRankVertexWriter extends 
TextVertexWriter { 
75 @Override 
76 public void writeVertex ( Vertex < LongWritable , 
DoubleWritable , FloatWritable > vertex ) throws 
IOException , InterruptedException { 
77 getRecordWriter (). write ( new Text ( vertex . getId (). 
toString ()), new Text ( vertex . getValue (). toString ())) 
; 
78 } 
79 } 
80 } 
81 }
Pagerank for TinkerPop3 
11 
1 public class PageRankVertexProgram implements VertexProgram < 
Double > { 
2 private MessageType . Local messageType = MessageType . Local .of 
(() -> GraphTraversal .< Vertex >of (). outE ()); 
3 public static final String PAGE_RANK = Graph .Key . hide (" gremlin 
. pageRank "); 
4 public static final String EDGE_COUNT = Graph .Key . hide (" 
gremlin . edgeCount "); 
5 private static final String VERTEX_COUNT = " gremlin . 
pageRankVertexProgram . vertexCount "; 
6 private static final String ALPHA = " gremlin . 
pageRankVertexProgram . alpha "; 
7 private static final String TOTAL_ITERATIONS = " gremlin . 
pageRankVertexProgram . totalIterations "; 
8 private static final String INCIDENT_TRAVERSAL = " gremlin . 
pageRankVertexProgram . incidentTraversal "; 
9 private double vertexCountAsDouble = 1; 
10 private double alpha = 0.85 d; 
11 private int totalIterations = 30; 
12 private static final Set <String > COMPUTE_KEYS = new HashSet <>( 
Arrays . asList ( PAGE_RANK , EDGE_COUNT )); 
13 
14 private PageRankVertexProgram () {} 
15 
16 @Override 
17 public void loadState ( final Configuration configuration ) { 
18 this . vertexCountAsDouble = configuration . getDouble ( 
VERTEX_COUNT , 1.0 d); 
19 this . alpha = configuration . getDouble (ALPHA , 0.85 d); 
20 this . totalIterations = configuration . getInt ( 
TOTAL_ITERATIONS , 30); 
21 try { 
22 if ( configuration . containsKey ( INCIDENT_TRAVERSAL )) { 
23 final SSupplier < Traversal > traversalSupplier = 
VertexProgramHelper . deserialize ( configuration , 
INCIDENT_TRAVERSAL ); 
24 VertexProgramHelper . verifyReversibility ( 
traversalSupplier .get ()); 
25 this . messageType = MessageType . Local .of (( SSupplier ) 
traversalSupplier ); 
26 } 
27 } catch ( final Exception e) { 
28 throw new IllegalStateException (e. getMessage () , e); 
29 } 
30 } 
32 @Override 
33 public void storeState ( final Configuration configuration ) { 
34 configuration . setProperty ( GraphComputer . VERTEX_PROGRAM , 
PageRankVertexProgram . class . getName ()); 
35 configuration . setProperty ( VERTEX_COUNT , this . 
vertexCountAsDouble ); 
36 configuration . setProperty (ALPHA , this . alpha ); 
37 configuration . setProperty ( TOTAL_ITERATIONS , this . 
totalIterations ); 
38 try { 
39 VertexProgramHelper . serialize ( this . messageType . 
getIncidentTraversal () , configuration , 
INCIDENT_TRAVERSAL ); 
40 } catch ( final Exception e) { 
41 throw new IllegalStateException (e. getMessage () , e); 
42 } 
43 } 
44 
45 @Override 
46 public Set <String > getElementComputeKeys () { 
47 return COMPUTE_KEYS ; 
48 } 
49 
50 @Override 
51 public void setup ( final Memory memory ) { 
52 
53 } 
54 
55 @Override 
56 public void execute ( final Vertex vertex , Messenger <Double > 
messenger , final Memory memory ) { 
57 if ( memory . isInitialIteration ()) { 
58 double initialPageRank = 1.0d / this . vertexCountAsDouble 
; 
59 double edgeCount = Double . valueOf (( Long ) this . 
messageType . edges ( vertex ). count (). next ()); 
60 vertex . singleProperty ( PAGE_RANK , initialPageRank ); 
61 vertex . singleProperty ( EDGE_COUNT , edgeCount ); 
62 messenger . sendMessage ( this . messageType , initialPageRank 
/ edgeCount ); 
63 } else { 
64 double newPageRank = StreamFactory . stream ( messenger . 
receiveMessages ( this . messageType )). reduce (0.0d, (a, 
b) -> a + b); 
65 newPageRank = ( this . alpha * newPageRank ) + ((1.0 d - this 
. alpha ) / this . vertexCountAsDouble ); 
66 vertex . singleProperty ( PAGE_RANK , newPageRank ); 
67 messenger . sendMessage ( this . messageType , newPageRank / 
vertex .<Double > property ( EDGE_COUNT ). orElse (0.0 d)); 
68 } 
69 } 
70 
71 @Override 
72 public boolean terminate ( final Memory memory ) { 
73 return memory . getIteration () >= this . totalIterations ; 
74 } 
75 }
Pagerank for ArangoDB 
1 var pageRank = function (vertex , message , global ) { 
2 var total , rank , edgeCount , send , edge , alpha , sum ; 
3 total = global . vertexCount ; 
4 edgeCount = vertex . _outEdges . length ; 
5 alpha = global . alpha ; 
6 sum = 0; 
7 if ( global . step > 0) { 
8 while ( message . hasNext ()) { 
9 sum += message . next (). data ; 
10 } 
11 rank = alpha * sum + (1- alpha ) / total ; 
12 } else { 
13 rank = 1 / total ; 
14 } 
15 vertex . _setResult ( rank ); 
16 if ( global . step < global . MAX_STEPS ) { 
17 send = rank / edgeCount ; 
18 while ( vertex . _outEdges . hasNext ()) { 
19 edge = vertex . _outEdges . next (); 
20 message . sendTo ( edge . _getTarget () , send ); 
21 } 
22 } else { 
23 vertex . _deactivate (); 
24 } 
25 }; 
26 
27 var combiner = function ( message , oldMessage ) { 
28 return message + oldMessage ; 
29 }; 
30 
31 var Runner = require (" org/ arangodb / pregelRunner "). Runner ; 
32 var runner = new Runner (); 
33 runner . setWorker ( pageRank ); 
34 runner . setCombiner ( combiner ); 
35 runner . start (" myGraph "); 
12
Thank you 
Further Questions? 
Follow me on twitter/github: @mchacki 
Write me a mail: mchacki@arangodb.com 
Follow @arangodb on Twitter 
Join our google group: 
https://groups.google.com/forum/#!forum/arangodb 
Visit our blog https://www.arangodb.com/blog 
Slides available at https://www.slideshare.net/arangodb 
13
17TH ~ 18th NOV 2014 
MADRID (SPAIN)

Más contenido relacionado

La actualidad más candente

Reactive programming with RxAndroid
Reactive programming with RxAndroidReactive programming with RxAndroid
Reactive programming with RxAndroidSavvycom Savvycom
 
Reactive programming with RxJava
Reactive programming with RxJavaReactive programming with RxJava
Reactive programming with RxJavaJobaer Chowdhury
 
Reactive Programming in Java and Spring Framework 5
Reactive Programming in Java and Spring Framework 5Reactive Programming in Java and Spring Framework 5
Reactive Programming in Java and Spring Framework 5Richard Langlois P. Eng.
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015Holden Karau
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedDatabricks
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...Spark Summit
 
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s OptimizerDeep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s OptimizerDatabricks
 
Introduction to Retrofit and RxJava
Introduction to Retrofit and RxJavaIntroduction to Retrofit and RxJava
Introduction to Retrofit and RxJavaFabio Collini
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengDatabricks
 
The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016Frank Lyaruu
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applicationsKexin Xie
 
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...zznate
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_i(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_iNico Ludwig
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analyticsSigmoid
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Rajeev Rastogi (KRR)
 

La actualidad más candente (20)

Reactive programming with RxAndroid
Reactive programming with RxAndroidReactive programming with RxAndroid
Reactive programming with RxAndroid
 
Reactive programming with RxJava
Reactive programming with RxJavaReactive programming with RxJava
Reactive programming with RxJava
 
Reactive Programming in Java and Spring Framework 5
Reactive Programming in Java and Spring Framework 5Reactive Programming in Java and Spring Framework 5
Reactive Programming in Java and Spring Framework 5
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think Vectorized
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s OptimizerDeep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0’s Optimizer
 
Introduction to Retrofit and RxJava
Introduction to Retrofit and RxJavaIntroduction to Retrofit and RxJava
Introduction to Retrofit and RxJava
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
 
The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
AWS Java SDK @ scale
AWS Java SDK @ scaleAWS Java SDK @ scale
AWS Java SDK @ scale
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
 
Forgive me for i have allocated
Forgive me for i have allocatedForgive me for i have allocated
Forgive me for i have allocated
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_i(2) c sharp introduction_basics_part_i
(2) c sharp introduction_basics_part_i
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2
 
Reactive Java (33rd Degree)
Reactive Java (33rd Degree)Reactive Java (33rd Degree)
Reactive Java (33rd Degree)
 

Destacado

How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...Big Data Spain
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQLArangoDB Database
 
Extensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software ArchitectureExtensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software ArchitectureMax Neunhöffer
 
Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBArangoDB Database
 
Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...Big Data Spain
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Location analytics by Marc Planaguma at Big Data Spain 2014
 Location analytics by Marc Planaguma at Big Data Spain 2014 Location analytics by Marc Planaguma at Big Data Spain 2014
Location analytics by Marc Planaguma at Big Data Spain 2014Big Data Spain
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...Big Data Spain
 
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014Big Data Spain
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012Big Data Spain
 
Intro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceIntro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceBig Data Spain
 
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Big Data Spain
 
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ... Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...Big Data Spain
 
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...Big Data Spain
 
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...Big Data Spain
 
Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0Big Data Spain
 
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...Big Data Spain
 
A new streaming computation engine for real-time analytics by Michael Barton ...
A new streaming computation engine for real-time analytics by Michael Barton ...A new streaming computation engine for real-time analytics by Michael Barton ...
A new streaming computation engine for real-time analytics by Michael Barton ...Big Data Spain
 
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...Big Data Spain
 

Destacado (20)

How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQL
 
Extensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software ArchitectureExtensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software Architecture
 
ArangoDB
ArangoDBArangoDB
ArangoDB
 
Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDB
 
Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...Big Data the potential for data to improve service and business management by...
Big Data the potential for data to improve service and business management by...
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Location analytics by Marc Planaguma at Big Data Spain 2014
 Location analytics by Marc Planaguma at Big Data Spain 2014 Location analytics by Marc Planaguma at Big Data Spain 2014
Location analytics by Marc Planaguma at Big Data Spain 2014
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014 Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
Data warehouse modernization programme by TOBY WOOLFE at Big Data Spain 2014
 
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012
 
Intro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conferenceIntro to the Big Data Spain 2014 conference
Intro to the Big Data Spain 2014 conference
 
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...Getting the best insights from your data using Apache Metamodel by Alberto Ro...
Getting the best insights from your data using Apache Metamodel by Alberto Ro...
 
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ... Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
Dataflows: The abstraction that powers Big Data by Raul Castro Fernandez at ...
 
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
ToroDB: Scaling PostgreSQL like MongoDB by Álvaro Hernández at Big Data Spain...
 
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
 
Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0Convergent Replicated Data Types in Riak 2.0
Convergent Replicated Data Types in Riak 2.0
 
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
How to integrate Big Data onto an analytical portal, Big Data benchmarking fo...
 
A new streaming computation engine for real-time analytics by Michael Barton ...
A new streaming computation engine for real-time analytics by Michael Barton ...A new streaming computation engine for real-time analytics by Michael Barton ...
A new streaming computation engine for real-time analytics by Michael Barton ...
 
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
IAd-learning: A new e-learning platform by José Antonio Omedes at Big Data Sp...
 

Similar a Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at Big Data Spain 2014

Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...NoSQLmatters
 
Reactive programming on Android
Reactive programming on AndroidReactive programming on Android
Reactive programming on AndroidTomáš Kypta
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph ProcessingVasia Kalavri
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every dayVadym Khondar
 
Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Alexey Fyodorov
 
Hazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSHazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSuzquiano
 
Jdk 7 4-forkjoin
Jdk 7 4-forkjoinJdk 7 4-forkjoin
Jdk 7 4-forkjoinknight1128
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Vasia Kalavri
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Anyscale
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойSigma Software
 
What is new in java 8 concurrency
What is new in java 8 concurrencyWhat is new in java 8 concurrency
What is new in java 8 concurrencykshanth2101
 
Vert.x - Reactive & Distributed [Devoxx version]
Vert.x - Reactive & Distributed [Devoxx version]Vert.x - Reactive & Distributed [Devoxx version]
Vert.x - Reactive & Distributed [Devoxx version]Orkhan Gasimov
 
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache BeamGDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache BeamImre Nagi
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerJAX London
 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Michael Barker
 
The Mayans Lost Guide to RxJava on Android
The Mayans Lost Guide to RxJava on AndroidThe Mayans Lost Guide to RxJava on Android
The Mayans Lost Guide to RxJava on AndroidFernando Cejas
 

Similar a Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at Big Data Spain 2014 (20)

Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
 
Reactive programming on Android
Reactive programming on AndroidReactive programming on Android
Reactive programming on Android
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every day
 
Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
JS everywhere 2011
JS everywhere 2011JS everywhere 2011
JS everywhere 2011
 
Hazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSHazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMS
 
Rx workshop
Rx workshopRx workshop
Rx workshop
 
Jdk 7 4-forkjoin
Jdk 7 4-forkjoinJdk 7 4-forkjoin
Jdk 7 4-forkjoin
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай Мозговой
 
What is new in java 8 concurrency
What is new in java 8 concurrencyWhat is new in java 8 concurrency
What is new in java 8 concurrency
 
Vert.x - Reactive & Distributed [Devoxx version]
Vert.x - Reactive & Distributed [Devoxx version]Vert.x - Reactive & Distributed [Devoxx version]
Vert.x - Reactive & Distributed [Devoxx version]
 
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache BeamGDG Jakarta Meetup - Streaming Analytics With Apache Beam
GDG Jakarta Meetup - Streaming Analytics With Apache Beam
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Locks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael BarkerLocks? We Don't Need No Stinkin' Locks - Michael Barker
Locks? We Don't Need No Stinkin' Locks - Michael Barker
 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!
 
The Mayans Lost Guide to RxJava on Android
The Mayans Lost Guide to RxJava on AndroidThe Mayans Lost Guide to RxJava on Android
The Mayans Lost Guide to RxJava on Android
 

Más de Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
 

Más de Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Último

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Último (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at Big Data Spain 2014

  • 1. PROCESSING LARGE-SCALE GRAPHS WITH GOOGLE(TM) PREGEL MICHAEL HACKSTEIN FRONT END AND GRAPH SPECIALIST ARANGODB
  • 2. Processing large-scale graphs with GoogleTMPregel November 17th Michael Hackstein @mchacki www.arangodb.com
  • 3. Michael Hackstein ArangoDB Core Team Web Frontend Graph visualisation Graph features Host of cologne.js Master’s Degree (spec. Databases and Information Systems) 1
  • 4. Graph Algorithms Pattern matching Search through the entire graph Identify similar components ) Touch all vertices and their neighbourhoods 2
  • 5. Graph Algorithms Pattern matching Search through the entire graph Identify similar components ) Touch all vertices and their neighbourhoods Traversals De1ne a speci1c start point Iteratively explore the graph ) History of steps is known 2
  • 6. Graph Algorithms Pattern matching Search through the entire graph Identify similar components ) Touch all vertices and their neighbourhoods Traversals De1ne a speci1c start point Iteratively explore the graph ) History of steps is known Global measurements Compute one value for the graph, based on all it’s vertices or edges Compute one value for each vertex or edge ) Often require a global view on the graph 2
  • 7. Pregel A framework to query distributed, directed graphs. Known as “Map-Reduce” for graphs Uses same phases Has several iterations Aims at: Operate all servers at full capacity Reduce network traZc Good at calculations touching all vertices Bad at calculations touching a very small number of vertices 3
  • 8. Example – Connected Components 1 1 2 2 5 7 7 5 4 3 4 3 6 6 active inactive 3 forward message 2 backward message 4
  • 9. Example – Connected Components 1 1 2 2 5 7 7 5 6 7 5 4 3 4 3 6 6 4 2 3 4 active inactive 3 forward message 2 backward message 4
  • 10. Example – Connected Components 1 1 2 2 5 7 7 5 6 7 5 4 3 4 3 6 6 4 2 3 4 active inactive 3 forward message 2 backward message 4
  • 11. Example – Connected Components 1 1 2 2 5 6 7 5 6 5 5 4 3 4 3 5 6 3 1 2 2 active inactive 3 forward message 2 backward message 4
  • 12. Example – Connected Components 1 1 2 2 5 6 7 5 6 5 5 4 3 4 3 5 6 3 1 2 2 active inactive 3 forward message 2 backward message 4
  • 13. Example – Connected Components 1 1 1 2 5 5 7 5 2 2 4 3 5 6 1 1 2 2 active inactive 3 forward message 2 backward message 4
  • 14. Example – Connected Components 1 1 1 2 5 5 7 5 2 2 4 3 5 6 1 1 2 2 active inactive 3 forward message 2 backward message 4
  • 15. Example – Connected Components 1 1 1 2 5 5 7 5 1 1 4 3 5 6 1 1 active inactive 3 forward message 2 backward message 4
  • 16. Example – Connected Components 1 1 1 2 5 5 7 5 1 1 4 3 5 6 1 1 active inactive 3 forward message 2 backward message 4
  • 17. Example – Connected Components 1 1 1 2 5 5 7 5 1 1 4 3 5 6 active inactive 3 forward message 2 backward message 4
  • 23. Worker ^= Map “Map” a user-de1ned algorithm over all vertices Output: set of messages to other vertices Available parameters: The current vertex and his outbound edges All incoming messages Global values Allow modi1cations on the vertex: Attach a result to this vertex and his outgoing edges Delete the vertex and his outgoing edges Deactivate the vertex 6
  • 24. Combine ^= Reduce “Reduce” all generated messages Output: An aggregated message for each vertex. Executed on sender as well as receiver. Available parameters: One new message for a vertex The stored aggregate for this vertex Typical combiners are SUM, MIN or MAX Reduces network traZc 7
  • 25. Activity ^= Termination Execute several rounds of Map/Reduce Count active vertices and messages Start next round if one of the following is true: At least one vertex is active At least one message is sent Terminate if neither a vertex is active nor messages were sent Store all non-deleted vertices and edges as resulting graph 8
  • 26. Pregel at ArangoDB Started as a side project in free hack time Experimental on operational database Implemented as an alternative to traversals Make use of the 2exibility of JavaScript: No strict type system No pre-compilation, on-the-2y queries Native JSON documents Really fast development 9
  • 27. Pagerank for Giraph 10 1 public class SimplePageRankComputation extends BasicComputation < LongWritable , DoubleWritable , FloatWritable , DoubleWritable > { 2 public static final int MAX_SUPERSTEPS = 30; 34 @Override 5 public void compute ( Vertex < LongWritable , DoubleWritable , FloatWritable > vertex , Iterable < DoubleWritable > messages ) throws IOException { 6 if ( getSuperstep () >= 1) { 7 double sum = 0; 8 for ( DoubleWritable message : messages ) { 9 sum += message .get (); 10 } 11 DoubleWritable vertexValue = new DoubleWritable ((0.15 f / getTotalNumVertices ()) + 0.85 f * sum ); 12 vertex . setValue ( vertexValue ); 13 } 14 if ( getSuperstep () < MAX_SUPERSTEPS ) { 15 long edges = vertex . getNumEdges (); 16 sendMessageToAllEdges (vertex , new DoubleWritable ( vertex . getValue ().get () / edges )); 17 } else { 18 vertex . voteToHalt (); 19 } 20 } 21 22 public static class SimplePageRankWorkerContext extends WorkerContext { 23 @Override 24 public void preApplication () throws InstantiationException , IllegalAccessException { } 25 @Override 26 public void postApplication () { } 27 @Override 28 public void preSuperstep () { } 29 @Override 30 public void postSuperstep () { } 31 } 32 33 public static class SimplePageRankMasterCompute extends DefaultMasterCompute { 34 @Override 35 public void initialize () throws InstantiationException , IllegalAccessException { 36 } 37 } 38 public static class SimplePageRankVertexReader extends GeneratedVertexReader < LongWritable , DoubleWritable , FloatWritable > { 39 @Override 40 public boolean nextVertex () { 41 return totalRecords > recordsRead ; 42 } 44 @Override 45 public Vertex < LongWritable , DoubleWritable , FloatWritable > getCurrentVertex () throws IOException { 46 Vertex < LongWritable , DoubleWritable , FloatWritable > vertex = getConf (). createVertex (); 47 LongWritable vertexId = new LongWritable ( 48 ( inputSplit . getSplitIndex () * totalRecords ) + recordsRead ); 49 DoubleWritable vertexValue = new DoubleWritable ( vertexId . get () * 10d); 50 long targetVertexId = ( vertexId .get () + 1) % ( inputSplit . getNumSplits () * totalRecords ); 51 float edgeValue = vertexId . get () * 100 f; 52 List <Edge < LongWritable , FloatWritable >> edges = Lists . newLinkedList (); 53 edges .add ( EdgeFactory . create (new LongWritable ( targetVertexId ), new FloatWritable ( edgeValue ))); 54 vertex . initialize ( vertexId , vertexValue , edges ); 55 ++ recordsRead ; 56 return vertex ; 57 } 58 } 59 60 public static class SimplePageRankVertexInputFormat extends GeneratedVertexInputFormat < LongWritable , DoubleWritable , FloatWritable > { 61 @Override 62 public VertexReader < LongWritable , DoubleWritable , FloatWritable > createVertexReader ( InputSplit split , TaskAttemptContext context ) 63 throws IOException { 64 return new SimplePageRankVertexReader (); 65 } 66 } 67 68 public static class SimplePageRankVertexOutputFormat extends TextVertexOutputFormat < LongWritable , DoubleWritable , FloatWritable > { 69 @Override 70 public TextVertexWriter createVertexWriter ( TaskAttemptContext context ) throws IOException , InterruptedException { 71 return new SimplePageRankVertexWriter (); 72 } 73 74 public class SimplePageRankVertexWriter extends TextVertexWriter { 75 @Override 76 public void writeVertex ( Vertex < LongWritable , DoubleWritable , FloatWritable > vertex ) throws IOException , InterruptedException { 77 getRecordWriter (). write ( new Text ( vertex . getId (). toString ()), new Text ( vertex . getValue (). toString ())) ; 78 } 79 } 80 } 81 }
  • 28. Pagerank for TinkerPop3 11 1 public class PageRankVertexProgram implements VertexProgram < Double > { 2 private MessageType . Local messageType = MessageType . Local .of (() -> GraphTraversal .< Vertex >of (). outE ()); 3 public static final String PAGE_RANK = Graph .Key . hide (" gremlin . pageRank "); 4 public static final String EDGE_COUNT = Graph .Key . hide (" gremlin . edgeCount "); 5 private static final String VERTEX_COUNT = " gremlin . pageRankVertexProgram . vertexCount "; 6 private static final String ALPHA = " gremlin . pageRankVertexProgram . alpha "; 7 private static final String TOTAL_ITERATIONS = " gremlin . pageRankVertexProgram . totalIterations "; 8 private static final String INCIDENT_TRAVERSAL = " gremlin . pageRankVertexProgram . incidentTraversal "; 9 private double vertexCountAsDouble = 1; 10 private double alpha = 0.85 d; 11 private int totalIterations = 30; 12 private static final Set <String > COMPUTE_KEYS = new HashSet <>( Arrays . asList ( PAGE_RANK , EDGE_COUNT )); 13 14 private PageRankVertexProgram () {} 15 16 @Override 17 public void loadState ( final Configuration configuration ) { 18 this . vertexCountAsDouble = configuration . getDouble ( VERTEX_COUNT , 1.0 d); 19 this . alpha = configuration . getDouble (ALPHA , 0.85 d); 20 this . totalIterations = configuration . getInt ( TOTAL_ITERATIONS , 30); 21 try { 22 if ( configuration . containsKey ( INCIDENT_TRAVERSAL )) { 23 final SSupplier < Traversal > traversalSupplier = VertexProgramHelper . deserialize ( configuration , INCIDENT_TRAVERSAL ); 24 VertexProgramHelper . verifyReversibility ( traversalSupplier .get ()); 25 this . messageType = MessageType . Local .of (( SSupplier ) traversalSupplier ); 26 } 27 } catch ( final Exception e) { 28 throw new IllegalStateException (e. getMessage () , e); 29 } 30 } 32 @Override 33 public void storeState ( final Configuration configuration ) { 34 configuration . setProperty ( GraphComputer . VERTEX_PROGRAM , PageRankVertexProgram . class . getName ()); 35 configuration . setProperty ( VERTEX_COUNT , this . vertexCountAsDouble ); 36 configuration . setProperty (ALPHA , this . alpha ); 37 configuration . setProperty ( TOTAL_ITERATIONS , this . totalIterations ); 38 try { 39 VertexProgramHelper . serialize ( this . messageType . getIncidentTraversal () , configuration , INCIDENT_TRAVERSAL ); 40 } catch ( final Exception e) { 41 throw new IllegalStateException (e. getMessage () , e); 42 } 43 } 44 45 @Override 46 public Set <String > getElementComputeKeys () { 47 return COMPUTE_KEYS ; 48 } 49 50 @Override 51 public void setup ( final Memory memory ) { 52 53 } 54 55 @Override 56 public void execute ( final Vertex vertex , Messenger <Double > messenger , final Memory memory ) { 57 if ( memory . isInitialIteration ()) { 58 double initialPageRank = 1.0d / this . vertexCountAsDouble ; 59 double edgeCount = Double . valueOf (( Long ) this . messageType . edges ( vertex ). count (). next ()); 60 vertex . singleProperty ( PAGE_RANK , initialPageRank ); 61 vertex . singleProperty ( EDGE_COUNT , edgeCount ); 62 messenger . sendMessage ( this . messageType , initialPageRank / edgeCount ); 63 } else { 64 double newPageRank = StreamFactory . stream ( messenger . receiveMessages ( this . messageType )). reduce (0.0d, (a, b) -> a + b); 65 newPageRank = ( this . alpha * newPageRank ) + ((1.0 d - this . alpha ) / this . vertexCountAsDouble ); 66 vertex . singleProperty ( PAGE_RANK , newPageRank ); 67 messenger . sendMessage ( this . messageType , newPageRank / vertex .<Double > property ( EDGE_COUNT ). orElse (0.0 d)); 68 } 69 } 70 71 @Override 72 public boolean terminate ( final Memory memory ) { 73 return memory . getIteration () >= this . totalIterations ; 74 } 75 }
  • 29. Pagerank for ArangoDB 1 var pageRank = function (vertex , message , global ) { 2 var total , rank , edgeCount , send , edge , alpha , sum ; 3 total = global . vertexCount ; 4 edgeCount = vertex . _outEdges . length ; 5 alpha = global . alpha ; 6 sum = 0; 7 if ( global . step > 0) { 8 while ( message . hasNext ()) { 9 sum += message . next (). data ; 10 } 11 rank = alpha * sum + (1- alpha ) / total ; 12 } else { 13 rank = 1 / total ; 14 } 15 vertex . _setResult ( rank ); 16 if ( global . step < global . MAX_STEPS ) { 17 send = rank / edgeCount ; 18 while ( vertex . _outEdges . hasNext ()) { 19 edge = vertex . _outEdges . next (); 20 message . sendTo ( edge . _getTarget () , send ); 21 } 22 } else { 23 vertex . _deactivate (); 24 } 25 }; 26 27 var combiner = function ( message , oldMessage ) { 28 return message + oldMessage ; 29 }; 30 31 var Runner = require (" org/ arangodb / pregelRunner "). Runner ; 32 var runner = new Runner (); 33 runner . setWorker ( pageRank ); 34 runner . setCombiner ( combiner ); 35 runner . start (" myGraph "); 12
  • 30. Thank you Further Questions? Follow me on twitter/github: @mchacki Write me a mail: mchacki@arangodb.com Follow @arangodb on Twitter Join our google group: https://groups.google.com/forum/#!forum/arangodb Visit our blog https://www.arangodb.com/blog Slides available at https://www.slideshare.net/arangodb 13
  • 31. 17TH ~ 18th NOV 2014 MADRID (SPAIN)