SlideShare a Scribd company logo
1 of 30
Download to read offline
Vasia Kalavri
Apache Flink PMC, PhD student @KTH
@vkalavri - vasia@apache.org
August 27, 2015
Gelly:
Large-Scale Graph Processing
with Apache Flink
Overview
- Why Graph Processing with Flink?
- Gelly: Flink’s Graph Processing API
- Iterative Graph Processing in Gelly
- What’s next, Gelly?
2
Why Graph Processing
with Flink?
4
A data analysis pipeline
load clean
create
graph
analyze
graph
clean transformload result
clean transformload
5
A more user-friendly pipeline
load
load result
load
6
General-purpose or specialized?
- fast application
development and
deployment
- easier maintenance
- non-intuitive APIs
- time-consuming
- use, configure and
integrate different
systems
- hard to maintain
- rich APIs and features
general-purpose specialized
what about performance?
7
Native Iterations
● the runtime is aware of the iterative execution
● no scheduling overhead between iterations
● caching and state maintenance are handled automatically
Caching Loop-invariant DataPushing work
“out of the loop”
Maintain state as index
8
Flink Iteration Operators
Iterate IterateDelta
9
Input
Iterative
Update Function
Result
Replace
Workset
Iterative
Update Function
Result
Solution Set
State
Many iterative algorithms
exhibit sparse
computational
dependencies and
asymmetric convergence
Delta Iterations
10
Example: Connected Components
11
Only vertices that change
their value are in the
workset for the next
iteration.
Beyond Iterations
12
Performance & Scalability
● own memory management
● own serialization framework
● operate on binary data when possible
Automatic Optimizations
● choose best execution strategy
● cache invariant data
Gelly
the Flink Graph API
● Java & Scala Graph APIs on top of Flink
● Ships already with Flink 0.9 (Beta)
● Can be seamlessly mixed with the DataSet
Flink API
→ easily implement applications that use
both record-based and graph-based analysis
Meet Gelly
14
Gelly in the Flink stack
15
Scala API
(batch and streaming)
Java API
(batch and streaming)
ML library Gelly
Runtime and Execution Environments
Data storage
Table API Python API
Transformations
and Utilities
Iterative Graph
Processing
Graph Library
Hello, Gelly!
// create a graph from a Dataset of Vertices and a DataSet of Edges
Graph<String, Long, Double> graph = Graph.fromDataSet(vertices, edges, env);
16
// create a graph from a Collection of Edges and initialize the Vertex values
Graph<String, Long, Double> graph = Graph.fromCollection(edges,
new MapFunction<String, Long>() {
public Long map(String vertexId) { return 1l; }
}, env);
vertex ID
type
vertex
value type
edge value
type
edges is a Java
collection
assigned vertex
value
● Graph Properties
○ getVertexIds
○ getEdgeIds
○ numberOfVertices
○ numberOfEdges
○ getDegrees
○ isWeaklyConnected
○ ...
Available Methods
● Transformations
○ map, filter, join
○ subgraph, union,
difference
○ reverse, undirected
○ getTriplets
● Mutations
○ add vertex/edge
○ remove vertex/edge
17
Example: mapVertices
18
// increment each vertex value by one
Graph<Long, Long, Long> updatedGraph = graph.mapVertices(
new MapFunction<Vertex<Long, Long>, Long>() {
public Long map(Vertex<Long, Long> value) {
return value.getValue() + 1;
}
});
4
2
8
5
5
3
1
7
4
5
Example: subGraph
19
Graph<Long, Long, Long> subgraph = graph.subgraph(
new FilterFunction<Vertex<Long, Long>>() {
public boolean filter(Vertex<Long, Long> vertex) {
// keep only vertices with positive values
return (vertex.getValue() > 0);
}
},
new FilterFunction<Edge<Long, Long>>() {
public boolean filter(Edge<Long, Long> edge) {
// keep only edges with negative values
return (edge.getValue() < 0);
}
})
- Apply a reduce function to the 1st-hop
neighborhood of each vertex in parallel
Neighborhood Methods
3
4
7
4
4
graph.reduceOnNeighbors(new MinValue(), EdgeDirection.OUT);
3
9
7
4
5
20
Iterative Graph Processing
Gelly offers iterative graph processing
abstractions on top of Flink’s Delta iterations
● vertex-centric (similar to Pregel, Giraph)
● gather-sum-apply (similar to PowerGraph)
21
MessagingFunction: what messages
to send to other vertices
VertexUpdateFunction: how to
update a vertex value, based on the
received messages
The workset contains only active
vertices (received a msg in the
previous superstep)
Vertex-centric Iterations
22
Vertex-centric SSSP
shortestPaths = graph.runVertexCentricIteration(
new DistanceUpdater(), new DistanceMessenger()).getVertices();
DistanceUpdater: VertexUpdateFunction DistanceMessenger: MessagingFunction
23
sendMessages(K key, Double newDist) {
for (Edge edge : getOutgoingEdges()) {
sendMessageTo(edge.getTarget(),
newDist + edge.getValue());
}
updateVertex(Vertex<K, Double> vertex,
MessageIterator msgs) {
Double minDist = Double.MAX_VALUE;
for (double msg : msgs) {
if (msg < minDist)
minDist = msg;
}
if (vertex.getValue() > minDist)
setNewVertexValue(minDist);
}
Gather: how to transform each of
the edges and neighbors of each
vertex
Sum: how to aggregate the partial
values of Gather to a single value
Apply: how to update the vertex
value, based on the new aggregate
and the current value
The workset contains the active
vertices (user-defined activation)
Gather-Sum-Apply Iterations
24
Gather-Sum-Apply SSSP
shortestPaths = graph.runGatherSumApplyIteration(new CalculateDistances(),
new ChooseMinDistance(), new UpdateDistance()).getVertices();
CalculateDistances: Gather
25
gather(Neighbor<Double, Double> neighbor) {
return neighbor.getNeighborValue() + neighbor.getEdgeValue();
}
ChooseMinDistance: Sum
ChooseMinDistance: Apply
sum(Double newValue, Double currentValue) {
return Math.min(newValue, currentValue);
}
apply(Double newDistance, Double oldDistance) {
if (newDistance < oldDistance) { setResult(newDistance); }
}
Iteration Configuration
● Parallelism
● Aggregators
● Broadcast Variables
● Messaging Direction (IN, OUT, ALL)
● Access vertex degrees and number of
vertices
26
Vertex-Centric or GSA?
27
vertex-centric GSA
when message creation is independent &
expensive (parallelized in Gather)
when the graph is skewed (~ 1.5x faster)
if update is associative-commutative
when all messages are needed for
the update
when a vertex needs to send
messages to non-neighbors
● PageRank*
● Single Source Shortest Paths*
● Label Propagation
● Weakly Connected Components*
● Community Detection
Upcoming: Triangle Count, HITS, Affinity Propagation,
Graph Summarization
graphWithRanks = inputGraph.run(new PageRank(maxIterations, dampingFactor));
*: both vertex-centric and GSA implementations
Library of Algorithms
28
What’s next, Gelly?
● Integration with the Flink Streaming API
● Specialized Operators for Skewed Graphs
● Partitioning Methods
● More graph processing models: neighborhood-centric,
partition-centric
Check our Roadmap!
https://cwiki.apache.org/confluence/display/FLINK/Flink+Gelly
29
Feeling Gelly? Try it!
● Grab the Flink master: https://github.com/apache/flink
● Read the Gelly programming guide: https://ci.apache.
org/projects/flink/flink-docs-master/libs/gelly_guide.html
● Follow the Gelly School tutorials (Beta): http://gellyschool.com
● Read our blog post: https://flink.apache.
org/news/2015/08/24/introducing-flink-gelly.html
● Come to Flink Forward (October 12-13, Berlin): http://flink-
forward.org/
30

More Related Content

What's hot

Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015
Andra Lungu
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Somnath Mazumdar
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward
 

What's hot (20)

Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduceBlock Sampling: Efficient Accurate Online Aggregation in MapReduce
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
 
Demystifying Distributed Graph Processing
Demystifying Distributed Graph ProcessingDemystifying Distributed Graph Processing
Demystifying Distributed Graph Processing
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Deep Stream Dynamic Graph Analytics with Grapharis -  Massimo PeriniDeep Stream Dynamic Graph Analytics with Grapharis -  Massimo Perini
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
m2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reusem2r2: A Framework for Results Materialization and Reuse
m2r2: A Framework for Results Materialization and Reuse
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetup
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin Meetup
 

Viewers also liked

Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 

Viewers also liked (20)

Baymeetup-FlinkResearch
Baymeetup-FlinkResearchBaymeetup-FlinkResearch
Baymeetup-FlinkResearch
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup Community Update August 2015
 
Designing and Testing Accumulo Iterators
Designing and Testing Accumulo IteratorsDesigning and Testing Accumulo Iterators
Designing and Testing Accumulo Iterators
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
 
Flink. Pure Streaming
Flink. Pure StreamingFlink. Pure Streaming
Flink. Pure Streaming
 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsCloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems research
 
Like a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web TrackersLike a Pack of Wolves: Community Structure of Web Trackers
Like a Pack of Wolves: Community Structure of Web Trackers
 
Graph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analyticsGraph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analytics
 
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
 
The shortest path is not always a straight line
The shortest path is not always a straight lineThe shortest path is not always a straight line
The shortest path is not always a straight line
 
MapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
ETL into Neo4j
ETL into Neo4jETL into Neo4j
ETL into Neo4j
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Rule Engine Evaluation for Complex Event Processing
Rule Engine Evaluation for Complex Event ProcessingRule Engine Evaluation for Complex Event Processing
Rule Engine Evaluation for Complex Event Processing
 

Similar to Gelly in Apache Flink Bay Area Meetup

Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit
 

Similar to Gelly in Apache Flink Bay Area Meetup (20)

Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
 
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at... Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 
Funtional Reactive Programming with Examples in Scala + GWT
Funtional Reactive Programming with Examples in Scala + GWTFuntional Reactive Programming with Examples in Scala + GWT
Funtional Reactive Programming with Examples in Scala + GWT
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) Pregel
 
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
Frank Celler – Processing large-scale graphs with Google(TM) Pregel - NoSQL m...
 
CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...
CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...
CodiLime Tech Talk - Katarzyna Ziomek-Zdanowicz: RxJS main concepts and real ...
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
SLE2015: Distributed ATL
SLE2015: Distributed ATLSLE2015: Distributed ATL
SLE2015: Distributed ATL
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and Culling
 
OpenGL Transformations
OpenGL TransformationsOpenGL Transformations
OpenGL Transformations
 
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLabAdvanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
 
RxJS Operators - Real World Use Cases (FULL VERSION)
RxJS Operators - Real World Use Cases (FULL VERSION)RxJS Operators - Real World Use Cases (FULL VERSION)
RxJS Operators - Real World Use Cases (FULL VERSION)
 
Dafunctor
DafunctorDafunctor
Dafunctor
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
 
Overview of Chainer and Its Features
Overview of Chainer and Its FeaturesOverview of Chainer and Its Features
Overview of Chainer and Its Features
 
2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks2011.10.14 Apache Giraph - Hortonworks
2011.10.14 Apache Giraph - Hortonworks
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 

Recently uploaded

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Recently uploaded (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

Gelly in Apache Flink Bay Area Meetup

  • 1. Vasia Kalavri Apache Flink PMC, PhD student @KTH @vkalavri - vasia@apache.org August 27, 2015 Gelly: Large-Scale Graph Processing with Apache Flink
  • 2. Overview - Why Graph Processing with Flink? - Gelly: Flink’s Graph Processing API - Iterative Graph Processing in Gelly - What’s next, Gelly? 2
  • 4. 4
  • 5. A data analysis pipeline load clean create graph analyze graph clean transformload result clean transformload 5
  • 6. A more user-friendly pipeline load load result load 6
  • 7. General-purpose or specialized? - fast application development and deployment - easier maintenance - non-intuitive APIs - time-consuming - use, configure and integrate different systems - hard to maintain - rich APIs and features general-purpose specialized what about performance? 7
  • 8. Native Iterations ● the runtime is aware of the iterative execution ● no scheduling overhead between iterations ● caching and state maintenance are handled automatically Caching Loop-invariant DataPushing work “out of the loop” Maintain state as index 8
  • 9. Flink Iteration Operators Iterate IterateDelta 9 Input Iterative Update Function Result Replace Workset Iterative Update Function Result Solution Set State
  • 10. Many iterative algorithms exhibit sparse computational dependencies and asymmetric convergence Delta Iterations 10
  • 11. Example: Connected Components 11 Only vertices that change their value are in the workset for the next iteration.
  • 12. Beyond Iterations 12 Performance & Scalability ● own memory management ● own serialization framework ● operate on binary data when possible Automatic Optimizations ● choose best execution strategy ● cache invariant data
  • 14. ● Java & Scala Graph APIs on top of Flink ● Ships already with Flink 0.9 (Beta) ● Can be seamlessly mixed with the DataSet Flink API → easily implement applications that use both record-based and graph-based analysis Meet Gelly 14
  • 15. Gelly in the Flink stack 15 Scala API (batch and streaming) Java API (batch and streaming) ML library Gelly Runtime and Execution Environments Data storage Table API Python API Transformations and Utilities Iterative Graph Processing Graph Library
  • 16. Hello, Gelly! // create a graph from a Dataset of Vertices and a DataSet of Edges Graph<String, Long, Double> graph = Graph.fromDataSet(vertices, edges, env); 16 // create a graph from a Collection of Edges and initialize the Vertex values Graph<String, Long, Double> graph = Graph.fromCollection(edges, new MapFunction<String, Long>() { public Long map(String vertexId) { return 1l; } }, env); vertex ID type vertex value type edge value type edges is a Java collection assigned vertex value
  • 17. ● Graph Properties ○ getVertexIds ○ getEdgeIds ○ numberOfVertices ○ numberOfEdges ○ getDegrees ○ isWeaklyConnected ○ ... Available Methods ● Transformations ○ map, filter, join ○ subgraph, union, difference ○ reverse, undirected ○ getTriplets ● Mutations ○ add vertex/edge ○ remove vertex/edge 17
  • 18. Example: mapVertices 18 // increment each vertex value by one Graph<Long, Long, Long> updatedGraph = graph.mapVertices( new MapFunction<Vertex<Long, Long>, Long>() { public Long map(Vertex<Long, Long> value) { return value.getValue() + 1; } }); 4 2 8 5 5 3 1 7 4 5
  • 19. Example: subGraph 19 Graph<Long, Long, Long> subgraph = graph.subgraph( new FilterFunction<Vertex<Long, Long>>() { public boolean filter(Vertex<Long, Long> vertex) { // keep only vertices with positive values return (vertex.getValue() > 0); } }, new FilterFunction<Edge<Long, Long>>() { public boolean filter(Edge<Long, Long> edge) { // keep only edges with negative values return (edge.getValue() < 0); } })
  • 20. - Apply a reduce function to the 1st-hop neighborhood of each vertex in parallel Neighborhood Methods 3 4 7 4 4 graph.reduceOnNeighbors(new MinValue(), EdgeDirection.OUT); 3 9 7 4 5 20
  • 21. Iterative Graph Processing Gelly offers iterative graph processing abstractions on top of Flink’s Delta iterations ● vertex-centric (similar to Pregel, Giraph) ● gather-sum-apply (similar to PowerGraph) 21
  • 22. MessagingFunction: what messages to send to other vertices VertexUpdateFunction: how to update a vertex value, based on the received messages The workset contains only active vertices (received a msg in the previous superstep) Vertex-centric Iterations 22
  • 23. Vertex-centric SSSP shortestPaths = graph.runVertexCentricIteration( new DistanceUpdater(), new DistanceMessenger()).getVertices(); DistanceUpdater: VertexUpdateFunction DistanceMessenger: MessagingFunction 23 sendMessages(K key, Double newDist) { for (Edge edge : getOutgoingEdges()) { sendMessageTo(edge.getTarget(), newDist + edge.getValue()); } updateVertex(Vertex<K, Double> vertex, MessageIterator msgs) { Double minDist = Double.MAX_VALUE; for (double msg : msgs) { if (msg < minDist) minDist = msg; } if (vertex.getValue() > minDist) setNewVertexValue(minDist); }
  • 24. Gather: how to transform each of the edges and neighbors of each vertex Sum: how to aggregate the partial values of Gather to a single value Apply: how to update the vertex value, based on the new aggregate and the current value The workset contains the active vertices (user-defined activation) Gather-Sum-Apply Iterations 24
  • 25. Gather-Sum-Apply SSSP shortestPaths = graph.runGatherSumApplyIteration(new CalculateDistances(), new ChooseMinDistance(), new UpdateDistance()).getVertices(); CalculateDistances: Gather 25 gather(Neighbor<Double, Double> neighbor) { return neighbor.getNeighborValue() + neighbor.getEdgeValue(); } ChooseMinDistance: Sum ChooseMinDistance: Apply sum(Double newValue, Double currentValue) { return Math.min(newValue, currentValue); } apply(Double newDistance, Double oldDistance) { if (newDistance < oldDistance) { setResult(newDistance); } }
  • 26. Iteration Configuration ● Parallelism ● Aggregators ● Broadcast Variables ● Messaging Direction (IN, OUT, ALL) ● Access vertex degrees and number of vertices 26
  • 27. Vertex-Centric or GSA? 27 vertex-centric GSA when message creation is independent & expensive (parallelized in Gather) when the graph is skewed (~ 1.5x faster) if update is associative-commutative when all messages are needed for the update when a vertex needs to send messages to non-neighbors
  • 28. ● PageRank* ● Single Source Shortest Paths* ● Label Propagation ● Weakly Connected Components* ● Community Detection Upcoming: Triangle Count, HITS, Affinity Propagation, Graph Summarization graphWithRanks = inputGraph.run(new PageRank(maxIterations, dampingFactor)); *: both vertex-centric and GSA implementations Library of Algorithms 28
  • 29. What’s next, Gelly? ● Integration with the Flink Streaming API ● Specialized Operators for Skewed Graphs ● Partitioning Methods ● More graph processing models: neighborhood-centric, partition-centric Check our Roadmap! https://cwiki.apache.org/confluence/display/FLINK/Flink+Gelly 29
  • 30. Feeling Gelly? Try it! ● Grab the Flink master: https://github.com/apache/flink ● Read the Gelly programming guide: https://ci.apache. org/projects/flink/flink-docs-master/libs/gelly_guide.html ● Follow the Gelly School tutorials (Beta): http://gellyschool.com ● Read our blog post: https://flink.apache. org/news/2015/08/24/introducing-flink-gelly.html ● Come to Flink Forward (October 12-13, Berlin): http://flink- forward.org/ 30