SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Big Graph on small computer
Hadoop meeting
Michael Leznik King.Com
Inspired by Aapo Kryola
What is GraphChi
• GraphChi, a disk-based system for computing
efficiently on graphs with billions of edges.
• GraphChi can compute on the full Twitter
follow -graph with just a standard laptop
~ as fast as a very large Hadoop cluster!
Why one needs it?
• To use existing graph frameworks, one is faced
with the challenge of partitioning the graph
across cluster nodes. Finding efficient graph cuts
that minimize communication between nodes,
and are also balanced, is a hard problem. More
generally, distributed systems and their users
must deal with managing a cluster, fault
tolerance, and often unpredictable performance.
From the perspective of programmers, debugging
and optimizing distributed algorithms is hard.
Would it be possible to do advanced
graph computation on just a personal
computer?
Aapo Kyrola PhD
Candidate at Carnegie
Melon University
How Does It Work
• By using a well-known method to break
large graphs into small parts, and a novel
parallel sliding windows method,
GraphChi is able to execute several
advanced data mining, graph mining, and
machine learning algorithms on very
large graphs, using just a single
consumer-level computer
What Is Parallel Sliding Window?
• Graph 𝐺 = 𝑉, 𝐸 𝑉 − vertex, 𝐸 − edge
• Each edge and vertex are associated with a
user defined type value (Integer, Double e.tc.)
A B
e
E is an out-edge of A and in-edge of B
e = 𝑠𝑜𝑢𝑟𝑐𝑒, 𝑑𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 ∈ 𝐸
Vertex Centric Computation
• Algorithm: 𝑉𝑒𝑟𝑡𝑒𝑥 𝑢𝑝𝑑𝑎𝑡𝑒 − 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
Update vertex begin
𝑥 ← read values of in−and−out edges of vertex
vertex. value ← 𝑓 𝑥 ;
foreach edge of vertex do
edge.value ← g(vertex.value, edge.value);
end
end
Graphs and Subgraphs
• Under the PSW method, the vertices V of graph
G = (V, E) are split into P disjoint intervals. For
each interval we associate a shard, which stores
all the edges that have destination in the
interval. Edges are stored in the order of their
source. Intervals are chosen to balance the
number of edges in each shard; the number of
intervals, P , is chosen so that any one shard can
be loaded completely into memory.
Graphs and Subgraphs
shard(1)
interval(1) interval(2) interval(P)
shard(2) shard(P)
Graphs and Subgraphs
Shard 1
Shards small enough to fit in memory; balance size of shards
Shard: in-edges for interval of vertices; sorted by source-id
in-edgesforvertices1..100
sortedbysource_id
Vertices
1..100
Vertices
101..700
Vertices
701..1000
Vertices
1001..10000
Shard 2 Shard 3 Shard 4Shard 1
Parallel Sliding Windows
Visualization of the stages of one iteration of the Parallel Sliding Windows method. In
this example, vertices are divided into four intervals, each associated with a shard. The
computation proceeds by constructing a subgraph of vertices one interval a time. In-
edges for the vertices are read from the memory-shard (in dark color) while out-edges
are read from each of the sliding shards. The current sliding window is pictured on top of
each shard.
Parallel Sliding Windows
foreach iteration do
shards ← 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒𝑆ℎ𝑎𝑟𝑑𝑠 𝑃
𝐟𝐨𝐫 interval ← 1 𝑡𝑜 𝑃 𝐝𝐨
subgraph ← 𝐿𝑜𝑎𝑑𝑆𝑢𝑏𝐺𝑟𝑎𝑝ℎ 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
𝐩𝐚𝐫𝐚𝐥𝐥𝐞 𝐟𝐨𝐫𝐞𝐚𝐜𝐡 𝑣𝑒𝑟𝑡𝑒𝑥 ∈ 𝑠𝑢𝑏𝑔𝑟𝑎𝑝ℎ. 𝑣𝑒𝑟𝑡𝑒𝑥 𝐝𝐨
𝑈𝐷𝐹_𝑢𝑝𝑑𝑎𝑡𝑒𝑉𝑒𝑟𝑡𝑒𝑥(𝑣𝑒𝑟𝑡𝑒𝑥)
end
for 𝑠 ∈ 1, … , 𝑃, 𝑠 ≠ 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝐝𝐨
𝑠ℎ𝑎𝑟𝑑𝑠 𝑠 . 𝑈𝑝𝑑𝑎𝑡𝑒𝑙𝑎𝑠𝑡𝑊𝑖𝑛𝑑𝑜𝑤𝑇𝑜𝐷𝑖𝑠𝑘()
end
end
end
Parallel Sliding Windows
Input: Interval index number p
Subgraph of vertices in the interval p
𝑎 ← 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑝 . 𝑠𝑡𝑎𝑟𝑡
𝑏 ← 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑝 . 𝑒𝑛𝑑
𝐺 ← 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒𝑆𝑢𝑏𝑔𝑟𝑎𝑝ℎ 𝑎, 𝑏
Graph Data Format
• Adjacency List Format: (can’t have values)
src1 3 dst1 dst2 dst3
src2 2 dst4 dst 6
• Edge List Format:
src1 dst1 value1
src2 dst2 value2
src3 src3 value3
Performance
• Laptop, 16 GB, CPU i7@2.7 x4, SSD 160Gb
• Algorithm: Connected Components
• Graph Size: 7,294,535 vertices (1.4 gb)
• Creating Shards: 61.45 sec
• Analysing graph: 177.24 sec
Experiment Setting
• Mac Mini (Apple Inc.)
– 8 GB RAM
– 256 GB SSD, 1TB hard drive
– Intel Core i5, 2.5 GHz
• Experiment graphs:
Graph Vertices Edges P (shards) Preprocessing
live-journal 4.8M 69M 3 0.5 min
netflix 0.5M 99M 20 1 min
twitter-2010 42M 1.5B 20 2 min
uk-2007-05 106M 3.7B 40 31 min
uk-union 133M 5.4B 50 33 min
yahoo-web 1.4B 6.6B 50 37 min
Comparison to Existing Systems
Notes: comparison results do not include time to transfer the data to cluster, preprocessing, or the time to
load the graph from disk. GraphChi computes asynchronously, while all but GraphLab synchronously.
PageRank WebGraph Belief Propagation (U Kang et al.)
Matrix Factorization (Alt. Least Sqr.) Triangle Counting
GraphLab
v1 (8 cores)
GraphChi
(Mac Mini)
0 2 4 6 8 10 12
Minutes
Netflix (99B edges)
Spark (50
machines)
GraphChi
(Mac Mini)
0 2 4 6 8 10 12 14
Minutes
Twitter-2010 (1.5B edges)
Pegasus /
Hadoop
(100
machines)
GraphChi
(Mac Mini)
0 5 10 15 20 25 30
Minutes
Yahoo-web (6.7B edges)
Hadoop
(1636
machines)
GraphChi
(Mac Mini)
0 100 200 300 400 500
Minutes
twitter-2010 (1.5B edges)
PowerGraph Comparison
• PowerGraph / GraphLab 2
outperforms previous systems
by a wide margin on natural
graphs.
• With 64 more machines, 512
more CPUs:
– Pagerank: 40x faster than
GraphChi
– Triangle counting: 30x faster than
GraphChi.
GraphChi has state-of-the-
art performance / CPU.
vs.
GraphChi
Conclusion
• Parallel Sliding Windows algorithm enables
processing of large graphs with very few non-
sequential disk accesses.
• For the system researchers, GraphChi is a solid
baseline for system evaluation
– It can solve as big problems as distributed systems.
• Takeaway: Appropriate data structures as an
alternative to scaling up.
Source code and examples: http://graphchi.org
License: Apache 2.0

Más contenido relacionado

La actualidad más candente

High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learningjie cao
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O Sri Ambati
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmYu Liu
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsNishant Gandhi
 
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...Menlo Systems GmbH
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...Nesreen K. Ahmed
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithmsSandeep Joshi
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsRevolution Analytics
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...asimkadav
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReducePietro Michiardi
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLMLconf
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016MLconf
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)Lynn Cherny
 
Data Streaming Algorithms
Data Streaming AlgorithmsData Streaming Algorithms
Data Streaming Algorithms宇 傅
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming AlgorithmsJoe Kelley
 
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Sparksscdotopen
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBArangoDB Database
 

La actualidad más candente (20)

High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
 
Map reduce programming model to solve graph problems
Map reduce programming model to solve graph problemsMap reduce programming model to solve graph problems
Map reduce programming model to solve graph problems
 
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorit...
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithms
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear Models
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)
 
Data Streaming Algorithms
Data Streaming AlgorithmsData Streaming Algorithms
Data Streaming Algorithms
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
 
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Spark
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 

Destacado

Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hoodZuhair khayyat
 
Machine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLabMachine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLabDanny Bickson
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013Amazon Web Services
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisMLconf
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - GraphlabAmir Payberah
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXAmir Payberah
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbitjakehofman
 

Destacado (10)

Graphlab under the hood
Graphlab under the hoodGraphlab under the hood
Graphlab under the hood
 
Machine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLabMachine Learning in the Cloud with GraphLab
Machine Learning in the Cloud with GraphLab
 
GraphLab
GraphLabGraphLab
GraphLab
 
PowerGraph
PowerGraphPowerGraph
PowerGraph
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
 
Jeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, AdaptrisJeff Bradshaw, Founder, Adaptris
Jeff Bradshaw, Founder, Adaptris
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - Graphlab
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphX
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 

Similar a GraphChi big graph processing

Graph processing
Graph processingGraph processing
Graph processingyeahjs
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedTuri, Inc.
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsFlink Forward
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...NAVER D2
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMfnothaft
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMfnothaft
 
Intel’S Larrabee
Intel’S LarrabeeIntel’S Larrabee
Intel’S Larrabeevipinpnair
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Ontico
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev
 
Ch22 parallel d_bs_cs561
Ch22 parallel d_bs_cs561Ch22 parallel d_bs_cs561
Ch22 parallel d_bs_cs561Shobhit Saxena
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profilepramodbiligiri
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsToyotaro Suzumura
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-isctembreternitz
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 

Similar a GraphChi big graph processing (20)

Graph processing
Graph processingGraph processing
Graph processing
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
Intel’S Larrabee
Intel’S LarrabeeIntel’S Larrabee
Intel’S Larrabee
 
Graph chi
Graph chiGraph chi
Graph chi
 
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
Thorny Path to the Large Scale Graph Processing, Алексей Зиновьев (Тамтэк)
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
Ch22 parallel d_bs_cs561
Ch22 parallel d_bs_cs561Ch22 parallel d_bs_cs561
Ch22 parallel d_bs_cs561
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph AnalyticsScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
ScaleGraph - A High-Performance Library for Billion-Scale Graph Analytics
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 

Más de huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 

Más de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

GraphChi big graph processing

  • 1. Big Graph on small computer Hadoop meeting Michael Leznik King.Com Inspired by Aapo Kryola
  • 2. What is GraphChi • GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. • GraphChi can compute on the full Twitter follow -graph with just a standard laptop ~ as fast as a very large Hadoop cluster!
  • 3. Why one needs it? • To use existing graph frameworks, one is faced with the challenge of partitioning the graph across cluster nodes. Finding efficient graph cuts that minimize communication between nodes, and are also balanced, is a hard problem. More generally, distributed systems and their users must deal with managing a cluster, fault tolerance, and often unpredictable performance. From the perspective of programmers, debugging and optimizing distributed algorithms is hard.
  • 4. Would it be possible to do advanced graph computation on just a personal computer?
  • 5. Aapo Kyrola PhD Candidate at Carnegie Melon University
  • 6. How Does It Work • By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer
  • 7. What Is Parallel Sliding Window? • Graph 𝐺 = 𝑉, 𝐸 𝑉 − vertex, 𝐸 − edge • Each edge and vertex are associated with a user defined type value (Integer, Double e.tc.) A B e E is an out-edge of A and in-edge of B e = 𝑠𝑜𝑢𝑟𝑐𝑒, 𝑑𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 ∈ 𝐸
  • 8. Vertex Centric Computation • Algorithm: 𝑉𝑒𝑟𝑡𝑒𝑥 𝑢𝑝𝑑𝑎𝑡𝑒 − 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 Update vertex begin 𝑥 ← read values of in−and−out edges of vertex vertex. value ← 𝑓 𝑥 ; foreach edge of vertex do edge.value ← g(vertex.value, edge.value); end end
  • 9. Graphs and Subgraphs • Under the PSW method, the vertices V of graph G = (V, E) are split into P disjoint intervals. For each interval we associate a shard, which stores all the edges that have destination in the interval. Edges are stored in the order of their source. Intervals are chosen to balance the number of edges in each shard; the number of intervals, P , is chosen so that any one shard can be loaded completely into memory.
  • 10. Graphs and Subgraphs shard(1) interval(1) interval(2) interval(P) shard(2) shard(P)
  • 11. Graphs and Subgraphs Shard 1 Shards small enough to fit in memory; balance size of shards Shard: in-edges for interval of vertices; sorted by source-id in-edgesforvertices1..100 sortedbysource_id Vertices 1..100 Vertices 101..700 Vertices 701..1000 Vertices 1001..10000 Shard 2 Shard 3 Shard 4Shard 1
  • 12. Parallel Sliding Windows Visualization of the stages of one iteration of the Parallel Sliding Windows method. In this example, vertices are divided into four intervals, each associated with a shard. The computation proceeds by constructing a subgraph of vertices one interval a time. In- edges for the vertices are read from the memory-shard (in dark color) while out-edges are read from each of the sliding shards. The current sliding window is pictured on top of each shard.
  • 13. Parallel Sliding Windows foreach iteration do shards ← 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒𝑆ℎ𝑎𝑟𝑑𝑠 𝑃 𝐟𝐨𝐫 interval ← 1 𝑡𝑜 𝑃 𝐝𝐨 subgraph ← 𝐿𝑜𝑎𝑑𝑆𝑢𝑏𝐺𝑟𝑎𝑝ℎ 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝐩𝐚𝐫𝐚𝐥𝐥𝐞 𝐟𝐨𝐫𝐞𝐚𝐜𝐡 𝑣𝑒𝑟𝑡𝑒𝑥 ∈ 𝑠𝑢𝑏𝑔𝑟𝑎𝑝ℎ. 𝑣𝑒𝑟𝑡𝑒𝑥 𝐝𝐨 𝑈𝐷𝐹_𝑢𝑝𝑑𝑎𝑡𝑒𝑉𝑒𝑟𝑡𝑒𝑥(𝑣𝑒𝑟𝑡𝑒𝑥) end for 𝑠 ∈ 1, … , 𝑃, 𝑠 ≠ 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝐝𝐨 𝑠ℎ𝑎𝑟𝑑𝑠 𝑠 . 𝑈𝑝𝑑𝑎𝑡𝑒𝑙𝑎𝑠𝑡𝑊𝑖𝑛𝑑𝑜𝑤𝑇𝑜𝐷𝑖𝑠𝑘() end end end
  • 14. Parallel Sliding Windows Input: Interval index number p Subgraph of vertices in the interval p 𝑎 ← 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑝 . 𝑠𝑡𝑎𝑟𝑡 𝑏 ← 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑝 . 𝑒𝑛𝑑 𝐺 ← 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒𝑆𝑢𝑏𝑔𝑟𝑎𝑝ℎ 𝑎, 𝑏
  • 15. Graph Data Format • Adjacency List Format: (can’t have values) src1 3 dst1 dst2 dst3 src2 2 dst4 dst 6 • Edge List Format: src1 dst1 value1 src2 dst2 value2 src3 src3 value3
  • 16. Performance • Laptop, 16 GB, CPU i7@2.7 x4, SSD 160Gb • Algorithm: Connected Components • Graph Size: 7,294,535 vertices (1.4 gb) • Creating Shards: 61.45 sec • Analysing graph: 177.24 sec
  • 17. Experiment Setting • Mac Mini (Apple Inc.) – 8 GB RAM – 256 GB SSD, 1TB hard drive – Intel Core i5, 2.5 GHz • Experiment graphs: Graph Vertices Edges P (shards) Preprocessing live-journal 4.8M 69M 3 0.5 min netflix 0.5M 99M 20 1 min twitter-2010 42M 1.5B 20 2 min uk-2007-05 106M 3.7B 40 31 min uk-union 133M 5.4B 50 33 min yahoo-web 1.4B 6.6B 50 37 min
  • 18. Comparison to Existing Systems Notes: comparison results do not include time to transfer the data to cluster, preprocessing, or the time to load the graph from disk. GraphChi computes asynchronously, while all but GraphLab synchronously. PageRank WebGraph Belief Propagation (U Kang et al.) Matrix Factorization (Alt. Least Sqr.) Triangle Counting GraphLab v1 (8 cores) GraphChi (Mac Mini) 0 2 4 6 8 10 12 Minutes Netflix (99B edges) Spark (50 machines) GraphChi (Mac Mini) 0 2 4 6 8 10 12 14 Minutes Twitter-2010 (1.5B edges) Pegasus / Hadoop (100 machines) GraphChi (Mac Mini) 0 5 10 15 20 25 30 Minutes Yahoo-web (6.7B edges) Hadoop (1636 machines) GraphChi (Mac Mini) 0 100 200 300 400 500 Minutes twitter-2010 (1.5B edges)
  • 19. PowerGraph Comparison • PowerGraph / GraphLab 2 outperforms previous systems by a wide margin on natural graphs. • With 64 more machines, 512 more CPUs: – Pagerank: 40x faster than GraphChi – Triangle counting: 30x faster than GraphChi. GraphChi has state-of-the- art performance / CPU. vs. GraphChi
  • 20. Conclusion • Parallel Sliding Windows algorithm enables processing of large graphs with very few non- sequential disk accesses. • For the system researchers, GraphChi is a solid baseline for system evaluation – It can solve as big problems as distributed systems. • Takeaway: Appropriate data structures as an alternative to scaling up. Source code and examples: http://graphchi.org License: Apache 2.0