SlideShare una empresa de Scribd logo
1 de 21
VENUS: Vertex-Centric Streamlined
Graph Computation on a Single PC
Jiefeng Cheng1, Qin Liu2, Zhenguo Li1,
Wei Fan1, John C.S. Lui2, Cheng He1
1Huawei Noah’s Ark Lab
2 The Chinese University of Hong Kong
ICDE’15
Graph is everywhere
 We have large graphs
• Web graph
• Social graph
• User-movie ratings graph
• …
 Graph Computation
• PageRank
• Community detection
• ALS for collaborative filtering
• …
Mining from Big Graphs:
two feasible ways
 Distributed systems
• Pregel[SIGMOD’10], GraphLab[OSDI’12],
GraphX[OSDI’14], Giraph, ...
• Expensive cluster, complex setup, writing distributed
programs
 Single-machine system
• Disk: GraphChi[OSDI’12], X-Stream[SOSP’13]
• SSD: TurboGraph[KDD’13], FlashGraph[FAST’15]
• Computation time close to distributed systems
• PageRank on Twitter graph (41M nodes, 1.4B edges)
• Spark: 8.1min with 50 machines (each with 2 CPUs, 7.5G
RAM)[Stanton KDD’12]
• VENUS: 8 min on a single machine with quad-core CPU, 16G RAM
• Affordable, easy to program/debug
Existing Systems
 Vertex-centric programming model: popularized
by Pregel / GraphLab / GraphChi
• Each vertex updates itself based on its neighborhood
 GraphChi
• Updated data on each vertex must be propagated to its
neighbors through disk
• Extensive disk I/O
 X-Stream
• Different API: edge-centric programming
• Less expressive, re-implement common algorithms
• Also use disk to propagate updates
Our Contributions
 Design and implement a disk-based system,
VENUS
• A new vertex-centric streamlined processing model
• Separate mutable vertex data and immutable edge
data
• Read/Write less data compared to other systems
 Evaluation on large graphs
• Outperform GraphChi and X-Stream
• Verify that our design reduce data access
Vertex-Centric Programming
 Consider GraphChi
for each iteration
for each vertex v
update(v)
void update(v)
fetch data from each in-edge
update data on v
spread data to each out-edge
Duplicated data
v
Vertex-Centric Programming
 VENUS:
• Only store mutable values on vertices
 Pros
• Less data access
• Enable ``streamlined’’ processing
 Cons
• Limited expressiveness
void update(v)
fetch data from each in-edge
update data on v
spread data to each out-edge
in-neighbor
v
VENUS Architecture
VENUS Architecture
 Disk storage (offline)
• Sharding
• Separation of edge data and vertex data
 Computing model (online)
• Load edge data sequentially
• Execute the update function on each vertex
• How to load vertex data and propagate updates
Sharding
 Graph cannot fit in RAM?
• Split the graph into shards
 Each shard corresponds to an interval of vertices:
• G-shard: immutable structure of graph
• In-edges of nodes in the interval
• V-shard: mutable vertex values
• Vertex values of all vertices in the shard
 Structure table: all g-shards
 Value table: all vertex data
Vertex ID 1 2 3 4 5 6 7 8 9 10 1
1
12
Data
Interval I1=[1,4] I2=[5,8] I3=[9,12]
G-shard 7,9,10 → 1
6,10 → 2
1,2,6 → 3
1,2,6,7,10 → 4
6,7,8,11 → 5
1,10 → 6
3,10,11 → 7
3,6,11 → 8
2,3,4,10,11 → 9
11 → 10
4,6 → 11
2,3,9,10,11 → 12
V-shard I1∪{6,7,9,10} I2∪{1,3,10,11} I3∪{2,3,4,6}
Vertex-Centric Streamlined
Processing
 V-shards are much smaller than g-shards
• Load each v-shard entirely into memory
 Scan each g-shard sequentially
• Execute the update function in parallel
Execution
Load v-shard 1
7,9,10 → 1
6,10 → 2
1,2,6 → 3
1,2,6,7,10 → 4
Update v-shard 1
Load v-shard 2
6,7,8,11 → 5
1,10 → 6
3,10,11 → 7
3,6,11 → 8
Update v-shard 2
Load v-shard 3
2,3,4,10,11 → 9
11 → 10
4,6 → 11
2,3,9,10,11 → 12
Update v-shard 3
LoadingExecution
Parallelize execution
and loading
Load and Update v-shards
 Two I/O efficient algorithms
• Algorithm 1: Extension of PSW in GraphChi (skip)
• Algorithm 2: Merge-Join
• Load: merge-join between value table and v-shard
• Update: write values of [1,4] back to vertex table
 Use value buffer to cache value table
ID 1 2 3 4 5 6 7 8 9 10 1
1
12
Data
Value table on disk
ID 1 2 3 4 6 7 9 10 Vertices in v-shard 1
on disk
ID 1 2 3 4 6 7 9 10
Data
Loaded v-shard 1
Evaluation of VENUS
 Setup: a commodity PC
• quad-core 3.4GHz CPU
• 16GB RAM and 4TB hard disk
 Main competitors:
• GraphChi and X-Stream
 Applications:
• PageRank
• WCC: weakly connected components
• CD: community detection
• ALS: alternating least square for collaborative filtering
• Shortest path, label propagations, etc.
PageRank on Twitter
 Twitter follow-graph: 41M nodes, 1.4B edges
Cost of Updates Propagation:
Data Write and Read
Applications: WCC, CD, ALS
Failed to implement CD on X-Stream,
due to its edge-centric programming
model
Web-Scale Graph
 Clueweb12: web scale graph
• 978 million nodes, 42.5 billion edges
• 402 GB on disk
• 2 iterations of PageRank
 Computation time
• GraphChi: 4.3 hours
• X-Stream: 7.4 hours
• VENUS-I: 2 hours
• VENUS-II: 1.8 hours
Conclusion
 Present a disk-based graph computation system,
VENUS
 Our design of graph storage and execution can
reduce data access and I/O
 Evaluations show it outperforms GraphChi and
X-Stream
 Also VENUS can handle billion-scale problems
Thank you!
Q&A

Más contenido relacionado

La actualidad más candente

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationAhmad El Tawil
 
Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationEUDAT
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Databricks
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopExtremeEarth
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution Chen Wu
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Stephan Ewen
 
Scaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rushScaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rushDaniel Ben-Zvi
 
Scaling Graphite At Yelp
Scaling Graphite At YelpScaling Graphite At Yelp
Scaling Graphite At YelpPaul O'Connor
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopExtremeEarth
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Flink Forward
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoopishan0019
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)Jonas Traub
 
Smallworld to GreGG - FME Server Automation
Smallworld to GreGG - FME Server AutomationSmallworld to GreGG - FME Server Automation
Smallworld to GreGG - FME Server AutomationSafe Software
 
SGF15--e-Poster-Shevrin-3273
SGF15--e-Poster-Shevrin-3273SGF15--e-Poster-Shevrin-3273
SGF15--e-Poster-Shevrin-3273ThomsonReuters
 
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisWorking with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisRob Emanuele
 
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingChristian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingFlink Forward
 

La actualidad más candente (20)

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Sky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor ComputationSky Arrays - ArrayDB in action for Sky View Factor Computation
Sky Arrays - ArrayDB in action for Sky View Factor Computation
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
 
Scaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rushScaling graphite to handle a zerg rush
Scaling graphite to handle a zerg rush
 
Introduction to yarn
Introduction to yarnIntroduction to yarn
Introduction to yarn
 
MapReduce
MapReduceMapReduce
MapReduce
 
Scaling Graphite At Yelp
Scaling Graphite At YelpScaling Graphite At Yelp
Scaling Graphite At Yelp
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoop
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)
 
Highly Available Graphite
Highly Available GraphiteHighly Available Graphite
Highly Available Graphite
 
GIS file types
GIS file typesGIS file types
GIS file types
 
Smallworld to GreGG - FME Server Automation
Smallworld to GreGG - FME Server AutomationSmallworld to GreGG - FME Server Automation
Smallworld to GreGG - FME Server Automation
 
SGF15--e-Poster-Shevrin-3273
SGF15--e-Poster-Shevrin-3273SGF15--e-Poster-Shevrin-3273
SGF15--e-Poster-Shevrin-3273
 
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisWorking with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and Geotrellis
 
Christian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream ProcessingChristian Kreuzfeld – Static vs Dynamic Stream Processing
Christian Kreuzfeld – Static vs Dynamic Stream Processing
 

Similar a VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC

Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsFlink Forward
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Graph processing
Graph processingGraph processing
Graph processingyeahjs
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkVasia Kalavri
 
Customer Education Webcast: New Features in Data Integration and Streaming CDC
Customer Education Webcast: New Features in Data Integration and Streaming CDCCustomer Education Webcast: New Features in Data Integration and Streaming CDC
Customer Education Webcast: New Features in Data Integration and Streaming CDCPrecisely
 
Social Networks Analysis
Social Networks AnalysisSocial Networks Analysis
Social Networks AnalysisJoud Khattab
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processinghuguk
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitMilind Bhandarkar
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce scriptHaripritha
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersKumari Surabhi
 

Similar a VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC (20)

Graph chi
Graph chiGraph chi
Graph chi
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Graph processing
Graph processingGraph processing
Graph processing
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Customer Education Webcast: New Features in Data Integration and Streaming CDC
Customer Education Webcast: New Features in Data Integration and Streaming CDCCustomer Education Webcast: New Features in Data Integration and Streaming CDC
Customer Education Webcast: New Features in Data Integration and Streaming CDC
 
Social Networks Analysis
Social Networks AnalysisSocial Networks Analysis
Social Networks Analysis
 
Ecsr tutorial
Ecsr tutorialEcsr tutorial
Ecsr tutorial
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
 

Último

Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 

Último (20)

Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 

VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC

  • 1. VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC Jiefeng Cheng1, Qin Liu2, Zhenguo Li1, Wei Fan1, John C.S. Lui2, Cheng He1 1Huawei Noah’s Ark Lab 2 The Chinese University of Hong Kong ICDE’15
  • 2. Graph is everywhere  We have large graphs • Web graph • Social graph • User-movie ratings graph • …  Graph Computation • PageRank • Community detection • ALS for collaborative filtering • …
  • 3. Mining from Big Graphs: two feasible ways  Distributed systems • Pregel[SIGMOD’10], GraphLab[OSDI’12], GraphX[OSDI’14], Giraph, ... • Expensive cluster, complex setup, writing distributed programs  Single-machine system • Disk: GraphChi[OSDI’12], X-Stream[SOSP’13] • SSD: TurboGraph[KDD’13], FlashGraph[FAST’15] • Computation time close to distributed systems • PageRank on Twitter graph (41M nodes, 1.4B edges) • Spark: 8.1min with 50 machines (each with 2 CPUs, 7.5G RAM)[Stanton KDD’12] • VENUS: 8 min on a single machine with quad-core CPU, 16G RAM • Affordable, easy to program/debug
  • 4. Existing Systems  Vertex-centric programming model: popularized by Pregel / GraphLab / GraphChi • Each vertex updates itself based on its neighborhood  GraphChi • Updated data on each vertex must be propagated to its neighbors through disk • Extensive disk I/O  X-Stream • Different API: edge-centric programming • Less expressive, re-implement common algorithms • Also use disk to propagate updates
  • 5. Our Contributions  Design and implement a disk-based system, VENUS • A new vertex-centric streamlined processing model • Separate mutable vertex data and immutable edge data • Read/Write less data compared to other systems  Evaluation on large graphs • Outperform GraphChi and X-Stream • Verify that our design reduce data access
  • 6. Vertex-Centric Programming  Consider GraphChi for each iteration for each vertex v update(v) void update(v) fetch data from each in-edge update data on v spread data to each out-edge Duplicated data v
  • 7. Vertex-Centric Programming  VENUS: • Only store mutable values on vertices  Pros • Less data access • Enable ``streamlined’’ processing  Cons • Limited expressiveness void update(v) fetch data from each in-edge update data on v spread data to each out-edge in-neighbor v
  • 9. VENUS Architecture  Disk storage (offline) • Sharding • Separation of edge data and vertex data  Computing model (online) • Load edge data sequentially • Execute the update function on each vertex • How to load vertex data and propagate updates
  • 10. Sharding  Graph cannot fit in RAM? • Split the graph into shards  Each shard corresponds to an interval of vertices: • G-shard: immutable structure of graph • In-edges of nodes in the interval • V-shard: mutable vertex values • Vertex values of all vertices in the shard  Structure table: all g-shards  Value table: all vertex data Vertex ID 1 2 3 4 5 6 7 8 9 10 1 1 12 Data
  • 11. Interval I1=[1,4] I2=[5,8] I3=[9,12] G-shard 7,9,10 → 1 6,10 → 2 1,2,6 → 3 1,2,6,7,10 → 4 6,7,8,11 → 5 1,10 → 6 3,10,11 → 7 3,6,11 → 8 2,3,4,10,11 → 9 11 → 10 4,6 → 11 2,3,9,10,11 → 12 V-shard I1∪{6,7,9,10} I2∪{1,3,10,11} I3∪{2,3,4,6}
  • 12. Vertex-Centric Streamlined Processing  V-shards are much smaller than g-shards • Load each v-shard entirely into memory  Scan each g-shard sequentially • Execute the update function in parallel
  • 13. Execution Load v-shard 1 7,9,10 → 1 6,10 → 2 1,2,6 → 3 1,2,6,7,10 → 4 Update v-shard 1 Load v-shard 2 6,7,8,11 → 5 1,10 → 6 3,10,11 → 7 3,6,11 → 8 Update v-shard 2 Load v-shard 3 2,3,4,10,11 → 9 11 → 10 4,6 → 11 2,3,9,10,11 → 12 Update v-shard 3 LoadingExecution Parallelize execution and loading
  • 14. Load and Update v-shards  Two I/O efficient algorithms • Algorithm 1: Extension of PSW in GraphChi (skip) • Algorithm 2: Merge-Join • Load: merge-join between value table and v-shard • Update: write values of [1,4] back to vertex table  Use value buffer to cache value table ID 1 2 3 4 5 6 7 8 9 10 1 1 12 Data Value table on disk ID 1 2 3 4 6 7 9 10 Vertices in v-shard 1 on disk ID 1 2 3 4 6 7 9 10 Data Loaded v-shard 1
  • 15. Evaluation of VENUS  Setup: a commodity PC • quad-core 3.4GHz CPU • 16GB RAM and 4TB hard disk  Main competitors: • GraphChi and X-Stream  Applications: • PageRank • WCC: weakly connected components • CD: community detection • ALS: alternating least square for collaborative filtering • Shortest path, label propagations, etc.
  • 16. PageRank on Twitter  Twitter follow-graph: 41M nodes, 1.4B edges
  • 17. Cost of Updates Propagation: Data Write and Read
  • 18. Applications: WCC, CD, ALS Failed to implement CD on X-Stream, due to its edge-centric programming model
  • 19. Web-Scale Graph  Clueweb12: web scale graph • 978 million nodes, 42.5 billion edges • 402 GB on disk • 2 iterations of PageRank  Computation time • GraphChi: 4.3 hours • X-Stream: 7.4 hours • VENUS-I: 2 hours • VENUS-II: 1.8 hours
  • 20. Conclusion  Present a disk-based graph computation system, VENUS  Our design of graph storage and execution can reduce data access and I/O  Evaluations show it outperforms GraphChi and X-Stream  Also VENUS can handle billion-scale problems