SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
Pregel: A System for Large-
Scale Graph Processing
2014 / 5 /14
Ishikawa Yasutaka
About this Paper
• Authers:Malewicz, GrzegorzAustern, Matthew HBik,
Aart J.CDehnert, James CHorn, IlanLeiser,
NatyCzajkowski, Grzegorz
• Google’s paper
• Proceedings of the 2010 international conference
on Management of data - SIGMOD '10
2
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
3
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
4
Today’s problems of graph
processing
• Poor locality of memory access
• Very little work ver vertex
5
Methods of graph
processing…(1/2)
1. Crafting a custom distributed infrastructure
→typically requiring a substantial implementation effort
2. Relying on an existing distributed computing
platform(e.g.,MapReduce)
→this can lead to suboptimal performance and usability
issues.
6
Methods of graph
processing…(2/2)
3. Using a single-computer graph algorithm library
→limiting the scale of problems
4. Using an existing parallel graph system
→do not address fault tolerance or other issues that are
important for very large scale distributed systems
7
What is Pregel
• Scalable graph processing model
- Based on BSP(Bulk Synchronous Parallel)
- Designed for efficient,scalable and fault- tolerant
Implementation on clusters
- Distribution-related details are hidden behind an
abstract API
• Not open source software
- Apach Giraph is a open source software
implementation of Pregel
8
Bulk Synchronous Parallel
• Bridging model for designing parallel algorithm
• BPS iterates superstep for computing
and synchronize all
processes at
each superstep
superstep
9
BSP’s algorithm(1/3)
1. Concurrent computation
2. Communication
3. Barrier synchronisation
Each thread processes their
data
concurrently,independently
10
BSP’s algorithm(2/3)
1. Concurrent computation
2. Communication
3. Barrier synchronisation
They pass messages
11
BSP’s algorithm(3/3)
1. Concurrent computation
2. Communication
3. Barrier synchronisation
They wait for completion
of message passing of all
other tread
Next superstep…
12
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
13
Pregel’s input and output
• Input: graph
• Output: graph
• Iterate superstep,which
consists of user defined function,
message passing
Graph:Input
Graph:output
Superstep
Superstep
Superstep
14
Graph component
• Graph of Pregel consists of vertex and edge
• Vertex:
- Consisting of unique identifier, user defined value
- Outgoing edge and value are modifiable
• Edge:
- Consisting of source vertex, target vertex, user defined value
- User defined value is modifiable
- Not first class citizen
A B
Vertex value is modifiable
D
C
B
A
D
C
B
A
Outgoing edge
and edge value
are modifiablea
b c
d
15
State of vertex
• Vertex has two states:Active,Inactive
• In case vertex receives message, chage state to
Active
• In case vertex has no message, change state to
Inactive
Active Inactive
Vote to halt
Message received 16
Pregel’s Superstep
1. In Superstep S,vertex V, compute user defined fuction
with messages send in Superstep S-1
2. Send messages to other vertices that will be received in
Superstep S+1
3. Modify the state of V
4. If all other vertices finish 1~3, go to Superstep S+1
• When no further vertices change in a superstep, algorithm
terminates with output
17
Example: maximum value(1/4)
3 6 2 1
3 6 2 1
:Active
:Inactive
Superstep 0
18
Example: maximum value(2/4)
3 6 2 1
6 6 2 6
6 6 2 6
:Active
:Inactive
Superstep 0
Superstep 1
19
Example: maximum value(3/4)
3 6 2 1
6 6 2 6
6 6 6 6
6 6 6 6
:Active
:Inactive
Superstep 0
Superstep 1
Superstep 2
20
Example: maximum value(4/4)
3 6 2 1
6 6 2 6
6 6 6 6
6 6 6 6
:Active
:Inactive
Superstep 0
Superstep 1
Superstep 2
Superstep 3
21
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
22
Vertex class
• Writing Pregel program involves subclassing the predefined Vertex class
• Compute() method will be executed at each active vertex
23
Message Passing
• The type of message which sent by vertex is
specified by the user as template parameter of
Vertex class
• There is no guaranteed order of messages in the
iterator, but it is guaranteed that messages will be
delivered
24
Combiners
• Sending a message to a vertex on another machine
incurs some overhead
• In some case, using combiners can reduce the
number of messages
• To enable this, user subclass
Conbiner class
Reduction of messages
25
Aggregators(1/2)
• Pregel aggregators are a mechanism for global
communication
• Each vertex can provide a value in Superstep S, and
this value is made available to all vertices in
Superstep S+1
Superstep S
4
2
1
Superstep S+1
7
7
7
4+2+1…
Sum aggregator:
number of edges 26
Aggregators(2/2)
• To define a new aggregator, a user subclasses the
predefined Aggregator class
Superstep S
4
2
1
Superstep S+1
7
7
7
4+2+1…
Sum aggregator:
number of edges 27
Topology Mutations(1/2)
• Some graph algorithms need to change the graph’s
topology
- Clustering algorithm
- Minimum spanning tree algorithm
• User’s Compute() function can issue requests to
add or remove vertices or edges
- it causes conflicts
28
Topology Mutations(2/2)
• We can solve this conflict using two mechanisms
- Partial ordering: edge remove → vertex remove → vertex
addition → edge addition
- Handler: This picks one arbitrary. User can define
hundler method in vertex subclass
• Partial ordering yields deterministic for most
conflict
29
Input and output
• Pregel adapts to many file format in input and
output
- It decouples the task of interpreting an input file from
task of graph computation
- Library provides readers and writers
- Users can write own by subclassing Reader and Writer
File format A
File format B
R
e
a
d
e
r
C
o
m
p
u
t
e
File format C
File format D
W
r
i
t
e
r
30
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
31
Basic architecture(1/2)
• The Pregel library divides a graph into partitions
• Assignment of a vertex to a partition depends
sololy on vertex ID
- Default partitioning function is Hash(ID):mod N
32
Basic architecture(2/2)
• The execution of a Pregel program consists of
several stages
1. Many copies of the user program begin executing on a
cluster of machines. One of these acts as the master
2. The master determines how many partitions the graph
will have, and assigns partitions to each worker
3. The master assigns a portion of the user’s input to
each worker
4. The master instructs each worker to perform a
superstep
33
Fault tolerance(1/2)
• Fault tolerance is achieved through chechpointing
• The master instructs workers to save the state of
their partitions to persistent storage is
- Including vertex values,edge values,imcoming messages
- Master separately saves the aggregator values
34
Fault tolerance(2/2)
• Worker failures are detected using regular “ping”
messages the master issues to workers
• When one or more workers fail, the master
reassigns graph partitions to the workers
- Repeating the missing Supersteps
35
Worker implementation
• A worker machine maintains the state of its portion
of the graph in memory
• There are two copies of active flag and incoming
message queue
• One for the current superstep and another for the next
superstep
• In message sending, there are two pattern: remote,
local
36
Master implementation
• The master assigns unique identifier to each worker at
the time of registration
• The master maintains a list of all workers known to be
active
• If any worker fails, the master enters recovery mode
• The master runs an HTTP server that display statistics
about the progress of computation
37
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
38
[1]Page Rank(1/2)
• Page Rank algorithm decide the importance of web
pages
• This algorithm is based on evaluation of paper
- Good paper might be cited from many other papers
- 「A paper that is cited from papers cited from many
papers」 might be good paper
• This is named from one of Google’s founders,
Larry “Page”
39
[1]Page Rank(2/2)
40
[2]Shortest Path(1/6)
• Shortest-Path problem: calculate the shortest path
in given two nodes of a weighted graph
• There is several variety of Shortest-Path problem
- The single-source shortest paths problem
- The s-t shortest path problem
- All-pairs shortest paths problem
• In this paper, focusing on single-source shortest
paths problems
41
[2]Shortest Path(2/6)
∞ ∞
0 ∞
∞
∞
∞
5
3
1 4
3 2
1
2
4
Superstep 0
42
[2]Shortest Path(3/6)
5 ∞
0 3
∞
∞
∞
5
3
1 4
3 2
1
2
4
Superstep 1
43
[2]Shortest Path(4/6)
4 6
0 3
6
∞
5
5
3
1 4
3 2
1
2
4
Superstep 2
44
[2]Shortest Path(5/6)
4 5
0 3
6
9
5
5
3
1 4
3 2
1
2
4
Superstep 3
45
[2]Shortest Path(6/6)
46
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
47
Experiment details
• Three experiments with the single-source shortest
paths
• Using a cluster of 300 multicore commodity PCs
• Reporting runtime for binary trees and log-normal
graphs
- Binary tree, varying number of worker tasks
- Binary tree, varying graph sizes
- Log-normal, random graphs: varying graph sizes
48
[1]1 billion vertex binary tree:varying
number of worker tasks
• Setting
- A billion vertices, the
number of Pregel
workers varying from
50 to 800
• Result
- Using 16 times as
many as Workers
represents a speedup
of about 10
49
[2]Binary tree:varying graph sizes
on 800 worker tasks
• Setting
- Varying in size from
a billion to 50 billion
vertices,using a fixed
numberof 800 worker
tasks
• Result
- tree size varying from
a billion to 50 billion,
the time increase from
17.3 to 702
50
[3]Log-normal random graphs:
varying graph sizes on 800 worker
tasks(1/2)
• Binary trees are not representative of graphs
encountered in practice
• Use a log-normal distribution of outdegrees
• In this experiment, μ = 4, σ = 1.3
e
d
d
dp
22
2/)(ln
2
1
)(




51
[3]Log-normal random graphs:
varying graph sizes on 800 worker
tasks(2/2)
• Setting
- Varying in size from
10million to a a billion
vertices
• Result
- Largest graph took
a little over 10 minutes
52
Outline
• Introduction
• Model of computation
• Pregel’s API
• Implementation
• Application
• Experiments
• conclusion
53
Conclusion
• They suggest a computing model that is suitable for
graph processing, and has scalability, fault-
tolerance
• They say that programmers can implement graph
processing algorithm easily with Pregel
54
This slide’s sources(1/)
• http://www.slideshare.net/doryokujin/largescale-
graph-processingintroduction
• http://shnya.jp/blog/?p=797
• http://www.slideshare.net/sscdotopen/introducing
-apache-giraph-for-large-scale-graph-processing
• http://teppei.hateblo.jp/entry/2013/11/11/232052
• http://ja.wikipedia.org/wiki/%E5%AF%BE%E6%95%
B0%E6%AD%A3%E8%A6%8F%E5%88%86%E5%B8
%83
55
This slide’s sources(2/)
• http://keisan.casio.jp/exec/system/1161228861
• http://www.atmarkit.co.jp/ait/articles/1203/22/ne
ws165_2.html
• http://en.wikipedia.org/wiki/Bulk_synchronous_pa
rallel
• http://research.preferred.jp/2011/06/bsp_piccolo_
spark_introduction/
• http://ja.wikipedia.org/wiki/%E3%83%9A%E3%83%
BC%E3%82%B8%E3%83%A9%E3%83%B3%E3%82%
AF
56
This slide’s sources(3/)
• http://ja.wikipedia.org/wiki/%E3%82%B9%E3%83%
91%E3%83%8B%E3%83%B3%E3%82%B0%E3%83%
84%E3%83%AA%E3%83%BC%E3%83%97%E3%83%
AD%E3%83%88%E3%82%B3%E3%83%AB
• http://ja.wikipedia.org/wiki/%E6%9C%80%E7%9F%
AD%E7%B5%8C%E8%B7%AF%E5%95%8F%E9%A1%
8C
• http://matome.naver.jp/odai/21286852451259207
01?&page=1
• http://www.cs.ucsb.edu/~prakash/projects/cs290b
/index.html
57
This slide’s sources
• http://homepage2.nifty.com/well/Template.html
• http://ja.wikipedia.org/wiki/%E7%AC%AC%E4%B8
%80%E7%B4%9A%E3%82%AA%E3%83%96%E3%82
%B8%E3%82%A7%E3%82%AF%E3%83%88
• http://ja.wikipedia.org/wiki/%E3%82%AF%E3%83%
AA%E3%83%BC%E3%82%AF_(%E3%82%B0%E3%8
3%A9%E3%83%95%E7%90%86%E8%AB%96)
• http://www.alaxala.com/jp/techinfo/archive/manu
al/AX2000R/HTML/KAISETS2/0078.HTM
58

Más contenido relacionado

La actualidad más candente

Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sortMadhu Bala
 
Message queue architecture
Message queue architectureMessage queue architecture
Message queue architectureMajdee Zoabi
 
Operating system 30 preemptive scheduling
Operating system 30 preemptive schedulingOperating system 30 preemptive scheduling
Operating system 30 preemptive schedulingVaibhav Khanna
 
Merge sort analysis and its real time applications
Merge sort analysis and its real time applicationsMerge sort analysis and its real time applications
Merge sort analysis and its real time applicationsyazad dumasia
 
L attribute in compiler design
L  attribute in compiler designL  attribute in compiler design
L attribute in compiler designkhush_boo31
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lexAnusuya123
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platformsSyed Zaid Irshad
 
Ch 18 intro to network layer - section 3
Ch 18   intro to network layer - section 3Ch 18   intro to network layer - section 3
Ch 18 intro to network layer - section 3Hossam El-Deen Osama
 
Recurrence relation solutions
Recurrence relation solutionsRecurrence relation solutions
Recurrence relation solutionssubhashchandra197
 

La actualidad más candente (20)

Microkernel
MicrokernelMicrokernel
Microkernel
 
Divide and conquer - Quick sort
Divide and conquer - Quick sortDivide and conquer - Quick sort
Divide and conquer - Quick sort
 
Message queue architecture
Message queue architectureMessage queue architecture
Message queue architecture
 
Min-Max algorithm
Min-Max algorithmMin-Max algorithm
Min-Max algorithm
 
Operating system 30 preemptive scheduling
Operating system 30 preemptive schedulingOperating system 30 preemptive scheduling
Operating system 30 preemptive scheduling
 
Merge sort analysis and its real time applications
Merge sort analysis and its real time applicationsMerge sort analysis and its real time applications
Merge sort analysis and its real time applications
 
L attribute in compiler design
L  attribute in compiler designL  attribute in compiler design
L attribute in compiler design
 
2. microkernel new
2. microkernel new2. microkernel new
2. microkernel new
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lex
 
Congestion control in TCP
Congestion control in TCPCongestion control in TCP
Congestion control in TCP
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
LISP: Introduction to lisp
LISP: Introduction to lispLISP: Introduction to lisp
LISP: Introduction to lisp
 
Error control
Error controlError control
Error control
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platforms
 
Sequential consistency model
Sequential consistency modelSequential consistency model
Sequential consistency model
 
Ch 18 intro to network layer - section 3
Ch 18   intro to network layer - section 3Ch 18   intro to network layer - section 3
Ch 18 intro to network layer - section 3
 
Ch5 answers
Ch5 answersCh5 answers
Ch5 answers
 
Recurrence relation solutions
Recurrence relation solutionsRecurrence relation solutions
Recurrence relation solutions
 

Similar a Pregel reading circle

Mining quasi bicliques using giraph
Mining quasi bicliques using giraphMining quasi bicliques using giraph
Mining quasi bicliques using giraphHsiao-Fei Liu
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationAhmad El Tawil
 
Cloud computing_processing frameworks
Cloud computing_processing frameworksCloud computing_processing frameworks
Cloud computing_processing frameworksReem Abdel-Rahman
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce scriptHaripritha
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfTSANKARARAO
 
MapReduce Programming Model
MapReduce Programming ModelMapReduce Programming Model
MapReduce Programming ModelAdarshaDhakal
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraphsscdotopen
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance IssuesAntonios Katsarakis
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakovmistercteam
 
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Universal Graph Transformer ...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Universal Graph Transformer ...NS-CUK Joint Journal Club: V.T.Hoang, Review on "Universal Graph Transformer ...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Universal Graph Transformer ...ssuser4b1f48
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsQian Lin
 

Similar a Pregel reading circle (20)

Mining quasi bicliques using giraph
Mining quasi bicliques using giraphMining quasi bicliques using giraph
Mining quasi bicliques using giraph
 
Pregel
PregelPregel
Pregel
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Cloud computing_processing frameworks
Cloud computing_processing frameworksCloud computing_processing frameworks
Cloud computing_processing frameworks
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
MapReduce Programming Model
MapReduce Programming ModelMapReduce Programming Model
MapReduce Programming Model
 
PREDIcT
PREDIcTPREDIcT
PREDIcT
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
MapReduce
MapReduceMapReduce
MapReduce
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakov
 
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Universal Graph Transformer ...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Universal Graph Transformer ...NS-CUK Joint Journal Club: V.T.Hoang, Review on "Universal Graph Transformer ...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Universal Graph Transformer ...
 
Map reduce
Map reduceMap reduce
Map reduce
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
 
try
trytry
try
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Pregel reading circle

  • 1. Pregel: A System for Large- Scale Graph Processing 2014 / 5 /14 Ishikawa Yasutaka
  • 2. About this Paper • Authers:Malewicz, GrzegorzAustern, Matthew HBik, Aart J.CDehnert, James CHorn, IlanLeiser, NatyCzajkowski, Grzegorz • Google’s paper • Proceedings of the 2010 international conference on Management of data - SIGMOD '10 2
  • 3. Outline • Introduction • Model of computation • Pregel’s API • Implementation • Application • Experiments • conclusion 3
  • 4. Outline • Introduction • Model of computation • Pregel’s API • Implementation • Application • Experiments • conclusion 4
  • 5. Today’s problems of graph processing • Poor locality of memory access • Very little work ver vertex 5
  • 6. Methods of graph processing…(1/2) 1. Crafting a custom distributed infrastructure →typically requiring a substantial implementation effort 2. Relying on an existing distributed computing platform(e.g.,MapReduce) →this can lead to suboptimal performance and usability issues. 6
  • 7. Methods of graph processing…(2/2) 3. Using a single-computer graph algorithm library →limiting the scale of problems 4. Using an existing parallel graph system →do not address fault tolerance or other issues that are important for very large scale distributed systems 7
  • 8. What is Pregel • Scalable graph processing model - Based on BSP(Bulk Synchronous Parallel) - Designed for efficient,scalable and fault- tolerant Implementation on clusters - Distribution-related details are hidden behind an abstract API • Not open source software - Apach Giraph is a open source software implementation of Pregel 8
  • 9. Bulk Synchronous Parallel • Bridging model for designing parallel algorithm • BPS iterates superstep for computing and synchronize all processes at each superstep superstep 9
  • 10. BSP’s algorithm(1/3) 1. Concurrent computation 2. Communication 3. Barrier synchronisation Each thread processes their data concurrently,independently 10
  • 11. BSP’s algorithm(2/3) 1. Concurrent computation 2. Communication 3. Barrier synchronisation They pass messages 11
  • 12. BSP’s algorithm(3/3) 1. Concurrent computation 2. Communication 3. Barrier synchronisation They wait for completion of message passing of all other tread Next superstep… 12
  • 13. Outline • Introduction • Model of computation • Pregel’s API • Implementation • Application • Experiments • conclusion 13
  • 14. Pregel’s input and output • Input: graph • Output: graph • Iterate superstep,which consists of user defined function, message passing Graph:Input Graph:output Superstep Superstep Superstep 14
  • 15. Graph component • Graph of Pregel consists of vertex and edge • Vertex: - Consisting of unique identifier, user defined value - Outgoing edge and value are modifiable • Edge: - Consisting of source vertex, target vertex, user defined value - User defined value is modifiable - Not first class citizen A B Vertex value is modifiable D C B A D C B A Outgoing edge and edge value are modifiablea b c d 15
  • 16. State of vertex • Vertex has two states:Active,Inactive • In case vertex receives message, chage state to Active • In case vertex has no message, change state to Inactive Active Inactive Vote to halt Message received 16
  • 17. Pregel’s Superstep 1. In Superstep S,vertex V, compute user defined fuction with messages send in Superstep S-1 2. Send messages to other vertices that will be received in Superstep S+1 3. Modify the state of V 4. If all other vertices finish 1~3, go to Superstep S+1 • When no further vertices change in a superstep, algorithm terminates with output 17
  • 18. Example: maximum value(1/4) 3 6 2 1 3 6 2 1 :Active :Inactive Superstep 0 18
  • 19. Example: maximum value(2/4) 3 6 2 1 6 6 2 6 6 6 2 6 :Active :Inactive Superstep 0 Superstep 1 19
  • 20. Example: maximum value(3/4) 3 6 2 1 6 6 2 6 6 6 6 6 6 6 6 6 :Active :Inactive Superstep 0 Superstep 1 Superstep 2 20
  • 21. Example: maximum value(4/4) 3 6 2 1 6 6 2 6 6 6 6 6 6 6 6 6 :Active :Inactive Superstep 0 Superstep 1 Superstep 2 Superstep 3 21
  • 22. Outline • Introduction • Model of computation • Pregel’s API • Implementation • Application • Experiments • conclusion 22
  • 23. Vertex class • Writing Pregel program involves subclassing the predefined Vertex class • Compute() method will be executed at each active vertex 23
  • 24. Message Passing • The type of message which sent by vertex is specified by the user as template parameter of Vertex class • There is no guaranteed order of messages in the iterator, but it is guaranteed that messages will be delivered 24
  • 25. Combiners • Sending a message to a vertex on another machine incurs some overhead • In some case, using combiners can reduce the number of messages • To enable this, user subclass Conbiner class Reduction of messages 25
  • 26. Aggregators(1/2) • Pregel aggregators are a mechanism for global communication • Each vertex can provide a value in Superstep S, and this value is made available to all vertices in Superstep S+1 Superstep S 4 2 1 Superstep S+1 7 7 7 4+2+1… Sum aggregator: number of edges 26
  • 27. Aggregators(2/2) • To define a new aggregator, a user subclasses the predefined Aggregator class Superstep S 4 2 1 Superstep S+1 7 7 7 4+2+1… Sum aggregator: number of edges 27
  • 28. Topology Mutations(1/2) • Some graph algorithms need to change the graph’s topology - Clustering algorithm - Minimum spanning tree algorithm • User’s Compute() function can issue requests to add or remove vertices or edges - it causes conflicts 28
  • 29. Topology Mutations(2/2) • We can solve this conflict using two mechanisms - Partial ordering: edge remove → vertex remove → vertex addition → edge addition - Handler: This picks one arbitrary. User can define hundler method in vertex subclass • Partial ordering yields deterministic for most conflict 29
  • 30. Input and output • Pregel adapts to many file format in input and output - It decouples the task of interpreting an input file from task of graph computation - Library provides readers and writers - Users can write own by subclassing Reader and Writer File format A File format B R e a d e r C o m p u t e File format C File format D W r i t e r 30
  • 31. Outline • Introduction • Model of computation • Pregel’s API • Implementation • Application • Experiments • conclusion 31
  • 32. Basic architecture(1/2) • The Pregel library divides a graph into partitions • Assignment of a vertex to a partition depends sololy on vertex ID - Default partitioning function is Hash(ID):mod N 32
  • 33. Basic architecture(2/2) • The execution of a Pregel program consists of several stages 1. Many copies of the user program begin executing on a cluster of machines. One of these acts as the master 2. The master determines how many partitions the graph will have, and assigns partitions to each worker 3. The master assigns a portion of the user’s input to each worker 4. The master instructs each worker to perform a superstep 33
  • 34. Fault tolerance(1/2) • Fault tolerance is achieved through chechpointing • The master instructs workers to save the state of their partitions to persistent storage is - Including vertex values,edge values,imcoming messages - Master separately saves the aggregator values 34
  • 35. Fault tolerance(2/2) • Worker failures are detected using regular “ping” messages the master issues to workers • When one or more workers fail, the master reassigns graph partitions to the workers - Repeating the missing Supersteps 35
  • 36. Worker implementation • A worker machine maintains the state of its portion of the graph in memory • There are two copies of active flag and incoming message queue • One for the current superstep and another for the next superstep • In message sending, there are two pattern: remote, local 36
  • 37. Master implementation • The master assigns unique identifier to each worker at the time of registration • The master maintains a list of all workers known to be active • If any worker fails, the master enters recovery mode • The master runs an HTTP server that display statistics about the progress of computation 37
  • 38. Outline • Introduction • Model of computation • Pregel’s API • Implementation • Application • Experiments • conclusion 38
  • 39. [1]Page Rank(1/2) • Page Rank algorithm decide the importance of web pages • This algorithm is based on evaluation of paper - Good paper might be cited from many other papers - 「A paper that is cited from papers cited from many papers」 might be good paper • This is named from one of Google’s founders, Larry “Page” 39
  • 41. [2]Shortest Path(1/6) • Shortest-Path problem: calculate the shortest path in given two nodes of a weighted graph • There is several variety of Shortest-Path problem - The single-source shortest paths problem - The s-t shortest path problem - All-pairs shortest paths problem • In this paper, focusing on single-source shortest paths problems 41
  • 42. [2]Shortest Path(2/6) ∞ ∞ 0 ∞ ∞ ∞ ∞ 5 3 1 4 3 2 1 2 4 Superstep 0 42
  • 43. [2]Shortest Path(3/6) 5 ∞ 0 3 ∞ ∞ ∞ 5 3 1 4 3 2 1 2 4 Superstep 1 43
  • 44. [2]Shortest Path(4/6) 4 6 0 3 6 ∞ 5 5 3 1 4 3 2 1 2 4 Superstep 2 44
  • 45. [2]Shortest Path(5/6) 4 5 0 3 6 9 5 5 3 1 4 3 2 1 2 4 Superstep 3 45
  • 47. Outline • Introduction • Model of computation • Pregel’s API • Implementation • Application • Experiments • conclusion 47
  • 48. Experiment details • Three experiments with the single-source shortest paths • Using a cluster of 300 multicore commodity PCs • Reporting runtime for binary trees and log-normal graphs - Binary tree, varying number of worker tasks - Binary tree, varying graph sizes - Log-normal, random graphs: varying graph sizes 48
  • 49. [1]1 billion vertex binary tree:varying number of worker tasks • Setting - A billion vertices, the number of Pregel workers varying from 50 to 800 • Result - Using 16 times as many as Workers represents a speedup of about 10 49
  • 50. [2]Binary tree:varying graph sizes on 800 worker tasks • Setting - Varying in size from a billion to 50 billion vertices,using a fixed numberof 800 worker tasks • Result - tree size varying from a billion to 50 billion, the time increase from 17.3 to 702 50
  • 51. [3]Log-normal random graphs: varying graph sizes on 800 worker tasks(1/2) • Binary trees are not representative of graphs encountered in practice • Use a log-normal distribution of outdegrees • In this experiment, μ = 4, σ = 1.3 e d d dp 22 2/)(ln 2 1 )(     51
  • 52. [3]Log-normal random graphs: varying graph sizes on 800 worker tasks(2/2) • Setting - Varying in size from 10million to a a billion vertices • Result - Largest graph took a little over 10 minutes 52
  • 53. Outline • Introduction • Model of computation • Pregel’s API • Implementation • Application • Experiments • conclusion 53
  • 54. Conclusion • They suggest a computing model that is suitable for graph processing, and has scalability, fault- tolerance • They say that programmers can implement graph processing algorithm easily with Pregel 54
  • 55. This slide’s sources(1/) • http://www.slideshare.net/doryokujin/largescale- graph-processingintroduction • http://shnya.jp/blog/?p=797 • http://www.slideshare.net/sscdotopen/introducing -apache-giraph-for-large-scale-graph-processing • http://teppei.hateblo.jp/entry/2013/11/11/232052 • http://ja.wikipedia.org/wiki/%E5%AF%BE%E6%95% B0%E6%AD%A3%E8%A6%8F%E5%88%86%E5%B8 %83 55
  • 56. This slide’s sources(2/) • http://keisan.casio.jp/exec/system/1161228861 • http://www.atmarkit.co.jp/ait/articles/1203/22/ne ws165_2.html • http://en.wikipedia.org/wiki/Bulk_synchronous_pa rallel • http://research.preferred.jp/2011/06/bsp_piccolo_ spark_introduction/ • http://ja.wikipedia.org/wiki/%E3%83%9A%E3%83% BC%E3%82%B8%E3%83%A9%E3%83%B3%E3%82% AF 56
  • 57. This slide’s sources(3/) • http://ja.wikipedia.org/wiki/%E3%82%B9%E3%83% 91%E3%83%8B%E3%83%B3%E3%82%B0%E3%83% 84%E3%83%AA%E3%83%BC%E3%83%97%E3%83% AD%E3%83%88%E3%82%B3%E3%83%AB • http://ja.wikipedia.org/wiki/%E6%9C%80%E7%9F% AD%E7%B5%8C%E8%B7%AF%E5%95%8F%E9%A1% 8C • http://matome.naver.jp/odai/21286852451259207 01?&page=1 • http://www.cs.ucsb.edu/~prakash/projects/cs290b /index.html 57
  • 58. This slide’s sources • http://homepage2.nifty.com/well/Template.html • http://ja.wikipedia.org/wiki/%E7%AC%AC%E4%B8 %80%E7%B4%9A%E3%82%AA%E3%83%96%E3%82 %B8%E3%82%A7%E3%82%AF%E3%83%88 • http://ja.wikipedia.org/wiki/%E3%82%AF%E3%83% AA%E3%83%BC%E3%82%AF_(%E3%82%B0%E3%8 3%A9%E3%83%95%E7%90%86%E8%AB%96) • http://www.alaxala.com/jp/techinfo/archive/manu al/AX2000R/HTML/KAISETS2/0078.HTM 58