   Resource Sharing
       不同地區的Process連通時→USER A可使用USER B的資源
   Computation Speedup
       困難複雜的問題分派多個處理器綜合處理
   Reliability
       因各處理器有各自獨立的Memory→當有一個處理器受損
   Communication
       任何連通的USER皆可藉由網路互相通訊和諮詢
   資料傳輸
       Site A ─Data→ Site B
       資料可視需求而定,但格式需一致,避免遺失資料
   計算傳輸
       使用者將指令藉由網路傳送至遠端處理器
       由遠端處理器以Local Resources執行
       再將執行結果回傳予使用者
   行程傳輸
       將Process藉由網路傳送至遠端執行,用此執行的理由:
           Load Balancing
           Computation Speedup
           Hardware / Software Preference
           Data Access
C Socket for Windows
C Socket for Windows
   Server.c
int main() {
  SOCKET server_sockfd, client_sockfd;
  int server_len, client_len;
  struct sockaddr_in server_address , sockaddr_in client_address;
  // 註冊 Winsock DLL
  WSADATA wsadata;
  // 產生 server socket
  server_sockfd = socket(AF_INET, SOCK_STREAM, 0);
  // AF_INET(使用IPv4); SOCK_STREAM; 0(即TCP)
C Socket for Windows
    Server.c

    server_address.sin_family = AF_INET;
    server_address.sin_addr.s_addr = inet_addr("");
    server_address.sin_port = 1234;
    server_len = sizeof(server_address);

    bind(server_sockfd, (struct sockaddr *)
     &server_address, server_len);

    listen(server_sockfd, 5);   // 5(即佇列數)
C Socket for Windows
   Server.c
 while(1) {
    char ch;
    printf("Server waiting...n");
    client_len = sizeof(client_address);
    client_sockfd = accept(server_sockfd, (struct sockaddr *)
                                  &client_address, &client_len);
    recv(client_sockfd, &ch, 1, 0);       // 接收‟A‟
    ch++;                                 // „A‟→‟B‟
    send(client_sockfd, &ch, 1, 0);       // 傳送‟B‟
C Socket for Windows
   Client.c

int main() {
  SOCKET sockfd;
  int len , result;
  struct sockaddr_in address;
  char ch = 'A';
  WSADATA wsadata;
  sockfd = socket(AF_INET, SOCK_STREAM, 0);
  address.sin_family = AF_INET;
C Socket for Windows
    Client.c

    address.sin_addr.s_addr = inet_addr("");
    address.sin_port = 1234;
    len = sizeof(address);
    connect(sockfd, (struct sockaddr *)&address, len);
    send(sockfd, &ch, 1, 0);
    recv(sockfd, &ch, 1, 0);
    printf("char from server = %cn", ch);
Client and server with threads

                          Thread 2 makes
                          requests to server
                                        Receipt &
Thread 1                                  queuing
 results    T1
                                                        N threads

                                          Distributed Systems: Concepts and Design
Alternative server threading architectures

             workers             per-connection threads        per-object threads

    I/O                                      remote         I/O           remote
                 objects                     objects

 a. Thread-per-request     b. Thread-per-connection       c. Thread-per-object

                                           Distributed Systems: Concepts and Design
C Thread

C Thread
   pthread.c

#include <stdio.h>
#include <pthread.h>
void *thread_func(void *arg);
char message[] = "Hello World";
int main() {
   pthread_t thread;
   void *thread_result;
   pthread_create(&thread,NULL,thread_func,(void *)message);
   printf("Waiting for thread to finish...n");
C Thread
   pthread.c

    printf("Thread joined, it returned %sn",(char *)thread_result);
void *thread_func(void *arg) {
   printf("thread %s is runningn",(char *)arg);
   pthread_exit("Thange you use CPU Timen");
Java TCP Socket (per-connection threads)

                                                                String data = in.readUTF();
                                                                System.out.println("Received: "+ data) ;
public class Client {
                                                            }catch (IOException e){
    public static void main (String args[]) {
    Socket s = null;
                                                            }finally {
         int serverPort = 1234;
                                                                     try {s.close();}
         s = new Socket("localhost", serverPort);
                                                                     catch (IOException e){}
      DataInputStream in = new
DataInputStream( s.getInputStream());                       }
      DataOutputStream out = new                        }
DataOutputStream( s.getOutputStream());             }
Java TCP Socket (per-connection threads)

public class Server {
    public static void main(String args[]) {
              int serverPort = 1234;
              ServerSocket listenSocket = new ServerSocket(serverPort);
              while(true) {
                   Socket clientSocket = listenSocket.accept();
                   Connection c = new Connection(clientSocket);
            } catch(IOException e) {
Java TCP Socket (per-connection threads)
                                                             } catch(IOException e){
import*;                                              System.out.println(e.getMessage());}
import*;                                        }
class Connection extends Thread {                        public void run(){
    DataInputStream in;                                      try {
    DataOutputStream out;                                        String data = in.readUTF();
    Socket clientSocket;                                         out.writeUTF("client data is " + data);
    public Connection (Socket ClientSocket) {                } catch(IOException e) {
      try {                                                       System.out.println(e.getMessage());
        clientSocket = ClientSocket;                         } finally {
      in = new                                                    try {
DataInputStream( clientSocket.getInputStream());
      out = new
                                                                  } catch (IOException e) {}
DataOutputStream( clientSocket.getOutputStream());
   External
       Synchronize all clocks against a single one, usually
        the one with external, accurate time information
   Internal
       Synchronize all clocks among themselves

   At least time monotonicity must be preserved
   External (accuracy) :
       Each system clock Ci             S
        differs at most Dext at
        every point in the
        synchronization interval
        from an external UTC
        source S:
        |S - Ci| < Dext for all i   C1        C3

   Internal
    (agreement) :
       Any two system clocks      C1        C3
        Ci and Cj differs at
        most Dint at every point        C2
        in the synchronization
        interval from each
        | Cj - Ci| < Dint
        for all i and j
   Dext and Dint are synchronization bounds
   Dint <= 2Dext
   Max-Synch-interval = Dint / 2Dext
   It means:
       If two events have single-value timestamps which
        differ by less than some value,we CAN‟T SAY in
        which order the events occurred.
       With interval timestamps, when intervals overlap, we
        CAN‟T SAY in which order the events occurred.
 B                                                    B‟s clock time
       TA                 TA+Ttrans
 A                                                    A‟s clock time
                                                      real time
Tmin < Ttrans < Tmax
Ttrans= (Tmin+ Tmax)/2 is at most wrong by (Tmin- Tmax)/2
If A sends its clock time TA to B
→ B can set its clock to TA + (Tmin+ Tmax)/2
→ then A and B are synchronized with bound (Tmin- Tmax)/2
                                      Tmin       (Tmin+ Tmax)/2    Tmax


                                        (Tmin- Tmax)/2(Tmin- Tmax)/2
                        TB                 TB +Tround/2
    B                                                     B‟s clock time
        TA             TA+Ttrans                 T‟A
    A                                                     A‟s clock time


   In asynchronous system, we have no Tmax
   How can A synchronize with B?
   By using the round-trip time Tround=TA-T‟A in Cristian‟s algorithm:
    TB= TB+ Tround/2
JAVA RMI (External Clock Synchronize)
JAVA RMI (External Clock Synchronize)
import java.rmi.*;
public interface Clock extends Remote{
          String getTime() throws RemoteException;
import java.rmi.*;
import java.rmi.server.*;
import java.util.*;
public class ClockImpl extends UnicastRemoteObject implements Clock {
          public ClockImpl() throws RemoteException {
          public String getTime() {
                    Date d = new Date();
                    return d.toString();
JAVA RMI (External Clock Synchronize)

import java.rmi.*;
public class ClockServer {
         public ClockServer() {
                  try {
                           Clock c = new ClockImpl();
                  } catch (Exception e) {
         public static void main(String args[]) {
                  new ClockServer();
JAVA RMI (External Clock Synchronize)

import java.rmi.*;
public class ClockClient {
         public static void main(String args[]) {
             try {
                 Clock c = (Clock)Naming.lookup("//localhost/ClockService");
             } catch (Exception e) {
Logical time
   One aspect of clock synchronization is to provide a mechanism
    whereby systems can assign sequence numbers (“timestamps”) to
    messages upon which all cooperating processes can agree.
   Leslie Lamport (1978) showed that clock synchronization need
    not be absolute and L. Lamport„s two important points lead to
     First point:
           If two processes do not interact, it is not necessary that their
            clocks be synchronized
               they can operate concurrently without fear of interferring with each
       Second (critical) point:
           It is not important that all processes agree on time, but
            rather, that they agree on the order in which events occur
     Such “clocks” are referred to as Logical Clocks
   Logical time is based on happens-before relationship
事件序列 Event Ordering
   Happens before and concurrent events illustrated

                           No causal path neither
                           from e1 to e2 nor from e2 to e1
                           e1 and e2 are concurrent

                           from e1 to e6 nor from e6 to e1
                           e1 and e6 are concurrent

                           from e2 to e6 nor from e6 to e2
                           e2 and e6 are concurrent

                           Types of events
                               Internal (change of state)
協調 Co-ordination
   對於分散式系統的困難點
       Centralised solutions not appropriate
           communications bottleneck
       Fixed master-slave arrangements not appropriate
           process crashes
       Varying network topologies
           ring, tree, arbitrary; connectivity problems
       Failures must be tolerated if possible
           link failures
           process crashes
       Impossibility results
           in presence of failures, esp asynchronous model
Mutual Exclusion
   要求
       Safety
           At most one process may execute in CS at any time
       Liveness
           Every request to enter and exit a CS is eventually granted
       Ordering (desirable)
           Requests to enter are granted according to causality order (FIFO)

                                                         Centralized      Distributed
                                      Based on mutual       Central         Circulating
                                         exclusion          process            token

                                         No mutual       Physical Clock   Physical clocks
                                         exclusion        Event Count     Logical clocks
Mutual Exclusion
   執行分三大類
       Centralized Approach
           P1有意進入Critical Section時→傳遞一個意願訊息Request→C接受意願訊息Request →
            若Critical Section允許Process進入→傳遞一個允許訊息Reply→P1就能進入
           此時當P2也有意願進行Critical Section →C將P2之意願訊息置入至Waiting Queue
           當P1離開臨界區時→傳遞一個釋出訊息Release至C→C將傳遞一個允許訊息Reply至Waiting
       Distributed Approach
           比較Timestamp
           要知道網路上所有Node的Name及也要將本身的Name告知其它節點,降低增加節點的頻率
           當Node故障,系統應立刻通知其它Node且進行修復後,故應經常維護各Node正常運作
           Process未進入Critical Section,必會頻頻停頓等待其他Process之操作
       Token Passing Approach
           適當的路徑,避免Node發生Starvation
           若Token遺失,系統應重新設定一個Token補救
           若路徑有Node故障,系統應重組最佳新路徑
緊密聚合 Aotomicity
Two-Phase Commit Protocol

                            prepare(T)     <prepare T>

                               ready(T)        abort(T)
                               <ready T>       <no T>
Two-Phase Commit Protocol

                              commit(T)       abort(T)
                              <commit T>      <abort T>

                            acknowledge(T)   acknowledge(T)

                                             <complete T>
Failure Handling in 2PC
Failure Handling in 2PC
Deadlock Prevention and Avoidance
   資源編碼演算法Resources Ordering Algorithm
       將網路上所有的資料源依我們想像的工作進行Global Resources-
        ordering ,並給予唯一的編號
       當某Process當時正佔有資源i時,不得再對於小於i的資源提出要求,如此
       Simple to implement; requires little overhead
   銀行家演算法Banker‟s Algorithm
       分散式系統選出一個最適當的Process擔任銀行家Banker,管理網路上所有

   (New)時間戳記優先演算法Timestamp Priority Algorithm
       網路上所有Process的TS均設定為各Process之Priority Number
       TS愈小的Process其優先等級愈高(愈早發生)
       唯有優先等級較高的Process,可以向優先等級低的提出資源要求
Timestamp Priority Algorithm

                               TR=5    TR=10

                               TR=10   TR=15
Deadlock Detection

    區域等待圖Local Wait For Graph   全域等待圖Global Wait For Graph

   集中式執行Centralized Approach
   分散式執行Distributed Approach
   Computational Rounds
       同步將以計時器度量回合數
       非同步演算法將以透過網路散播事件的次數waves來決
   Local Running Time
   Spaced
       Global→所有電腦使用空間的總和
       Local→每台電腦需要使用多少空間
   Message complexity
       電腦傳送的總訊息數
           訊息M透過p個邊傳輸→訊息複雜度為p|M|,|M|代表M的長度
   Ring Leader
   Tree Leader
   BFS
   MST
Ring Leader
   每Process將它的id傳送到環狀裡的下一個Process
       從上一個Process收到一個識別號碼id
       將id與自己的識別號碼比較
       把兩值之中的最小值,傳送到環狀裡的下一個Process
  Input:The unique identifier, id, for the processor running
  Output:The smallest identifier of a processor in the ring
  M←[Candidate is id]
  Send message M to the successor processor in the ring
    Get message M from the predecessor processor in the ring.
    if M=[Candidate is i] then
       if i=id then
                M←[Leader is id]
             M←[Candidate is m]
       {M is a “Leader is” message}
   Send message M to the next processor in the ring
 until done
 return M
   Computational Rounds
       O(2N)
   Local Running Time
       O(N)
   Local Spaced
       O(1)
   Message Complexity
       O(N2)
Tree Leader
   假設網路是一個自由樹狀圖
       自然起始點
       外部節點
   非同步
       訊息檢查Message Check
           特定邊是否已送出訊息且到達該節點
       二階段
           Accumulation Phase
               id自樹的外部節點流入,記錄最小id的節點
               找出Leader
           Broadcast Phase
               廣播Leader id至各外部節點
  Input:The unique identifier, id, for the processor running
  Output:The smallest identifier of a processor in the ring
  {Accumulation Phase}
  Let d be the number of neighbors of processor id
  m ←0         {counter for messages received}
  ℓ ←id        {tentative leader}
    {begin a new round}
    for each neighbor j do
       check if a message from processor j has arrived
       if a message M = [Candidate is i] from j has arrived then
                ℓ←min{i. ℓ}
 until m > d-1
 if m=d then
    M←[Leader is ℓ]
    for each neighbor i≠k do
       send message M to processor j
    return M {M is a “leader is ” message}
    M←[Candidate is ℓ]
    send M to the neighbor k that has not sent a message yet
{Broadcast Phase}
  {begin a new round}
  check if a message from processor k has arrived
  if a message M from k has arrived then
       if M=[Candidate is i] then
              M←[Leader is ℓ]
              for each neighbor j do
                     send message M to process j
              {M is a “leader is” message}
              for each neighbor j≠k do
                      send message M to processor j
until m=d
return M      {M is a “leader is” message}
•   di為處理器i的相鄰Process之數量
   Computational Rounds
       O(D)
   Local Running Time
       O(diD)
   Local Spaced
       O(di)
   Message Complexity
       O(N)
Tree Leader

   同步
       一塊石頭被丟池塘內後引起的漣漪
       直徑Diameter為圖中任兩個節點之間最長之路徑之長度
       回合數為Diameter
       二階段
         Accumulation Phase:中心
         Broadcast Phase:向外傳播
Breadth-first Search
   認定s為source node
   同步
       以波wave的型態向外散播
       一層層由上往下建構BFS Tree
       每部節點v傳送訊息給先前沒有與v有所接觸的鄰居
       任一節點v必須選擇另一個節點v當父節點
  Input: The identifier v of the node (processor) executing this algorithm and
  the identifier s of the start node of the BFS traversal
  Output: For each node v, its parent in a BFS tree rooted at s
        {begin a new round}
        if v=s or v has received a message from one of its neighbors then
                set parent(v) to be a node requesting v to become its child
                 (or null, if v=s)
                for each node w adjacent to v that has not contacted v yet do
                         send a message to w asking w to become a child of v
  until v=s or v has received a message
   n個節點,m個邊
   Computational Rounds
   Local Running Time
   Local Spaced
   Message complexity
       O(n+m)
Breadth-first Search
   非同步
     要求每個處理器知道在網路中的Process總數
     根節點s送出的一個「脈衝」訊息,來觸發其他Process
     合併
           向下脈衝從根節點s傳遞至BFS Tree
           向上脈衝從BFS Tree的外部節點一直到根節點s
       先收到向上脈衝信號之後,
  Input: The identifier v of the node (processor) executing this
  algorithm and the identifier s of the start node of the BFS
  Output: For each node v, its parent in a BFS tree rooted at s
  C←ø {verified BFS children for v}
  set A to be the set of neighbors of v
       {begin a new round}
       if parent(v) is defined or v=s then
               if parent(v) is defined then
                        wait for pulse-down message from parent(v)
        if C is not empty then
                 {v is an internal node in the BFS tree}
                 send a pulse-down message to all nodes in C
                 wait for a pulse-up message from all nodes in C
                 {v is an external node in the BFS tree}
                 for each node u in A do
                          send a make child message to u

                for each node u in A do
                       get a message M from u and remove u from A
                       if M is an accept-child message then
                                add u to C
        send a pulse-up message to parent(v)
        {v ≠s has no parent yet}
        for each node w in A do
                if w has sent v a make-child message then
                        remove w from A
                        {w is no longer a candidate child for v}

                              if parent(v) is undefined then
                                      send an accept-child message to w
                                      send a reject-child message to w
until (v has received message done)
  or (v=s and has pulsed-down n-1 times)
send a done message to all the nodes in C
•   n個節點,m個邊
   Computational Rounds
   Local Running Time
   Local Spaced
   Message complexity
       O(n2+m)
Minimum Spanning Tree
   利用Baruskal演算法找出MST所提出的有效率的序列式
   同步模式下的Baruskal分散式演算法
       決定出所有連通分量圖
       針對每個連通分量圖,找到具最小權重的邊
       加入到另一個分量圖
Baruskal Algorithm
  Input: A simple connected weighted graph G
  with n vertices and m edges
  Output: A minimum spanning tree T for G
  for each vertext v in G do
         define an elementary cluster C(v)←{v}
  initialize a priority queue Q to contain all edges in G,
  using the weights as keys
Baruskal Algorithm
 while T has fewer than n-1 edges do
      Let C(v) be the cluster containing v ,
      Let C(u) be the cluster containing u.
      if C(v)≠C(u) then
              Add edge(v,u) to T.
              Merge C(v) and C(u) into one cluster,
                      that is union C(v) and C(u).
 return tree T
•   n個節點,m個邊
   Computational Rounds
       O(logn)
   Local Running Time
   Local Spaced
       O(m)
   Message complexity
       O(mlogn)
Synchronization Algorithms
    Multicast
        Uses a central time server to synchronize clocks
    Cristian‟s algorithm (centralised)
    Berkeley algorithm (centralised)
    The Network Time Protocol (decentralised)

Cristian’s Algorithm(1989)
   使用time server來同步時間,且為保留供參考的時間
   Clients ask the time server for time
       period depends on maximum clock drift and accuracy required
   Clients receive the value and may:
       use it as it is
       add the known minimum network delay
       add half the time between this send and receive
   For links with symmetrical latency:
       RTT = resp.-received-time – req.-sent-time
       adjusted-local-time =
           server-timestamp + minimum network delay or
           server-timestamp + (RTT / 2) or
           server-timestamp + (RTT – server-latency) /2
       local-clock-error = adjusted-local-time – local-time
Berkeley algorithm (Gusella & Zatti, 1989)
   if no machines have receivers, …
   Berkeley algorithm uses a designated server to

   The designated server polls or broadcasts
    to all machines for their time,
    adjusts times received for RTT & latency,
    averages times, and tells each machine how to adjust.

   Polling is done using Cristian‟s algorithm

   Avg. time is more accurate, but still drifts
Network Time Protocol
    NTP is a best known and most widely implemented
     decentralised algorithm
    Used for time synchronization on Internet

                                 1       Primary server,
                                         direct synchronization

Secondary server,
                     2           2              2
synchronized by
the primary server

                3        3   3       3      3       3
                                             Tertiary server,
                                             synchronized by                                 the secondary server
   Each pair of processes is connected by reliable
    channels (such as TCP).
   Messages are eventually delivered to recipients‟ input
   Processes will not fail.
   There is agreement on how a resource is identified
       Pass identifier with requests
Exclusive Access Algorithm
   Centralized Algorithm
   Token Ring Algorithm
   Lamport Algorithm
     (Timestamp Approach)
       Ricart & Agrawala Algorithm
   Leader Election Algorithms
       Bully Algorithm
       Ring Algorithm
   Chang&Roberts Algorithm
   Itai&Rodeh Algorithm
Centralized Algorithm
Operations                                                                 Request(R
    1.       Request resource                                              )                C
             Send request to coordinator to enter CS                           Grant(R)
    2.       Wait for response                                         P
    3.       Receive grant                                                          Release(R)
              Grants permission to enter CS
              keeps a queue of requests to enter the CS.
    4.       access resource                                               Coordinator
                                                            Queue of
    5.       Release resource
                                                            Requests        4
             Send release message to inform coordinator
   Safety, liveness and order are guaranteed                                             Grant

Delay                                                       Request
                                                     P1                                           P4
        Client and Synchronization                                             Release
            one round trip time (release + grant)
                                                                  P2                P3
Token Ring Algorithm
       For each CS a token is used.
       Only the process holding the token can enter the CS.
       To exit the CS, the process sends the token onto its neighbor.
       If a process does not require to enter the CS when it receives the
        token, it forwards the token to the next neighbor.
   在一個時間只會有一個程序取得Token,保證Mutual exclusion
   Order well-defined,讓Starvation不會發生
   假如token遺失 (e.g. process died),將必須重新產生
   Safety & liveness are guaranteed, but ordering is not.
       Client : 0 to N message transmissions.
       Synchronization :between one process‟s exit from the CS and the next
        process‟s entry is between 1 and N message transmissions.
Lamport Algorithm
   A total ordering of requests is established by logical
   Each process maintains request Queue (mutual exclusion requests)
   Requesting CS, Pi
       multicasts “request” (i, Ti) to all processes (Ti is local Lamport time).
       Places request on its own queue
       waits until all processes “reply”
   Entering CS, Pi
       receives message (ack or release) from every other process with a
        timestamp larger than Ti
   Releasing CS , Pi
       Remove request from its queue
       Send a timestamped release message
       This may cause its own entry have the earliest timestamp in the
        queue, enabling it to access the critical section
Ricart & Agrawala Algorithm
   Using reliable multicast and logical clocks
   Process wants to enter critical section
       Compose message containing
         Identifier (machine ID, process ID)
         Name of resource
         Current time
       Send request to all processes ,wait until everyone gives permission
   When process receives request
       If receiver not interested →Send OK to sender
       If receiver is in critical section →Do not reply; add request to queue
       If receiver just sent a request as well:
           Compare timestamps: received & sent msgs→Earliest wins
           If receiver is loser then send OK else receiver is winner, do not reply, queue
       When done with critical section→Send OK to all queued requests
Ricart & Agrawala Algorithm
On initialization
  state := RELEASED;
To enter the critical section
  state := WANTED;
  Multicast request to all processes;      request processing deferred
  T := request‟s timestamp;
  Wait until (number of replies received = (N – 1));
  state := HELD;
On receipt of a request <Ti, pi> at pj (i≠ j)
  if (state = HELD) or ((state = WANTED) and ((T, pj) < (Ti, pi))
  then queue request from pi without replying;
  else reply immediately to pi;
To exit the critical section
  state := RELEASED;
  reply to any queued requests;
Ricart & Agrawala Algorithm
   Safety, liveness, and ordering are guaranteed.
   It takes 2(N-1) messages per entry operation (N-1 multicast
    requests + N-1 replies); N messages if the underlying network
    supports multicast. [3(N-1) in Lamport‟s algorithm]
       Client                                                                  P3
           one round-trip time              P1           P1 remains in
       Synchronization                                   “wanted” until
                                                          P2 sends “reply”
           one message transmission time.


                       P2不能傳Reply給P1                             P2           P2 message:
                       因為Timestamp →P1大於P2
                                                                             Timestamp is 78

                                         P2 Changes to “held”                  P1 message:

                                                                             Timestamp is 87
Leader Election Algorithms
   Solution the problem
     N processes, may or may not have unique IDs (UIDs)
     for simplicity assume no crashes
     must choose unique master coordinator amongst processes
   Requirements
     Every process knows P, identity of leader, where P is unique
      process id (usually maximum) or is yet undefined.
     All processes participate and eventually discover the identity
      of the leader (cannot be undefined).
     When a coordinator fails, the algorithm must elect that active
      process with the largest priority number
   兩種類型的演算法
       Bully: “the biggest guy in town wins”
       Ring: a logical, cyclic grouping
Bully Algorithm
   假設
       Synchronous system
           All messages arrive within Ttrans units of time.
           A reply is dispatched within Tprocess units of time of the receipt of a message.
           if no response is received in 2Ttrans + Tprocess, the node is assumed to be dead.

   若Process知道自己有最高的id,就會elect自己當Coordinator
   當Process P注意到coordinator太久沒回應要求,就初始一個election
   當Process P拿到election就會傳送election訊息給其餘process
       若都沒人回應,P就會當Coordinator
       若有一個人有更higher numbered process回答,就結束P‟s job is done
Bully Algorithm
   Performce
       Best case scenario: The process with the second highest id
        notices the failure of the coordinator and elects itself.
           N-2 coordinator messages are sent.
           Turnaround time is one message transmission time.
       Worst case scenario: When the process with the least id
        detects the failure.
           N-1 processes altogether begin elections, each sending messages to
            processes with higher ids.
           The message overhead is O(N2).
           Turnaround time is approximately 5 message transmission times.
Ring Algorithm
   No token is used in this algorithm
   當演算法結束時,任一Process分有Active清單(consisting of all the
    priority numbers of all active processes in the system)
   若Process Pi偵測Coordinator failure,就會建立初始空白的Active
    清單,之後傳送訊息elect(i)給Pi的right neighbor,和增加number i
   若Pi接收到訊訊elect(j)從左邊的Process,它必須有所回應
       If this is the first elect message it has seen or sent, Pi creates a new
        active list with the numbers i and j and send the message elect(j)
       If i  j, then the active list for Pi now contains the numbers of all the
        active processes in the system , Pi can now determine the largest
        number in the active list to identify the new coordinator process
       If i = j, then Pi receives the message elect(i) , The active list for Pi
        contains all the active processes in the system Pi can now determine
        the new coordinator process.
Chang&Roberts Algorithm
   Assume
       Unidirectional ring
       Asynchronous system
       Each Process has UID

   Election
       initially each process non-participant
       determine leader (election message):
          initiator becomes participant and passes own UID on to neighbour
          when non-participant receives election message, forwards maximum
           of own and the received UID and becomes participant
          participant does not forward the election message
       announce winner (elected message):
          when participant receives election message with own UID, becomes
           leader and non-participant, and forwards UID in elected message
          otherwise, records the leader‟s UID, becomes non-participant and
           forwards it
Itai&Rodeh Algorithm
   Assume
       Unidirectional ring
       Synchronous system
       Each Process not has UID

   Election
       each process selects ID at random from set {1,..K}
           non-unique! but fast
       process pass all IDs around the ring
       after one round, if there exists a unique ID then
        elect maximum unique ID
       otherwise, repeat

   How do know the algorithm terminates?
       from probabilities:if you keep flipping a fair coin then after
        several heads you must get tails

Más contenido relacionado

La actualidad más candente

How Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzerHow Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzerAndrey Karpov
Silicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsSilicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsAzul Systems, Inc.
Java Concurrency Gotchas
Java Concurrency GotchasJava Concurrency Gotchas
Java Concurrency GotchasAlex Miller
PVS-Studio in 2021 - Error Examples
PVS-Studio in 2021 - Error ExamplesPVS-Studio in 2021 - Error Examples
PVS-Studio in 2021 - Error ExamplesAndrey Karpov
.NET Multithreading and File I/O
.NET Multithreading and File I/O.NET Multithreading and File I/O
.NET Multithreading and File I/OJussi Pohjolainen
Accelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAccelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAkihiro Hayashi
Java concurrency begining
Java concurrency   beginingJava concurrency   begining
Java concurrency beginingmaksym220889
Actor Concurrency
Actor ConcurrencyActor Concurrency
Actor ConcurrencyAlex Miller
Conf soat tests_unitaires_Mockito_jUnit_170113
Conf soat tests_unitaires_Mockito_jUnit_170113Conf soat tests_unitaires_Mockito_jUnit_170113
Conf soat tests_unitaires_Mockito_jUnit_170113SOAT
sizeof(Object): how much memory objects take on JVMs and when this may matter
sizeof(Object): how much memory objects take on JVMs and when this may mattersizeof(Object): how much memory objects take on JVMs and when this may matter
sizeof(Object): how much memory objects take on JVMs and when this may matterDawid Weiss
Where destructors meet threads
Where destructors meet threadsWhere destructors meet threads
Where destructors meet threadsShuo Chen
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningCarol McDonald
IDSECCONF2013 CTF online Write Up
IDSECCONF2013 CTF online Write Up IDSECCONF2013 CTF online Write Up
IDSECCONF2013 CTF online Write Up idsecconf
Алексей Кутумов, Вектор с нуля
Алексей Кутумов, Вектор с нуляАлексей Кутумов, Вектор с нуля
Алексей Кутумов, Вектор с нуляSergey Platonov
Non-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need itNon-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need itAlexey Fyodorov

La actualidad más candente (20)

How Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzerHow Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzer
Silicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsSilicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM Mechanics
Java Concurrency Gotchas
Java Concurrency GotchasJava Concurrency Gotchas
Java Concurrency Gotchas
PVS-Studio in 2021 - Error Examples
PVS-Studio in 2021 - Error ExamplesPVS-Studio in 2021 - Error Examples
PVS-Studio in 2021 - Error Examples
Java NIO.2
Java NIO.2Java NIO.2
Java NIO.2
.NET Multithreading and File I/O
.NET Multithreading and File I/O.NET Multithreading and File I/O
.NET Multithreading and File I/O
Accelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAccelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL Generation
Java concurrency begining
Java concurrency   beginingJava concurrency   begining
Java concurrency begining
C++ aptitude
C++ aptitudeC++ aptitude
C++ aptitude
Actor Concurrency
Actor ConcurrencyActor Concurrency
Actor Concurrency
Jersey Guice AOP
Jersey Guice AOPJersey Guice AOP
Jersey Guice AOP
Conf soat tests_unitaires_Mockito_jUnit_170113
Conf soat tests_unitaires_Mockito_jUnit_170113Conf soat tests_unitaires_Mockito_jUnit_170113
Conf soat tests_unitaires_Mockito_jUnit_170113
sizeof(Object): how much memory objects take on JVMs and when this may matter
sizeof(Object): how much memory objects take on JVMs and when this may mattersizeof(Object): how much memory objects take on JVMs and when this may matter
sizeof(Object): how much memory objects take on JVMs and when this may matter
Where destructors meet threads
Where destructors meet threadsWhere destructors meet threads
Where destructors meet threads
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java concurrency
Java concurrencyJava concurrency
Java concurrency
IDSECCONF2013 CTF online Write Up
IDSECCONF2013 CTF online Write Up IDSECCONF2013 CTF online Write Up
IDSECCONF2013 CTF online Write Up
Clang tidy
Clang tidyClang tidy
Clang tidy
Алексей Кутумов, Вектор с нуля
Алексей Кутумов, Вектор с нуляАлексей Кутумов, Вектор с нуля
Алексей Кутумов, Вектор с нуля
Non-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need itNon-blocking synchronization — what is it and why we (don't?) need it
Non-blocking synchronization — what is it and why we (don't?) need it

Similar a 分散式系統

Chapter 4 slides
Chapter 4 slidesChapter 4 slides
Chapter 4 slideslara_ays
Java Socket Programming
Java Socket ProgrammingJava Socket Programming
Java Socket ProgrammingVipin Yadav
Advance Java-Network Programming
Advance Java-Network ProgrammingAdvance Java-Network Programming
Advance Java-Network Programmingashok hirpara
Networks lab
Networks labNetworks lab
Networks labsvijiiii
Networks lab
Networks labNetworks lab
Networks labsvijiiii
Networking.ppt(client/server, socket) uses in program
Networking.ppt(client/server, socket) uses in programNetworking.ppt(client/server, socket) uses in program
Networking.ppt(client/server, socket) uses in programgovindjha339843
TCP IPhivasu
Socket Programming
Socket  Programming it-slideshares.blogspot.comSocket  Programming
Socket Programming it-slideshares.blogspot.comphanleson
Non Blocking I/O for Everyone with RxJava
Non Blocking I/O for Everyone with RxJavaNon Blocking I/O for Everyone with RxJava
Non Blocking I/O for Everyone with RxJavaFrank Lyaruu
Lab manual cn-2012-13
Lab manual cn-2012-13Lab manual cn-2012-13
Lab manual cn-2012-13Sasi Kala
Distributed systems
Distributed systemsDistributed systems
Distributed systemsSonali Parab

Similar a 分散式系統 (20)

Chapter 4 slides
Chapter 4 slidesChapter 4 slides
Chapter 4 slides
Socket Programming
Socket ProgrammingSocket Programming
Socket Programming
Java 1
Java 1Java 1
Java 1
java sockets
 java sockets java sockets
java sockets
Java Socket Programming
Java Socket ProgrammingJava Socket Programming
Java Socket Programming
Pemrograman Jaringan
Pemrograman JaringanPemrograman Jaringan
Pemrograman Jaringan
Advance Java-Network Programming
Advance Java-Network ProgrammingAdvance Java-Network Programming
Advance Java-Network Programming
Networks lab
Networks labNetworks lab
Networks lab
Networks lab
Networks labNetworks lab
Networks lab
Networking.ppt(client/server, socket) uses in program
Networking.ppt(client/server, socket) uses in programNetworking.ppt(client/server, socket) uses in program
Networking.ppt(client/server, socket) uses in program
Socket Programming
Socket  Programming it-slideshares.blogspot.comSocket  Programming
Socket Programming
Non Blocking I/O for Everyone with RxJava
Non Blocking I/O for Everyone with RxJavaNon Blocking I/O for Everyone with RxJava
Non Blocking I/O for Everyone with RxJava
Lab manual cn-2012-13
Lab manual cn-2012-13Lab manual cn-2012-13
Lab manual cn-2012-13
Distributed systems
Distributed systemsDistributed systems
Distributed systems


Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Último (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024


  • 2. 優點  Resource Sharing  不同地區的Process連通時→USER A可使用USER B的資源  Computation Speedup  困難複雜的問題分派多個處理器綜合處理  Reliability  因各處理器有各自獨立的Memory→當有一個處理器受損 時,將不致影響其他處理器之作業;同時互相幫忙修補  Communication  任何連通的USER皆可藉由網路互相通訊和諮詢
  • 3. 作業系統的類型  資料傳輸  Site A ─Data→ Site B  資料可視需求而定,但格式需一致,避免遺失資料  計算傳輸  使用者將指令藉由網路傳送至遠端處理器  由遠端處理器以Local Resources執行  再將執行結果回傳予使用者  行程傳輸  將Process藉由網路傳送至遠端執行,用此執行的理由:  Load Balancing  Computation Speedup  Hardware / Software Preference  Data Access
  • 4. C Socket for Windows
  • 5. C Socket for Windows  Server.c #include<winsock2.h> #include<stdio.h> int main() { SOCKET server_sockfd, client_sockfd; int server_len, client_len; struct sockaddr_in server_address , sockaddr_in client_address; // 註冊 Winsock DLL WSADATA wsadata; WSAStartup(0x101,(LPWSADATA)&wsadata) // 產生 server socket server_sockfd = socket(AF_INET, SOCK_STREAM, 0); // AF_INET(使用IPv4); SOCK_STREAM; 0(即TCP)
  • 6. C Socket for Windows  Server.c server_address.sin_family = AF_INET; server_address.sin_addr.s_addr = inet_addr(""); server_address.sin_port = 1234; server_len = sizeof(server_address); bind(server_sockfd, (struct sockaddr *) &server_address, server_len); listen(server_sockfd, 5); // 5(即佇列數)
  • 7. C Socket for Windows  Server.c while(1) { char ch; printf("Server waiting...n"); client_len = sizeof(client_address); client_sockfd = accept(server_sockfd, (struct sockaddr *) &client_address, &client_len); recv(client_sockfd, &ch, 1, 0); // 接收‟A‟ ch++; // „A‟→‟B‟ send(client_sockfd, &ch, 1, 0); // 傳送‟B‟ closesocket(client_sockfd); WSACleanup(); } }
  • 8. C Socket for Windows  Client.c #include<winsock2.h> #include<stdio.h> int main() { SOCKET sockfd; int len , result; struct sockaddr_in address; char ch = 'A'; WSADATA wsadata; WSAStartup(0x202,(LPWSADATA)&wsadata); sockfd = socket(AF_INET, SOCK_STREAM, 0); address.sin_family = AF_INET;
  • 9. C Socket for Windows  Client.c address.sin_addr.s_addr = inet_addr(""); address.sin_port = 1234; len = sizeof(address); connect(sockfd, (struct sockaddr *)&address, len); send(sockfd, &ch, 1, 0); recv(sockfd, &ch, 1, 0); printf("char from server = %cn", ch); closesocket(sockfd); WSACleanup(); system("pause"); }
  • 10. Client and server with threads Thread 2 makes requests to server Input-output Receipt & Thread 1 queuing generates results T1 Requests N threads Client Server Distributed Systems: Concepts and Design
  • 11. Alternative server threading architectures workers per-connection threads per-object threads I/O remote I/O remote remote objects objects objects a. Thread-per-request b. Thread-per-connection c. Thread-per-object Distributed Systems: Concepts and Design
  • 12. C Thread -lpthreadGC2
  • 13. C Thread  pthread.c #include <stdio.h> #include <pthread.h> void *thread_func(void *arg); char message[] = "Hello World"; int main() { pthread_t thread; void *thread_result; pthread_create(&thread,NULL,thread_func,(void *)message); printf("Waiting for thread to finish...n");
  • 14. C Thread  pthread.c pthread_join(thread,&thread_result); printf("Thread joined, it returned %sn",(char *)thread_result); system("pause"); } void *thread_func(void *arg) { printf("thread %s is runningn",(char *)arg); sleep(3); pthread_exit("Thange you use CPU Timen"); }
  • 15. Java TCP Socket (per-connection threads)  String data = in.readUTF(); import*; System.out.println("Received: "+ data) ; import*; s.close(); public class Client { }catch (IOException e){ public static void main (String args[]) { System.out.println(e.getMessage()); Socket s = null; }finally { try{ if(s!=null) int serverPort = 1234; try {s.close();} s = new Socket("localhost", serverPort); catch (IOException e){} DataInputStream in = new DataInputStream( s.getInputStream()); } DataOutputStream out = new } DataOutputStream( s.getOutputStream()); } out.writeUTF(“Hello");
  • 16. Java TCP Socket (per-connection threads)  import*; import*; public class Server { public static void main(String args[]) { try{ int serverPort = 1234; ServerSocket listenSocket = new ServerSocket(serverPort); while(true) { Socket clientSocket = listenSocket.accept(); Connection c = new Connection(clientSocket); } } catch(IOException e) { System.out.println(e.getMessage()); } } }
  • 17. Java TCP Socket (per-connection threads)  this.start(); } catch(IOException e){ import*; System.out.println(e.getMessage());} import*; } class Connection extends Thread { public void run(){ DataInputStream in; try { DataOutputStream out; String data = in.readUTF(); Socket clientSocket; out.writeUTF("client data is " + data); public Connection (Socket ClientSocket) { } catch(IOException e) { try { System.out.println(e.getMessage()); clientSocket = ClientSocket; } finally { in = new try { DataInputStream( clientSocket.getInputStream()); clientSocket.close(); out = new } catch (IOException e) {} DataOutputStream( clientSocket.getOutputStream()); } } }
  • 18. 時間同步的類型  External  Synchronize all clocks against a single one, usually the one with external, accurate time information  Internal  Synchronize all clocks among themselves  At least time monotonicity must be preserved
  • 19. 時間同步的類型  External (accuracy) : 同步於驗證來源的時間  Each system clock Ci S differs at most Dext at every point in the synchronization interval from an external UTC source S: |S - Ci| < Dext for all i C1 C3 C2
  • 20. 時間同步的類型  Internal (agreement) : 彼此間合力同步時間  Any two system clocks C1 C3 Ci and Cj differs at most Dint at every point C2 in the synchronization interval from each other: | Cj - Ci| < Dint for all i and j
  • 21. 時間同步的類型  Dext and Dint are synchronization bounds  Dint <= 2Dext  Max-Synch-interval = Dint / 2Dext  It means:  If two events have single-value timestamps which differ by less than some value,we CAN‟T SAY in which order the events occurred.  With interval timestamps, when intervals overlap, we CAN‟T SAY in which order the events occurred.
  • 22. 同步系統時間 TB B B‟s clock time TA TA+Ttrans A A‟s clock time Ttrans real time Tmin < Ttrans < Tmax Ttrans= (Tmin+ Tmax)/2 is at most wrong by (Tmin- Tmax)/2 If A sends its clock time TA to B → B can set its clock to TA + (Tmin+ Tmax)/2 → then A and B are synchronized with bound (Tmin- Tmax)/2 Tmin (Tmin+ Tmax)/2 Tmax Ttrans (Tmin- Tmax)/2(Tmin- Tmax)/2
  • 23. 非同步系統時間 TB TB +Tround/2 B B‟s clock time TA TA+Ttrans T‟A A A‟s clock time Tround  In asynchronous system, we have no Tmax  How can A synchronize with B?  By using the round-trip time Tround=TA-T‟A in Cristian‟s algorithm: TB= TB+ Tround/2
  • 24. JAVA RMI (External Clock Synchronize)
  • 25. JAVA RMI (External Clock Synchronize)  import java.rmi.*; public interface Clock extends Remote{ String getTime() throws RemoteException; }  import java.rmi.*; import java.rmi.server.*; import java.util.*; public class ClockImpl extends UnicastRemoteObject implements Clock { public ClockImpl() throws RemoteException { super(); } public String getTime() { Date d = new Date(); return d.toString(); } }
  • 26. JAVA RMI (External Clock Synchronize)  import java.rmi.*; public class ClockServer { public ClockServer() { try { Clock c = new ClockImpl(); Naming.rebind("//localhost/ClockService",c); } catch (Exception e) { System.out.print(e.getMessage()); } } public static void main(String args[]) { new ClockServer(); } }
  • 27. JAVA RMI (External Clock Synchronize)  import java.rmi.*; import*; public class ClockClient { public static void main(String args[]) { try { Clock c = (Clock)Naming.lookup("//localhost/ClockService"); System.out.println(c.getTime()); } catch (Exception e) { System.out.print(e.getMessage()); } } }
  • 28. Logical time  One aspect of clock synchronization is to provide a mechanism whereby systems can assign sequence numbers (“timestamps”) to messages upon which all cooperating processes can agree.  Leslie Lamport (1978) showed that clock synchronization need not be absolute and L. Lamport„s two important points lead to “causality”  First point:  If two processes do not interact, it is not necessary that their clocks be synchronized  they can operate concurrently without fear of interferring with each other  Second (critical) point:  It is not important that all processes agree on time, but rather, that they agree on the order in which events occur  Such “clocks” are referred to as Logical Clocks  Logical time is based on happens-before relationship
  • 29. 事件序列 Event Ordering  Happens before and concurrent events illustrated No causal path neither from e1 to e2 nor from e2 to e1 e1 and e2 are concurrent from e1 to e6 nor from e6 to e1 e1 and e6 are concurrent from e2 to e6 nor from e6 to e2 e2 and e6 are concurrent Types of events Send Receive Internal (change of state)
  • 30. 協調 Co-ordination  對於分散式系統的困難點  Centralised solutions not appropriate  communications bottleneck  Fixed master-slave arrangements not appropriate  process crashes  Varying network topologies  ring, tree, arbitrary; connectivity problems  Failures must be tolerated if possible  link failures  process crashes  Impossibility results  in presence of failures, esp asynchronous model
  • 31. Mutual Exclusion  要求  Safety  At most one process may execute in CS at any time  Liveness  Every request to enter and exit a CS is eventually granted  Ordering (desirable)  Requests to enter are granted according to causality order (FIFO) Synchronization Centralized Distributed scheme Based on mutual Central Circulating exclusion process token No mutual Physical Clock Physical clocks exclusion Event Count Logical clocks
  • 32. Mutual Exclusion  執行分三大類  Centralized Approach  P1有意進入Critical Section時→傳遞一個意願訊息Request→C接受意願訊息Request → 若Critical Section允許Process進入→傳遞一個允許訊息Reply→P1就能進入  此時當P2也有意願進行Critical Section →C將P2之意願訊息置入至Waiting Queue  當P1離開臨界區時→傳遞一個釋出訊息Release至C→C將傳遞一個允許訊息Reply至Waiting Queue中的下一個意訊願訊息的擁有者Process  Distributed Approach  比較Timestamp  要知道網路上所有Node的Name及也要將本身的Name告知其它節點,降低增加節點的頻率  當Node故障,系統應立刻通知其它Node且進行修復後,故應經常維護各Node正常運作  Process未進入Critical Section,必會頻頻停頓等待其他Process之操作  Token Passing Approach  適當的路徑,避免Node發生Starvation  若Token遺失,系統應重新設定一個Token補救  若路徑有Node故障,系統應重組最佳新路徑
  • 34. Two-Phase Commit Protocol  prepare(T) <prepare T> ready(T) abort(T) <ready T> <no T>
  • 35. Two-Phase Commit Protocol  commit(T) abort(T) <commit T> <abort T> acknowledge(T) acknowledge(T) <complete T>
  • 38. Deadlock Prevention and Avoidance  資源編碼演算法Resources Ordering Algorithm  將網路上所有的資料源依我們想像的工作進行Global Resources- ordering ,並給予唯一的編號  當某Process當時正佔有資源i時,不得再對於小於i的資源提出要求,如此 可降低循環等待的機會  Simple to implement; requires little overhead  銀行家演算法Banker‟s Algorithm  分散式系統選出一個最適當的Process擔任銀行家Banker,管理網路上所有 的資源及對商上各Process作最適當的資源分配  (New)時間戳記優先演算法Timestamp Priority Algorithm  網路上所有Process的TS均設定為各Process之Priority Number  TS愈小的Process其優先等級愈高(愈早發生)  唯有優先等級較高的Process,可以向優先等級低的提出資源要求
  • 39. Timestamp Priority Algorithm  TR=5 TR=10 TR=10 TR=15
  • 40. Deadlock Detection 區域等待圖Local Wait For Graph 全域等待圖Global Wait For Graph  集中式執行Centralized Approach  分散式執行Distributed Approach
  • 42. 複雜度測量  Computational Rounds  同步將以計時器度量回合數  非同步演算法將以透過網路散播事件的次數waves來決 定回合數  Local Running Time  Spaced  Global→所有電腦使用空間的總和  Local→每台電腦需要使用多少空間  Message complexity  電腦傳送的總訊息數  訊息M透過p個邊傳輸→訊息複雜度為p|M|,|M|代表M的長度
  • 43. 基本分散式演算法  Ring Leader  Tree Leader  BFS  MST
  • 44. Ring Leader  每Process將它的id傳送到環狀裡的下一個Process 之後的回合裡,每個Process將執行如下的計算:  從上一個Process收到一個識別號碼id  將id與自己的識別號碼比較  把兩值之中的最小值,傳送到環狀裡的下一個Process
  • 45. Algorithm RingLeader(id): Input:The unique identifier, id, for the processor running Output:The smallest identifier of a processor in the ring M←[Candidate is id] Send message M to the successor processor in the ring done←false repeat Get message M from the predecessor processor in the ring. if M=[Candidate is i] then if i=id then M←[Leader is id] done←true
  • 46. Algorithm else m←min{i,id} M←[Candidate is m] else {M is a “Leader is” message} done←true Send message M to the next processor in the ring until done return M
  • 47. Analysis  Computational Rounds  O(2N)  Local Running Time  O(N)  Local Spaced  O(1)  Message Complexity  O(N2)
  • 48. Tree Leader  假設網路是一個自由樹狀圖  自然起始點  外部節點  非同步  訊息檢查Message Check  特定邊是否已送出訊息且到達該節點  二階段  Accumulation Phase  id自樹的外部節點流入,記錄最小id的節點  找出Leader  Broadcast Phase  廣播Leader id至各外部節點
  • 49. Algorithm TreeLeader(id): Input:The unique identifier, id, for the processor running Output:The smallest identifier of a processor in the ring {Accumulation Phase} Let d be the number of neighbors of processor id m ←0 {counter for messages received} ℓ ←id {tentative leader} repeat {begin a new round} for each neighbor j do check if a message from processor j has arrived if a message M = [Candidate is i] from j has arrived then ℓ←min{i. ℓ} m←m+1
  • 50. Algorithm until m > d-1 if m=d then M←[Leader is ℓ] for each neighbor i≠k do send message M to processor j return M {M is a “leader is ” message} else M←[Candidate is ℓ] send M to the neighbor k that has not sent a message yet
  • 51. Algorithm {Broadcast Phase} repeat {begin a new round} check if a message from processor k has arrived if a message M from k has arrived then m←m+1 if M=[Candidate is i] then ℓ←min{i,ℓ} M←[Leader is ℓ] for each neighbor j do send message M to process j
  • 52. Algorithm else {M is a “leader is” message} for each neighbor j≠k do send message M to processor j until m=d return M {M is a “leader is” message}
  • 53. Analysis • di為處理器i的相鄰Process之數量  Computational Rounds  O(D)  Local Running Time  O(diD)  Local Spaced  O(di)  Message Complexity  O(N)
  • 54. Tree Leader  同步  一塊石頭被丟池塘內後引起的漣漪  直徑Diameter為圖中任兩個節點之間最長之路徑之長度  回合數為Diameter  二階段  Accumulation Phase:中心  Broadcast Phase:向外傳播
  • 55. Breadth-first Search  認定s為source node  同步  以波wave的型態向外散播  一層層由上往下建構BFS Tree  每部節點v傳送訊息給先前沒有與v有所接觸的鄰居  任一節點v必須選擇另一個節點v當父節點
  • 56. Algorithm SynchronousBFS(v,s): Input: The identifier v of the node (processor) executing this algorithm and the identifier s of the start node of the BFS traversal Output: For each node v, its parent in a BFS tree rooted at s repeat {begin a new round} if v=s or v has received a message from one of its neighbors then set parent(v) to be a node requesting v to become its child (or null, if v=s) for each node w adjacent to v that has not contacted v yet do send a message to w asking w to become a child of v until v=s or v has received a message
  • 57. Analysis  n個節點,m個邊  Computational Rounds  Local Running Time  Local Spaced  Message complexity  O(n+m)
  • 58. Breadth-first Search  非同步  要求每個處理器知道在網路中的Process總數  根節點s送出的一個「脈衝」訊息,來觸發其他Process 開始進行整體計算的下一回合  合併  向下脈衝從根節點s傳遞至BFS Tree  向上脈衝從BFS Tree的外部節點一直到根節點s  先收到向上脈衝信號之後, 才會發出一個新的向下脈衝信號
  • 59. Algorithm AsynchronousBFS(v,s): Input: The identifier v of the node (processor) executing this algorithm and the identifier s of the start node of the BFS traversal Output: For each node v, its parent in a BFS tree rooted at s C←ø {verified BFS children for v} set A to be the set of neighbors of v repeat {begin a new round} if parent(v) is defined or v=s then if parent(v) is defined then wait for pulse-down message from parent(v)
  • 60. Algorithm if C is not empty then {v is an internal node in the BFS tree} send a pulse-down message to all nodes in C wait for a pulse-up message from all nodes in C else {v is an external node in the BFS tree} for each node u in A do send a make child message to u
  • 61. Algorithm for each node u in A do get a message M from u and remove u from A if M is an accept-child message then add u to C send a pulse-up message to parent(v) else {v ≠s has no parent yet} for each node w in A do if w has sent v a make-child message then remove w from A {w is no longer a candidate child for v}
  • 62. Algorithm if parent(v) is undefined then parent(v)←w send an accept-child message to w else send a reject-child message to w until (v has received message done) or (v=s and has pulsed-down n-1 times) send a done message to all the nodes in C
  • 63. Analysis • n個節點,m個邊  Computational Rounds  Local Running Time  Local Spaced  Message complexity  O(n2+m)
  • 64. Minimum Spanning Tree  利用Baruskal演算法找出MST所提出的有效率的序列式  同步模式下的Baruskal分散式演算法  決定出所有連通分量圖  針對每個連通分量圖,找到具最小權重的邊  加入到另一個分量圖
  • 65. Baruskal Algorithm KruskalMST(G): Input: A simple connected weighted graph G with n vertices and m edges Output: A minimum spanning tree T for G for each vertext v in G do define an elementary cluster C(v)←{v} initialize a priority queue Q to contain all edges in G, using the weights as keys T←ø
  • 66. Baruskal Algorithm while T has fewer than n-1 edges do (u,v)←Q.removeMin() Let C(v) be the cluster containing v , Let C(u) be the cluster containing u. if C(v)≠C(u) then Add edge(v,u) to T. Merge C(v) and C(u) into one cluster, that is union C(v) and C(u). return tree T
  • 67. Analysis • n個節點,m個邊  Computational Rounds  O(logn)  Local Running Time  Local Spaced  O(m)  Message complexity  O(mlogn)
  • 69. Synchronization Algorithms  Multicast  Uses a central time server to synchronize clocks  Cristian‟s algorithm (centralised)  Berkeley algorithm (centralised)  The Network Time Protocol (decentralised) 69
  • 70. Cristian’s Algorithm(1989)  使用time server來同步時間,且為保留供參考的時間  Clients ask the time server for time  period depends on maximum clock drift and accuracy required  Clients receive the value and may:  use it as it is  add the known minimum network delay  add half the time between this send and receive  For links with symmetrical latency:  RTT = resp.-received-time – req.-sent-time  adjusted-local-time =  server-timestamp + minimum network delay or  server-timestamp + (RTT / 2) or  server-timestamp + (RTT – server-latency) /2  local-clock-error = adjusted-local-time – local-time
  • 71. Berkeley algorithm (Gusella & Zatti, 1989)  if no machines have receivers, …  Berkeley algorithm uses a designated server to synchronize  The designated server polls or broadcasts to all machines for their time, adjusts times received for RTT & latency, averages times, and tells each machine how to adjust.  Polling is done using Cristian‟s algorithm  Avg. time is more accurate, but still drifts
  • 72. Network Time Protocol  NTP is a best known and most widely implemented decentralised algorithm  Used for time synchronization on Internet 1 Primary server, direct synchronization Secondary server, 2 2 2 synchronized by the primary server 3 3 3 3 3 3 Tertiary server, synchronized by the secondary server
  • 74. 假設  Each pair of processes is connected by reliable channels (such as TCP).  Messages are eventually delivered to recipients‟ input buffer.  Processes will not fail.  There is agreement on how a resource is identified  Pass identifier with requests
  • 75. Exclusive Access Algorithm  Centralized Algorithm  Token Ring Algorithm  Lamport Algorithm (Timestamp Approach)  Ricart & Agrawala Algorithm  Leader Election Algorithms  Bully Algorithm  Ring Algorithm  Chang&Roberts Algorithm  Itai&Rodeh Algorithm
  • 76. Centralized Algorithm Operations Request(R 1. Request resource ) C  Send request to coordinator to enter CS Grant(R) 2. Wait for response P 3. Receive grant Release(R)  Grants permission to enter CS  keeps a queue of requests to enter the CS. 4. access resource Coordinator Queue of 5. Release resource Requests 4  Send release message to inform coordinator 2  Safety, liveness and order are guaranteed Grant Delay Request P1 P4  Client and Synchronization Release  one round trip time (release + grant) P2 P3
  • 77. Token Ring Algorithm Operations  For each CS a token is used.  Only the process holding the token can enter the CS.  To exit the CS, the process sends the token onto its neighbor.  If a process does not require to enter the CS when it receives the token, it forwards the token to the next neighbor.  在一個時間只會有一個程序取得Token,保證Mutual exclusion  Order well-defined,讓Starvation不會發生  假如token遺失 (e.g. process died),將必須重新產生  Safety & liveness are guaranteed, but ordering is not. Delay  Client : 0 to N message transmissions.  Synchronization :between one process‟s exit from the CS and the next process‟s entry is between 1 and N message transmissions.
  • 78. Lamport Algorithm  A total ordering of requests is established by logical timestamps.  Each process maintains request Queue (mutual exclusion requests)  Requesting CS, Pi  multicasts “request” (i, Ti) to all processes (Ti is local Lamport time).  Places request on its own queue  waits until all processes “reply”  Entering CS, Pi  receives message (ack or release) from every other process with a timestamp larger than Ti  Releasing CS , Pi  Remove request from its queue  Send a timestamped release message  This may cause its own entry have the earliest timestamp in the queue, enabling it to access the critical section
  • 79. Ricart & Agrawala Algorithm  Using reliable multicast and logical clocks  Process wants to enter critical section  Compose message containing  Identifier (machine ID, process ID)  Name of resource  Current time  Send request to all processes ,wait until everyone gives permission  When process receives request  If receiver not interested →Send OK to sender  If receiver is in critical section →Do not reply; add request to queue  If receiver just sent a request as well:  Compare timestamps: received & sent msgs→Earliest wins  If receiver is loser then send OK else receiver is winner, do not reply, queue  When done with critical section→Send OK to all queued requests
  • 80. Ricart & Agrawala Algorithm On initialization state := RELEASED; To enter the critical section state := WANTED; Multicast request to all processes; request processing deferred here T := request‟s timestamp; Wait until (number of replies received = (N – 1)); state := HELD; On receipt of a request <Ti, pi> at pj (i≠ j) if (state = HELD) or ((state = WANTED) and ((T, pj) < (Ti, pi)) then queue request from pi without replying; else reply immediately to pi; To exit the critical section state := RELEASED; reply to any queued requests;
  • 81. Ricart & Agrawala Algorithm  Safety, liveness, and ordering are guaranteed.  It takes 2(N-1) messages per entry operation (N-1 multicast requests + N-1 replies); N messages if the underlying network supports multicast. [3(N-1) in Lamport‟s algorithm] Delay  Client P3  one round-trip time P1 P1 remains in  Synchronization “wanted” until P2 sends “reply”  one message transmission time. Reply P2不能傳Reply給P1 P2 P2 message: 因為Timestamp →P1大於P2 Timestamp is 78 P2 Changes to “held” P1 message: Timestamp is 87
  • 82. Leader Election Algorithms  Solution the problem  N processes, may or may not have unique IDs (UIDs)  for simplicity assume no crashes  must choose unique master coordinator amongst processes  Requirements  Every process knows P, identity of leader, where P is unique process id (usually maximum) or is yet undefined.  All processes participate and eventually discover the identity of the leader (cannot be undefined).  When a coordinator fails, the algorithm must elect that active process with the largest priority number  兩種類型的演算法  Bully: “the biggest guy in town wins”  Ring: a logical, cyclic grouping
  • 83. Bully Algorithm  假設  Synchronous system  All messages arrive within Ttrans units of time.  A reply is dispatched within Tprocess units of time of the receipt of a message.  if no response is received in 2Ttrans + Tprocess, the node is assumed to be dead.  若Process知道自己有最高的id,就會elect自己當Coordinator 且會傳送coordinator訊息給所有比其id低的其餘process  當Process P注意到coordinator太久沒回應要求,就初始一個election  當Process P拿到election就會傳送election訊息給其餘process  若都沒人回應,P就會當Coordinator  若有一個人有更higher numbered process回答,就結束P‟s job is done
  • 84. Bully Algorithm  Performce  Best case scenario: The process with the second highest id notices the failure of the coordinator and elects itself.  N-2 coordinator messages are sent.  Turnaround time is one message transmission time.  Worst case scenario: When the process with the least id detects the failure.  N-1 processes altogether begin elections, each sending messages to processes with higher ids.  The message overhead is O(N2).  Turnaround time is approximately 5 message transmission times.
  • 85. Ring Algorithm  No token is used in this algorithm  當演算法結束時,任一Process分有Active清單(consisting of all the priority numbers of all active processes in the system)  若Process Pi偵測Coordinator failure,就會建立初始空白的Active 清單,之後傳送訊息elect(i)給Pi的right neighbor,和增加number i 到Pi的Active清單  若Pi接收到訊訊elect(j)從左邊的Process,它必須有所回應  If this is the first elect message it has seen or sent, Pi creates a new active list with the numbers i and j and send the message elect(j)  If i  j, then the active list for Pi now contains the numbers of all the active processes in the system , Pi can now determine the largest number in the active list to identify the new coordinator process  If i = j, then Pi receives the message elect(i) , The active list for Pi contains all the active processes in the system Pi can now determine the new coordinator process.
  • 86. Chang&Roberts Algorithm  Assume  Unidirectional ring  Asynchronous system  Each Process has UID  Election  initially each process non-participant  determine leader (election message):  initiator becomes participant and passes own UID on to neighbour  when non-participant receives election message, forwards maximum of own and the received UID and becomes participant  participant does not forward the election message  announce winner (elected message):  when participant receives election message with own UID, becomes leader and non-participant, and forwards UID in elected message  otherwise, records the leader‟s UID, becomes non-participant and forwards it
  • 87. Itai&Rodeh Algorithm  Assume  Unidirectional ring  Synchronous system  Each Process not has UID  Election  each process selects ID at random from set {1,..K}  non-unique! but fast  process pass all IDs around the ring  after one round, if there exists a unique ID then elect maximum unique ID  otherwise, repeat  How do know the algorithm terminates?  from probabilities:if you keep flipping a fair coin then after several heads you must get tails