SlideShare una empresa de Scribd logo
1 de 31
Chord: A Scalable Peer-
 to-peer Lookup Protocol
 for Internet Applications
  Ion Stoica, Robert Morris, David Liben-Nowell, David R.
Karger, M. Frans Kaashoek, Frank Dabek Hari Balakrishnan
About Chord

•   Protocol and algorithm for distributed hash
    table (DHT)

•   Structured-decentralized P2P overlay
    network
About Chord
•   Supports just one operation: lookup(key)

    •   Given a key, it maps the key onto a node

    •   Returns the IP-address of the node

•   Associate a value with each key and store
    the pair on the resulting node:

    •   Distributed hash table

•   Uses consistent hashing for this operation
Chord properties (1)
•   Load balancing:

    •   spreads keys evenly over nodes

•   Decentralized:

    •   fully distributed, all nodes equally
        important

    •   does not take into account heterogeneity
        of nodes
Chord properties (2)
•   Scalability:
    •   lookup cost: O(log(N))
•   Availability:
    •   automatically updates lookup tables as
        nodes join, leave, fail
•   Flexible naming:
    •   Flat key-space, no constraint on structure
        of keys
Traditional hashing

•   Given an object, assign it to a bin from a set
    of N bins

•   Each bin: roughly same amount of objects

•   Problem:

    •   If N changes, all objects need to be
        reassigned
Consistent hashing
•   Evenly distributes K objects across N bins,
    about K/N each

•   When N changes:

    •   Not all objects need to be moved

    •   Only O(K/N) need to

•   Uses a base hash function, such as SHA-1
Chord protocol
•   Assign each node and key an m-bit
    identifier using SHA-1:

    •   node: hash the IP-address

    •   key: hash the key

•   Identifiers are ordered on an identifier circle
    module 2m, the Chord ring:

    •   circle of numbers from 0 to 2m-1
Assigning keys to nodes

 •   Assigning key k to a node:

     •   first node whose identifier is equal to or
         follows the identifier of key k in the
         identifier space

     •   successor node of k: successor(k)

     •   first node clockwise from k on Chord ring
Chord ring (m=6)
                          N1



        N56                          N8
                                           K10
 K54
   N51
                                           N14
  N48



                                       N21
        N42

                                     K24
          K38 N38
                    N32
                               K30
Minimal disruption
•   Minimal disruption when node joins/leaves

•   n joins:

    •   certain keys assigned to successor(n) are
        now assigned to n

•   n leaves:

    •   all keys assigned to n are now assigned
        to successor(n)
Simple Chord lookup
•   Simplest Chord lookup algorithm:

    •   Each node only needs to know its
        successor on Chord ring

    •   Query for identifier id is passed around
        Chord ring until successor(id) is found

    •   Result returned along reverse path

•   Easy but not efficient: # messages O(N)
Simple Chord lookup
                                                                        N1

                                                                                     lookup(K54)
/ ask node n to fi nd the successor of id                                              N8
                                               K54 N56
n.find successor(id) the successor
    // ask node n to find
  if// of id successor])
     (id ∈ (n,
    n.find_successor(id)
     return successor;                          N51
                                                                                           N14
  else return (n, successor])
      if (id ∈
                successor;                     N48
     // forward the query around the circle
      else
     return successor.fiquery around the
         // forward the nd successor(id);
       // circle
       return successor.find_successor(id);                                             N21
                                                     N42

                    (a)
                                                           N38
                                                                  N32

                                                                     (b)
seudocode to fi nd the successor node of an identifi er id. Remote procedure calls and variable lookups
 a query from node 8 for key 54, using the pseudocode in Figure 3(a).
Finger table
•   Chord maintains additional routing info:

    •   not essential for correctness

    •   but allows acceleration of lookups

•   Each node maintains finger table with m
    entries:

    •   n.finger[i] = successor(n + 2i-1)
Finger table
le (but slow) pseudocode to fi nd the successor node of an identifi er id. Rem
e path taken by a query from node 8 for key 54, using the pseudocode in Figu

                                        N1

                                                                  Finger table
                                                      N8
                                                       +1         N8 + 1   N14
                                                                  N8 + 2   N14   K54 N
                                                     +2           N8 + 4   N14
                                                   +4             N8 + 8   N21
             N51                                                                  N51
                            +32               +8            N14   N8 +16   N32
                                                                  N8 +32   N42
            N48                         +16                                      N48



                                                          N21
               N42
                                                                                       N


                      N38
                                  N32
Improved lookup
•   Lookup for id now works as follows:

    •   If id falls between n and successor(n), successor(n) is
        returned

    •   Otherwise, lookup is performed at node n’, which is the
        node in the finger table of n that most immediately
        precedes id

•   Since each node has finger entries at power of two
    intervals around the identifier circle, each node can
    forward a query at least halfway along the remaining
    distance between the node and the target identifier.

•   Thus O(log(N)) nodes need to be contacted
N32

                                                          (b)

                        Chord protocol
 successor node of an identifi er id. Remote procedure calls and variable lookups ar
for key 54, using the pseudocode in Figure 3(a).

1                                                                     N1

                       Finger table                                        lookup(54)
 // ask node n to find the successor
           N8
 // of id    +1       N8 + 1 N14                                           N8
 n.find_successor(id) N8 + 2 N14        K54 N56
   if (id ∈ (n, successor]) N14
                      N8 + 4
          +2
     return successor; + 8 N21
        +4            N8
   else                                  N51
     +8         N14 N8 +16 N32                                                  N14
     n′ = closest_preceding_node(id);
                      N8 +32 N42
16   return n′.find_successor(id);      N48
    // search the local table for the
    // highest predecessor of id
    n.closest_preceding_node(id)
                 N21                                                        N21
      for i = m downto 1
        if (finger[i] ∈ (n, id))              N42
          return finger[i];
      return n;
                                                    N38
                                                                N32


(a)                                                               (b)
Coping with joining/leaving

 •   Essential that successor pointers stay up to
     date

 •   Each node periodically runs stabilization
     protocol which updates Chord’s successor
     pointers and finger tables

 •   For node n to join, it must know a node n’ of
     the Chord ring
Coping with joining/leaving
 •   To join, n performs join()

     •   node n asks n’ to find_successor(n), which
         n sets as its sucessor

 •   Each node periodically performs stabilize() to
     learn about newly joined nodes

     •   n asks his successor s for s’ predecessor p

     •   decides whether p should be n’s successor
         instead
Coping with joining/leaving
•   stabilize() also notifies n’s successor s of n’s
    existence

    •   s might change it’s predecessor to n

•   Each node periodically calls fix_fingers()

    •   makes sure fingers are up to date

    •   is how new nodes acquire finger table

    •   is how old nodes add new nodes to their
        tables
Coping with joining/leaving
                                        // n′ thinks it might be our
//join a Chord ring containing node n′.
                                        // predecessor.
n.join(n′)
                                        n.notify(n′)
  predecessor = nil;
                                          if (predecessor is nil or
  successor = n′.find_successor(n);
                                                   n′ ∈ (predecessor, n))
                                            predecessor = n′;
// called periodically. verifies n’s
// immediate successor, and tells the
// successor about n.
                                        // called periodically. refreshes
n.stabilize()
                                        // finger table entries. next stores
  x = successor.predecessor;
                                        // the index of the next finger to fix.
  if (x ∈ (n, successor))
                                        n.fix_fingers()
    successor = x;
                                          next = next + 1 ;
  successor.notify(n);
                                          if (next > m)
                                            next = 1;
                                          finger[next] = find_successor(n+2next−1);
Coping with joining/leaving

                              N21                          N21                           N21                         N21


          successor(N21)

                                                     N26                           N26                         N26
                                                                                   K24                         K24
           N32                         N32                          N32                           N32
           K24                         K24                          K24                           K30

           K30                         K30                          K30



                  (a)                         (b)                            (c)                            (d)

 lustrating the join operation. Node 26 joins the system between nodes 21 and 32. The arcs represent the successor relationsh
  to node 32; (b) node 26 fi nds its successor (i.e., node 32) and points to it; (c) node 26 copies all keys less than 26 from node
ates the successor of node 21 to node 26.


most immediate predecessor of id. In addition,                     its r successors means that it can inform the hig
Coping with joining/leaving
 •   Lookup will only fail if successor pointers are
     not up to date => retry after a pause

 •   As soon as successor pointers are correct,
     lookup will work

     •   over time, finger tables will be up to date
         and linear search is no longer needed

 •   Old finger tables can still be used and in
     general lookup will remain O(log(N)) even if
     not all finger tables are up to date
Successor list
•   Lookup fails if a node doesn’t know its correct
    successor

    •   can occur when nodes fail

•   Solution: each node maintains a successor
    list:

    •   contains the node’s first r successors

    •   if first successor does not respond, try the
        rest of the list
Replication

•   Successor list can also be used for replication

    •   application on top of Chord might store
        replicas of data associated with a key on k
        nodes from n’s successor list

    •   Chord can inform application when this
        successor list changes, so application can
        create new replicas
Chord in practice
•   In practice, Chord ring will never be in stable
    state:

    •   Joins and leaves occur continuously

    •   Interleaved by stabilization algorithm

    •   Not enough time to stabilize before new
        change

•   If stabilization is run at a certain rate:

    •   Chord ring will be continuously “almost
        stable” and lookup will be fast and correct
Virtual nodes
•   Chord load balancing can be improved
    through virtual nodes:

    •   node ids do not uniformly cover entire id
        space

    •   to make (# keys / node) more uniform,
        associate keys with virtual nodes

    •   map r virtual nodes, with unrelated ids, to
        each real node

•   Tradeoff: each real node needs r times as
    much space to store finger table
Network latency
•   Chord path length is O(log(N)), but latency
    can be quite large

    •   Nodes close in id space can be far away in
        the underlying network

•   Idea:

    •   choose next-hop finger based on progress
        in id space and latency in network

        •   maximize progress in id space

        •   minimize latency
Network latency
•   Different approach to improving latency:

    •   each entry in finger table can point to list of
        nodes instead of a single node

    •   nodes have similar ids and are equivalent
        for routing purposes

    •   latency of the nodes could be different

    •   use closest node in terms of network
        distance in the underlying network
Applications of Chord
•   Chord File System (CFS):

    •   Chord locates storage blocks

•   Distributed indexes:

    •   solves Napster’s central server issue
•   Large-scale combinatorial search:

    •   candidate solutions are keys, Chord maps these to
        machines which test them

•   Cooperative mirroring:

    •   Uses Chord to provide load balancing

•   Time-shared storage:

    •   Uses Chord to provide availability
?

Más contenido relacionado

La actualidad más candente

Agreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared MemoryAgreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared MemorySHIKHA GAUTAM
 
Block cipher modes of operation
Block cipher modes of operation Block cipher modes of operation
Block cipher modes of operation harshit chavda
 
Linux Networking Commands
Linux Networking CommandsLinux Networking Commands
Linux Networking Commandstmavroidis
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort AlgorithmLemia Algmri
 
symmetric key encryption algorithms
 symmetric key encryption algorithms symmetric key encryption algorithms
symmetric key encryption algorithmsRashmi Burugupalli
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented CommunicationDilum Bandara
 
Congestion control
Congestion controlCongestion control
Congestion controlAman Jaiswal
 
Water Torture: A Slow Drip DNS DDoS Attack on QTNet by Kei Nishida [APRICOT 2...
Water Torture: A Slow Drip DNS DDoS Attack on QTNet by Kei Nishida [APRICOT 2...Water Torture: A Slow Drip DNS DDoS Attack on QTNet by Kei Nishida [APRICOT 2...
Water Torture: A Slow Drip DNS DDoS Attack on QTNet by Kei Nishida [APRICOT 2...APNIC
 
Code generation in Compiler Design
Code generation in Compiler DesignCode generation in Compiler Design
Code generation in Compiler DesignKuppusamy P
 
Red black tree
Red black treeRed black tree
Red black treeRajendran
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structureSajid Marwat
 

La actualidad más candente (20)

Linked List
Linked ListLinked List
Linked List
 
Agreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared MemoryAgreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared Memory
 
Block cipher modes of operation
Block cipher modes of operation Block cipher modes of operation
Block cipher modes of operation
 
Linux Networking Commands
Linux Networking CommandsLinux Networking Commands
Linux Networking Commands
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort Algorithm
 
Network security at_osi_layers
Network security at_osi_layersNetwork security at_osi_layers
Network security at_osi_layers
 
Red black tree
Red black treeRed black tree
Red black tree
 
Secure Hash Algorithm
Secure Hash AlgorithmSecure Hash Algorithm
Secure Hash Algorithm
 
Hash Function
Hash FunctionHash Function
Hash Function
 
Heaps
HeapsHeaps
Heaps
 
symmetric key encryption algorithms
 symmetric key encryption algorithms symmetric key encryption algorithms
symmetric key encryption algorithms
 
Bucket sort
Bucket sortBucket sort
Bucket sort
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented Communication
 
Congestion control
Congestion controlCongestion control
Congestion control
 
Water Torture: A Slow Drip DNS DDoS Attack on QTNet by Kei Nishida [APRICOT 2...
Water Torture: A Slow Drip DNS DDoS Attack on QTNet by Kei Nishida [APRICOT 2...Water Torture: A Slow Drip DNS DDoS Attack on QTNet by Kei Nishida [APRICOT 2...
Water Torture: A Slow Drip DNS DDoS Attack on QTNet by Kei Nishida [APRICOT 2...
 
Code generation in Compiler Design
Code generation in Compiler DesignCode generation in Compiler Design
Code generation in Compiler Design
 
Red black tree
Red black treeRed black tree
Red black tree
 
Lec17
Lec17Lec17
Lec17
 
Lec 17 heap data structure
Lec 17 heap data structureLec 17 heap data structure
Lec 17 heap data structure
 
Singly & Circular Linked list
Singly & Circular Linked listSingly & Circular Linked list
Singly & Circular Linked list
 

Más de GertThijs

Easter presentation
Easter presentationEaster presentation
Easter presentationGertThijs
 
X-mas thesis presentation
X-mas thesis presentationX-mas thesis presentation
X-mas thesis presentationGertThijs
 
Game Interaction Storyboards
Game Interaction StoryboardsGame Interaction Storyboards
Game Interaction StoryboardsGertThijs
 
Game proposals
Game proposalsGame proposals
Game proposalsGertThijs
 
Thesis Overview
Thesis OverviewThesis Overview
Thesis OverviewGertThijs
 
Eindpresentatie CHI11 - Team Chili Peppers
Eindpresentatie CHI11 - Team Chili PeppersEindpresentatie CHI11 - Team Chili Peppers
Eindpresentatie CHI11 - Team Chili PeppersGertThijs
 
Storyboard 2 #chi11
Storyboard 2 #chi11Storyboard 2 #chi11
Storyboard 2 #chi11GertThijs
 

Más de GertThijs (7)

Easter presentation
Easter presentationEaster presentation
Easter presentation
 
X-mas thesis presentation
X-mas thesis presentationX-mas thesis presentation
X-mas thesis presentation
 
Game Interaction Storyboards
Game Interaction StoryboardsGame Interaction Storyboards
Game Interaction Storyboards
 
Game proposals
Game proposalsGame proposals
Game proposals
 
Thesis Overview
Thesis OverviewThesis Overview
Thesis Overview
 
Eindpresentatie CHI11 - Team Chili Peppers
Eindpresentatie CHI11 - Team Chili PeppersEindpresentatie CHI11 - Team Chili Peppers
Eindpresentatie CHI11 - Team Chili Peppers
 
Storyboard 2 #chi11
Storyboard 2 #chi11Storyboard 2 #chi11
Storyboard 2 #chi11
 

Último

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Último (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Chord presentation

  • 1. Chord: A Scalable Peer- to-peer Lookup Protocol for Internet Applications Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek Hari Balakrishnan
  • 2. About Chord • Protocol and algorithm for distributed hash table (DHT) • Structured-decentralized P2P overlay network
  • 3. About Chord • Supports just one operation: lookup(key) • Given a key, it maps the key onto a node • Returns the IP-address of the node • Associate a value with each key and store the pair on the resulting node: • Distributed hash table • Uses consistent hashing for this operation
  • 4. Chord properties (1) • Load balancing: • spreads keys evenly over nodes • Decentralized: • fully distributed, all nodes equally important • does not take into account heterogeneity of nodes
  • 5. Chord properties (2) • Scalability: • lookup cost: O(log(N)) • Availability: • automatically updates lookup tables as nodes join, leave, fail • Flexible naming: • Flat key-space, no constraint on structure of keys
  • 6. Traditional hashing • Given an object, assign it to a bin from a set of N bins • Each bin: roughly same amount of objects • Problem: • If N changes, all objects need to be reassigned
  • 7. Consistent hashing • Evenly distributes K objects across N bins, about K/N each • When N changes: • Not all objects need to be moved • Only O(K/N) need to • Uses a base hash function, such as SHA-1
  • 8. Chord protocol • Assign each node and key an m-bit identifier using SHA-1: • node: hash the IP-address • key: hash the key • Identifiers are ordered on an identifier circle module 2m, the Chord ring: • circle of numbers from 0 to 2m-1
  • 9. Assigning keys to nodes • Assigning key k to a node: • first node whose identifier is equal to or follows the identifier of key k in the identifier space • successor node of k: successor(k) • first node clockwise from k on Chord ring
  • 10. Chord ring (m=6) N1 N56 N8 K10 K54 N51 N14 N48 N21 N42 K24 K38 N38 N32 K30
  • 11. Minimal disruption • Minimal disruption when node joins/leaves • n joins: • certain keys assigned to successor(n) are now assigned to n • n leaves: • all keys assigned to n are now assigned to successor(n)
  • 12. Simple Chord lookup • Simplest Chord lookup algorithm: • Each node only needs to know its successor on Chord ring • Query for identifier id is passed around Chord ring until successor(id) is found • Result returned along reverse path • Easy but not efficient: # messages O(N)
  • 13. Simple Chord lookup N1 lookup(K54) / ask node n to fi nd the successor of id N8 K54 N56 n.find successor(id) the successor // ask node n to find if// of id successor]) (id ∈ (n, n.find_successor(id) return successor; N51 N14 else return (n, successor]) if (id ∈ successor; N48 // forward the query around the circle else return successor.fiquery around the // forward the nd successor(id); // circle return successor.find_successor(id); N21 N42 (a) N38 N32 (b) seudocode to fi nd the successor node of an identifi er id. Remote procedure calls and variable lookups a query from node 8 for key 54, using the pseudocode in Figure 3(a).
  • 14. Finger table • Chord maintains additional routing info: • not essential for correctness • but allows acceleration of lookups • Each node maintains finger table with m entries: • n.finger[i] = successor(n + 2i-1)
  • 15. Finger table le (but slow) pseudocode to fi nd the successor node of an identifi er id. Rem e path taken by a query from node 8 for key 54, using the pseudocode in Figu N1 Finger table N8 +1 N8 + 1 N14 N8 + 2 N14 K54 N +2 N8 + 4 N14 +4 N8 + 8 N21 N51 N51 +32 +8 N14 N8 +16 N32 N8 +32 N42 N48 +16 N48 N21 N42 N N38 N32
  • 16. Improved lookup • Lookup for id now works as follows: • If id falls between n and successor(n), successor(n) is returned • Otherwise, lookup is performed at node n’, which is the node in the finger table of n that most immediately precedes id • Since each node has finger entries at power of two intervals around the identifier circle, each node can forward a query at least halfway along the remaining distance between the node and the target identifier. • Thus O(log(N)) nodes need to be contacted
  • 17. N32 (b) Chord protocol successor node of an identifi er id. Remote procedure calls and variable lookups ar for key 54, using the pseudocode in Figure 3(a). 1 N1 Finger table lookup(54) // ask node n to find the successor N8 // of id +1 N8 + 1 N14 N8 n.find_successor(id) N8 + 2 N14 K54 N56 if (id ∈ (n, successor]) N14 N8 + 4 +2 return successor; + 8 N21 +4 N8 else N51 +8 N14 N8 +16 N32 N14 n′ = closest_preceding_node(id); N8 +32 N42 16 return n′.find_successor(id); N48 // search the local table for the // highest predecessor of id n.closest_preceding_node(id) N21 N21 for i = m downto 1 if (finger[i] ∈ (n, id)) N42 return finger[i]; return n; N38 N32 (a) (b)
  • 18. Coping with joining/leaving • Essential that successor pointers stay up to date • Each node periodically runs stabilization protocol which updates Chord’s successor pointers and finger tables • For node n to join, it must know a node n’ of the Chord ring
  • 19. Coping with joining/leaving • To join, n performs join() • node n asks n’ to find_successor(n), which n sets as its sucessor • Each node periodically performs stabilize() to learn about newly joined nodes • n asks his successor s for s’ predecessor p • decides whether p should be n’s successor instead
  • 20. Coping with joining/leaving • stabilize() also notifies n’s successor s of n’s existence • s might change it’s predecessor to n • Each node periodically calls fix_fingers() • makes sure fingers are up to date • is how new nodes acquire finger table • is how old nodes add new nodes to their tables
  • 21. Coping with joining/leaving // n′ thinks it might be our //join a Chord ring containing node n′. // predecessor. n.join(n′) n.notify(n′) predecessor = nil; if (predecessor is nil or successor = n′.find_successor(n); n′ ∈ (predecessor, n)) predecessor = n′; // called periodically. verifies n’s // immediate successor, and tells the // successor about n. // called periodically. refreshes n.stabilize() // finger table entries. next stores x = successor.predecessor; // the index of the next finger to fix. if (x ∈ (n, successor)) n.fix_fingers() successor = x; next = next + 1 ; successor.notify(n); if (next > m) next = 1; finger[next] = find_successor(n+2next−1);
  • 22. Coping with joining/leaving N21 N21 N21 N21 successor(N21) N26 N26 N26 K24 K24 N32 N32 N32 N32 K24 K24 K24 K30 K30 K30 K30 (a) (b) (c) (d) lustrating the join operation. Node 26 joins the system between nodes 21 and 32. The arcs represent the successor relationsh to node 32; (b) node 26 fi nds its successor (i.e., node 32) and points to it; (c) node 26 copies all keys less than 26 from node ates the successor of node 21 to node 26. most immediate predecessor of id. In addition, its r successors means that it can inform the hig
  • 23. Coping with joining/leaving • Lookup will only fail if successor pointers are not up to date => retry after a pause • As soon as successor pointers are correct, lookup will work • over time, finger tables will be up to date and linear search is no longer needed • Old finger tables can still be used and in general lookup will remain O(log(N)) even if not all finger tables are up to date
  • 24. Successor list • Lookup fails if a node doesn’t know its correct successor • can occur when nodes fail • Solution: each node maintains a successor list: • contains the node’s first r successors • if first successor does not respond, try the rest of the list
  • 25. Replication • Successor list can also be used for replication • application on top of Chord might store replicas of data associated with a key on k nodes from n’s successor list • Chord can inform application when this successor list changes, so application can create new replicas
  • 26. Chord in practice • In practice, Chord ring will never be in stable state: • Joins and leaves occur continuously • Interleaved by stabilization algorithm • Not enough time to stabilize before new change • If stabilization is run at a certain rate: • Chord ring will be continuously “almost stable” and lookup will be fast and correct
  • 27. Virtual nodes • Chord load balancing can be improved through virtual nodes: • node ids do not uniformly cover entire id space • to make (# keys / node) more uniform, associate keys with virtual nodes • map r virtual nodes, with unrelated ids, to each real node • Tradeoff: each real node needs r times as much space to store finger table
  • 28. Network latency • Chord path length is O(log(N)), but latency can be quite large • Nodes close in id space can be far away in the underlying network • Idea: • choose next-hop finger based on progress in id space and latency in network • maximize progress in id space • minimize latency
  • 29. Network latency • Different approach to improving latency: • each entry in finger table can point to list of nodes instead of a single node • nodes have similar ids and are equivalent for routing purposes • latency of the nodes could be different • use closest node in terms of network distance in the underlying network
  • 30. Applications of Chord • Chord File System (CFS): • Chord locates storage blocks • Distributed indexes: • solves Napster’s central server issue • Large-scale combinatorial search: • candidate solutions are keys, Chord maps these to machines which test them • Cooperative mirroring: • Uses Chord to provide load balancing • Time-shared storage: • Uses Chord to provide availability
  • 31. ?

Notas del editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n