08 Distributed Algorithms

Distributed Algorithms
Complexity Analysis Internal computation in a node is normally negligible compared to message times.
Distributed Algorithms computes over more than one process using message passing through a network. Complexity Analysis Cost analysis Examples: Communication Protocols Flow Control Routing Resource Allocation Leader Election choosing a synchronization process
Secondary memory time is also mostly negligible. Complexity analysis for Distributed Algorithms: The number of messages sent compared to number of participating processes (nodes). The least number of message jumps between nodes compared to number of participating processes (nodes) until algorithm terminates: cost of waiting. The highest number of message jumps for any message before algorithm termination compared to number of participating processes (nodes). cost on network. Bit-complexity If the algorithm uses a large amount of data. The amount of data to send (times message jumps) compared to number of participating processes (nodes).
1 (49) - DISTRIBUTED SYSTEMS
Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering
Complexity Analysis (2)
Algorithms for Information Distribution
A choice between: Cheapest algorithm Minimum number of total messages - nice to the network Fastest algorithm
Flooding Algorithms Fast Expensive Echo Algorithms Not so fast
Allowing more messages than necessary - not nice to the network
Cheaper Virtual Ring Algorithms Slow Cheap
Flooding Algorithms Broadcast algorithm for a Mesh Network. Basic algorithm: When a node wants to send information it sends this in a message to all its neighbor nodes. Each node getting such a message sends a copy of it to all its neighbor nodes except the node it came from. It retrieves the information from the message first time it arrives. This will spread the information through the whole network to the nodes that are reachable. The problem is to stop the message passing to avoid the message from taking up the whole network. This can be done using different techniques: 1. Each message should only retransmit the message first time it sees it. This will not work if a message transmission fails. Then the sending node still think the message was retransmitted and will not try again. 2. Put a time limit on the messages. After a given time the messages should not be retransmitted by the nodes. This will enforce some common time within the system. 3. Put a limit for the number of allowed jumps for each message. Each time a message is retransmitted this limit should be decremented by one and when it is zero the message should not be retransmitted at all.
A D
F C
Flooding Complexity Analysis For all cases the time to termination, i.e. when all nodes have got the information, will depend on the Network Diameter. The total resource utilization of the network for the three different termination rules will be: 1. The algorithm will cost 2l message jumps if l is the number of links in the network. 2. This will highly utilize the network. The number of message jumps depends on how far away the time limit is set and the fork degree (fan out) from the nodes. The more connected the network is the more it will cost. 3. This is dependent of the fork degree (fan out) from the nodes. The more connected the network is the more it will cost. Worst case is when the network is totally connected. Then each message gets (n-1) + (n2)(k1) jumps if n is the number of nodes and k is the number of maximum jumps. The number of maximum jumps k must be set according to the networks diameter D. An upper limit on the number of jumps will then be (n-1) + (n2)(D-1) nD.
Resource Allocation Centralized (Client-Server architecture): one singular synchronization process for a given object. Dynamic centralized requires that one of the functioning nodes is chosen as synchronization node there must be an Election Algorithm Echo Algorithm Virtual Ring Algorithm Logical Clock Algorithm Voting Algorithm Circular control Virtual Control-Token Ring Fully distributed (Peer-to-Peer architecture): Logical Clock Algorithm Voting Algorithm
Echo Algorithms Mainly for mesh networks. (Ernest Chang 1979) refining of flooding Can be used for Election Algorithms Broadcast in mesh networks Presumptions Each node has an unique name (identifier). No shared memory, processes uses message passing. FIFO on communication links One node does not know about all nodes in the network, only its neighbors. single source multi source
Traversal Algorithm Two phases forward phase echo phase The initiating node, IN, sends an Explorer Message, EM, on all its outgoing links. When a node gets its first EM it marks the corresponding link as First Link, FL. If the node doesnt have more links (it is a leaf) it will send an echo message, ECHO, back to the node that had sent the EM. If the node has more links it should send an EM on these links. Then the node waits for an echo message, ECHO, on each of these links. If a node gets another EM it sends an echo message, ECHO, on the corresponding link. When a node gets an echo message, ECHO, the corresponding link is marked as ready. If the node has got an ECHO on all links except its First Link, FL, it sends an ECHO back on its FL When IN has got an ECHO on all its links the algorithm terminate.
one initiating node might have several initiating nodes concurrently

Echo Algorithm
Echo Algorithm A B
Traversal Execution Tree (TET) The tree of links formed by all the Explorer Messages, EMs. E P-tree The tree of links formed by all the First Links, FLs. D Network C
A B C D E E D C B C E
A D
D Traversal Execution Tree (TET)

P-tree
Echo Algorithm Complexity analysis
Echo Algorithm Applications
The algorithm requires at most 4l messages if l is the number of links in the network. If the speed is roughly the same on the links the algorithm will take roughly 2D+2 time units, where D is the network diameter.
Distribution of list among nodes. Calculation of nodes maximum value (identity) Election Algorithm
Distribution of List among the nodes in a Mesh Network Each node in the network should be given an unique number. This will be distributed by the node S that when the algorithm starts doesnt know which nodes there are in the network. S initiating node. Phase 1: 1. S starts an Echo Algorithm. 2. Each leaf node returns the value 1 in its ECHO. 3. Each ECHO sent back on a link that is not FL returns the value 0. 4. Each node registers the return values for ECHOs for the corresponding links. 5. When a node has got all ECHO messages it return an ECHO on its FL containing the sum value of all its return values + 1 (for itself). 6. When the initiating node, S, has got ECHOs on all its links it knows how many nodes that are present in the network at the given moment. Phase 2:
Distribution of List among the nodes in a Mesh Network (2)
7. The initiating node creates as many unique identifiers that are nodes in the network and sends them on its links to its neighbors. Each link message get the number of identifiers as was given in the corresponding ECHO. 8. Each other node will get a message with unique identities on its FL (the FL of Phase 1). The node keeps one of the identities and sends the rest on its links according to the corresponding registered number of nodes. Each link gets as many as it indicated in its ECHO. 9. Echo messages can optionally be sent back so the initiator node can get a confirmation that the algorithm has terminated.
Calculation of nodes maximum value (identity) All nodes have unique Names (numbers) that are totally ordered. Decide the highest of the Names of the present nodes in the network Each node has a Superior variable. From the beginning it has the nodes own Name. Any node can start the algorithm and then becomes Initiator Node, IN, in an echo algorithm which will be identified with the nodes Name. 1. One node starts the algorithm by sending out EM to all its neighbors. These messages contains the nodes identity. The node sets its Superior variable to its own identity. 2. If an inactive node gets an EM that has a value lower than the nodes own identity, this message is ignored. Instead it starts a new echo algorithm as described in 1. 3. 4. If an active node gets an EM that has a value lower than the nodes Superior variable, this message is ignored. If an active or an inactive node gets an EM that has a value higher than the nodes Superior variable, the node updates its Superior variable. The incoming link is marked as FL and any former existing FL is unmarked. Then the node sends out EM on all links except the FL. These messages contains the value of the Superior variable.
5. If an active node gets an EM that has a value equal to its Superior variable it sends an ECHO message back on the corresponding link. This message contains the value of the Superior variable. 6. If a node gets an ECHO message that has a value lower than the nodes Superior variable, this message is ignored. 7. If a node gets an ECHO message that has a value same as the nodes Superior variable, it is book-kept. When a node has got a ECHO message with the same value as the Superior variable on each link except the FL it sends a corresponding ECHO message on its FL. If the nodes identity is the same as the Superior variable the algorithm terminates and the node is elected. 8. If a node gets an ECHO message that has a value higher than the nodes Superior variable, indicates that there is a programming error!
Calculation of nodes maximum value (identity) Example Assume six nodes N1, N2, N3, N4, N5 and N6 with identifications ordered as: N1 N2 N3 N4 N5 N6. When the algorithm starts the nodes dont know the current network configuration N2 starts the Algorithm N1 <N2> <N2> N2 N2 <N2> N3 <N2> N6 N5 N4 <N2> N2 N2 N3 N1 has got the message N2 N1 <N2> <N2> N4
N6
N5
N6 has got the message. Starts a new echo algorithm N2 N1 <N2> <N2> N4
N3 has got N6's message N2 N1 <N2> <N2> N4
<N2> N2 N2 N3
<N6> <N2> N2 N2 <N6> <N6>
<N6>
N3
N6
<N6> N6
<N6> N6 <N6> N5 N6 N6 <N6> N5
N2, N4 and N5 has got N6's message N2 N1 <N6> <N6> N6 <N6> <N6> N2 N3 <N6> N6 <N6> <N2> N4 N6
N1 has got N6's message N6 N1 <N6> <N6> <N6> <N6> N4 N6
<N6>
N6
N2
N3
N6
N6
N6
N5 N6
N6
N6
N5
N6
All EM-messages have reached their destination N6 N1 N4 N6 N1 <Echo:N6> <Echo:N6> N6 N2 <Echo:N6> N3 N6 <Echo:N6> <Echo:N6> N6 N2 <Echo:N6> N3 N6 <Echo:N6> <Echo:N6> <Echo:N6> <Echo:N6> N4 N6
N6
N6
N6
N5 N6
N6
N6
N5
N6
N6 N1 <Echo:N6> <Echo:N6> N6 N2 <Echo:N6> N3 N6 N2 N4 N6
N6 N1 N4 N6
N6
N3
N6
<Echo:N6> N6 N6 <Echo:N6> N5 N6 N6
<Echo:N6> N5 N6
N6
Broadcast in a Mesh Network using Echo Algorithm The algorithm has terminated N6 N1 N4 N6 The P-tree can be used for broadcast message distribution. The messages follows the tree. One tree for each sender reasonable cost broadcast in a mesh network. N6 N2 N3 N6 Broadcasts from different nodes might reach receivers in different order.
N6
N6
N5
N6
Improved Traversal Algorithm (Segall)
Logical Clocks Distributed Resource Allocation Algorithm
An improved transversal algorithm. (Segall) The initiating node, IN, sends an Explorer Message, EM, on all its outgoing links. When a node gets its first EM it marks the corresponding link as First Link, FL. If the node doesnt have more links (it is a leaf) it will send an echo message, ECHO, back to the node that had sent the EM. If the node has more links it should send an EM on these links. A little more efficient. Then the node waits for an EM or ECHO on these links. If a node gets another EM, the corresponding link is marked as ready. If a node gets an ECHO, the corresponding link is marked as ready. If the node has got an EM or ECHO on all links, it sends an ECHO back on its FL When IN has got an EM or ECHO on all its links the algorithm terminates. Lamport algorithm Already shown. Ricart and Agrawala algorithm Given in the textbook (6.3.4).
Virtual Ring Algorithms
Example of a Virtual Ring
A special message, a control-token, is sent among the nodes. The node possessing the token has the right to perform operation that must be done by mutual exclusion. The nodes must be able to: assure that there is one and only one token create a new token discover if the ring is broken create a new ring
A D
F C
A-B-E-H-F-G-D-C-A
Algorithm that guarantees exactly one control-token on the ring. In each node the following algorithm is executed: Each time the CT passes the node its timer is restarted with a given time-out value. The node will keep/change to normal state. Le Lann 1978. All nodes has got unique names, Ni, which are totally ordered. A special message, a control-token CT, is sent among the nodes. Another special message, an election token ET(Ni), that is created by node Ni, is also used by the algorithm. Each node has got a timer that is used for time-outs if no token arrives within a given time limit. All tokens circulate in a given order, FIFO. A node can be in normal state or election state At time-out at node Ni, i.e. when the timer signals that there has been no CT or ET passing within the given time-out limit: The node creates a new token, an election token, ET(Ni), which contains the nodes identification. The node changes to election state and restarts its timer as it sends the ET(Ni) on the ring. Each time an ET(Nj) arrives at Ni: If the node is in normal state the timer is restarted - if Nj Ni the ET is sent further on the ring. If the node is in election state the node compares its originators identity with its own: - if Nj < Ni then the node (Ni) will change to normal state, send the ET further on the ring and it restarts its timer. - if Nj > Ni then the node should send the ET further on the ring and restart its timer. - if Nj = Ni, i.e. it is the nodes own ET, it converts it into a control-token CT that then is sent further on the ring and it restarts its timer.
Proof that the algorithm is correct The algorithm behavior can be described as:
events
The following events are defined:
states
ATO ACT AET< AET> AET= time-out
ATO E E
ACT N N
AET< N N
AET> N E
AET= N C N
reception of CT reception of an ET with identification less than the nodes own identification reception of an ET with identification greater than the nodes own identification reception of an ET with identification equal to the nodes own identification
N E
The different points of time are notated as:

AET(x):
A nodes state has the following notations: N E CN normal state, the node is waiting and the timer is on election state, the timer is on and the node waits on its own ET
the event that node x creates ET(x)
AET(x),y: the event that node y receives ET(x) AET(x),x: the event that node x receives ET(x) after it has been revolving one lap on the ring ACT(x):
the creation of a new CT and then a direct change to normal state N
the event that node x creates CT(x)
ACT(x),y: the event that node y receives CT(x). In our proof k means that k has not happened.
Proposal: Two CT can not be created on the ring. Proof: We will make a contradictory proposal and show that it will lead to contradiction i.e. we assume that two nodes x and y both can create a CT concurrently. We also assume that id(x) < id(y), i.e. x y. Since y creates a CT, y must be in state E at event AET(y),y ( ACT , AET< holds at y between the events AET(y) and AET(y),y). The same for x: ( ACT , AET< holds at x between the events AET(x) and AET(x),x). Since ET makes one lap on the ring it holds that AET(y) AET(y),x AET(y),y and AET(x) AET(x),y AET(x),x Since id(x) < id(y), so A ET< at y between AET(y) and AET(y),y AET(y),y AET(x),y
and since x doesnt leave state E: ( ACT at x between AET(x) and AET(x),x) AET(x),x ACT(y),x FIFO means that AET(y),y AET(x),y ACT(y),x AET(x),x i.e. ACT(y),x AET(x),x
This is a contradiction since we now have showed both: ACT(y),x AET(x),x and AET(x),x ACT(y),x Thus the contradictory assumption that two nodes x and y both can create a CT concurrently must be wrong
Note that the algorithm as both Logical Clocks and the Echo Algorithm Election requires that the nodes are given unique identifications that are totally ordered.
Election algorithm
Voting Algorithms (1)
The unique control token algorithm can be used for election purposes. The node that was allowed to create a CT will be the elected. There has been designed a whole family of election algorithm based on this with different modifications: let ET messages traverse both directions on the ring let ET messages randomly choose which direction to be sent and so on ... 30 years of research!!
Voting can be used for resource allocation or election. A group of nodes cooperates in some way and need to make decisions together. A node that wants to be elected (or use a resource) sends a request message to all other nodes in the group. A node get a request message answers the originator: yes if no other node has requested since last release (of the resource) no otherwise The requesting node is elected (can use the resource) when it gets a majority of yes answers on its request. Here a majority means more than half of the group. If the requesting node doesnt get a majority of yes answers it will not be elected.
Doesnt require FIFO on network links Doesnt require totally ordered identifications on nodes Doesnt require answer from all nodes Drawback: Voting might lead to that no one gets elected (can use the resource)
After each voting there can be two different states: One is elected. No one is elected. It is important to distinguish between these states. Therefore there must be a message sent from the winner to all other nodes so they know that one was elected. That no one is elected can only be determined by time-out. When a node wants to release a resource a Release message must be sent. Then the other nodes can start a new election. When there is a time-out for an election another node can start a new election. Then there will be a need to distinguish between different elections. This can be done using the node name and an ordering number as the identity for the request. Then all answers to that request must use this identity as well as the eventual Release.
The Bully Algorithm The Bully Algorithm is an Election Algorithm, Garcia-Molina 1982. Can handle process crashes. Presumptions All processes has got unique identity which are totally ordered. Every process does know about all other processes in the network. The system is synchronous, i.e. there is a maximal time limit T within a request will be answered if the requested process is alive. Algorithm:
The Bully Algorithm (2)
1. The process that wants an election sends an election-message to all processes with higher identity than itself and then waits for answer-messages. - if no answer-message arrives within the time limit T the process consider itself elected and then sends a coordinator message to all processes with a lower identity. - if there is one or more answer-message the process waits a further time period T for a coordinator message. If there is no such a message the process restarts the algorithm. 2. A process receiving an election-message, returns an answer-message and starts the algorithm from the beginning if not done so before. 3. A process receiving a coordinator message register the senders identity and consider it elected. 4. When a faulty process restarts it also starts the algorithm.
Local Networks
Skansholm Algorithm for Resource Allocation
Communication on Local Networks: faster cheaper cheap broadcast might guarantee atomic broadcast
Utilizes atomic broadcast with same cost as single message Networks: control-token-ring ETHERNET
the network is the synchronization tool or general:
all nodes get all broadcasts in the same order
any network with an atomic broadcast service but then the broadcast might be expensive
Skansholm Algorithm for Resource Allocation (2)
Each node has got a copy of the Request Queue. A node which wants to allocate a resource sends a Request Message as a broadcast to all nodes. Since there only can be one message at a time on the network all nodes will receive theses messages in the same order. This order will be the order in the Request Queue. Note that this also holds for the sending node. It should not put the request in its queue until its Request Message actually is transmitted on the network. When a nodes request is first in the local Request Queue it can be processed. After processing a Release message is sent to all. Then the first message in each Local Queue is removed and the next message can be processed.
This algorithm uses Broadcast (Multicast) but since it is a single local network the broadcast has the same cost as a single message. Thus this algorithm can have a very high performance.

08 Distributed Algorithms

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

08 Distributed Algorithms

Uploaded by

Copyright:

Available Formats

Distributed Algorithms

1 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

2 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

Complexity Analysis (2)

Algorithms for Information Distribution

Flooding Algorithms Fast Expensive Echo Algorithms Not so fast

Allowing more messages than necessary - not nice to the network

Cheaper Virtual Ring Algorithms Slow Cheap

3 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

4 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

6 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

7 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

8 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

one initiating node might have several initiating nodes concurrently

D Traversal Execution Tree (TET)

12 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

Echo Algorithm Complexity analysis

Echo Algorithm Applications

13 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

14 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

Distribution of List among the nodes in a Mesh Network (2)

15 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

16 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

17 (49) - DISTRIBUTED SYSTEMS

18 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

19 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

20 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

N3 has got N6's message N2 N1 <N2> <N2> N4

<N6> <N2> N2 N2 <N6> <N6>

<N6> N6 <N6> N5 N6 N6 <N6> N5

21 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

22 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

N1 has got N6's message N6 N1 <N6> <N6> <N6> <N6> N4 N6

23 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

24 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

25 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

26 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

N6 N1 <Echo:N6> <Echo:N6> N6 N2 <Echo:N6> N3 N6 N2 N4 N6

27 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

28 (49) - DISTRIBUTED SYSTEMS

Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering