Atomic Broadcast Optimistic Generic Broadcast Conclusion
Latency-optimal fault-tolerant replication Piotr Zieli´ nski Inference Group Cavendish Laboratory University of Cambridge
February 1, 2006
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Hotel booking system
Protocol 1
client → server: “book room 5”
2
server → client: “room booked”
client
book room 5
client
room booked
server server
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Fault tolerance by replication Problem a single server crash blocks the entire system
A
A
A
A
B
B
C
C
Solution introduce many servers system still usable despite some servers being down
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Consistency problems Problem messages might reach the replicas in different orders A and B book the room to
client , replica C to client . results: unpredictable
A B C
Solution ensure that replicas receive requests in the same order by using Atomic Broadcast to disseminate requests
A B C
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Atomic Broadcast atomic broadcast
Atomic Broadcast clients atomically broadcast messages, such as and replicas atomically deliver them
A
replicas atomically deliver all messages in the same order
B C
fault-tolerant
Atomic Broadcast
atomic delivery
Goal: Minimizing latency in common scenarios
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Naive leader-based algorithm
Naive leader-based algorithm 1
clients broadcast messages to the main replica A
2
the main replica assigns sequence numbers k = 1, 2, . . . to them, and broadcasts the to other replicas
3
replicas deliver messages in order
4
if the main replica fails, another takes over
Piotr Zieli´ nski
A
k
B C
Latency-optimal fault-tolerant replication
k k k
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Leader ensures the same order Example
A
1
1
2
2
B C
2
1
main replica A assigns 1 to
1
2
and 2 to
replicas A and C deliver messages
and
straight away
replica B waits with delivering until it has delivered all replicas deliver and in the same order Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
When the leader fails Examples
A
1
B C
1
2
2
1
2
1
2
Case 1: no failures A assigns 1 to
all replicas deliver
, and 2 to followed by
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
When the leader fails Examples
A
1
B C
1 1
2
1
2 2
Case 2: Leader fails, no message loss A assigns 1 to
, and fails
B takes over and assigns 2 to
all replicas deliver
before (possibly) Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
When the leader fails Examples
A
1
1 1
B
1 1
C
Case 3: Leader fails, message loss occurs A crashes and its messages to the others are lost B does not know about A delivers
, it assigns 1 to
, replicas B and C deliver Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
When the leader fails Examples
A
1
1 1
B
1 1
C
Case 4: Leader is just very slow A is correct but very slow B thinks A crashed, it assigns 1 to A delivers
first, replicas B and C deliver Piotr Zieli´ nski
first
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Chandra-Toueg Atomic Broadcast algorithm
Chandra-Toueg algorithm uses a sequence of Consensus instances Cons1 , Cons2 , . . . in each instance Consi , replicas 1 2 3
propose the first i messages decide on some set {m1 , . . . , mi } atomically deliver m1 , . . . , mi
no message delivered twice
A B
Cons1
C
propose { } decide { }
instances Consi can run in parallel
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Chandra-Toueg algorithm Example
A B
Cons1
Cons2
C
propose { }
decide { } propose { , } decide { , }
Comments Cons1 : all propose { }, decide on { }, and deliver Cons2 : all propose { , }, decide on { , }, and deliver all replicas deliver before Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Chandra-Toueg algorithm Example
A B
Cons1
Cons2
C
propose { }
decide { } propose { , } decide { , }
Comments Cons1 : all propose { }, decide on { }, and deliver Cons2 : all propose { , }, decide on { , }, and deliver all replicas deliver before (even if failures occur) Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Chandra-Toueg algorithm Code 1 2 3 4 5 6 7 8 9 10
when a client atomically broadcasts m do broadcast m to all replicas task proposing at every replica is for k = 1, 2, . . . do wait for some message mk propose Mk = {m1 , . . . , mk } to Consensus instance k task delivery at every replica is for k = 1, 2, . . . do wait until Consensus instance k decides on some Mk atomically deliver all undelivered messages in Mk in order
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Chandra-Toueg algorithm
A B C
Comments message
to A is delayed
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Chandra-Toueg algorithm Replica
A B C
1:{ } 2:{ , }
A proposes B proposes C proposes
Cons1
Cons2
{ } { } { }
{ , } { , } { , }
1:{ } 2:{ , } 1:{ } 2:{ , } { , }={ , }
Comments
message to A is delayed replicas start instances of Consensus at different times Cons1 : A proposes { }, B and C propose { } Cons2 : all replicas propose { , }, and decide on { , }
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Chandra-Toueg algorithm Replica
A B C
1:{ } 2:{ , } 1:{ } 2:{ , } 1:{ } 2:{ , }
A proposes B proposes C proposes
all decide all deliver
Cons1
Cons2
{ } { } { }
{ , } { , } { , }
{ }
{ , }
{ , }={ , }
Comments
message to A is delayed replicas start instances of Consensus at different times Cons1 : A proposes { }, B and C propose { } Cons2 : all replicas propose { , }, and decide on { , } If Cons1 decides on { }, then replicas deliver Piotr Zieli´ nski
followed by
Latency-optimal fault-tolerant replication
.
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Chandra-Toueg algorithm Replica
A B C
1:{ } 2:{ , } 1:{ } 2:{ , } 1:{ } 2:{ , }
A proposes B proposes C proposes
all decide all deliver
Cons1
Cons2
{ } { } { }
{ , } { , } { , }
{ }
{ , }
{ , }={ , }
Comments
message to A is delayed replicas start instances of Consensus at different times Cons1 : A proposes { }, B and C propose { } Cons2 : all replicas propose { , }, and decide on { , } If Cons1 decides on { }, then replicas deliver Piotr Zieli´ nski
followed by
Latency-optimal fault-tolerant replication
.
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Latency 1
1 Consensus
A
A
B
B
C
C
Direct Broadcast 1 step
Consk
Chandra-Toueg 1 step + Consensus
Latency the number of communication steps from atomically broadcasting a message to its atomic delivery Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Latency 1
1
A
A
B
B
C
C
Direct Broadcast 1 step
2
Consk
Chandra-Toueg 3 steps
Latency the number of communication steps from atomically broadcasting a message to its atomic delivery Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Replication example Naive approach Chandra-Toueg algorithm
Latency 1
1
A
A
B
B
C
C
Direct Broadcast 1 step
1
Ck
Chandra-Toueg 2 steps
Latency the number of communication steps from atomically broadcasting a message to its atomic delivery Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
System assumptions
Messages message passing: communication by messages reliable channels: no message loss between correct processes asynchrony: no time bounds for messages, no clocks Processes crash-stop model: only crash failures, no malicious processes Ω leader elector: leader eventually correct and fixed n > 3f: less than a third of the servers can crash Consensus implementable in one step
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Goal: minimizing latency
Goal Atomic Broadcast with minimum latency if the leader is correct and does not change.
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Generic Broadcast read x write to x r w r w r r w w
A B
Atomic Broadcast
r r w w r r w w
C
Observations ordering all messages is expensive (Atomic Broadcast) not all messages have to be ordered (Generic Broadcast) r r w w = r r w w 6= r r w w Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Generic Broadcast read x write to x r w r w r r w w
A B
Generic Broadcast
r r w w r r w w
C
Observations ordering all messages is expensive (Atomic Broadcast) not all messages have to be ordered (Generic Broadcast) r r w w = r r w w 6= r r w w Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Generic Broadcast r w
w r
Meta-solution 1
Define the conflict relation“ ”. Only conflicting messages must be delivered in the same order.
2
Determine the partial order “ of conflicting messages.
3
Deliver messages in any total order consistent with “ ”.
r w
w r
r
r w w
r
r w w Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
”
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1: m2
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1: m2 ,m3
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1: m2 ,m3
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1: m2 ,m3
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1: m2 ,m3
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1: m2 ,m3 ,m1
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1: m2 ,m3 ,m1 ,m4
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m5
m3
m6
m2
m1
m1
order 2:
order 1: m2 ,m3 ,m1 ,m4
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5
m5
m1
order 2:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2:
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1 ,m2
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1 ,m2
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1 ,m2 ,m3
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1 ,m2 ,m3
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1 ,m2 ,m3 ,m4
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1 ,m2 ,m3 ,m4
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1 ,m2 ,m3 ,m4 ,m5
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1 ,m2 ,m3 ,m4 ,m5 ,m6
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m4
m4
m5
m3
m6
m2
m3
m6
m2
m1
order 1: m2 ,m3 ,m1 ,m4 ,m5 ,m6
m5
m1
order 2: m1 ,m2 ,m3 ,m4 ,m5 ,m6
Delivery rule Deliver a message when all undelivered conflicting messages are its successors Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Problems m1
m2 1
m2 m3
use a separate Consensus instance for each pair of messages, all executed in parallel 2
Cycles if no failures, the leader dictates the order, no cycles if failures occur, a cycle-resolution algorithm used
m1 3
m1
Different processes perceive different orders
m2 m3 m4 m5 m6
The graph contains all possible messages infinitely many parallel instances of Consensus most of them identical, only finitely many different implementable with finite resources
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Agreement on message order: two messages Problem messages
and
m1
conflict
m2
different processes see different orders Solution Consensus to decide on the order each replica proposes the first message received
A
if decision
, then
→
B
if decision
, then
→
C
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
?
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Agreement on message order: many messages
m1 m2
m3 m4
m1
m2
m3
m4
m1
no conflict
m1 ↔ m2
m1 ↔ m3
m1 ↔ m4
m2
m1 ↔ m2
no conflict
no conflict
m2 ↔ m4
m3
m1 ↔ m3
no conflict
no conflict
m3 ↔ m4
m4
m1 ↔ m4
m2 ↔ m4
m3 ↔ m4
no conflict
Comments many parallel Consensus instances mi ↔ mj one instance for each pair of conflicting messages mi and mj message pairs are unordered: mi ↔ mj ≡ mj ↔ mi Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Agreement on message order: many messages
m1 m2
m3 m4
m1
m2
m3
m4
m1
no conflict
m1
m1
m1
m2
m1
no conflict
no conflict
m2
m3
m1
no conflict
no conflict
?
m4
m1
m2
?
no conflict
Comments many parallel Consensus instances mi ↔ mj one instance for each pair of conflicting messages mi and mj message pairs are unordered: mi ↔ mj ≡ mj ↔ mi Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Infinitely many instances at the same time Example. All processes receive only m1 , m2 , m3 , in this order
m1
m2
m3
m4 m5 m6 m7 m8
m1 m2 m3 m4 m5 m6 .. .
m1
m2
m3
m4
m5
m6
···
— m1 m1 m1 m1 m1 .. .
m1 — m2 m2 m2 m2 .. .
m1 m2 — m3 m3 m3 .. .
m1 m2 m3 — ? ? .. .
m1 m2 m3 ? — ? .. .
m1 m2 m3 ? ? — .. .
··· ··· ··· ··· ··· ··· .. .
infinitely many instances mi ↔ mj
identical instances share state
finitely many different instances
finite resources sufficient
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Infinitely many instances at the same time Example. All processes receive only m1 , m2 , m3 , in this order
m1
m2
m3
m4 m5 m6 m7 m8
m1 m2 m3 m4 m5 m6 .. .
m1
m2
m3
m4
m5
m6
···
— m1 m1 m1 m1 m1 .. .
m1 — m2 m2 m2 m2 .. .
m1 m2 — m3 m3 m3 .. .
m1 m2 m3 — ? ? .. .
m1 m2 m3 ? — ? .. .
m1 m2 m3 ? ? — .. .
··· ··· ··· ··· ··· ··· .. .
infinitely many instances mi ↔ mj
identical instances share state
finitely many different instances
finite resources sufficient
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Latency 1
1
1
A B
2
A
?
B
C
?
C
Different orders
Same order conflicting messages received in the same order
conflicting messages received in different orders
same Consensus proposals
different Consensus proposals
delivery in 1 + 1 = 2 steps
delivery in 1 + 2 = 3 steps
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Related work
Generic Broadcast
same order
Chandra and Toueg [1996] Pedone and Schiper [1998] Pedone and Schiper [1999] Aguilera et al. [2000] This work
3 2 4 4 2
steps steps steps steps steps
no conflicts 3 4 2 2 2
steps steps steps steps steps
other 3 4 4 4 3
The latency of our algorithm is provably optimal.
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
steps steps steps steps steps
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Cycles Observation If the leader does not change, cycles do not appear
m1
m3
m5
m2
m4
m6
Proof by contradiction 1
Consensus instances adopt leader’s proposal, so
2
the leader proposed m3 → m4 → m5 → m3 , so
3
the leader received m3 before m4 before m5 before m3 . Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Cycles Observation If cycles appear, they must be resolved
m1
m3
m5
m2
m4
m6
Cycle resolution messages in cycles and their successors are blocked (grey) break cycles by delivering the first message (m3 ) use Atomic Broadcast to agree on a total order on messages Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
m1 , m2 , m3 , m4 , m5 , m6 Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
m1 , m2 , m3 , m4 , m5 , m6 Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
m1 , m2 , m3 , m4 , m5 , m6 Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
m1 , m2 , m3 , m4 , m5 , m6 Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
m1 , m2 , m3 , m4 , m5 , m6 Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
m1 , m2 , m3 , m4 , m5 , m6 Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
m1 , m2 , m3 , m4 , m5 , m6 Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
m1 , m2 , m3 , m4 , m5 , m6 Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Example m1
m3
m5
m1
m3
m5
m2
m4
m6
m2
m4
m6
m1 , m2 , m3 , m4 , m5 , m6
m3 , m1 , m4 , m5 , m6 , m2
Delivery rule Deliver a message when all undelivered conflicting messages 1
succeed it in the partial order, or
2
succeed it in the total order and are blocked Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Problem 1: Agreement on message order Problem 2: Infinitely many instances of Consensus Problem 3: Cycle resolution
Problems m1
m2 1
m2 m3
use a separate Consensus instance for each pair of messages, all executed in parallel 2
Cycles if no failures, the leader dictates the order, no cycles if failures occur, a cycle-resolution algorithm used
m1 3
m1
Different processes perceive different orders
m2 m3 m4 m5 m6
The graph contains all possible messages infinitely many parallel instances of Consensus most of them identical, only finitely many different implementable with finite resources
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Frequently asked questions Summary of the talk
Frequently Asked Questions Is it possible to . . . 1
deliver some messages faster than in two steps? No, this means delivery before feedback from others.
2
drop the requirement n > 3f ? No, 2-step delivery requires n > 3f . [Pedone and Schiper, 2004]
3
not use the oracle for 2-step deliveries (thriftiness)? No, at least not with Consensus-based implementations. [Guerraoui and Raynal, 2003]
4
deliver all messages in two steps in all runs? Yes, but only in closed groups, with no failures, and perfectly synchronized clocks. Otherwise, no. [Zieli´ nski, 2005]
Piotr Zieli´ nski
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Frequently asked questions Summary of the talk
Summary of the talk m4
Optimistic Generic Broadcast
m5
m3
based on agreed partial order delivery in 2 or 3 steps
m6
latency provably optimal m2
m1
m2 m1
m3
m2
subsumes Generic Broadcast and Optimistic Atomic Broadcast
m1
m2 m3 m4 m5 m6
Contributions
m1
Piotr Zieli´ nski
1
1-2-step Consensus
2
cycle resolution
3
infinitely many instances
Latency-optimal fault-tolerant replication
Atomic Broadcast Optimistic Generic Broadcast Conclusion
Frequently asked questions Summary of the talk
References Marcos Kawazoe Aguilera, Carole Delporte-Gallet, Hugues Fauconnier, and Sam Toueg. Thrifty Generic Broadcast. Lecture Notes in Computer Science, 1914:268–282, 2000. Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267, 1996. Rachid Guerraoui and Michel Raynal. The information structure of indulgent Consensus. Technical Report PI-1531, IRISA, April, 2003. Fernando Pedone and Andr´e Schiper. On the inherent cost of Generic Broadcast. Technical Report IC/2004/46, Swiss Federal Institute of Technology (EPFL), May 2004. Fernando Pedone and Andr´e Schiper. Optimistic Atomic Broadcast. In Proceedings of the 12th International Symposium on Distributed Computing, pages 318–332, September 1998. Fernando Pedone and Andr´e Schiper. Generic Broadcast. In Proceedings of the 13th International Symposium on Distributed Computing, pages 94–108, 1999. Piotr Zieli´ nski
Latency-optimal fault-tolerant replication