Introduction
Consensus Crash Course
Optimistically Terminating Consensus designing reusable low-latency agreement protocols
Piotr Zieli´ nski Cavendish Laboratory University of Cambridge United Kingdom
6 July 2006
Piotr Zieli´ nski
Optimistically Terminating Consensus
Unifying framework
Introduction
Consensus Crash Course
Unifying framework
Agreement problems
Agreement on whether a transaction succeded or not (Atomic Commit) which client’s request arrived first (State Machine Replication) which server is the master (Leader Election) agreement problems are common but difficult because of failures Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Agreement problems A A
B
C
B C
System assumptions message passing: communication by messages process failures: servers can crash
A
message loss: messages can get lost asynchrony: no time bounds for messages, no clocks Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Consensus A B C
1 2
1
Consensus
1
2
1
propose
decide
Consensus Processes propose values and make decisions validity: decision is one of the proposals agreement: all decisions are the same termination: all correct processes decide
Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Consensus Valid outcomes A B C
1 2
1 Cons
2
1 1
A B C
1 2 2
2 Cons
2 2
A B C
1 2 2
2 Cons
Consensus Processes propose values and make decisions validity: decision is one of the proposals agreement: all decisions are the same termination: all correct processes decide
Piotr Zieli´ nski
Optimistically Terminating Consensus
2
Introduction
Consensus Crash Course
Unifying framework
Consensus Valid outcomes A B C
1 2
1 Cons
2
1 1
A B C
1 2
2 Cons
2
2
2
1
1
A B C
1 2 2
2 Cons
2
Invalid outcomes A B C
1 2 2
3 Cons
3 3
A B C
2 2
Piotr Zieli´ nski
Cons
1 2
A B C
1 2
1 Cons
2
Optimistically Terminating Consensus
1
Introduction
Consensus Crash Course
Unifying framework
Consensus Valid outcomes A B C
1 2
1 Cons
2
1 1
A B C
1 2
2 Cons
2
2
2
1
2
A B C
1 2 2
2 Cons
2
Fault-tolerance A B C
1 2
2 Cons
2
A B C
A
2
Cons
2 B
2
Piotr Zieli´ nski
C
1 2
2 Cons
2
Optimistically Terminating Consensus
2
Introduction
Consensus Crash Course
Unifying framework
Consensus: current state A B C
1 2
1 Cons
2
1 1
A B C
1 2
2 Cons
2
2 2
A B C
1 2 2
2 Cons
Current state dozens of protocols in existence slight changes in assumptions require new protocols malicious participants 10+ pages of correctness proofs! highly non-trivial to design
Conclusion wasted effort → unified approach necessary Piotr Zieli´ nski
Optimistically Terminating Consensus
2
Introduction
Consensus Crash Course
Unifying framework
Consensus Crash Course
Consensus Crash Course
Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Democracy vs. Dictatorship Democracy: majority wins A B C
1
1
1
1
2
1
A B C
1 1
A
1
2
B C
1
1
1 2
decision depends on all inputs: not recoverable with any failure Dictatorship: leader decides A B C
1
1
1
1
2
1
A B C
1
1
1 2
A B
1
C
1
1
1 2
decision depends on one input: not recoverable when leader fails Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Two-step approach A B C
1
1
1
2
1
1
1
Algorithm 1
broadcast the message from the leader A
2
decide when received the same (1) from a majority ( A B ) assume a majority of processes are correct
majority contains a correct process → recovery always possible
Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Two-step approach A B C
1
1
2
A B
1
C
no decision
1
1
2
1
1
A
1 1
B C
possibly 1
1
1
2
1
1
1 decision 1
Algorithm 1
broadcast the message from the leader A
2
decide when received the same (1) from a majority ( A B ) assume a majority of processes are correct
majority contains a correct process → recovery always possible
Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Everything together
A B C
1
1
2
1
2 2
2
2
2
2
2
1
2 round 1
2
round 2
2
round 3
Complete Consensus algorithm each round as before: leader proposes, decision by majority if not successful, new round with new leader do not propose values conflicting with previous decisions Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying Framework
Unifying Framework
Piotr Zieli´ nski
Optimistically Terminating Consensus
Unifying framework
Introduction
Consensus Crash Course
Unifying framework
Optimistically Terminating Consensus (OTC)
A B C
1
1
1
2
1
1
1
A B
1
1
1
2
1
1
1
C
2
Single round of Consensus second step isolated into OTC
2
propose the value 1 from the leader decide if receive the same from ≥ 2 processes 2 is a parameter
Piotr Zieli´ nski
Optimistically Terminating Consensus
OTC
Introduction
Consensus Crash Course
Unifying framework
Optimistically Terminating Consensus (OTC)
A B C
1
1
1
2
2 2
2
2
2
2
OTC as a black box one OTC per one Consensus round decision if all correct processes propose the same the decision is recoverably unique Piotr Zieli´ nski
2
2
2
2
1 2
2
Optimistically Terminating Consensus
2
Introduction
Consensus Crash Course
Unifying framework
Optimistically Terminating Consensus (OTC)
A B C
1
1
2
2
2
2 2
OTC as a black box one OTC per one Consensus round decision if all correct processes propose the same the decision is recoverably unique Piotr Zieli´ nski
2
2
2
2
1 2
2
Optimistically Terminating Consensus
2
Introduction
Consensus Crash Course
Malicious participants
Opportunities for cheating leader can send different proposals processes can modify forwarded messages recovery phase full algorithms very complicated Piotr Zieli´ nski
Optimistically Terminating Consensus
Unifying framework
Introduction
Consensus Crash Course
Unifying framework
Malicious participants
A B C
1
1
1
1
1
1
2
1
1
D 3
Opportunities for cheating leader can send different proposals processes can modify forwarded messages recovery phase full algorithms very complicated Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Malicious participants
A B C D
1
1
1
1
1
1
2
1
1
3
A B C D
2
2
2
1
2
2
2
2
2
3
3
decision 1
A B C
?
?
?
1
1
1
2
2
2
3
D 3
decision 2
3
what happened?
Who is cheating? maybe A broadcast 1 to B , and 2 to C ? maybe B really received 2, and A is just slow? maybe C really received 1, and A is just slow? impossible to determine Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Malicious participants A B C
1
1
1
1
1
1
1
1
2
1
1
1
D
Malicious participants three steps necessary [Castro and Liskov, 1999] 2nd and 3rd steps are OTCs no need to look inside OTCs to prove correctness (blackbox) composition of two OTCs is also an OTC Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Malicious participants A B C
1
1
1
1
1
1
1
1
2
1
1
1
D 3
3
Malicious participants three steps necessary [Castro and Liskov, 1999] 2nd and 3rd steps are OTCs no need to look inside OTCs to prove correctness (blackbox) composition of two OTCs is also an OTC Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Malicious participants A B C
1
1
1
1
1
1
1
1
2
1
1
1
D 3
3
Malicious participants three steps necessary [Castro and Liskov, 1999] 2nd and 3rd steps are OTCs no need to look inside OTCs to prove correctness (blackbox) composition of two OTCs is also an OTC Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Malicious participants A B C
1
1
1
1
1
1
1
1
2
1
1
1
D 3
3
Malicious participants three steps necessary [Castro and Liskov, 1999] 2nd and 3rd steps are OTCs no need to look inside OTCs to prove correctness (blackbox) composition of two OTCs is also an OTC Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
Malicious participants
A B C D
1
1
1
SLOW
2
1
2
1
1
2
1
2 3
1 1
3
3
3
1
1
1
1
1
1
1
1
1
2
2
2
3
Malicious participants three steps necessary [Castro and Liskov, 1999] 2nd and 3rd steps are OTCs no need to look inside OTCs to prove correctness (blackbox) composition of two OTCs is also an OTC Piotr Zieli´ nski
Optimistically Terminating Consensus
3
Introduction
Consensus Crash Course
Unifying framework
Reconstructed algorithms Algorithm
Steps
Round 1
Processes
Chandra and Toueg [1996]
2
2
n > 2f
Lamport and Massa [2004]
2
2
n>f
Brasileiro et al. [2001]
1
3
n > 3f
cheap one-step (new)
1
3
n > 2f
cheap one-step Byzantine (new)
1
4
n > 3f
Martin and Alvisi [2004]
2
5
n > 5f
Castro and Liskov [1999]
3
3 3
n > 3f
2/3
3 3
n > 3f
3 3 3
n > 3f
Dutta et al. [2004] multi-step Byzantine (new) n: number of processes Piotr Zieli´ nski
2/3/4
f : number of failures Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
OTC Summary Simple implementation broadcast and wait for a given number of replies simple to extend to custom failure models automatic verification and discovery possible [Zieli´ nski, 2006]
Reconstructs all known Consensus protocols even with malicious participants no overhead in latency or processes lower bounds attained
Modularity precisely defined interface (blackbox) dramatically reduces development time and proofs
Applicable to similar problems non-blocking Atomic Commit in 2 communication steps
Piotr Zieli´ nski
Optimistically Terminating Consensus
Introduction
Consensus Crash Course
Unifying framework
References Francisco Brasileiro, Fab´ıola Greve, Achour Most´efaoui, and Michel Raynal. Consensus in one communication step. Lecture Notes in Computer Science, 2127:42–50, 2001. Miguel Castro and Barbara Liskov. Practical Byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation, pages 173–186, New Orleans, Louisiana, February 1999. USENIX Association. Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2):225–267, 1996. Partha Dutta, Rachid Guerraoui, and Marko Vukolic. Asynchronous Byzantine Consensus: Complexity, resilience and authentication. Technical Report 200479, EPFL, September 2004. Leslie Lamport and Mike Massa. Cheap Paxos. In Proceedings of 2004 International Conference on Dependable Systems and Networks, pages 307–314, Florence, Italy, June 2004. Jean-Philippe Martin and Lorenzo Alvisi. Fast Byzantine Paxos. Technical Report TR-04-07, University of Texas at Austin, Department of Computer Science., 2004. Piotr Zieli´ nski. MinimizingPiotr latency protocols. PhD thesis, Zieli´ nski of agreement Optimistically Terminating Consensus