Skip to main content
Skip main navigation
No Access

Byzantine consensus in asynchronous message-passing systems: a survey

Published Online:pp 141-161https://doi.org/10.1504/IJCCBS.2011.041257

Consensus is a classical distributed systems problem with both theoretical and practical interest. Asynchronous Byzantine consensus is currently at the core of some solutions for the implementation of highly-resilient computing services. This paper surveys Byzantine consensus in message-passing distributed systems, by presenting the main theoretical results in the area, the main classes of algorithms and by discussing important issues like the performance and resilience of these algorithms.

Keywords

distributed algorithms, distributed systems, consensus, Byzantine faults, arbitrary faults, asynchronous model, message passing, Byzantine consensus

References

  • 1. M. Aguilera, '‘A pleasant stroll through the land of infinitely many creatures’' ACM SIGACT News (2004) Google Scholar
  • 2. E.A.P. Alchieri, A.N. Bessani, J.S. Fraga, F. Greve, '‘Byzantine consensus with unknown participants’' in Proceedings of the 12th International Conference on Principles of Distributed Systems (2008) Google Scholar
  • 3. N. Alon, M. Merrit, O. Reingold, G. Taubenfeld, R. Wright, '‘Tight bounds for shared memory systems accessed by Byzantine processes’' Distributed Computing (2005) Google Scholar
  • 4. B. Alpern, F. Schneider, '‘Defining liveness’' (1984) Google Scholar
  • 5. P.C. Attie, '‘Wait-free Byzantine consensus’' Information Processing Letters (2002) Google Scholar
  • 6. R. Baldoni, J-M. Hèlary, S.T. Piergiovanni, '‘A methodology to design arbitrary failure detectors for distributed protocols’' Journal of Systems Architecture (2008) Google Scholar
  • 7. R. Baldoni, J-M. H´elary, M. Raynal, L. Tanguy, '‘Consensus in Byzantine asynchronous systems’' Journal of Discrete Algorithms (2003) Google Scholar
  • 8. M. Ben-Or, '‘Another advantage of free choice: completely asynchronous agreement protocols’' in Proceedings of the 2nd ACM Symposium on Principles of Distributed Computing (1983) Google Scholar
  • 9. M. Ben-Or, '‘Fast asynchronous Byzantine agreement’' in Proceedings of the 4th ACM Symp. on Principles of Distributed Computing (1985) Google Scholar
  • 10. A.N. Bessani, M. Correia, J.S. Fraga, L.C. Lung, '‘Sharing memory between Byzantine processes using policy-enforced tuple spaces’' IEEE Transactions on Parallel and Distributed Systems (2009) Google Scholar
  • 11. A.N. Bessani, P. Sousa, M. Correia, N.F. Neves, P. Verissimo, '‘The CRUTIAL way of critical infrastructure protection’' IEEE Security & Privacy (2008) Google Scholar
  • 12. G. Bracha, '‘An asynchronous (n - 1)/3-resilient consensus protocol’' in Proceedings of the 3rd ACM Symposium on Principles of Distributed Computing (1984) Google Scholar
  • 13. C. Cachin, K. Kursawe, V. Shoup, '‘Random oracles in Contanstinople: practical asynchronous Byzantine agreement using cryptography’' in Proceedings of the 19th ACM Symposium on Principles of Distributed Computing (2000) Google Scholar
  • 14. C. Cachin, K. Kursawe, F. Petzold, V. Shoup, J. Kilian Ed., '‘Secure and efficient asynchronous broadcast protocols (extended abstract)’' Advances in Cryptology: CRYPTO 2001, Lecture Notes in Computer Science (2001) Google Scholar
  • 15. R. Canetti, T. Rabin, '‘Fast asynchronous Byzantine agreement with optimal resilience’' in Proceedings of the 25th Annual ACM Symposium on Theory of Computing (1993) Google Scholar
  • 16. M. Castro, B. Liskov, '‘Practical Byzantine fault tolerance and proactive recovery’' ACM Transactions on Computer Systems (2002) Google Scholar
  • 17. T. Chandra, S. Toueg, '‘Unreliable failure detectors for reliable distributed systems’' Journal of the ACM (1996) Google Scholar
  • 18. T. Chandra, V. Hadzilacos, S. Toueg, '‘The weakest failure detector for solving consensus’' Journal of the ACM (1996) Google Scholar
  • 19. S. Chaudhuri, '‘More choices allow more faults: set consensus problems in totally asynchronous systems’' Information and Computation (1993) Google Scholar
  • 20. G. Chockler, I. Keidar, R. Vitenberg, '‘Group communication specifications: a comprehensive study’' ACM Computing Surveys (2001) Google Scholar
  • 21. B-G. Chun, P. Maniatis, S. Shenker, J. Kubiatowicz, '‘Attested append-only memory: making adversaries stick to their word’' in Proceedings of the 21st ACM Symposium on Operating Systems Principles (2007) Google Scholar
  • 22. M. Correia, A.N. Bessani, P. Verissimo, '‘On Byzantine generals with alternative plans’' Journal of Parallel and Distributed Computing (2008) Google Scholar
  • 23. M. Correia, N.F. Neves, P. Verissimo, '‘How to tolerate half less one Byzantine nodes in practical distributed systems’' in Proceedings of the 23rd IEEE Symposium on Reliable Distributed Systems (2004) Google Scholar
  • 24. M. Correia, N.F. Neves, P. Verissimo, '‘From consensus to atomic broadcast: time-free Byzantine-resistant protocols without signatures’' Computer Journal (2006) Google Scholar
  • 25. M. Correia, N.F. Neves, L.C. Lung, P. Verissimo, '‘Low complexity Byzantineresilient consensus’' Distributed Computing (2005) Google Scholar
  • 26. M. Correia, N.F. Neves, L.C. Lung, P. Verissimo, '‘Worm-IT – a wormholebased intrusion-tolerant group communication system’' Journal of Systems and Software (2007) Google Scholar
  • 27. M. Correia, P. Verissimo, N.F. Neves, '‘The design of a COTS real-time distributed security kernel’' in Proceedings of the Fourth European Dependable Computing Conference (2002) Google Scholar
  • 28. M. Correia, G.S. Veronese, L.C. Lung, '‘Asynchronous Byzantine consensus with 2f + 1 processes’' in Proceedings of the 25th Annual ACM Symposium on Applied Computing (2010) Google Scholar
  • 29. J. Cowling, D. Myers, B. Liskov, R. Rodrigues, L. Shrira, '‘HQ-replication: a hybrid quorum protocol for Byzantine fault tolerance’' in Proceedings of 7th Symposium on Operating Systems Design and Implementations (2006) Google Scholar
  • 30. F. Cristian, C. Fetzer, '‘The timed asynchronous system model’' in Proceedings of the 28th IEEE International Symposium on Fault-Tolerant Computing (1998) Google Scholar
  • 31. R. de Prisco, D. Malki, M. Reiter, '‘On k-set consensus problems in asynchronous systems’' in Proceedings of the 18th ACM Symposium on Principles of Distributed Computing (1999) Google Scholar
  • 32. Y. Deswarte, K. Kanoun, J.C. Laprie, '‘Diversity against accidental and deliberate faults’' Computer Security, Dependability, & Assurance: From Needs to Solutions (1998) Google Scholar
  • 33. D. Dolev, C. Dwork, L. Stockmeyer, '‘On the minimal synchronism needed for distributed consensus’' Journal of the ACM (1987) Google Scholar
  • 34. A. Doudou, B. Garbinato, R. Guerraoui, '‘Encapsulating failure detection: from crash-stop to Byzantine failures’' (2002) Google Scholar
  • 35. A. Doudou, B. Garbinato, R. Guerraoui, H.B. Diab Ed., A.Y. Zomaya Ed., '‘Tolerating arbitrary failures with state machine replication’' Dependable Computing Systems Paradigms, Performance Issues and Applications (2005) Google Scholar
  • 36. A. Doudou, B. Garbinato, R. Guerraoui, A. Schiper, '‘Muteness failure detectors: specification and implementation’' in Proceedings of the Third European Dependable Computing Conference (1999) Google Scholar
  • 37. P. Dutta, R. Guerraoui, M. Vukolic, '‘Best-case complexity of asynchronous Byzantine consensus’' (2005) Google Scholar
  • 38. C. Dwork, N. Lynch, L. Stockmeyer, '‘Consensus in the presence of partial synchrony’' Journal of the ACM (1988) Google Scholar
  • 39. C. Fetzer, F. Cristian, '‘On the possibility of consensus in asynchronous systems’' in Proceedings of the Pacific Rim International Symposium on Fault-Tolerant Systems (1995) Google Scholar
  • 40. M.J. Fischer, N.A. Lynch, M.S. Paterson, '‘Impossibility of distributed consensus with one faulty process’' Journal of the ACM (1985) Google Scholar
  • 41. R. Friedman, A. Mostefaoui, M. Raynal, '‘Simple and efficient oracle-based consensus protocols for asynchronous Byzantine systems’' IEEE Transactions on Dependable and Secure Computing (2005) Google Scholar
  • 42. R. Friedman, A. Mostefaoui, S. Rajsbaum, M. Raynal, '‘Distributed agreement and its relation with error-correcting codes’' in Proceedings of the 16th International Conference on Distributed Computing (2002) Google Scholar
  • 43. R. Guerraoui, '‘Indulgent algorithms’' in Proceedings of the 19th ACM Symposium on Principles of Distributed Computing (2000) Google Scholar
  • 44. R. Guerraoui, M. Raynal, '‘The information structure of indulgent consensus’' IEEE Transactions on Computers (2004) Google Scholar
  • 45. R. Guerraoui, A. Schiper, '‘Consensus: the big misunderstanding’' in Proceedings of the IEEE International Workshop on Future Trends in Distributed Computing Systems (1997) Google Scholar
  • 46. R. Guerraoui, A. Schiper, '‘The generic consensus service’' IEEE Transactions on Software Engineering (2001) Google Scholar
  • 47. R. Guerraoui, M. Hurfin, A. Mostefaoui, R. Oliveira, M. Raynal, A. Schiper, S. Krakowiak Ed., S. Shrivastava Ed., '‘Consensus in asynchronous distributed systems: a concise guided tour’' Advances in Distributed Systems, Lecture Notes in Computer Science (2000) Google Scholar
  • 48. V. Hadzilacos, S. Toueg, '‘A modular approach to fault-tolerant broadcasts and related problems’' (1994) Google Scholar
  • 49. C.H. Hauser, D.E. Bakken, I. Dionysiou, K.H. Gjermundrød, V.S. Irava, A. Bose, '‘Security, trust and QoS in next-generation control and communication for large power systems’' International Journal of Critical Infrastructures (2008) Google Scholar
  • 50. M. Herlihy, '‘Wait-free synchronization’' ACM Transactions on Programming Languages and Systems (1991) Google Scholar
  • 51. I. Keidar, '‘Challenges in evaluating distributed algorithms’' in Proceedings of the International Workshop on Future Directions in Distributed Computing (2002) Google Scholar
  • 52. K.P. Kihlstrom, L.E. Moser, P.M. Melliar-Smith, '‘The Secure Ring group communication system’' ACM Transactions on Information and System Security (2001) Google Scholar
  • 53. K.P. Kihlstrom, L.E. Moser, P.M. Melliar-Smith, '‘Byzantine fault detectors for solving consensus’' The Computer Journal (2003) Google Scholar
  • 54. L. Lamport, '‘The part-time parliament’' ACM Transactions Computer Systems (1998) Google Scholar
  • 55. L. Lamport, '‘Paxos made simple’' SIGACT News (2001) Google Scholar
  • 56. L. Lamport, R. Shostak, M. Pease, '‘The Byzantine generals problem’' ACM Transactions on Programming Languages and Systems (1982) Google Scholar
  • 57. B. Lampson, '‘The ABCD’s of Paxos’' in Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing (2001) Google Scholar
  • 58. D. Levin, J.R. Douceur, J.R. Lorch, T. Moscibroda, '‘TrInc: small trusted hardware for large distributed systems’' in Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (2009) Google Scholar
  • 59. B. Littlewood, L. Strigini, P. Samarati Ed., P. Rian Ed., D. Gollmann Ed., R. Molva Ed., '‘Redundancy and diversity in security’' (2004) Google Scholar
  • 60. N. Lynch, '‘A hundred impossibility proofs for distributed computing’' in Proceedings of the 8th Annual ACM Symposium on Principles of Distributed Computing (1989) Google Scholar
  • 61. N. Lynch, Distributed Algorithms (1996) Google Scholar
  • 62. D. Malkhi, M. Reiter, '‘Unreliable intrusion detection in distributed computations’' in Proceedings of the 10th Computer Security Foundations Workshop (1997) Google Scholar
  • 63. D. Malkhi, M. Merrit, M. Reiter, G. Taubenfeld, '‘Objects shared by Byzantine processes’' Distributed Computing (2003) Google Scholar
  • 64. J.P. Martin, L. Alvisi, '‘Fast Byzantine consensus’' in Proceedings of the IEEE International Conference on Dependable Systems and Networks (2005) Google Scholar
  • 65. H. Moniz, N.F. Neves, M. Correia, P. Verissimo, '‘Experimental comparison of local and shared coin randomized consensus protocols’' in Proceedings of the 25th IEEE Symposium on Reliable Distributed Systems (2006a) Google Scholar
  • 66. H. Moniz, N.F. Neves, M. Correia, P. Verissimo, '‘Randomized intrusion-tolerant asynchronous services’' in Proceedings of the International Conference on Dependable Systems and Networks (2006b) Google Scholar
  • 67. A. Mostefaoui, E. Mourgaya, M. Raynal, '‘Asynchronous implementation of failure detectors’' in Proceedings of the IEEE International Conference on Dependable Systems and Networks (2003a) Google Scholar
  • 68. A. Mostefaoui, S. Rajsbaum, M. Raynal, '‘Conditions on input vectors for consensus solvability in asynchronous distributed systems’' Journal of the ACM (2003b) Google Scholar
  • 69. A. Mostefaoui, S. Rajsbaum, M. Raynal, M. Roy, '‘Condition based consensus solvability: a hierarchy of conditions and efficient protocols’' Distributed Computing (2004) Google Scholar
  • 70. A. Mostefaoui, M. Raynal, F. Tronel, '‘From binary consensus to multivalued consensus in asynchronous message-passing systems’' Information Processing Letters (2000) Google Scholar
  • 71. A. Mostefaoui, M. Raynal, C. Travers, S. Patterson, D. Agrawal, A.E. Abbadi, '‘From static distributed systems to dynamic systems’' in Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems (2005) Google Scholar
  • 72. N.F. Neves, M. Correia, P. Verissimo, '‘Wormhole-aware Byzantine protocols’' in 2nd Bertinoro Workshop on Future Directions in Distributed Computing: Survivability – Obstacles and Solutions (2004) Google Scholar
  • 73. N.F. Neves, M. Correia, P. Verissimo, '‘Solving vector consensus with a wormhole’' IEEE Transactions on Parallel and Distributed Systems (2005) Google Scholar
  • 74. R.R. Obelheiro, A.N. Bessani, L.C. Lung, M. Correia, '‘How practical are intrusion-tolerant distributed systems?’' (2006) Google Scholar
  • 75. M. Pease, R. Shostak, L. Lamport, '‘Reaching agreement in the presence of faults’' Journal of the ACM (1980) Google Scholar
  • 76. F. Pedone, A. Schiper, P. Urb´an, D. Cavin, '‘Solving agreement problems with weak ordering oracles’' in Proceedings of the Fourth European Dependable Computing Conference (2002) Google Scholar
  • 77. D. Powell, '‘Fault assumptions and assumption coverage’' in Proceedings of the 22nd IEEE International Symposium of Fault-Tolerant Computing (1992) Google Scholar
  • 78. M.O. Rabin, '‘Randomized Byzantine generals’' in Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science (1983) Google Scholar
  • 79. M. Reiter, '‘Secure agreement protocols: reliable and atomic group multicast in Rampart’' in Proceedings of the 2nd ACM Conference on Computer and Communications Security (1994) Google Scholar
  • 80. M.K. Reiter, '‘A secure group membership protocol’' IEEE Transactions on Software Engineering (1996) Google Scholar
  • 81. F.B. Schneider, '‘Implementing faul-tolerant services using the state machine approach: a tutorial’' ACM Computing Surveys (1990) Google Scholar
  • 82. S. Toueg, '‘Randomized Byzantine agreements’' in Proceedings of the 3rd ACM Symposium on Principles of Distributed Computing (1984) Google Scholar
  • 83. Trusted Computing Group, '‘Trusted computing platform module main specification, version 1.2, revision 94’' (2006) Google Scholar
  • 84. R. Turpin, B.A. Coan, '‘Extending binary Byzantine agreement to multivalued Byzantine agreement’' Information Processing Letters (1984) Google Scholar
  • 85. P. Verissimo, '‘Uncertainty and predictability: can they be reconciled?’' in Future Directions in Distributed Computing, Lecture Notes in Computer Science (2003) Google Scholar
  • 86. P. Verissimo, '‘Travelling through wormholes: a new look at distributed systems models’' SIGACT News (2006) Google Scholar
  • 87. P. Verissimo, A.N. Bessani, M. Correia, N.F. Neves, P. Sousa, '‘Designing modular and redundant cyber architectures for process control: lessons learned’' in Proceedings of the 42nd Annual Hawaii International Conference on System Sciences (2009) Google Scholar
  • 88. P. Verissimo, N.F. Neves, M. Correia, '‘The CRUTIAL reference critical information infrastructure architecture: a blueprint’' International Journal of System of Systems Engineering (2008) Google Scholar
  • 89. G.S. Veronese, M. Correia, A.N. Bessani, L.C. Lung, '‘Highly-resilient services for critical infrastructures’' in Proceedings of the Embedded Systems and Communications Security Workshop (2009b) Google Scholar
  • 90. G.S. Veronese, M. Correia, A.N. Bessani, L. Chung, P. Verissimo, '‘Minimal Byzantine fault tolerance: algorithm and evaluation’' (2009a) Google Scholar
  • 91. J. Yin, J. Martin, A. Venkataramani, L. Alvisi, M. Dahlin, '‘Separating agreement from execution for Byzantine fault tolerant services’' in Proceedings of the 19th ACM Symposium on Operating Systems Principles (2003) Google Scholar
  • 92. P. Zielinski, '‘Paxos at war’' (2004) Google Scholar