Summary
Joachim Neu (a16z) presents a comprehensive treatment of BFT consensus under varying client and validator models at MEV-SBC 2025. The key contribution is a 16-model taxonomy (2 client dimensions × 2 validator dimensions × 2 network assumptions) that reveals why known protocols achieve different safety/liveness tradeoffs, and constructs explicit protocols for the “sleepy + communicating” model that previously had no dedicated analysis. Main results: under partial synchrony, client model doesn’t matter; under synchrony with sleepy communicating clients, you can achieve (99% safety, 49% liveness) OR (99% liveness, 49% safety) but not both simultaneously.
The 16-Model Taxonomy
Two Validator Dimensions
Validator availability:
- Always-on validators: all validators are online and participating at all times (classical BFT assumption)
- Sleepy validators: validators may go offline at any time; the adversary chooses when they are online (“sleepy” model from Pass-Shi 2017)
Validator communication:
- Silent validators: validators only read the blockchain; they don’t broadcast messages themselves
- Communicating validators: validators broadcast messages, participate in gossip, can attest
Two Client Dimensions
Client state:
- Stateless clients: light clients / wallets that don’t store full state; rely on proofs from full nodes
- Stateful clients: full nodes that maintain the full blockchain state
Client communication:
- Read-only clients: clients only read finalized state; don’t participate in consensus
- Communicating clients: clients receive gossip and participate in the p2p network
Two Network Assumptions
- Synchrony: messages are delivered within a known bounded delay Δ
- Partial synchrony: messages are eventually delivered, but the bound may temporarily be exceeded (GST model)
Key finding: under partial synchrony, the client model doesn’t matter. All 8 partial-synchrony models achieve the same fundamental limits (safety with f < N/3 Byzantine validators). This explains why the Byzantine/partial-synchrony literature has largely ignored client models — it genuinely doesn’t affect the results.
Under synchrony, client models matter significantly.
Key Results Under Synchrony
The Sleepy + Communicating Model
The most interesting and previously understudied model: validators may go offline (sleepy), but when online they actively participate in gossip (communicating). This matches real Ethereum validator behavior more closely than either extreme.
Fundamental tradeoff (new result):
- (99% safety, 49% liveness): 99% of honest online validators agree on the same chain; consensus continues as long as >51% are adversarial BUT the honest minority can detect the attack (liveness fails gracefully)
- (99% liveness, 49% safety): consensus always makes progress even with up to 49% adversarial stake; safety may fail but the honest majority will eventually override
You cannot simultaneously have >50% fault tolerance for both safety and liveness under partial synchrony.
Always-Safe Protocol Construction
For applications that must never confirm a conflicting chain (safety-critical):
Construction:
- Run a standard 49/49 SMR (State Machine Replication) protocol underneath (any BFT protocol with f < N/2 tolerance under synchrony)
- Wrap with a “gossip-then-wait-Δ” mechanism: after seeing a potential confirmation, wait Δ (the known network delay) before finalizing
- If during the wait period a conflicting proposal arrives, don’t finalize and escalate
Safety property: a conflicting finalization requires the adversary to have > 49% stake AND suppress honest validator messages for longer than Δ — which is impossible under synchrony.
Liveness property: if the adversary controls < 49% stake, the protocol always makes progress. Liveness fails gracefully to 49% adversarial tolerance.
Always-Live Protocol Construction
For applications that must always make progress (censorship resistance):
Construction:
- Run standard consensus for normal operation
- Maintain a side channel: “overdue unconfirmed transactions” list
- If a transaction remains unconfirmed for longer than T_overdue, inject it directly into the log regardless of what the consensus protocol decided
Liveness property: a transaction submitted to any honest node will be confirmed within T_overdue time, regardless of adversary behavior.
Safety property: the injected transactions are still validated (invalid transactions are rejected). But the ordering of injected transactions may not match consensus ordering → safety can fail if the adversary creates conflicting transaction sets.
Implications for Ethereum
FOCIL and Censorship Resistance
The always-live protocol construction is essentially a formal version of what FOCIL implements:
- “Overdue unconfirmed transactions” ≈ transactions on inclusion lists
- “Inject directly regardless of consensus” ≈ FOCIL enforcement that blocks violating ILs are invalid
- T_overdue ≈ 1 slot (FOCIL transactions that appeared in the previous IL must be included)
The theoretical result provides a rigorous foundation: FOCIL can guarantee liveness up to 49% adversarial stake at the cost of potential safety failure in extreme adversarial scenarios.
Finality Gadgets (Casper FFG)
Ethereum’s finality (Casper FFG) is an always-safe protocol applied on top of an always-live base layer (Gasper/LMD-GHOST):
- Base layer (LMD-GHOST): always-live, ensures blocks keep being produced
- Finality layer (FFG): always-safe, only finalizes when 2/3 supermajority agrees
The theory predicts: safety failure of the finality layer requires >1/3 equivocating validators; liveness failure of the base layer requires >50% adversarial stake. This matches Ethereum’s known security properties, validating the framework.
Validator Concentration Risk
The “sleepy” model captures a real Ethereum risk: if a large validator pool goes offline (e.g., cloud provider outage), the adversary temporarily has higher effective stake among online validators. The always-safe protocol’s gossip-then-wait mechanism provides a defense: it detects the anomalous offline period and delays finalization.
Open Questions
❓ What is the concrete Δ required for the always-safe protocol to work on Ethereum mainnet? (Current gossip delay is ~500ms-2s globally.)
❓ Can the always-live protocol’s “overdue injection” mechanism be formalized well enough for automated verification?
❓ How do the theoretical results change under adversarial network conditions (Δ is unknown or unbounded)?
Timeline
2025-08-08— Presented by Joachim Neu (a16z) at MEV-SBC 2025
See Also
- Finality in Ethereum: Gasper, Gloas, and the Engine API — Casper FFG and Gasper as always-safe + always-live combination
- Decoupled Consensus: Goldfish, Majorum, and Dynamic Availability — Goldfish and Majorum as alternative consensus designs
- FOCIL: Fork-Choice Enforced Inclusion Lists (EIP-7805) — Practical implementation of the always-live construction