Summary

Joachim Neu (a16z) presents a comprehensive treatment of BFT consensus under varying client and validator models at MEV-SBC 2025. The key contribution is a 16-model taxonomy (2 client dimensions × 2 validator dimensions × 2 network assumptions) that reveals why known protocols achieve different safety/liveness tradeoffs, and constructs explicit protocols for the “sleepy + communicating” model that previously had no dedicated analysis. Main results: under partial synchrony, client model doesn’t matter; under synchrony with sleepy communicating clients, you can achieve (99% safety, 49% liveness) OR (99% liveness, 49% safety) but not both simultaneously.

The 16-Model Taxonomy

Two Validator Dimensions

Validator availability:

  • Always-on validators: all validators are online and participating at all times (classical BFT assumption)
  • Sleepy validators: validators may go offline at any time; the adversary chooses when they are online (“sleepy” model from Pass-Shi 2017)

Validator communication:

  • Silent validators: validators only read the blockchain; they don’t broadcast messages themselves
  • Communicating validators: validators broadcast messages, participate in gossip, can attest

Two Client Dimensions

Client state:

  • Stateless clients: light clients / wallets that don’t store full state; rely on proofs from full nodes
  • Stateful clients: full nodes that maintain the full blockchain state

Client communication:

  • Read-only clients: clients only read finalized state; don’t participate in consensus
  • Communicating clients: clients receive gossip and participate in the p2p network

Two Network Assumptions

  • Synchrony: messages are delivered within a known bounded delay Δ
  • Partial synchrony: messages are eventually delivered, but the bound may temporarily be exceeded (GST model)

Key finding: under partial synchrony, the client model doesn’t matter. All 8 partial-synchrony models achieve the same fundamental limits (safety with f < N/3 Byzantine validators). This explains why the Byzantine/partial-synchrony literature has largely ignored client models — it genuinely doesn’t affect the results.

Under synchrony, client models matter significantly.

Key Results Under Synchrony

The Sleepy + Communicating Model

The most interesting and previously understudied model: validators may go offline (sleepy), but when online they actively participate in gossip (communicating). This matches real Ethereum validator behavior more closely than either extreme.

Fundamental tradeoff (new result):

  • (99% safety, 49% liveness): 99% of honest online validators agree on the same chain; consensus continues as long as >51% are adversarial BUT the honest minority can detect the attack (liveness fails gracefully)
  • (99% liveness, 49% safety): consensus always makes progress even with up to 49% adversarial stake; safety may fail but the honest majority will eventually override

You cannot simultaneously have >50% fault tolerance for both safety and liveness under partial synchrony.

Always-Safe Protocol Construction

For applications that must never confirm a conflicting chain (safety-critical):

Construction:

  1. Run a standard 49/49 SMR (State Machine Replication) protocol underneath (any BFT protocol with f < N/2 tolerance under synchrony)
  2. Wrap with a “gossip-then-wait-Δ” mechanism: after seeing a potential confirmation, wait Δ (the known network delay) before finalizing
  3. If during the wait period a conflicting proposal arrives, don’t finalize and escalate

Safety property: a conflicting finalization requires the adversary to have > 49% stake AND suppress honest validator messages for longer than Δ — which is impossible under synchrony.

Liveness property: if the adversary controls < 49% stake, the protocol always makes progress. Liveness fails gracefully to 49% adversarial tolerance.

Always-Live Protocol Construction

For applications that must always make progress (censorship resistance):

Construction:

  1. Run standard consensus for normal operation
  2. Maintain a side channel: “overdue unconfirmed transactions” list
  3. If a transaction remains unconfirmed for longer than T_overdue, inject it directly into the log regardless of what the consensus protocol decided

Liveness property: a transaction submitted to any honest node will be confirmed within T_overdue time, regardless of adversary behavior.

Safety property: the injected transactions are still validated (invalid transactions are rejected). But the ordering of injected transactions may not match consensus ordering → safety can fail if the adversary creates conflicting transaction sets.

Implications for Ethereum

FOCIL and Censorship Resistance

The always-live protocol construction is essentially a formal version of what FOCIL implements:

  • “Overdue unconfirmed transactions” ≈ transactions on inclusion lists
  • “Inject directly regardless of consensus” ≈ FOCIL enforcement that blocks violating ILs are invalid
  • T_overdue ≈ 1 slot (FOCIL transactions that appeared in the previous IL must be included)

The theoretical result provides a rigorous foundation: FOCIL can guarantee liveness up to 49% adversarial stake at the cost of potential safety failure in extreme adversarial scenarios.

Finality Gadgets (Casper FFG)

Ethereum’s finality (Casper FFG) is an always-safe protocol applied on top of an always-live base layer (Gasper/LMD-GHOST):

  • Base layer (LMD-GHOST): always-live, ensures blocks keep being produced
  • Finality layer (FFG): always-safe, only finalizes when 2/3 supermajority agrees

The theory predicts: safety failure of the finality layer requires >1/3 equivocating validators; liveness failure of the base layer requires >50% adversarial stake. This matches Ethereum’s known security properties, validating the framework.

Validator Concentration Risk

The “sleepy” model captures a real Ethereum risk: if a large validator pool goes offline (e.g., cloud provider outage), the adversary temporarily has higher effective stake among online validators. The always-safe protocol’s gossip-then-wait mechanism provides a defense: it detects the anomalous offline period and delays finalization.

Open Questions

❓ What is the concrete Δ required for the always-safe protocol to work on Ethereum mainnet? (Current gossip delay is ~500ms-2s globally.)

❓ Can the always-live protocol’s “overdue injection” mechanism be formalized well enough for automated verification?

❓ How do the theoretical results change under adversarial network conditions (Δ is unknown or unbounded)?

Timeline

  • 2025-08-08 — Presented by Joachim Neu (a16z) at MEV-SBC 2025

See Also