Summary

Private Information Retrieval (PIR) allows a client to read data from a server without the server learning which item was accessed. Applied to Ethereum, PIR enables wallet nodes and light clients to query chain state (balances, storage slots, contract bytecode) without revealing to RPC providers which addresses or contracts they are monitoring. The 2026 sharded PIR design by Ali Atiia and Keewoo Lee matches each slice of Ethereum data to the most efficient PIR scheme, achieving practical privacy with manageable overhead.

Why Privacy Matters for Chain Queries

Standard Ethereum light clients query full nodes or RPC providers (Infura, Alchemy) for account balances, storage reads, and transaction receipts. These queries reveal:

  • Which addresses a user monitors (inferring wallet holdings)
  • Which contracts a user interacts with (inferring DeFi positions)
  • Timing of balance checks (inferring trading intent)

Even when transaction data is public, the pattern of queries can de-anonymize users more thoroughly than on-chain analysis.

PIR Taxonomy

Four categories of PIR scheme, differentiated by where state is held:

CategoryClient StateServer StateRepresentative SchemesBest For
Stateless client / Stateful serverNoneFull historySealPIR, SpiralSmall, append-only DB
Stateless client / Stateless serverNoneNone (amortized)YPIR, HintlessPIRLarge, infrequently updated DB
Stateful client / Download-hintHint (one-time download)NoneSimplePIRLarge DB, rare updates
Stateful client / Interactive-hintHint (online refresh)NonePiano, PlinkoLarge DB, frequent updates

Hint: a compact summary of the database that allows sublinear communication. Download once, refresh incrementally.

Key Performance Parameters

  • Communication overhead: bytes exchanged per query beyond the plain data size
  • Server computation: per-query work on the server side (often dominates)
  • Client storage: size of hint maintained by client
  • Update cost: overhead to incorporate database changes

Sharded PIR Design

The key insight: Ethereum’s state is not a monolithic database. Different slices have radically different access patterns, sizes, and update frequencies. Matching each slice to the optimal PIR scheme reduces overall overhead significantly.

Ethereum Data Slices

SliceSizeUpdate FrequencyChurn RateOptimal Scheme
Contract bytecodeSmallNone (immutable)ZeroStateless client / stateless server (YPIR, HintlessPIR)
Latest block headersSmallPer-blockFullStateless client / stateful server (SealPIR, Spiral)
Hot state trie (active accounts/storage)LargePer-blockLowSidecar pattern (Piano/Plinko with hint refresh)
Historical state / archivesVery largeAppend-onlyNone (immutable once finalized)Sidecar pattern (SimplePIR with one-time hint)

Sidecar Pattern

For large databases with per-block updates (hot state trie), the sidecar pattern separates:

  1. Sidecar: small, frequently updated index of new/changed entries
  2. Main DB: large, rarely changing base

Queries check the sidecar first (cheap, standard PIR); if not found, query the main DB (more expensive but infrequent). This dramatically reduces per-query cost for databases where most entries are unchanged across blocks.

Ideal for: Ethereum state trie — ~15M active accounts, ~1M changed per block. A client queries at most one account per block frequently, but the full database is 15M entries.

Privacy Equivalence Theorem

For sharded PIR to achieve the same privacy as monolithic PIR:

Construction: for a query to shard i, the client sends:

  • 1 genuine PIR query to shard i’s engine
  • Decoy PIR queries to all other shard engines (j ≠ i)

Privacy: from any single engine’s view, the client sent exactly one query — indistinguishable from a genuine query to that shard. The server cannot determine which shard contains the accessed item.

Cost: N PIR queries per access (N = number of shards). But since shards use different engines optimized for their data type, the aggregate cost is lower than a monolithic PIR over the full database.

Ethereum-Specific Optimizations

Merkle Proof PIR

A user querying account balance often also needs a Merkle proof (for eth_getProof). Standard PIR returns a single leaf; proving the path requires per-level PIR queries.

MPT (Merkle Patricia Trie): 9 levels on average for a state trie of size 2²⁵. Per-level PIR overhead: ~48× total storage overhead for proof retrieval.

UBT (Unbalanced Binary Trie): recent EIP proposals replace MPT with binary tries. PIR overhead per level is lower: ~9× total overhead for proof retrieval. UBT simplifies PIR significantly.

SNARK-ified Archival State

Historical state (e.g., “what was account X’s balance at block 15M?”) requires archival nodes. PIR over the full archive is expensive. Alternative: the archival node generates a SNARK proof of the historical value, and the client verifies the proof rather than accessing the archive directly. Privacy: only the prover learns which historical value was queried, but the prover’s proof reveals nothing about why it was requested.

DEPIR (Distributed Encoded PIR)

Theoretically achieves sublinear server computation by distributing the database across multiple non-colluding servers. Each server holds an encoded share; the client combines responses.

Practical limitation: current DEPIR constructions are concretely impractical — the encoding overhead exceeds the savings from sublinear computation for Ethereum-scale databases. Listed as a research direction, not a deployed solution.

Decision Tree for PIR Scheme Selection

Is the database small (<1GB)?
├── Yes → Is it frequently updated?
│   ├── Yes → Stateless client / stateful server (SealPIR, Spiral)
│   └── No  → Stateless client / stateless server (YPIR, HintlessPIR)
└── No  → Is the churn rate low (<1% per update)?
    ├── Yes → Sidecar pattern
    │   ├── Append-only → SimplePIR (one-time hint download)
    │   └── Updates → Piano / Plinko (interactive hint refresh)
    └── No  → Multiple databases needed; re-apply decision tree per slice

Practical Overhead

Current estimates for Ethereum state PIR (approximate, scheme-dependent):

  • Single balance query (SimplePIR with downloaded hint): ~1-2 KB communication, <1ms server compute
  • Full account proof (per-level PIR, MPT): ~200-400 KB total (9 levels × ~25 KB per level)
  • Full account proof (per-level PIR, UBT): ~50-100 KB total (9 levels × ~6 KB per level)
  • One-time hint download (SimplePIR, hot state): ~500 MB (once; amortized over many queries)

Integration with Ethereum Clients

PIR for Ethereum requires:

  1. State availability: full nodes or dedicated PIR servers expose state databases in PIR-queryable format
  2. Protocol support: light client protocols (Portal Network, snap/2) need PIR-compatible query interfaces
  3. Client implementation: wallets replace direct eth_getBalance calls with PIR queries to non-colluding servers

Currently research/prototype stage. No production wallet implements PIR. The Portal Network’s sharded history design is architecturally compatible with sharded PIR.

Open Questions

❓ What is the concrete overhead of Piano/Plinko for Ethereum’s hot state trie at 15M account scale?

❓ Can a single PIR server suffice, or does multi-server PIR require non-collusion assumptions that are hard to guarantee?

❓ How does UBT adoption (if EIP accepted) change PIR feasibility for Merkle proof queries?

❓ What is the minimum server trust assumption? Most schemes assume honest-but-curious servers; can PIR protect against malicious servers?

Timeline

  • 2026-03-30 — Sharded PIR for Ethereum State paper published by Ali Atiia and Keewoo Lee

See Also