Private Information Retrieval (PIR) for Ethereum

Summary

Private Information Retrieval (PIR) allows a client to read data from a server without the server learning which item was accessed. Applied to Ethereum, PIR enables wallet nodes and light clients to query chain state (balances, storage slots, contract bytecode) without revealing to RPC providers which addresses or contracts they are monitoring. The 2026 sharded PIR design by Ali Atiia and Keewoo Lee matches each slice of Ethereum data to the most efficient PIR scheme, achieving practical privacy with manageable overhead.

Why Privacy Matters for Chain Queries

Standard Ethereum light clients query full nodes or RPC providers (Infura, Alchemy) for account balances, storage reads, and transaction receipts. These queries reveal:

Which addresses a user monitors (inferring wallet holdings)
Which contracts a user interacts with (inferring DeFi positions)
Timing of balance checks (inferring trading intent)

Even when transaction data is public, the pattern of queries can de-anonymize users more thoroughly than on-chain analysis.

PIR Taxonomy

Four categories of PIR scheme, differentiated by where state is held:

Category	Client State	Server State	Representative Schemes	Best For
Stateless client / Stateful server	None	Full history	SealPIR, Spiral	Small, append-only DB
Stateless client / Stateless server	None	None (amortized)	YPIR, HintlessPIR	Large, infrequently updated DB
Stateful client / Download-hint	Hint (one-time download)	None	SimplePIR	Large DB, rare updates
Stateful client / Interactive-hint	Hint (online refresh)	None	Piano, Plinko	Large DB, frequent updates

Hint: a compact summary of the database that allows sublinear communication. Download once, refresh incrementally.

Key Performance Parameters

Communication overhead: bytes exchanged per query beyond the plain data size
Server computation: per-query work on the server side (often dominates)
Client storage: size of hint maintained by client
Update cost: overhead to incorporate database changes

Sharded PIR Design

The key insight: Ethereum’s state is not a monolithic database. Different slices have radically different access patterns, sizes, and update frequencies. Matching each slice to the optimal PIR scheme reduces overall overhead significantly.

Ethereum Data Slices

Slice	Size	Update Frequency	Churn Rate	Optimal Scheme
Contract bytecode	Small	None (immutable)	Zero	Stateless client / stateless server (YPIR, HintlessPIR)
Latest block headers	Small	Per-block	Full	Stateless client / stateful server (SealPIR, Spiral)
Hot state trie (active accounts/storage)	Large	Per-block	Low	Sidecar pattern (Piano/Plinko with hint refresh)
Historical state / archives	Very large	Append-only	None (immutable once finalized)	Sidecar pattern (SimplePIR with one-time hint)

Sidecar Pattern

For large databases with per-block updates (hot state trie), the sidecar pattern separates:

Sidecar: small, frequently updated index of new/changed entries
Main DB: large, rarely changing base

Queries check the sidecar first (cheap, standard PIR); if not found, query the main DB (more expensive but infrequent). This dramatically reduces per-query cost for databases where most entries are unchanged across blocks.

Ideal for: Ethereum state trie — ~15M active accounts, ~1M changed per block. A client queries at most one account per block frequently, but the full database is 15M entries.

Privacy Equivalence Theorem

For sharded PIR to achieve the same privacy as monolithic PIR:

Construction: for a query to shard i, the client sends:

1 genuine PIR query to shard i’s engine
Decoy PIR queries to all other shard engines (j ≠ i)

Privacy: from any single engine’s view, the client sent exactly one query — indistinguishable from a genuine query to that shard. The server cannot determine which shard contains the accessed item.

Cost: N PIR queries per access (N = number of shards). But since shards use different engines optimized for their data type, the aggregate cost is lower than a monolithic PIR over the full database.

Ethereum-Specific Optimizations

Merkle Proof PIR

A user querying account balance often also needs a Merkle proof (for eth_getProof). Standard PIR returns a single leaf; proving the path requires per-level PIR queries.

MPT (Merkle Patricia Trie): 9 levels on average for a state trie of size 2²⁵. Per-level PIR overhead: ~48× total storage overhead for proof retrieval.

UBT (Unbalanced Binary Trie): recent EIP proposals replace MPT with binary tries. PIR overhead per level is lower: ~9× total overhead for proof retrieval. UBT simplifies PIR significantly.

SNARK-ified Archival State

Historical state (e.g., “what was account X’s balance at block 15M?”) requires archival nodes. PIR over the full archive is expensive. Alternative: the archival node generates a SNARK proof of the historical value, and the client verifies the proof rather than accessing the archive directly. Privacy: only the prover learns which historical value was queried, but the prover’s proof reveals nothing about why it was requested.

DEPIR (Distributed Encoded PIR)

Theoretically achieves sublinear server computation by distributing the database across multiple non-colluding servers. Each server holds an encoded share; the client combines responses.

Practical limitation: current DEPIR constructions are concretely impractical — the encoding overhead exceeds the savings from sublinear computation for Ethereum-scale databases. Listed as a research direction, not a deployed solution.

Decision Tree for PIR Scheme Selection

Is the database small (<1GB)?
├── Yes → Is it frequently updated?
│   ├── Yes → Stateless client / stateful server (SealPIR, Spiral)
│   └── No  → Stateless client / stateless server (YPIR, HintlessPIR)
└── No  → Is the churn rate low (<1% per update)?
    ├── Yes → Sidecar pattern
    │   ├── Append-only → SimplePIR (one-time hint download)
    │   └── Updates → Piano / Plinko (interactive hint refresh)
    └── No  → Multiple databases needed; re-apply decision tree per slice

Practical Overhead

Current estimates for Ethereum state PIR (approximate, scheme-dependent):

Single balance query (SimplePIR with downloaded hint): ~1-2 KB communication, <1ms server compute
Full account proof (per-level PIR, MPT): ~200-400 KB total (9 levels × ~25 KB per level)
Full account proof (per-level PIR, UBT): ~50-100 KB total (9 levels × ~6 KB per level)
One-time hint download (SimplePIR, hot state): ~500 MB (once; amortized over many queries)

Integration with Ethereum Clients

PIR for Ethereum requires:

State availability: full nodes or dedicated PIR servers expose state databases in PIR-queryable format
Protocol support: light client protocols (Portal Network, snap/2) need PIR-compatible query interfaces
Client implementation: wallets replace direct eth_getBalance calls with PIR queries to non-colluding servers

Currently research/prototype stage. No production wallet implements PIR. The Portal Network’s sharded history design is architecturally compatible with sharded PIR.

Open Questions

❓ What is the concrete overhead of Piano/Plinko for Ethereum’s hot state trie at 15M account scale?

❓ Can a single PIR server suffice, or does multi-server PIR require non-collusion assumptions that are hard to guarantee?

❓ How does UBT adoption (if EIP accepted) change PIR feasibility for Merkle proof queries?

❓ What is the minimum server trust assumption? Most schemes assume honest-but-curious servers; can PIR protect against malicious servers?

Timeline

2026-03-30 — Sharded PIR for Ethereum State paper published by Ali Atiia and Keewoo Lee