Citation

Seoev, A., Belousov, D., Smirnova, A., Kurinova, K., Smirnov, A., Fedyanin, D., Yanovich, Y. “The Origins of MEV: Systematic Attribution of Arbitrage Opportunity Creation at Scale.” arXiv:2604.27979v1 [cs.DC] (Apr 30, 2026). MEV-X / MIPT / HSE / Skoltech. Targeting SIGCOMM ‘26.

Core Question

Most MEV research focuses on extraction — how searchers identify and capture value. This paper inverts the question: which transactions create the opportunities that searchers then capture? Without attribution, we cannot answer:

  • Which protocols generate the most arbitrage opportunities?
  • Which users unintentionally create value for extractors?
  • How is MEV-creation distributed across the supply chain?

Central Hypothesis

In competitive MEV markets, arbitrage opportunities are predominantly created by single transactions, not multi-transaction sequences. Reason: searchers extract immediately upon detecting an opportunity rather than waiting for hypothetically-better future composite states.

Empirical validation: 96.7% of atomic arbitrage opportunities can be attributed to a single source transaction.

Four Attribution Methods

MethodApproachTradeoffs
Bot-data-drivenUse searcher bidding behavior as ground-truth proxyMost accurate when data available; depends on observable bidding
Simulation-basedReplay transactions, isolate state perturbations introduced by each candidate sourceMost rigorous; computationally heaviest; requires archive node
Coefficient-based / amplificationApply analytical models (e.g., AMM bonding curves) to estimate price impactCheapest; analytical approximations may miss compositional effects
Shapley-basedCooperative game theory: distribute MEV value across all candidate sources by Shapley valueMost principled; expensive (combinatorial); fair across multi-source scenarios

Methodology Foundation

The attribution framework rests on the deterministic state machine property of EVM-style chains: replay an alternative transaction sequence → get a precisely-defined alternative state. This makes counterfactual reasoning tractable, even though Solana/TON have parallel execution and sharding has cross-shard concerns.

For atomic arbitrage specifically (the focus): formal criteria from Vostrikov et al. — multi-swap (≥2), per-asset Δ ≥ 0, profit > 0 after fees and priority bids. Atomic, on-chain only, no off-chain coordination.

Empirical Setup

  • Data: 1,050,000 Polygon blocks (March 2026)
  • Atomic arbitrage events identified using the formal three-condition test
  • Compares all four methods on the same dataset for accuracy/cost trade-offs

Key Findings

  1. 96.7% of atomic arbitrage opportunities trace to a single source transaction — single-source hypothesis confirmed.
  2. MEV creation is highly concentrated — a small subset of protocols generates most opportunities (concrete protocol breakdown in the paper’s Section 5).
  3. Method comparison: bot-data is most accurate where bidding data exists; simulation is the rigorous default when bot data unavailable; coefficient/Shapley occupy different cost-accuracy points.
  4. Polygon-specific empirical work, but the formal framework is chain-agnostic for any deterministic-state-machine blockchain with replayable history.

Why This Matters

  • For protocol designers: identifies which DEX/lending/oracle patterns systematically leak MEV. Targeted protocol-level fixes (oracle-first ordering, priority-update registry, ACE patterns) become measurable improvements.
  • For validators / proposers: optimal transaction ordering becomes informed by which sources will create the highest-Shapley downstream MEV.
  • For analysts: ecosystem-health metrics can now include per-protocol MEV-creation footprint, not just per-searcher MEV-capture.
  • For research: closes a major gap in the MEV literature — moves the field from extraction-side to supply-chain-side analysis.

Connection to Wiki

Open Questions

❓ How does the 96.7% single-source rate hold on Ethereum mainnet vs Polygon? (paper’s empirics are Polygon-only).

❓ Can attribution be extended in real-time to feed into ACE/oracle-first ordering decisions, not just retrospective analysis?

❓ What fraction of “MEV creation” comes from informed traders (intentional, fee-paying) vs uninformed (retail, victim) sources? Distributional implications for “MEV is just a tax” framings.

See Also