Citation
Seoev, A., Belousov, D., Smirnova, A., Kurinova, K., Smirnov, A., Fedyanin, D., Yanovich, Y. “The Origins of MEV: Systematic Attribution of Arbitrage Opportunity Creation at Scale.” arXiv:2604.27979v1 [cs.DC] (Apr 30, 2026). MEV-X / MIPT / HSE / Skoltech. Targeting SIGCOMM ‘26.
Core Question
Most MEV research focuses on extraction — how searchers identify and capture value. This paper inverts the question: which transactions create the opportunities that searchers then capture? Without attribution, we cannot answer:
- Which protocols generate the most arbitrage opportunities?
- Which users unintentionally create value for extractors?
- How is MEV-creation distributed across the supply chain?
Central Hypothesis
In competitive MEV markets, arbitrage opportunities are predominantly created by single transactions, not multi-transaction sequences. Reason: searchers extract immediately upon detecting an opportunity rather than waiting for hypothetically-better future composite states.
Empirical validation: 96.7% of atomic arbitrage opportunities can be attributed to a single source transaction.
Four Attribution Methods
| Method | Approach | Tradeoffs |
|---|---|---|
| Bot-data-driven | Use searcher bidding behavior as ground-truth proxy | Most accurate when data available; depends on observable bidding |
| Simulation-based | Replay transactions, isolate state perturbations introduced by each candidate source | Most rigorous; computationally heaviest; requires archive node |
| Coefficient-based / amplification | Apply analytical models (e.g., AMM bonding curves) to estimate price impact | Cheapest; analytical approximations may miss compositional effects |
| Shapley-based | Cooperative game theory: distribute MEV value across all candidate sources by Shapley value | Most principled; expensive (combinatorial); fair across multi-source scenarios |
Methodology Foundation
The attribution framework rests on the deterministic state machine property of EVM-style chains: replay an alternative transaction sequence → get a precisely-defined alternative state. This makes counterfactual reasoning tractable, even though Solana/TON have parallel execution and sharding has cross-shard concerns.
For atomic arbitrage specifically (the focus): formal criteria from Vostrikov et al. — multi-swap (≥2), per-asset Δ ≥ 0, profit > 0 after fees and priority bids. Atomic, on-chain only, no off-chain coordination.
Empirical Setup
- Data: 1,050,000 Polygon blocks (March 2026)
- Atomic arbitrage events identified using the formal three-condition test
- Compares all four methods on the same dataset for accuracy/cost trade-offs
Key Findings
- 96.7% of atomic arbitrage opportunities trace to a single source transaction — single-source hypothesis confirmed.
- MEV creation is highly concentrated — a small subset of protocols generates most opportunities (concrete protocol breakdown in the paper’s Section 5).
- Method comparison: bot-data is most accurate where bidding data exists; simulation is the rigorous default when bot data unavailable; coefficient/Shapley occupy different cost-accuracy points.
- Polygon-specific empirical work, but the formal framework is chain-agnostic for any deterministic-state-machine blockchain with replayable history.
Why This Matters
- For protocol designers: identifies which DEX/lending/oracle patterns systematically leak MEV. Targeted protocol-level fixes (oracle-first ordering, priority-update registry, ACE patterns) become measurable improvements.
- For validators / proposers: optimal transaction ordering becomes informed by which sources will create the highest-Shapley downstream MEV.
- For analysts: ecosystem-health metrics can now include per-protocol MEV-creation footprint, not just per-searcher MEV-capture.
- For research: closes a major gap in the MEV literature — moves the field from extraction-side to supply-chain-side analysis.
Connection to Wiki
- Complements Arbitrage: CEX-DEX and AMM Arb and Exclusive Order Flow and the Builder Flywheel — explains the origin side rather than capture side.
- Validates the single-source / immediate-extraction assumption that the Paper: Timing Games — Probabilistic Backrunning and Spam (Flashbots/Offchain Labs) / probabilistic-backrunning literature (Mazorra et al.) implicitly relies on.
- The Shapley-based method dovetails with mechanism-design proposals like Boost+ (Paper: Boost+ — Equitable, Incentive-Compatible Block Building) that need to attribute revenue fairly.
Open Questions
❓ How does the 96.7% single-source rate hold on Ethereum mainnet vs Polygon? (paper’s empirics are Polygon-only).
❓ Can attribution be extended in real-time to feed into ACE/oracle-first ordering decisions, not just retrospective analysis?
❓ What fraction of “MEV creation” comes from informed traders (intentional, fee-paying) vs uninformed (retail, victim) sources? Distributional implications for “MEV is just a tax” framings.
See Also
- Arbitrage: CEX-DEX and AMM Arb — atomic arbitrage taxonomy and CEX-DEX arb scale
- Exclusive Order Flow and the Builder Flywheel — capture side of the same supply chain
- Paper: Boost+ — Equitable, Incentive-Compatible Block Building — fair-revenue-attribution-needing mechanism
- Paper: Timing Games — Probabilistic Backrunning and Spam (Flashbots/Offchain Labs) — equilibrium model that assumes single-shot extraction