Written by Ron Rothblum
The overhead of proving over computing
A Succinct Non-Interactive Argument of Knowledge (SNARK) lets an untrusted party prove they did a computation correctly by handing over a small receipt that anyone can check in a fraction of the time it would take to redo the work themselves.
But proving comes at a cost: generating the proof is more expensive than simply doing the computation. There is no single yardstick to measure this cost, and measures won’t always agree on the details, but they agree on the verdict. The overhead is formidable.
The question, then, is how small does the proof actually have to be?
As Justin Thaler lays out, for state-of-the-art zkVMs, proving a computation can cost ten thousand to a million times more than simply performing it. As steep as that sounds, proving general computation was barely feasible at any speed only a few years ago. Thanks to leaps in engineering and research, the overhead has fallen steadily.
Flock is a continued step in reducing proving cost.
Flock: standard hashes at small overhead
Flock is a SNARK for proving batches of Boolean computations, developed together with Benedikt Bünz (Espresso, NYU) and William Wang (NYU).
Focusing on Boolean computations is already a departure from how most SNARKs work, which natively prove arithmetic computation over large prime fields. Flock operates directly over Boolean circuits — the bit-level AND and XOR operations that real computation and standard cryptography are made of. We specifically target the workloads that dominate in practice: large batches of standard cryptographic hash functions, such as Keccak, SHA-256, and BLAKE3.
On a single core of an Apple M4 Max, Flock proves standard hash functions at less than 250× the cost of computing them. This measures wall-clock time, which is a tricky yardstick that depends on the underlying hardware and a handful of measurement choices. First, the native baseline does not use Apple's dedicated SHA-256 and SHA-3 instructions. They accelerate only these specific functions, but would not apply to other Boolean computations. Second, the native hashes are run one at a time, which makes it harder to exploit SIMD.
The exact figure also varies by hash. Roughly 170x for SHA-256 and 245× for Keccak. Caveats and all, we find it striking that proving a standard hash costs only a couple hundred times more than computing it. Concretely, on one core Flock proves:
- 82,100 BLAKE3 compressions per second,
- 42,100 SHA-256 compressions per second,
- 30,700 Keccak-f[1600] permutations per second,
The implementation parallelizes cleanly: on ten cores, BLAKE3 throughput climbs past 660,000 compressions per second.
Flock is, to our knowledge, the fastest prover for each of these standard hashes:
- On SHA-256, about 8.4× faster than Binius64, the previous state of the art.
- On BLAKE3, about 14× faster than both Binius64 and Plonky3.
- On Keccak, about 1.8× faster than Hashcaster on a single core (around 4× multi-threaded).
- Prime-field and group-based systems are far behind: Flock is roughly 10–40× faster than Plonky3, and hundreds of times faster than Spartan.
The gap widens sharply against general-purpose zkVMs, which run four to six orders of magnitude over native execution. Flock gives up generality outright, proving batches of a fixed Boolean circuit instead of arbitrary programs. For many applications dominated by standard hashing, that is a tradeoff well worth making. Moreover, we believe the ideas behind Flock can carry over to zkVMs in time.
Why Boolean, why standard, why batch?
Strip almost any SNARK that runs in production down to where it spends its time, and the same thing turns up underneath it all: a cryptographic hash, evaluated over and over. Hashing is the workhorse of the systems people actually run, so the target that matters is proving batches of hashes, and proving them fast. That's what Flock is built for. From that goal, three design choices follow: the Boolean circuit model, standardized hash functions, and batched proving.
Why Boolean circuits. SNARK constructions rely on deep mathematical structure, and the operations native to them are, usually, arithmetic over a large prime field. Emulating non-native operations — e.g. anything bit-level — carries a steep tax. The standard cryptographic hashes were not designed for SNARK-friendliness. They are bit-shuffling primitives at heart, and forcing them through a prime-field SNARK is exactly the case where that tax is highest. Representing them as Boolean circuits, by contrast, is trivial. Better still, the same thing that makes a computation cheap to run — fewer bit operations — is what makes it cheap for Flock to prove. The effort already spent optimizing the native code carries straight through to the prover.
Why standard hashes, not "SNARK-friendly" ones. Much of the SNARK literature reaches instead for hash functions co-designed with the proof system — Poseidon, Rescue, MiMC — chosen precisely because they are cheap to prove. They bring two problems.
- Security: these designs are far less battle-tested than SHA-256, Keccak, or BLAKE3, and their heavy algebraic structure has repeatedly opened the door to attacks.
- Lock-in. A SNARK-friendly hash is cheap to prove only inside a proof system built over the one field it was designed for. It effectively enshrines that field. The efficiency gains disappear if the proof systems are swapped or compose over different fields, forcing the application designer into a false choice: optimize the hash for the prover and pay for it every time the hash runs natively, or optimize for native speed and pay for it in the prover.
Targeting a standard hash dissolves that choice. The hash is fixed by the application and runs identically whether or not a proof is ever produced, so the only question left is how cheaply it can be proved.
Why batch. The last choice is to prove a batch — many independent runs of the same circuit — rather than one arbitrary computation. This may sound like a restriction, but it is quite pragmatic a typical SNARK applications spend the vast majority of their time proving correctness of many hashes. Batching is also what keeps the prover fast and the system simple by sidestepping the expensive permutation and memory-checking machinery, and lets the verifier's work stay tied to a single instance, even as the batch grows into the millions.
Under the hood
Our starting point is the line of work on "proving as fast as computing" (RR24, RR25, HR22, ARR25), which showed that repeated Boolean computation can be proved nearly as fast as it is run. Flock relies on binary fields rather than the more popular prime-order fields used in most other projects. While binary fields were used beforehand in these theoretical works, their remarkable practical utility was first demonstrated by the Binius line of work (Binius, Binius64); in particular, Flock relies on their elegant ring-switching technique.
A fair amount of the speedup comes from very simple design choices: (1) binary fields, (2) the batch setting, and (3) using R1CS to reduce the witness size. These choices strip away much of what other proof-systems pay and focus the inner machinery onto a single bottleneck: a simple zerocheck protocol for checking the logical AND of bit vectors. Once the zerocheck is isolated as the one thing worth attacking, the returns are outsized. Continuing a line of work on optimizing the sumcheck protocol, Flock introduces new ideas that substantially speed up the bit-zerocheck protocol.
It is worth noting however that Flock is a research prototype and we do not recommend it for production use yet. There are a handful of important limitations, which are discussed in detail in the paper.
Hash-based signatures for post quantum
The throughput above can translate into real, application-level rates. The one we are most excited about is the post-quantum transition.
A large enough quantum computer would break the elliptic-curve signatures that secure essentially every blockchain today. The most conservative and most trusted replacements are hash-based signatures (Lamport, Winternitz, XMSS) schemes whose security rests on nothing beyond the hash function itself, and whose cost is almost entirely hashing. Verifying one is a few hundred hash evaluations; verifying and aggregating the thousands in a block is exactly the batched-hashing workload Flock is built for.
The Ethereum Foundation's Lean VM (leanvm) is being designed specifically to carry the post-quantum transition through SNARKs, with hashing as its dominant cost. A single signature in the multi-signature scheme proposed for Ethereum's quantum transition costs about 160 hash invocations (DKKW25). Instantiated with BLAKE3, Flock's ten-core throughput of 660,000 compressions per second is enough to prove the hashing for roughly 4,000 transactions per second — comfortably past the 200,000 hashes per second that Vitalik Buterin has estimated (Buterin) would suffice for a post-quantum Ethereum. Bitcoin, more conservative, will likely reach for SHA-256-based signatures; there, Flock can prove the hashing for 5,000 signatures — a full block — in under 3 seconds on consumer hardware. These figures count only the hashing inside signature verification — the dominant cost, but not the whole transaction.
What’s next?
A ~250× overhead over native execution is, we believe, a significant milestone. However it is now just a new starting point for additional innovation. We are already exploring ways to drive it down substantially.
Flock is a research prototype, and we don't recommend it for production use yet. However, the code is open source and the paper has the full protocol and benchmarks. If you are working on post-quantum signatures, verifiable delay functions, or anything dominated by standard hashing, please try it and help push the overhead lower.
Lastly, we are exploring how to fold Flock's core ideas into SP1, bringing the same speedups to a general-purpose zkVM where they accelerate succinct and zero-knowledge proving for arbitrary programs, not just batched hashing.
Read the full paper and review the code here.
References
- Thaler — Justin Thaler. The path to secure and efficient zkVMs: How to track progress. a16z crypto, 2025.
- RR24 — Noga Ron-Zewi and Ron Rothblum. Local Proofs Approaching the Witness Length. J. ACM, 2024.
- RR25 — Noga Ron-Zewi and Ron Rothblum. Proving as Fast as Computing: Succinct Arguments with Constant Prover Overhead. J. ACM, 2025.
- HR22 — Justin Holmgren and Ron D. Rothblum. Faster Sounder Succinct Arguments and IOPs. CRYPTO 2022.
- ARR25 — Noor Athamnah, Noga Ron-Zewi, and Ron D. Rothblum. Linear Prover IOPs in Log Star Rounds. TCC 2025.
- Binius — Benjamin E. Diamond and Jim Posen. Succinct Arguments over Towers of Binary Fields. EUROCRYPT 2025.
- Binius64 — Irreducible Team. Binius64. GitHub, 2025.
- leanvm — lean Ethereum. leanVM: A Minimal zkVM for Lean Ethereum. GitHub, 2025.
- DKKW25 — Justin Drake, Dmitry Khovratovich, Mikhail Kudinov, and Benedikt Wagner. Hash-Based Multi-Signatures for Post-Quantum Ethereum. Cryptology ePrint Archive 2025/055.
- Buterin — Vitalik Buterin. Possible Futures of the Ethereum Protocol, Part 4: The Verge. 2024.
- CFW26 — Alessandro Chiesa, Giacomo Fenzi, and Guy Weissenberg. Zero-Knowledge IOPPs for Constrained Interleaved Codes. Cryptology ePrint Archive 2026/391.
- DHRR26 — Rahul Dalal, Tamir Hemo, Eugene Rabinovich, and Ron D. Rothblum. VEIL: Lightweight Zero-Knowledge for Hash-Based Multilinear Proof Systems. Cryptology ePrint Archive 2026/683.