SP1 Turbo: the world’s fastest zkVM just got faster

SP1 Turbo: the world’s fastest zkVM just got faster

SP1 Turbo (v4.0.0) is the latest upgrade to SP1 and offers significant cost and latency improvements. SP1 Turbo is a blazing fast zkVM with best-in-class performance for a variety of ZK workloads–including rollups (zkEVMs), light clients, signature verification and other blockchain computations.

  • SP1 Turbo offers the fastest speeds & cheapest costs for a wide variety of ZK workloads.
  • SP1 Turbo has unrivaled latency when running across a cluster of parallelized GPUs, inching closer to real-time Ethereum proving.
  • This release introduces new precompiles, such as Secp256R1 and RSA signature verification.
  • Get started with SP1 Turbo today by reading our docs.

SP1 Turbo Performance 

SP1 offers best-in-class performance across cost, latency, and throughput benchmarks on a wide variety of ZK workloads. This is possible thanks to hardcore performance engineering by our team, who spent months on CUDA optimizations for better GPU utilization. 

We show the progression of SP1’s performance throughout time, since its initial release in February, which shows the story of a “ZK Moore’s law” in action. Note that points in this graph aren’t fully comparable because SP1’s initial release only had CPU proving, whereas recent versions are benchmarked on GPU nodes, and performance varies by program. Nonetheless, performance is exponentially improving. 

Caption: SP1 performance over time [1].

Single GPU Performance Benchmarks

To demonstrate how performant SP1 Turbo is on a single GPU node, we compare its proving time on a set of benchmarking programs against SP1 V3.0.0. For consistency, we benchmark both versions on the same cloud GPU machine (AWS g6.xlarge). We measure the end-to-end proving times with recursion/constant proof sizes enabled.

Program

SP1 Turbo E2E Proving Time

SP1 V3.0.0 Proving Time

Improvement

Loop-1M

2.04 secs

13.1 secs

6.4x

Loop-100M

95.7 secs

239.7 secs

2.5x

Fibonacci

6.1 secs

23.2

3.8x

Tendermint

21.4 secs

38.5 secs

1.8x

Reth

682.6 secs

1120 secs [2]

1.6x

SP1 Turbo consistently outperforms SP1 V3.0.0 across a wide range of programs and workload sizes, both with and without precompiles. Additionally, SP1 Turbo achieves proving costs as low as a few cents for typical Ethereum mainnet blocks.

A more comprehensive set of benchmarks is available on the SP1 Datasheet. The benchmarks are reproducible using our benchmarking repository.

Multi GPU Performance Benchmarks

While benchmarks run on a single GPU provide a standardized and replicable environment for comparing zkVMs and are also useful for extrapolating proving costs (by multiplying proving time by instance cost), they do not tell the whole story when it comes to latency, which is the end to end time required to prove a program. 

Parallelizing proof generation for a single program across many GPUs can make the latency much lower than proving on a single machine. Note that the proving costs stay relatively constant, as we use a larger number of machines, each for a shorter amount of time [3]. Parallelizing proof generation is natural for SP1, given that SP1 already “shards” a proof of a long program into ~2 million cycle chunks and generates individual proofs of each shard, and combines them recursively afterward to get to a single final STARK.

SP1 Turbo offers incredible speeds when proven on a cluster of many GPUs in the cloud. Note that single-machine performance of SP1 varies between 900 KHz and a few MhZ depending on the workload and machine instance, showing that parallelizing proving across many GPUs can offer significant latency gains. Developers wishing to take advantage of this can use Succinct’s Prover Network Beta today. 

SP1 Turbo gets us close to real-time Ethereum proving, proving real Ethereum mainnet blocks in < 40 seconds. Check out the proving time of some real Ethereum blocks on our cluster.

New Precompiles in SP1 Turbo

We have brand new precompiles in this release, including secp256r1 and bigint arithmetic for efficient RSA signature verification. Precompiles are specialized circuits for intensive cryptographic operations. They offer an order of magnitude performance gain for blockchain workloads that make heavy use of cryptography like hash functions and signature verification. Precompiles are a large reason why SP1 is the fastest zkVM for zkEVM and other rollup workloads.

With these precompiles, SP1 is the market leading zkVM in terms of the variety of supported precompiles.

SP1 Turbo Behind the Scenes: Hardcore Performance Engineering and New Cryptography

In SP1 Turbo we use a new memory consistency argument via elliptic curve based multiset hash functions. Memory consistency is a major component in a zkVM, which ensures that in the execution, when data is read from a given memory address, the value seen there is the value that was last written to the same address. In SP1 V1, we follow Spartan's off-line memory checking approach (itself based on Blum et al. [BEG+94]).

The approach used in SP1 Turbo has significant advantages over other approaches that have been considered:

  • Merkelizing Memory - introduces a significant overhead from opening Merkle paths.
  • Fingerprinting + grand product arguments (aka logUp and GKR/Thaler, used in SP1 V1) - these solutions require verifier randomness, which is obtained from Fiat-Shamir. This necessitates completing the computation before we can proceed to process the memory argument. In our current solution, we run the memory argument on-the-fly as the computation proceeds.

We have a detailed note on this topic written by our Head of Cryptography @ronrothblum and Security Lead @rkm0959.

Use it Today and Reach Out

SP1 Turbo is available today and is ready for production usage. Developers can get started with it locally or by using our prover network beta. Learn more from our documentation. We also have a migration guide for users upgrading from V3 to Turbo.

Please reach out if you’re interested in trying out SP1 Turbo or generally using any of our integrations

or building anything else with SP1.

Citations

[BEG+94] Blum, M., Evans, W., Gemmell, P., Kannan, S., & Naor, M. (1994). Checking the correctness of memories. Algorithmica, 12, 225-244.

Footnotes

[1] MHz is computed for SP1 V1, V3 and Turbo on a single 4090 GPU instance on a program with a Fibonacci computation with ~400M cycles without recursion. With recursion, the performance for SP1 V3/V4 is 0.7/3.0MHz respectively. The SP1 v.06 and v.01 performance numbers are taken from earlier blog posts and are not fully comparable because they are measured on a CPU instance and the benchmarked program is a Fibonacci workload with much smaller # of cycles. 

[2] This number is from an older version of our Ethereum block prover known as SP1 Reth. The proving time was extrapolated using the difference in cycle count.

[3] This is assuming there is constant utilization of the many GPUs, which requires some minimal amount of throughput (cycles proven per second) that will saturate the GPUs. If your workload is very bursty (as an example, proving a program with 15B cycles in 1 minute, followed by no proofs for 20 mins), then spinning up many machines and having them lay idle means proof generation costs more. 

Read more