🚢 Succinct Ships: Optimized bn254 & bls12-381 Precompiles in SP1

🚢 Succinct Ships: Optimized  bn254 & bls12-381  Precompiles in SP1

TL;DR: Our intern "Bhargav the Great" went wild coding on SP1 for his summer internship and now Fred Ehrsam follows him on X.

  • He shipped new precompiles for accelerating bn254 and bls12-381 elliptic curve operations, making SP1 the only production-ready zkVM with these available today.
  • These precompiles enable fast proving of the following:
    • verification of groth16 & plonk-kzg proofs within SP1 programs
    • fast bls12-381 EC arithmetic for KZG & blob operations required in Ethereum
    • fast bn254 arithmetic and pairing computation for proving EVM execution (using revm) which gives a massive performance boost to RSP and OP-Succinct.
    • optimized Ethereum ZK Light client with SP1 Helios
  • You can start using it today with SP1 v2.0.0 or v3.0.0, reach out if you have any questions!

Background on bn254, bls12-381 & SP1 precompiles

The bn254 and bls12-381 elliptic curves are commonly used elliptic curves in the Ethereum ecosystem. Many protocols use bls12-381 for digital signatures, like Ethereum’s consensus, and various ZKP protocols (like ZCash). The bn254 curve is so popular that it has been enshrined as an Ethereum precompile, and is commonly used for EVM verification of Groth16 and PlonK-KZG proofs.

Examples of bn254 and bls12-381 usage across crypto ecosystems

Curve

Application

bn254



Groth16 proof verification

PlonK-KZG proof verification

revm bn254 precompiles (add, mul, pairing)


bls12-381

Ethereum light client (verifying aggregate bls12-381 signature of validators)

KZG verification for Ethereum blob verification

Due to their prevalence in the Ethereum ecosystem, these elliptic curve operations are also commonly used in SP1 programs. But elliptic curve arithmetic implemented in Rust can be computationally expensive inside SP1, leading to long proof generation times. Thankfully this problem is solved with SP1’s precompiles that provide a major performance boost.

SP1 Precompiles to the Rescue

SP1's precompiles are specialized STARKs designed to efficiently compute a common operation, like a hash function or elliptic curve arithmetic. A single precompile invocation is used in place of executing many individual CPU instructions required to carry out a complex task. The specialized precompile circuit is much more efficient at proving the computation vs. many cycles required of our RISC-V CPU table, which incurs overhead from needing to move data to/from registers and memory, and also pays prover overhead from accommodating all possible instructions.

By reducing the number of RISC-V cycles significantly, precompiles typically reduce the computational overhead of proving by several orders of magnitude (check the Appendix for benchmarks).

An Interesting Aside on Elliptic Curve Arithmetic

In traditional CPU execution environments, it is far more efficient to do elliptic curve arithmetic in projective space (P²) because of the high cost of finite field inversions and square roots. However, these otherwise costly operations cost a little more than a field multiplication in the zkVM context--a direct consequence of the general philosophy of "verify don't execute". Thus, we simply need to check that a witnessed value is indeed the correct square root or inverse rather than explicitly finding the value inside of SP1.

Dropping the Cycle Count with Blazing Fast Precompiles

Blazing fast groth16 + plonk-KZG proof verification

Using our bn254 precompiles, we built a sp1-snark-verifier library, for verifying Groth16 or PlonK proofs in SP1, which is useful for generic proof aggregation. SP1 itself also generates Groth16 proofs of PlonK proofs for EVM verification, and you can use this library to use SP1 to recursively verify proofs of itself! 

Using the bn254 precompiles  significantly boosts performance by ~20x for both Groth16 and PlonK proof verification.

ZKP System

Before Precompile (cycles)

After Precompiles (cycles)

PlonK Verification

187,227,852

8,078,761

Groth16 Verification

173,953,261

9,390,640

Blazing fast kzg-rs blob verification

The KZG polynomial commitment scheme is a cryptographic method used to efficiently prove polynomial evaluations. In Ethereum, KZG commitments are required for EIP-4844, a recent update that enables Ethereum scaling through ephemeral “blobs” data type. Blobs are essential for Layer 2 rollups to function efficiently because it enables them to publish off-chain transaction data while keeping the consensus chain lightweight.

In kzg-rs, we implement a pure Rust crate for KZG blob and batched blob verification for EIP-4844. kzg-rs is valuable for multiple reasons. Firstly, it is a pure Rust implementation of c-kzg-4844 (a popular KZG implementation), which utilizes C bindings, thereby enhancing auditability and portability. Additionally, kzg-rs is optimized to run inside of SP1 and is currently employed for blob verification and KZG point evaluation in OP Succinct, Reth Succinct Processor (RSP) and Taiko's rollup implementation.

Metric

Non-precompile Version (cycles)

Precompile Version (cycles)

Verify KZG Proof

212,709,402

9,391,832

Verify Blob KZG Proof

265,322,934

27,960,797

Verify Blob KZG Proof Batch (10 Proofs)

1,228,277,089

270,655,817

Blazing fast Ethereum light client using bls12-381 precompiles

We took Helios, a portable Ethereum light client written in Rust, and combined it with SP1 to create SP1-Helios, a ZK Ethereum light client that is useful for cross-chain interoperability.

With our optimized bls12-381 precompiles for signature verification, verifying 512 signatures from the Ethereum sync committee inside SP1 went from 6 billion cycles to only 50 million.

Metric

Non-precompile Version (cycles)

Precompile Version (cycles)

Total Cycles

6,732,566,139

49,387,331

Verify Update Cycles

3,371,549,554

29,968,821

Verify Finality Update Cycles

3,361,016,585

19,418,510

Blazing fast bn254 pairing in revm

We can use our drop-in replacement for substrate-bn, a fork of the original substrate-bn, to accelerate all bn254 precompiles in Revm. Our substate-bn patch is already incorporated into many of our integrations, including op-succinct and rsp. With 0 lines of code changes to revm, we are able to significantly accelerate all bn254 EVM precompiles, including a 20x speedup in the alt_bn128_pair operation.

Operation

Non-Precompile Cycles

Precompile Cycles

alt_bn128_add

170,298

7,616

alt_bn128_mul

1,860,836

141,824

alt_bn128_pair

155,016,170

6,598,690

Accelerate Your SP1 Program Today

Program too slow? Use SP1, the only zkVM with bn254 and bls12-381 precompiles, and experience the dramatic, order of magnitude speed up yourself. Don’t forget to check out our other long list of precompiles, including keccak256, sha256, secp256k1 and much more. Please get in touch with us here if:

  • You’re interested in creating your own precompile or want a specific precompile to accelerate your program–we love collaborating with teams.
  • You’re interested in use cases that leverage these precompiles such as kzg verification, RSP, OP-Succinct, SP1-Helios, etc.

Code: Our code is fully open-source with an MIT license: check out SP1 here and feel free to contribute.

Appendix

Precompiles in Action

Below, we show examples of various elliptic curve operations in SP1 by comparing the raw Rust cycle count (number of RISC-V cycles) vs. the cycle count with precompiles.

Precompiles in Action

Below, we show examples of various elliptic curve operations in SP1 by comparing the raw Rust cycle count (number of RISC-V cycles) vs. the cycle count with precompiles.

Base Field (𝐹ₚ) Operations

Operation

Non-Precompile Cycles

Precompile Cycles

Addition

402

541

Multiplication

402

552

Subtraction

402

541

Square Root

1,829,272

1,647

Inversion

1,826,741

1,599

Quadratic Extension Field (𝐹ₚ2) Operations

Operation

Non-Precompile Cycles

Precompile Cycles

Addition

1,284

824

Multiplication

12,782

842

Subtraction

1,588

824

Square Root

11,481,439

6,329,089

Inversion

1,839,096

2,838

𝐹ₚ6 Operations

Operation

Non-Precompile Cycles

Precompile Cycles

Addition

3,299

1,908

Multiplication

85,356

3,639

Subtraction

4,214

1,910

Inversion

1,979,974

10,532

𝐹ₚ12 Operations

Operation

Non-Precompile Cycles

Precompile Cycles

Addition

7,211

4,440

Multiplication

272,757

12,515

Subtraction

8,062

3,459

Inversion

2,273,149

32,516

𝐺₁ Projective Curve Operations

Operation

Non-Precompile Cycles

Precompile Cycles

Projective Point Addition

5,883,275

325,849

Projective Scalar Multiplication

19,569,843

2,931,398

Projective Point Subtraction

7,740,573

343,158

𝐺₂ Projective Curve Operations

Operation

Non-Precompile Cycles

Precompile Cycles

Projective Point Addition

40,631,751

13,124,100

Projective Scalar Multiplication

77,193,504

1,719,549

Projective Point Subtraction

29,107,536

6,784,692

1 | Also known as bn128 or alt-bn128

2 | Behind the scenes this is a RISC-V ecall (i.e. system call instruction)