SP1 Security Update

John Guibas

Jan 27, 2025 — 5 min read

TL;DR There were two vulnerabilities found in SP1 V3, one found by Aligned, LambdaClass, and 3MI Labs and one found by Succinct. There was also one vulnerability found in Plonky3, a critical dependency of SP1, by Lev Soukhanov and Onur Kilic. We sincerely thank these researchers for reporting these issues. These three vulnerabilities are now patched in SP1 Turbo, the latest production version of SP1. We recommend all users of SP1 upgrade to Turbo ASAP, and we have frozen the routers to the SP1 verifier contracts deployed on mainnets. Succinct has communicated with all known customers about these security issues, and notified all users on the SP1 Telegram. Details on how to migrate to SP1 Turbo are explained below. Starting February 15th 2025, the Succinct Prover Network will stop the support of SP1 V2 and V3. The disclosure of these vulnerabilities is available on our Github security advisory here.

Upgrade Instructions

We strongly encourage users to migrate to SP1 Turbo as soon as possible. Succinct has prepared a guide for users using SP1 V2 or V3 to upgrade their SP1 versions here. Our team is available in the SP1 Telegram chat to support and assist users on this migration.

Fixed Issues

Missing Validation of Chip Ordering

Impact: Medium, soundness issue

In SP1’s STARK prover/verifier system, the prover provides a HashMap called chip_ordering to let the verifier know which polynomial evaluation claims correspond to which chip. The verifier uses this information to fetch the index of the chips that have preprocessed columns. However, prior to SP1 Turbo, the validation that this chip_ordering correctly provides these indices was missing. An attack based on this observation could allow an attacker to use incorrect openings of preprocessed commitments in SP1 proofs, leading to a soundness issue.

In the recursive verifier, all valid verifier programs are generated in advance and committed into a Merkle tree. Later, the verifier keys are checked for validity within the circuit by requiring a Merkle proof to the precomputed Merkle root. As all precomputed verifier programs were generated with honest values of chip_ordering, compressed proofs and the on-chain verifier are not affected by this vulnerability. This bug only affects direct verification of core shard proofs.

This code underwent two audits: one by KALOS and another by Cantina for SP1 V1. The Succinct team discovered this bug while preparing SP1 Turbo. In SP1 Turbo, this was fixed by making the verifier check that the indexed chip’s name is equal to the name of the chip in question.

Missing Check in Recursive Verifier

Impact: High, soundness issue

In the recursion circuits, the is_complete boolean flag is used to denote whether or not the current recursive proof is of a complete program execution. The correctness of this flag is constrained by the assert_complete function, but prior to SP1 Turbo the call to this function was missing in parts of our recursive verifier, including the first layer of recursion. An attack based on this observation could allow proofs of partial execution to be verified as proofs of complete execution. As the computed hash of committed values is constrained to be equal to the public value’s committed_value_digest at the end of the program just before the program halts, a partial execution wouldn’t perform this check. Using the attack idea to prove this partial execution as a complete execution would allow incorrect committed_value_digest to be proved, leading to soundness issues. This attack idea directly works on the Rust SDK for verifying compressed proofs, and a similar idea leads to an attack that affects the soundness of the on-chain verifier for deferred proofs. We note that the final proof at which is used for the on-chain verifier does include correct constraints for is_complete.

The issue was found by a collaborative effort by Aligned, LambdaClass, and 3MI Labs, with Succinct independently identifying it during their preparations for SP1 Turbo. In SP1 Turbo, this was fixed by adding appropriate calls to the assert_complete function in our recursion circuits.

Plonky3 Vulnerability

Impact: High, soundness issue

In SP1’s STARK prover/verifier system, the verifier checks that the polynomial evaluation claims by the prover are correct by using a FRI-based polynomial commitment scheme. This part of the code relied on Plonky3, one of SP1’s core dependencies. In the STARK verifier, the Plonky3 library was directly used, and in the recursive version of the STARK verifier, the logic inside Plonky3 was ported over. In Plonky3, these polynomial evaluation claims were batched using a random linear combination. Prior to SP1 Turbo, the individual evaluation claims were not observed into the Fiat-Shamir challenger before sampling the coefficient for the random linear combination. This allows incorrect polynomial evaluation claims to be verified, leading to incorrect chip proofs being verifiable, a soundness issue on SP1.

This issue was found by Lev Soukhanov and Onur Kilic, and we have worked closely with the Plonky3 team to mitigate this vulnerability. The Plonky3 team has also issued an advisory for this issue recently. In SP1 Turbo, this bug was fixed by observing all evaluation claims into the Fiat-Shamir challenger correctly before sampling the random coefficient.

Our response to other reported security issues

We believe these issues have little to no impact on developers using SP1.

SP1 is compliant with the RISC-V specification

There was a report from Aligned/Lambda Class/3MI Labs that claimed that SP1’s implementation of RISC-V deviates from standards, and that this deviation may lead to vulnerabilities.

After receiving the report, we have received an audit from Zellic on our RV32IM standards compliance. The audit report can be found here, and our documentation on it can be found in SP1 Docs here. In summary, RISC-V is a skeleton rather than an immutable standard, and SP1’s requirements on memory access alignment and reserved memory regions are within RISC-V compliance. The example proof of concept from the report also involves an unsafe program that has a write to an invalid memory address. As zkVMs simply provide a proof of execution of a program, it is critical that the guest program itself is secure. This is true for many VMs - the EVM allows writing and deploying vulnerable code, and RISC Zero’s security model clearly specifies that the zkVM technology cannot prevent security issues in guest programs.

We agreed that making details on our implementation of RISC-V more clear and well documented was an enhancement for SP1’s security. As mentioned, we have added more documentation and have received an audit to make sure our implementation is compliant. We have also added more checks at the executor level, and added constraints in our memory related AIR circuits so that register reserved memory cannot be accessed in memory load/store.

Users must use the SP1 entrypoint that ensures proper usage of the HALT opcode

There was a report from Aligned/Lambda Class/3MI Labs that has two claimed issues, one of which is the second vulnerability in the disclosure above. The other claimed issue from the report was that if the user program directly calls the HALT syscall, the user program would end before calling the COMMIT syscall which constrains the committed_value_digest, which would allow incorrect public values to be verifiable. This claim by itself is true, and it’s also true that this idea is used in the proof of concept of the is_complete vulnerability above. However, this idea by itself is not a vulnerability. As mentioned above, it’s the developer’s responsibility to make sure that the guest programs are secure. There’s also no reason for a developer to directly invoke the HALT syscall in their guest programs, especially considering that the SP1 toolchain automatically handles invocation of syscall_halt. Syscalls are delicate, and operating directly on them should be considered similar to using unsafe Rust. If someone doesn’t have a very clear understanding of it, then they shouldn’t use it, and it’s up to the users to write a safe guest program.

We do agree that making this rule on using syscalls directly clearly documented is an enhancement of SP1’s security. We have improved our documentation on this here.

Acknowledgements

We’d like to express our gratitude to Aligned, LambdaClass, 3MI Labs, Lev Soukhanov, and Onur Kilic for their responsible disclosure of vulnerabilities in SP1 and Plonky3 respectively. Their contributions were invaluable in strengthening the security of SP1 and the broader ZK space. All contributors have been compensated through bug bounty programs by either Succinct or Polygon, and we deeply appreciate their efforts.