Formal Verification of SP1 Hypercube

We are happy to announce that the Nethermind Security Formal Verification team and Succinct Labs have joined forces to formally verify the correctness of all of the core RV-64 chips from SP1 Hypercube (https://blog.succinct.xyz/sp1-hypercube/) RISC-V zkVM in the Lean proof assistant. We would also like to thank the Ethereum Foundation for their generous support in this endeavour.
In particular, the behaviour of the relevant RISC-V opcodes was verified against the official 64-bit RISC-V Sail specification (https://github.com/opencompl/sail-riscv-lean), extracted to Lean as part of another Ethereum-Foundation-funded grant. To our knowledge, this is the first time this specification was successfully used for a large-scale verification effort, demonstrating its viability for real-world use.
In this brief post, we summarise the outcomes, reflect on the technical challenges, and discuss the relevant caveats. An in-depth technical blog will be published in the near future.
Design principles
The first step in formally verifying any system is to give an exact specification of the intended behavior. SP1 Hypercube proves the validity of computations encoded as programs with the RISC-V 64bit instruction set. Therefore, it was important to us that the source of truth for the intended behavior must be the official specification of the RISC-V instruction set.
Verification Outcomes
We have used the Lean proof assistant to verify the correctness of the entire core of the 64-bit SP1 Hypercube RISC-V zkVM with respect to the official RISC-V Sail specification. This covers the following 62 opcodes:
- 41 ALU-related opcodes: ADD, ADDW, ADDI, ADDIW, SUB, SUBW, XOR, XORI, OR, ORI, AND, ANDI, SLL, SLLI, SLLW, SLLIW, SRL, SRLI, SRLW, SRLIW, SRA, SRAI, SRAW, SRAIW, SLT, SLTI, SLTU, SLTIU, MUL, MULH, MULHU, MULHSU, MULW, DIV, DIVU, DIVW, DIVUW, REM, REMU, REMW, REMUW
- 10 control-flow-related opcodes: JAL, JALR, BEQ, BNE, BLT, BGE, BLTU, BGEU, LUI, AUIPC
- 11 memory-related opcodes: SB, SH, SW, SD, LB, LBU, LH, LHU, LW, LWU, LD
Technical Challenges
This project presented several technical challenges, both on the level of SP1 tooling and changes needed to be made to Lean libraries used, including upstreaming some improvements to the RISC-V specification.
Challenge 1: Connection with the Lean RISC-V specification
The formal specification of the RISC-V instruction set written in the ISA specification language Sail. The Lean backend for Sail provides the SailM monad, a combined error and state monad that represents execution of an abstract RISC-V machine, where the state includes the current values in registers and memory. Running a computation in this monad gives a function from initial machine states to final machine states, representing how the memory and registers are changed.
To verify correctness of a given opcode, we model the SP1 implementations in this monad and compare them against the actual specifications provided by Sail, showing that if all the extracted constraints hold then the two have the same effect on any valid initial machine state. This approach allows us to reason about the monadic behavior and arithmetic constraints independently: we can first use monadic reasoning to reduce to a concrete statement about register and memory values, then use the constraints to establish a correspondence between the values in the spec and in SP1. See the proofs for add for a standard example.
For most cases, a valid initial state here just means that each register in the state is initialized to some (unspecified/garbage) value, which greatly simplifies the amount of case analysis needed to reason about computations. For memory based opcodes, however, we also need a few additional assumptions about specific register values (for example the plat_ram_base register, which sets the starting index for memory addressing, is assumed to be zero).
Because the Lean backend extraction for Sail is both automated and new, in performing the verification we discovered and helped solve a number of upstream issues that arose during the verification, such as ambiguous type annotations, opaque definitions of looping, and a number of operations that could not be safely unfolded by Lean’s kernel.
Challenge 2: Constraint Extraction
In order to get an accurate representation of SP1’s constraints to use in these proofs, the next challenge was to extract the constraints from source code into a format that Lean can understand. SP1 constraints are written in Rust, using the Air abstraction. Luckily, as this abstraction is agnostic to the backend, as encoded by the AirBuilder trait, we were able to extract constraints simply by creating a new AirBuilder implementation that keeps track of the constraints and interactions. Such a plain extraction yields a collection of equations and interactions per chip.
One requirement that presented a challenge in this regard was the need for modularity of the extraction. In principle, formal verification greatly benefits from modularity, especially in the case of proving complex statements, as this means that the same proofs do not have to be written in full over and over again. The original Rust code itself is modular, in that it identifies a number of important independent parts that it isolates into functions and then re-uses through function calls, but the initial extraction code would “forget” this structure. These functions capture very specific behaviors and would ideally be used as essential lemmas for proving the main theorems for various chip constraints. For these reasons,we modified the extraction code to reflect this extra structure, which required a number of changes to the Rust constraint API.
Challenge 3: Reasoning about opcode correctness
After overcoming the challenges related to the Lean RISC-V specification, what was left was to actually prove that the constraints associated with the targeted opcodes actually describe the desired behaviours.
This involved, first of all, developing an infrastructure for reasoning about, byte- and 16-bit level operations performed on 32-bit and 64-bit machine integers.
Secondly, we also needed to reason about finite-field arithmetic, first for the BabyBear and then for the KoalaBear fields. This, in particular, was complicated at times given that the size of these fields is slightly under 2^32, meaning that various field-specific techniques had to be used for expressing constraints and operations that would be expressed straightforwardly otherwise. Luckily, Lean comes with Mathlib, a comprehensive library of facts about finite fields/natural numbers/integers/bit-vectors, as well as with the omega, bv_decide, aesop, and grind automated reasoning procedures, all of which we have found to be very helpful.
Finally, our modular approach to defining chip operations proved invaluable when verifying more complex opcodes. A prime example of this is the DivRem chip, which uses, among others, the multiplication, addition, and comparison operations. As we had already proven the specifications for these fundamental operations individually, we could simply reuse those specifications within the larger DivRem proofs. Without this modularity, we would have been forced to repeat the entire logical reasoning for each sub-operation whenever it was used, creating bloated, unmanageable proofs that would have likely overwhelmed Lean in the process. By relying on the modularity, not only have we saved immense effort but we have also kept our proofs clean and maintainable.
Assumptions and caveats
It is important to note that the verification was subject to a number of caveats and assumptions.
Firstly we have assumptions about the external and internal tooling used:
- We assumed the correctness of Lean’s kernel
- We assumed the correctness of the implemented constraint extraction mechanism: proving its correctness would require formal reasoning about the Rust code itself
- We assumed the correctness of the 64-bit Lean Sail RISC-V specification with respect to the official RISC-V standard
We also have a number of assumptions about the representation of SP1’s constraints and interactions in Lean, and the semantics that they imply (see this file for the assumptions in Lean):
- We assumed the correctness of the constraints derived from the content of the SP1 lookup bus and associated air interactions
- We assumed the correctness of the theory and implementation of the SP1 lookup bus and the underlying proving system
Finally we have a number of limitations we hope to handle in future work:
- We proved the basic connection between the operation of the chips and the RISC-V specification: this connection should, in future, be strengthened to go higher up the RISC-V hierarchy of execution functions
- We have not yet proven any results related to memory consistency
- The semantics of certain constant table lookups are assumed correct without proof. Integration with a more general circuit representation in Lean could allow straightforward verification of their correctness