An Experiment in Formal Verification with Claude

by Succinct May 20, 2026 5 min read

A few weeks ago, we announced VEIL, a compiler that adds zero-knowledge to hash-based multilinear proof systems with only ~3% prover overhead.

The compiler relies on thirty pages of new cryptographic arguments that have not been peer-reviewed yet. As an additional check, we experimented with Anthropic’s Claude to formalize VEIL’s main theorems in the Lean 4 theorem prover.

While not a suitable substitute for a thorough peer review, we found that Claude produced strong evidence that the paper's theorems follow from their stated assumptions. You can see the audit document with the Lean formalization at veil.succinct.xyz.

What is VEIL

Modern proof systems like SP1 are designed for succinctness. Verifiers quickly check for correctness for large computations against small proofs. This is valuable in settings where verification speed is critical, like fast finality for blockchains. But these systems are not natively zero-knowledge (ZK) and offer no privacy guarantees, ruling out use cases dealing with sensitive data, like content authenticity, private access control, or responsible vulnerability disclosure.

VEIL (Verifiable Encapsulation of Interactive proofs with Low overhead) is a compiler that adds zero-knowledge to hash-based proof systems. The resulting protocol reveals nothing about the witness while preserving soundness and succinctness.

VEIL works by decoupling the algebraic interaction in a proof system from the cryptographic hashing, applying cheap targeted blinding to the hash-touching parts and wrapping only the small algebraic interaction with an inner ZK system. The resulting protocol reveals nothing about the witness while preserving soundness and succinctness.

Why Formal Verification?

VEIL's soundness and zero-knowledge guarantees depend on about 30 pages of delicate, non-peer-reviewed technical arguments. That's more unreviewed math than we're comfortable shipping, so we pursued a stronger correctness check.

One very powerful check is formal verification, where mathematical proofs are rewritten in a special programming language that compiles only if the proof is correct. Unfortunately, writing formal proofs in this way is extremely time-consuming. Even formal verification experts often take 10x times as long to formalize a proof as to write it out by hand, as we did in this paper.

We decided to use VEIL as a test case to see whether modern LLMs could make formal verification practical without sacrificing rigor. We found the process was quite efficient, with Claude doing most of the work autonomously under human supervision. As models improve, this kind of LLM-assisted formalization should become a standard tool for verifying new cryptographic protocols.

How we worked with Claude

We worked with Claude Opus 4.6 and 4.7 to generate the theorem statements and key definitions. The same models generated the proofs largely autonomously using the paper as a reference.

We did not use a sophisticated harness for this work. Instead, we instantiated a claude.md file with simple guidelines, such as never updating definitions or theorem statements without explicit instructions. Agents would periodically update this file with useful argument patterns they discovered or note recurring writing pitfalls to avoid. The result, written with the Lean 4 theorem prover, can be found at veil.succinct.xyz.

The total human time was roughly 1-2x spent writing the paper itself – a marked improvement over the 10x it commonly takes. Most of the human labor went into generating statements and definitions, which required careful human review over dozens of iterations. This is because formal verification has a fundamental limitation: it cannot guarantee that a formal statement matches its informal intent, so this correspondence has to be checked manually. For a math theorem in a few lines, this is not too onerous. But a full cryptographic protocol can require auditing a thousand lines of Lean. Nevertheless, manually checking Claude’s output was significantly easier than verifying by hand that each argument in the paper was mistake-free.

To further speed up the review process, we had Claude mock up a prototype of an interactive audit document with the Lean source presented side-by-side with English theorem statements. Visually lining up the code with the desired meaning made the review significantly easier.

Left: English theorems as presented in the paper. Right: Theorem written in the Lean programming language

Using Claude for proofs was much more convenient. Proofs could be generated autonomously without much human involvement since an incorrect proof simply fails to type-check.

Lessons Learned

Verification surfaced a real bug

Type-checking served its role perfectly when the verification surfaced a technical error buried within 30 pages of dense math. One of our protocols was statistically zero-knowledge against an honest verifier, not perfectly zero-knowledge as we had claimed^[1].

Note that this doesn’t matter for practical implementations since they are further compiled through the Fiat-Shamir transform. ↩︎

Writing the “trusted base” was harder than writing the proofs

The first step to formalization is to create a "trusted base" of definitions and theorem statements written in Lean. This is the part that needs to be manually audited against the informal meanings in the paper.

Generating the trusted base ended up being the main human bottleneck due to three compounding problems:

First. LLMs currently produce sloppy code that works but reads poorly. The trusted base is where legibility matters most, as human auditors must manually read each line of code. Most of our time went into revising Claude’s definitions until they were uncluttered and legible.
Second, Claude struggled to invent the right abstractions for complicated structural concepts. For example, it could not formalize the notion of an "interactive protocol." We ended up designing those abstractions by hand and letting the model fill them in, rather than asking it to choose its own framing.
Third, the trusted base contained errors that were easy to miss on inspection. Quantifier order, in particular, consistently tripped Claude up. In a zero-knowledge claim, for instance, the model would silently swap a ∀∃ for an ∃∀ — easy to gloss over without careful reading. Claude also defined malicious provers that were too weak, with state in round k that didn't include all the information a real adversary would have by then.

In retrospect, we would recommend writing the trusted base by hand and using the LLM only for proofs.

Claude did not cut corners

As far as we could tell, Claude Opus was well-behaved throughout. We never caught it taking shortcuts that would compromise the verification. It did not quietly introduce axioms or tactic-level cheats. Even when a mistake in the trusted base weakened a theorem, Claude would internally prove the stronger version that the paper actually claimed even though the weaker one would have been easier.

Build the custom audit tooling early

Presentation mattered significantly more than we expected. We realized that LLMs let us personalize the code review process at negligible cost, making the experience as painless as possible. We found most of the issues described above only after building our audit tool, and recommend building something like this as early as possible.

Conclusion

Formal verification of VEIL took only 1-2x as long as writing this paper, surfaced a real bug, and dramatically improved our trust in the math behind VEIL. While this does not replace peer review, LLM-assisted formalization is an efficient and effective way to secure novel cryptographic protocols. We expect this kind of check to become standard practice as the models continue to improve.

The Lean formalization and full audit document are at veil.succinct.xyz. To learn more about this experiment, you can reach out to cryptography@succinct.xyz.