ProofCoop: Collaborative Automated Formal Verification"/> ProofCoop: Collaborative Automated Formal Verification"/>
Formal verification using proof assistants, such as Coq or Lean, is an effective way of ensuring software correctness. Recent research has shown that machine learning can automate proof synthesis, but such approaches are only successful a fraction of the time. Methods for improving such proof synthesis include training more precise models, improving the model-driven synthesis search mechanisms, and effectively combining multiple models into synthesis search. This paper focuses on the latter and develops ProofCoop, for the first time demonstrating collaborative cooperation between models. Diva, the state-of-the-art method for combining models, uses a disparate search for each model. On the CoqGym benchmark of 68.5K theorems from 124 open-source Coq projects a Diva combination of five models automatically proves 3,287 (27.5%) of CoqGym's 11.9K test set theorems, while the best individual model proves only 2,604 (21.8%). By contrast, our method, ProofCoop can use the same five models to fully automatically prove 3,932 (33.0%) theorems, meaning that ProofCoop is 19.6% more likely to prove a theorem, on average, than the prior state of the art combination method. ProofCoop allows its models to build on each other's contributions and demonstrates that such collaboration significantly increases synthesis success. ProofCoop enables six different types of collaboration: joint model next-step prediction at each search step; preferential next-step prediction via voting, bidding, and stacking; sharing lemmas proven across models mid-search, and models completing each other's partial proofs. Together with CoqHammer, ProofCoop synthesizes proofs for 36.0% of the theorems. Our research demonstrates that creative uses of learned models can lead to collaborative synthesis that is more effective than prior approaches, suggesting a powerful new research direction in automated formal verification.
@inproceedings{Kaufman26icse,
author = {Zhanna Kaufman and Emily First and Alex Sanchez-Stern and
Kyle Thompson and Sorin Lerner and Yuriy Brun},
title =
{ProofCoop: Collaborative Automated Formal Verification},
booktitle = {Proceedings of the 48th International Conference on Software Engineering (ICSE)},
venue = {ICSE},
address = {Rio de Janeiro, Brazil},
month = {April},
date = {15--17},
year = {2026},
pages = {1339--1352},
doi = {10.1145/3744916.3787783},
note = {ACM artifact badges granted:
Artifact Available,
Artifact Reusable.
DOI:
10.1145/3744916.3787783},
accept = {$\frac{160}{809} \approx 20\%$ (2nd cycle)},
abstract = {Formal verification using proof assistants, such as Coq or
Lean, is an effective way of ensuring software correctness. Recent research
has shown that machine learning can automate proof synthesis, but such
approaches are only successful a fraction of the time. Methods for
improving such proof synthesis include training more precise models,
improving the model-driven synthesis search mechanisms, and effectively
combining multiple models into synthesis search. This paper focuses on the
latter and develops ProofCoop, for the first time demonstrating
collaborative cooperation between models. Diva, the state-of-the-art method
for combining models, uses a disparate search for each model. On the CoqGym
benchmark of 68.5K theorems from 124 open-source Coq projects a Diva
combination of five models automatically proves 3,287 (27.5%) of CoqGym's
11.9K test set theorems, while the best individual model proves only 2,604
(21.8%). By contrast, our method, ProofCoop can use the same five models to
fully automatically prove 3,932 (33.0%) theorems, meaning that ProofCoop is
19.6% more likely to prove a theorem, on average, than the prior state of
the art combination method. ProofCoop allows its models to build on each
other's contributions and demonstrates that such collaboration
significantly increases synthesis success. ProofCoop enables six different
types of collaboration: joint model next-step prediction at each search
step; preferential next-step prediction via voting, bidding, and stacking;
sharing lemmas proven across models mid-search, and models completing each
other's partial proofs. Together with CoqHammer, ProofCoop synthesizes
proofs for 36.0% of the theorems. Our research demonstrates that creative
uses of learned models can lead to collaborative synthesis that is more
effective than prior approaches, suggesting a powerful new research
direction in automated formal verification.},
fundedBy = {NSF CCF-2210243,
Defense Advanced Research Projects Agencies (DARPA) under Contract No. HR0011-24-2-0307},
}