Entrants' System Descriptions

CASC-J13

LEO-II 1.7.0
Prover9 1109a
Vampire 4.8
Vampire 5.0
Zipperposition 2.1.9999

CASC-J13

LEO-II 1.7.0

Alexander Steen
University of Greifswald, Germany

Architecture

LEO-II [BP+08], the successor of LEO [BK98], is a higher-order ATP system based on extensional higher-order resolution. More precisely, LEO-II employs a refinement of extensional higher-order RUE resolution [Ben99]. LEO-II is designed to cooperate with specialist systems for fragments of higher-order logic. By default, LEO-II cooperates with the first-order ATP system E [Sch02]. LEO-II is often too weak to find a refutation amongst the steadily growing set of clauses on its own. However, some of the clauses in LEO-II's search space attain a special status: they are first-order clauses modulo the application of an appropriate transformation function. Therefore, LEO-II launches a cooperating first-order ATP system every n iterations of its (standard) resolution proof search loop (e.g., 10). If the first-order ATP system finds a refutation, it communicates its success to LEO-II in the standard SZS format. Communication between LEO-II and the cooperating first-order ATP system uses the TPTP language and standards.

Strategies

LEO-II employs an adapted "Otter loop". Moreover, LEO-II uses some basic strategy scheduling to try different search strategies or flag settings. These search strategies also include some different relevance filters.

Implementation

LEO-II is implemented in OCaml 4, and its problem representation language is the TPTP THF language [BRS08]. In fact, the development of LEO-II has largely paralleled the development of the TPTP THF language and related infrastructure [SB10]. LEO-II's parser supports the TPTP THF0 language and also the TPTP languages FOF and CNF.

Unfortunately the LEO-II system still uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.

The LEO-II system is distributed under a BSD style license, and it is available from

    http://www.leoprover.org

Expected Competition Performance

LEO-II is not actively being developed anymore, hence there are no expected improvements to last year's CASC results.

Prover9 1109a

Bob Veroff on behalf of William McCune
University of New Mexico, USA

Architecture

Prover9, Version 2009-11A, is a resolution/paramodulation prover for first-order logic with equality. Its overall architecture is very similar to that of Otter-3.3 [McC03]. It uses the "given clause algorithm", in which not-yet-given clauses are available for rewriting and for other inference operations (sometimes called the "Otter loop").

Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.

Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.

Completeness is not guaranteed, so termination does not indicate satisfiability.

Strategies

Prover9 has available many strategies; the following statements apply to CASC.

Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.

For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence.

For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.

Implementation

Prover9 is coded in C, and it uses the LADR libraries. Some of the code descended from EQP [McC97]. (LADR has some AC functions, but Prover9 does not use them). Term data structures are not shared (as they are in Otter). Term indexing is used extensively, with discrimination tree indexing for finding rewrite rules and subsuming units, FPA/Path indexing for finding subsumed units, rewritable terms, and resolvable literals. Feature vector indexing [Sch04] is used for forward and backward nonunit subsumption. Prover9 is available from

    http://www.cs.unm.edu/~mccune/prover9/

Expected Competition Performance

Prover9 is the CASC fixed point, against which progress can be judged. Each year it is expected do worse than the previous year, relative to the other systems.

Vampire 4.8

Michael Rawson
TU Wien, Austria

There have been a number of changes and improvements since Vampire 4.7, although it is still the same beast. Most significant from a competition point of view are long-awaited refreshed strategy schedules. As a result, several features present in previous competitions will now come into full force, including new rules for the evaluation and simplification of theory literals. A large number of completely new features and improvements also landed this year: highlights include a significant refactoring of the substitution tree implementation, the arrival of encompassment demodulation to Vampire, and support for parametric datatypes.

Vampire's higher-order support has also been re-implemented from the ground up. The new implementation is still at an early stage and its theoretical underpinnings are being developed. There is currently no documentation of either.

Architecture

Vampire [KV13] is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic. Vampire implements the calculi of ordered binary resolution, and superposition for handling equality. It also implements the Inst-gen calculus and a MACE-style finite model builder [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14, RB+16]. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing [RSV16]. Vampire implements many useful preprocessing transformations including the SinE axiom selection algorithm. When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

Vampire 4.8 provides a very large number of options for strategy selection. The most important ones are:

Choices of saturation algorithm:
- Limited Resource Strategy [RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
Splitting via AVATAR [Vor14]
A variety of optional simplifications.
Parameterized reduction orderings.
A number of built-in literal selection functions and different modes of comparing literals [HR+16].
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection. This has been extended with a layered clause selection approach [GS20].
Set-of-support strategy with extensions for theory reasoning.
For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions [RSV21].
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent splitting branches [RB+16].
- Specialised theory instantiation and unification [RSV18].
- Extensionality resolution with detection of extensionality axioms

The schedule for the new HOL implementation was developed using Snake, a strategy schedule construction tool described in more detail last year. The Snake schedule will this year fully embrace Vampire randomisation support [Sud22] and, in particular, every strategy will independently shuffle the input problem, to nullify (in expectation) the effect of problem scrambling done by the organisers.

Implementation

Vampire 4.8 is implemented in C++. It makes use of fixed versions of Minisat and Z3. See the website for more information and access to the GitHub repository.

Expected Competition Performance

Vampire 4.8 is the CASC-29 FNT winner.

Vampire 5.0

Michael Rawson
University of Southampton, United Kongdom

Vampire 5.0 remains similar in spirit to all previous versions, but a bumper crop of changes have been merged this competition cycle. Various non-competition improvements to Vampire including a program synthesis mode [HA+24] and partial support for the polymorphic SMT-LIB 2.7 standard landed, but for the competition we mention:

ALASCA [KK+23] for reasoning with linear arithmetic, with further VIRAS extensions [SKK24] for quantifier elimination.
Partial redundancy calculi [HKV25]
Stabilised and greatly enhanced runtime-specialised unidirectional term ordering checks [HC+25]
A variant of the ground joinability redundancy elimination rule, used in forward simplification.
Subsumption (resolution) via code trees.
Integration of the CaDiCaL SAT solver [BF+24] alongside Minisat.
More detailed output, including proofs that are (more) TSTP-compliant, reporting non-trivial preprocessing in saturations, and producing completely faithful finite models of the input.
Portability: Vampire is much more standards-compliant and portable than previously, with much-reduced dependence on platform-specific APIs and hardware architectures, aided by C++17.

Vampire's higher-order support remains very similar to last year, although a re-implementation intended for mainline Vampire is being merged in stages.

Architecture

Vampire [BB+25] is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic. Vampire implements several extensions of a core superposition calculus. It also implements a MACE-style finite model builder for finding finite counter-examples [RSV16]. Splitting in saturation-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14, RB+16]. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. Substitution tree and code tree indices are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form: problems not already in CNF are clausified during preprocessing [RSV16]. Vampire implements many preprocessing transformations, including the SInE axiom selection algorithm for large theories and blocked clause elimination.

Strategies

Vampire 5.0 provides a very large number of options for strategy selection. The most important ones are:

Choices of saturation algorithm:
- Limited Resource Strategy [RV03]
- DISCOUNT loop
- Otter loop
- MACE-style finite model building with sort inference
Splitting via AVATAR [Vor14]
A variety of optional simplifications.
Parameterized reduction orderings KBO and LPO.
A number of built-in literal selection functions and different modes of comparing literals [HR+16].
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection. This has been extended with a layered clause selection approach [GS20].
The set-of-support strategy with extensions for theory reasoning.
For theory reasoning:
- Specialised calculi such as ALASCA.
- Addition of theory axioms and evaluation of interpreted functions [RSV21].
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent splitting branches [RB+16].
- Specialised theory instantiation and unification [RSV18].
- Extensionality resolution with detection of extensionality axioms

Implementation

Vampire 5.0 is implemented in C++. It makes use of fixed versions of Minisat, CaDiCaL, GMP, VIRAS, and Z3. See the GitHub repository and associated wiki for more information.

Expected Competition Performance

Vampire 5.0 is the CASC-30 THF, FOF, and UEQ winner.

Zipperposition 2.1.9999

Jasmin Blanchette
Ludwig-Maximilians-Universität München, Germany

Architecture

Zipperposition is a superposition-based theorem prover for typed first-order logic with equality and for higher-order logic. It is a pragmatic implementation of a complete calculus for full higher-order logic [BB+21]. It features a number of extensions that include polymorphic types, user-defined rewriting on terms and formulas ("deduction modulo theories"), a lightweight variant of AVATAR for case splitting [EBT21], and Boolean reasoning [VN20]. The core architecture of the prover is based on saturation with an extensible set of rules for inferences and simplifications. Zipperposition uses a full higher-order unification algorithm that enables efficient integration of procedures for decidable fragments of higher-order unification [VBN20]. The initial calculus and main loop were imitations of an earlier version of E [Sch02]. With the implementation of higher-order superposition, the main loop had to be adapted to deal with possibly infinite sets of unifiers [VB+21].

Strategies

The system uses various strategies in a portfolio. The strategies are run in parallel, making use of all CPU cores available. We designed the portfolio of strategies by manual inspection of TPTP problems. Zipperposition's heuristics are inspired by efficient heuristics used in E. Various calculus extensions are used by the strategies [VB+21]. The portfolio mode distinguishes between first-order and higher-order problems. If the problem is first-order, all higher-order prover features are turned off. In particular, the prover uses standard first-order superposition calculus and disables collaboration with the backend prover (described below). Other than that, the portfolio is static and does not depend on the syntactic properties of the problem.

Implementation

The prover is implemented in OCaml. Term indexing is done using fingerprints for unification, perfect discrimination trees for rewriting, and feature vectors for subsumption. Some inference rules such as contextual literal cutting make heavy use of subsumption. For higher-order problems, some strategies use the E prover as an end-game backend prover.

Zipperposition's code can be found at

    https://github.com/sneeuwballen/zipperposition

and is entirely free software (BSD-licensed).

Zipperposition can also output graphic proofs using graphviz. Some tools to perform type inference and clausification for typed formulas are also provided, as well as a separate library for dealing with terms and formulas [Cru15].

Expected Competition Performance

The prover is expected to perform well on THF, about as well as last year's version. We expect to beat E.

ProoVer 2026

GDV 2.0

Geoff Sutcliffe
University of Miami, USA

Overview

GDV 2.0 [Sut06] is a verifier for derivations in classical first-order and typed first-order logic, written in the TPTP format. GDV checks a derivation in four verification phases: structural verification, leaf verification, rule-specific verification, and inference verification.

Structural verification deals with non-logical aspects of a proof, including checking the syntax, that formulae are uniquely named, the derivation is acyclic, refutations have false roots, etc.
Leaf verification ensures that the leaves of the derivation match formulae in the original input problem, and that introduced formulae such as definitions meet requirements.
Rule-specific verification deals with special cases, e.g., splitting as implemented in SPASS [Wei01]. Special techniques are available for verifying correctly documented Skolemization steps; see Section 2.3 of [SBB25].
Inference verification uses external trusted ATP systems to verify each inference step, based on the SZS status [Sut08] of the inference record. For example, inference steps often have the SZS status thm indicating that the inferred formula is (supposed to be) a theorem of the parent formulae. In this case a proof obligation with the parent formulae as the axioms and in the inferred formula as the conjecture is created, and discharged (or not) using a trusted theorem proving ATP system. Other SZS status values are treated with variants of that process.

Implementation

GDV is implemented in C. It is available from:

    https://github.com/TPTPWorld/GDV

GDV relies heavily on the JJParser library, which has to be downloaded separately into the same directory as GDV:

    https://github.com/TPTPWorld/JJParser

The external ATP systems run remotely, through the SystemOnTPTP service [Sut00].

Expected Competition Performance

The short time limit of 30s per proof, and the relatively slow process of running the remote ATP systems, means that GDV is likely to timeout often. GDV is sound, slow, but very sure of itself.

GDV-LP 2.0

Frédéric Blanqui
ENS Paris-Saclay, INRIA, France

Overview

GDV-LP 2.0 [SBB25] is a verifier for derivations in classical first-order and typed first-order logic, written in the TPTP format. GDV-LP checks a derivation in two steps:

Standard GDV (as described above) is run, using ZenonModulo [DD+13] as the trusted ATP system for discharging proof obligations. ZenonModulo is configured to output a LambdaPi term [HB20] for each discharge proof.
GDV-LP produces the necessary files that declare the formulae, the signatures of the symbols in the terms, a LambdaPi term for the root of the proof, and a lambdapi.pkg package file. ZenonModulo's LambdaPi terms are chained together from the root term, and passed to the lambdapi checker to be checked. The key strength of this added layer is that it is not necessary to trust the "trusted ATP system", here ZenonModulo. Additionally, tools other than lambdapi can be used to check the LambdaPi terms, e.g., dkcheck [Sai15] and kontroli [Far22].

Implementation

GDV-LP is implemented in C. It is available from:

    https://github.com/TPTPWorld/GDV

GDV relies heavily on the JJParser library, which has to be downloaded separately into the same directory as GDV:

    https://github.com/TPTPWorld/JJParser

ZenonModulo is implemented in OCaml. It is available from:

    https://github.com/Deducteam/zenon_modulo

lambdapi is written in OCaml. It is available from:

    https://github.com/Deducteam/lambdapi.git

ZenonModulo, other external ATP systems, and lambdapi run remotely, through the SystemOnTPTP service [Sut00].

Expected Competition Performance

The short time limit of 30s per proof, and the relatively slow process of running the remote ATP systems, means that GDV-LP is likely to timeout often, probably in the first non-LambdaPi step. GDV-LP is sound, slow, but very very sure of itself.