Entrants' System Descriptions

Drodi 4.1.0
iProver 3.9
LastButNotLeast 0
Prover9 1109a
Vampire 4.4
Vampire 4.9
Zipperposition 2.1.9999

Drodi 4.1.0

Oscar Contreras
Amateur Programmer, Spain

Architecture

Drodi 4.1.0 is a very basic and lightweight automated theorem prover. It implements the following main features:

Ordered resolution and equality paramodulation inferences as well as demodulation and some other standard simplifications.

A basic implementation of clausal normal form conversion as in [NW01].
AVATAR architecture with a SAT solver [Vor14].
Limited Resource Strategy [RV03].
Discrimination trees.
KBO, non recursive and lexicographic reduction orderings. KBO has been rewriten using the polynomial-time algorithm in [BS23].
Literal selection including lookahead as in [HR+16].
SInE distance for clauses and symbols as in [Sud19].
Layered clause selection as in [GS20].
Stochastic strategy inspired in [Sud22].
Drodi produces a (hopefully) verifiable proof in TPTP format.

Strategies

Drodi has a fair number of selectable strategies including but not limited to the following:

Otter, Discount and Limited Resource Strategy [RV03] saturation algorithms.
A basic implementation of AVATAR architecture [Vor14].
Several literal and term reduction orderings.
Several literal selection options [HR+16].
Several layered clause selection heuristics with adjustable selection ratios [GS20].
Classical clause relevancy pruning.
Drodi V4 has new strategy portfolio inspired (but not exactly equal than) by [BS23]. The strategies have now unequal time slices and the set of strategies used depends on the problem features detected during the preprocessing.
Some strategies are run a second time with a previously applied randomization to the problem [Sud22].
Drodi can generate learning data from successful proofs and use the data to guide clause selection strategy. It is based in the enhanced ENIGMA method. However, unlike ENIGMA, the learning data is completely general and can be used with any kind of problems. This generality allows the use of the same learning data in both FOF and UEQ CASC competition divisions. The learning data is generated over a set of TPTP problems before the CASC competition using built-in Drodi functions that include a L2 Support Vector Machine. Drodi integrated learning functions are a generalization of ENIGMA [JU17, JU18]. Literals polarity, equality, skolem and variable occurrences are stored in clause feature vectors. Unlike ENIGMA, instead of storing the specific functions and predicates themselves only the SinE distance and arity of functions and non equality predicates are stored in clause feature vectors with different features assigned to predicates and functions.

Implementation

Drodi is implemented in C. It includes discrimination trees and hashing indexing. All the code is original, without special code libraries or code taken from other sources.

Expected Competition Performance

Drodi 4.1.0 solves around 5% more problemas than last year version. Anyway we expect that performance will be similar to last year's version.

iProver 3.9

Konstantin Korovin
University of Manchester, United Kingdom

Architecture

iProver [Kor08, DK20] is a theorem prover for quantified first-order logic with theories. iProver interleaves instantiation calculus Inst-Gen [Kor13, Kor08, GK03] with ordered resolution and superposition calculi [DK20]. iProver approximates first-order clauses using propositional abstractions that are solved using MiniSAT [ES04] or Z3 [dMB08] and refined using model-guided instantiations. iProver also implements a general abstraction-refinement framework for under-and over-approximations of first-order clauses [HK18, HK19]. First-order clauses are exchanged between calculi during the proof search.

Recent features in iProver include:

Ground joinability and connectedness in the superposition calculus [DK22].
Support for quantified reasoning with arithmetic and arrays.
AC joinability and AC normalisation [DK21].
Superposition calculus with simplifications including: demodulation, light normalisation, subsumption, subsumption resolution and global subsumption. iProver's simplification set up [DK20] is tunable via command line options and generalises common architectures such as Discount or Otter.
HOS-ML framework for learning heuristics using combination of hyper-parameter optimisation and dynamic clustering together with schedule optimisation using constraint solving [HK21, HK19]

Strategies

iProver has around 100 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT/SMT solvers, simplifications, and options for combination of instantiation with resolution and superposition. For the competition HOS-ML [HK21] was used to build a multi-core schedule from heuristics learnt over a sample of FOF problems. Some theories and fragments are recognised such as EPR, UEQ, Horn, groups, rings and lattices for which options are adapted accordingly.

Implementation

iProver is implemented in OCaml. For the ground reasoning uses MiniSat [ES04] and Z3 [dMB08]. iProver accepts FOF, TFF and CNF formats. Vampire [KV13, RSV15] and E prover [Sch13] are used for proof-producing clausification of FOF/TFF problems. Vampire is also used for SInE axiom selection [HV11] in the LTB division and for theory axioms in the TFA division. iProver is available at:

    https://gitlab.com/korovin/iprover

Expected Competition Performance

iProver 3.9 is the CASC-J12 TFN winner.

LastButNotLeast 0

Julie Cailler
University of Lorraine, CNRS, Inria, LORIA, Nancy, France

Architecture

LastButNotLeast: The fastest way to give up - now with 0 bugs! Probably the fastest prover in the competition:

Runs flawlessly.
Achieves absolutely nothing.
100% success rate in failure.
Zero proof, infinite potential.
Proudly earns you a cool CASC T-shirt.

Unfortunately, designing this prover may force you to share a meal with very weird people - but hey, that's the price of true innovation.

Strategies

The strategy is simple: always return GaveUp. It's fast, reliable, and guarantees we never solve anything - by design.

Implementation

#!/usr/bin/env python

print("% LastButNotLeast: The fastest way to give up!")
print("% For best results, do not expect results.")
print("% SZS status GaveUp")
print("% It's not a bug - it's a philosophical stance.")
print("% Thanks for trying LastButNotLeast :)")

Expected Competition Performance

The prover is expected to come last, and any other result would be genuinely surprising - possibly even concerning. In fact, LastButNotLeast proudly exists to ensure that no other prover - no matter how underwhelming - has to carry the burden of being last.

Prover9 1109a

Bob Veroff on behalf of William McCune
University of New Mexico, USA

Architecture

Prover9, Version 2009-11A, is a resolution/paramodulation prover for first-order logic with equality. Its overall architecture is very similar to that of Otter-3.3 [McC03]. It uses the "given clause algorithm", in which not-yet-given clauses are available for rewriting and for other inference operations (sometimes called the "Otter loop").

Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.

Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.

Completeness is not guaranteed, so termination does not indicate satisfiability.

Strategies

Prover9 has available many strategies; the following statements apply to CASC.

Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.

For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence.

For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.

Implementation

Prover9 is coded in C, and it uses the LADR libraries. Some of the code descended from EQP [McC97]. (LADR has some AC functions, but Prover9 does not use them). Term data structures are not shared (as they are in Otter). Term indexing is used extensively, with discrimination tree indexing for finding rewrite rules and subsuming units, FPA/Path indexing for finding subsumed units, rewritable terms, and resolvable literals. Feature vector indexing [Sch04] is used for forward and backward nonunit subsumption. Prover9 is available from

    http://www.cs.unm.edu/~mccune/prover9/

Expected Competition Performance

Prover9 is the CASC fixed point, against which progress can be judged. Each year it is expected do worse than the previous year, relative to the other systems.

Vampire 4.4

Giles Reger
University of Manchester, United Kingdom

There are no major changes to the main part of Vampire since 4.4, beyond some new proof search heuristics and new default values for some options. The biggest addition is support for higher-order reasoning via translation to applicative form and combinators, addition of axioms and extra inference rules, and a new form of combinatory unification.

Architecture

Vampire [KV13] 4.4 is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic. Vampire implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus and a MACE-style finite model builder [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14, RB+16]. Both resolution and instantiation based proof search make use of global subsumption.

A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing. Vampire implements many useful preprocessing transformations including the SinE axiom selection algorithm.

When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF. Vampire 4.4 provides a very large number of options for strategy selection. The most important ones are:

Choices of saturation algorithm:
- Limited Resource Strategy [RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
Splitting via AVATAR [Vor14]
A variety of optional simplifications.
Parameterized reduction orderings.
A number of built-in literal selection functions and different modes of comparing literals [HR+16].
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.
Set-of-support strategy.
For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions.
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent splitting branches [RB+16].
- Specialised theory instantiation and unification [RSV18].
- Extensionality resolution with detection of extensionality axioms
For higher-order problems:
- Translation to applicative and combinator form.
- Addition of combinator axioms.
- Addition of shortcut inference rules that encode axioms.
- Proof search heuristics targetting the growth of combinator axioms.
- Restricted combinatory unification [RSV18].

Implementation

Vampire 4.4 is implemented in C++. It makes use of Minisat and Z3.

Expected Competition Performance

Vampire 4.4 is the CASC-27 EPR winner.

Vampire 4.9

Michael Rawson
TU Wien, Austria

There have been a number of improvements since Vampire 4.8, although it is still the same beast. For the first time this year, Vampire's schedules were constructed mostly using the Snake strategy selection tool, although a return of the traditional Spider is still possible in future. Improvements from the past year include:

A runtime-specialised version of unidirectional term ordering checks
Improvements to unification with abstraction
Surprising improvements to Vampire's basic routines such as renaming and unification
A simple interactive mode
Revitalisation of code trees
Experimental features not yet fully understood, mostly aimed at unit-equational reasoning.
Portability: Vampire is much more standards-compliant and portable than previously, with much-reduced dependence on platform-specific APIs and hardware architectures, aided by C++17

Vampire's higher-order support remains very similar to last year, although a re-implementation intended for mainline Vampire is already underway.

Architecture

Vampire [KV13] is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic. Vampire implements the calculi of ordered binary resolution, and superposition for handling equality. It also implements a MACE-style finite model builder for finding finite counter-examples [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14, RB+16]. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing [RSV16]. Vampire implements many useful preprocessing transformations including the SInE axiom selection algorithm. When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

Vampire 4.9 provides a very large number of options for strategy selection. The most important ones are:

Choices of saturation algorithm:
- Limited Resource Strategy [RV03]
- DISCOUNT loop
- Otter loop
- MACE-style finite model building with sort inference
Splitting via AVATAR [Vor14]
A variety of optional simplifications.
Parameterized reduction orderings.
A number of built-in literal selection functions and different modes of comparing literals [HR+16].
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection. This has been extended with a layered clause selection approach [GS20].
Set-of-support strategy with extensions for theory reasoning.
For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions [RSV21].
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent splitting branches [RB+16].
- Specialised theory instantiation and unification [RSV18].
- Extensionality resolution with detection of extensionality axioms

Implementation

Vampire 4.9 is implemented in C++. It makes use of fixed versions of Minisat and Z3. See the GitHub repository and associated wiki for more information.

Expected Competition Performance

Vampire 4.9 is the CASC-J12 THF, TFA, FOF, UEQ, SLH, and ICU winner.

Zipperposition 2.1.9999

Jasmin Blanchette
Ludwig-Maximilians-Universität München, Germany

Architecture

Zipperposition is a superposition-based theorem prover for typed first-order logic with equality and for higher-order logic. It is a pragmatic implementation of a complete calculus for full higher-order logic [BB+21]. It features a number of extensions that include polymorphic types, user-defined rewriting on terms and formulas ("deduction modulo theories"), a lightweight variant of AVATAR for case splitting [EBT21], and Boolean reasoning [VN20]. The core architecture of the prover is based on saturation with an extensible set of rules for inferences and simplifications. Zipperposition uses a full higher-order unification algorithm that enables efficient integration of procedures for decidable fragments of higher-order unification [VBN20]. The initial calculus and main loop were imitations of an earlier version of E [Sch02]. With the implementation of higher-order superposition, the main loop had to be adapted to deal with possibly infinite sets of unifiers [VB+21].

Strategies

The system uses various strategies in a portfolio. The strategies are run in parallel, making use of all CPU cores available. We designed the portfolio of strategies by manual inspection of TPTP problems. Zipperposition's heuristics are inspired by efficient heuristics used in E. Various calculus extensions are used by the strategies [VB+21]. The portfolio mode distinguishes between first-order and higher-order problems. If the problem is first-order, all higher-order prover features are turned off. In particular, the prover uses standard first-order superposition calculus and disables collaboration with the backend prover (described below). Other than that, the portfolio is static and does not depend on the syntactic properties of the problem.

Implementation

The prover is implemented in OCaml. Term indexing is done using fingerprints for unification, perfect discrimination trees for rewriting, and feature vectors for subsumption. Some inference rules such as contextual literal cutting make heavy use of subsumption. For higher-order problems, some strategies use the E prover as an end-game backend prover.

Zipperposition's code can be found at

    https://github.com/sneeuwballen/zipperposition

and is entirely free software (BSD-licensed).

Zipperposition can also output graphic proofs using graphviz. Some tools to perform type inference and clausification for typed formulas are also provided, as well as a separate library for dealing with terms and formulas [Cru15].

Expected Competition Performance

The prover is expected to perform well on THF, about as well as last year's version. We expect to beat E.