# ADR-011: alternative SMT encoding using arrays

author | revision |
---|---|

Rodrigo Otoni | 1.8 |

This ADR describes an alternative encoding of the KerA+ fragment of TLA+ into SMT. Compound data structures, e.g. sets, are currently encoded using the core theory of SMT, with the goal being to encode them using arrays with extensionality instead. The hypothesis is that this will lead to increased solver performance and more compact SMT instances. We target the Z3 solver and will use the SMT-LIB Standard (Version 2.6) in conjunction with Z3-specific operators, e.g. constant arrays.

For an overview of the current encoding check the TLA+ Model Checking Made Symbolic paper, presented at OOPSLA'19. In the remainder of the document the use of the new encoding and the treatment of different TLA+ operators are described. For further details on the new encoding check the Symbolic Model Checking for TLA+ Made Faster paper, presented at TACAS'23.

## 1. CLI encoding option

The encoding using arrays is to be an alternative, not a replacement, to the already existing encoding.
Given this, a new option is to be added to the `check`

command of the CLI. The default encoding will be
the existing one. The option description is shown below. The envvar `SMT_ENCODING`

can also be used to
set the encoding, see the model checking parameters for details. In addition to the `arrays`

encoding,
which uses SMT arrays to encode TLA+ sets and functions, we also have the `funArrays`

encoding, which
restricts itself to encoding only TLA+ functions as SMT arrays.

```
--smt-encoding : the SMT encoding: oopsla19, arrays (experimental), funArrays (experimental), default: oopsla19 (overrides envvar SMT_ENCODING)
```

### Code changes

The following changes will be made to implement the new CLI option:

- Add new string variable to class
`CheckCmd`

to enable the new option. - Add new
`smtEncoding`

field to`SolverConfig`

. - Add new class
`SymbStateRewriterImplWithArrays`

, which extends class`SymbStateRewriterImpl`

. - Use the new option to set the
`SolverConfig`

encoding field and select between different`SymbStateRewriter`

implementations in classes`BoundedCheckerPassImpl`

and`SymbStateRewriterAuto`

. - The infrastructure changes made for the
`funArrays`

encoding mirror the ones made for the`arrays`

one. See PR 2027 for details.

## 2. Testing the new encoding

The new encoding should provide the same results as the existing one, the available test suite
will thus be used to test the new encoding. To achieve this, the unit tests needs to be made parametric
w.r.t. the `SolverConfig`

encoding field and the implementations of `SymbStateRewriter`

, and the
integration tests need to be tagged to run with the new encoding.

### Code changes

The following changes will be made to implement the tests for the new encoding:

- Refactor the classes in
`tla-bmcmt/src/test`

to enable unit testing with different configurations of`SolverConfig`

and implementations of`SymbStateRewriter`

. - Add unit tests for the new encoding, which should be similar to existing tests, but use a
different solver configuration and
`SymbStateRewriterImplWithArrays`

instead of`SymbStateRewriterImpl`

. - Add integration tests for the new encoding by tagging existing tests with
`array-encoding`

, which will be run by the CI with envvar`SMT_ENCODING`

set to`arrays`

.

## 3. Encoding sets

Sets are currently encoded in an indirect way. Consider a sort `some_sort`

and distinct elements `elem1`

,
`elem2`

, and `elem3`

of type `someSort`

, as defined below.

```
(declare-sort some_sort 0)
(declare-const elem1 some_sort)
(declare-const elem2 some_sort)
(declare-const elem3 some_sort)
(assert (distinct elem1 elem2 elem3))
```

A set `set1`

containing `elem1`

, `elem2`

, and `elem3`

is currently represented by a constant of type
`set_of_some_Sort`

and three membership predicates, as shown below.

```
(declare-sort set_of_some_Sort 0)
(declare-const set1 set_of_some_Sort)
(declare-const elem1_in_set1 Bool)
(declare-const elem2_in_set1 Bool)
(declare-const elem3_in_set1 Bool)
(assert elem1_in_set1)
(assert elem3_in_set1)
(assert elem2_in_set1)
```

The new encoding has each set encoded directly as an array whose domain and range equal the set's sort
and the Boolean sort, respectively. SMT arrays can be thought of as a functions, as this is exactly how
they are represented internally in Z3. Set membership of an element `elem`

is thus attained by simply
setting the array at index `elem`

to `true`

.

One important point in the new encoding is the handling of set declarations, since declaring an empty set requires the setting of all array indexes to false. This can be easily achieved for finite sets by explicitly setting each index, but falls outside the quantifier-free fragment of first-order logic in the case of infinite sets, e.g. the set of integers. To handle declarations of infinite sets we rely on Z3's constant arrays, which map all indexes to a fixed value. Below is an example using the new encoding.

```
(declare-const set2_0 (Array some_sort Bool))
(declare-const set2_1 (Array some_sort Bool))
(declare-const set2_2 (Array some_sort Bool))
(declare-const set2_3 (Array some_sort Bool))
(assert (= set2_0 ((as const (Array some_sort Bool)) false)))
(assert (= set2_1 (store set2_0 elem1 true)))
(assert (= set2_2 (store set2_1 elem2 true)))
(assert (= set2_3 (store set2_2 elem3 true)))
```

The `store`

operator handles array updates and receives the array to be updated, the index, and the new
value, returning the updated array. For array access, the `select`

operator can be used, which receives
an array and an index and returns the value at the given index, as shown below.

```
(assert (= (select set2_2 elem1) true)) ; SAT
(assert (= (select set2_2 elem2) true)) ; SAT
(assert (= (select set2_2 elem3) true)) ; UNSAT
(assert (= (select set2_3 elem1) true)) ; SAT
(assert (= (select set2_3 elem2) true)) ; SAT
(assert (= (select set2_3 elem3) true)) ; SAT
```

For consistency, the new encoding uses constant arrays to declare both finite and infinite arrays.

### Code changes

The following changes will be made to implement the new encoding of sets:

- Add alternative rewriting rules for sets when appropriate, by extending the existing rules.
- All alternative rules will be suffixed with
`WithArrays`

. - The new rules will not rely on
`LazyEquality`

and will aim to use SMT equality directly. - Only the generation of SMT constraints will be modified by the new rules, the other Arena elements will remain unchanged.

- All alternative rules will be suffixed with
- In class
`SymbStateRewriterImplWithArrays`

, add the new rules to`ruleLookupTable`

by overriding the entries to their older versions. - Add four new Apalache IR operators in
`ApalacheOper`

,`Builder`

,`ConstSimplifierForSmt`

, and`PreproSolverContext`

, to represent the array operations.- The
`selectInSet`

IR operator represents the SMT`select`

. - The
`storeInSet`

IR operator represents the SMT`store`

. - The
`unchangedSet`

IR operator represents an equality between the current and new SSA array representations. This is required to constraint the array representation as it evolves. It is important to note that this operator assumes that all arrays are initially empty, so an element not explicitly added is assumed to not be in the array. To check absence of an element,`selectInSet`

should be used with negation. - The
`smtMap`

IR operator represents the use of SMT map.

- The
- In class
`Z3SolverContext`

, add/change appropriate methods to handle SMT constraints over arrays.- The main changes will de done in
`declareCell`

and the new`mkSelect`

,`mkStore`

, and`mkUnchangedSet`

methods, as these methods are directly responsible for creating the SMT constraints representing sets and set membership. - With the new IR operators, the "in-relation" concept, which underpins
`declareInPredIfNeeded`

and`getInPred`

, will not be applied to the new encoding. Cases for the new IR operators will be added to`toExpr`

, which will default to`TlaSetOper.in`

and`TlaSetOper.notin`

for the existing encoding. - The
`smtMap`

IR operator will be used to encode the TLA+ set filter operation. It constructs a temporary array that contains the evaluation of the filter's predicate for each set element and uses SMT map to compute the intersection of the set being filtered and the set represented by the temporary array constructed. - Cases for
`FinSetT`

and`PowSetT`

will be added to`getOrMkCellSort`

, as these types are no longer represented by uninterpreted constants. `cellCache`

will be changed to contain a list of cells, in order to handle the effects of`push`

and`pop`

in the SSA assignment of sets. The following examples illustrates this need.`(assert (= set_0 ((as const (Array Int Bool)) false))) (assert (= set_1 (store set_0 5 true))) (push) (assert (= set_2 (store set_1 6 true))) (push) (assert (= set_3 (store set_2 7 true))) (assert (= (select set_3 7) true)) (pop 2) (assert (= (select set_1 7) false)) ; Without the list we would query set_3 here`

- The main changes will de done in

## 4. Encoding functions and sequences

Functions are currently encoded as sets of pairs, with each pair representing a mapping present in
the function. The first element of a pair is a tuple containing some function arguments and the second
element is the return value given by such arguments. The handling of functions is thus given by
operations over sets and tuples. Sequences of type `T`

are currently encoded as tuples of form
`⟨start,end,fun⟩`

, where `start`

and `end`

are integers and `fun`

is a function from integers to `T`

.
The new encoding of functions will thus encompass sequences, as their tuple representations is
intended to be kept.

The new encoding will, like the current one, also map tuples of arguments to return values, but
will do so natively instead of simply relying on sets. A function will be represented by two SMT
arrays. The first array will store the domain of the function and will be encoded as a standard
TLA+ set. The second array will store the mappings, having sort `<S1,...,Sn>`

as its domain, with
`Si`

being the sort of argument `i`

, and the sort of the function's codomain as its range.
The sorts of the array domain and range can be infinite, but the domain of the function itself,
and by implication the number of mappings tuples, will always be finite.

To encode the TLA+ function `finSucc = [x \in {1,2,3} |-> x + 1 ]`

, which computes the successors
of integers from `1`

to `3`

, we first have to declare its domain, as shown below; tuples are
represented here as per the OOPSLA'19 encoding.

```
(declare-sort Tuple_Int 0) ; Sort of <Int>
(declare-const tuple_with_1 Tuple_Int) ; <1>
(declare-const tuple_with_2 Tuple_Int) ; <2>
(declare-const tuple_with_3 Tuple_Int) ; <3>
(declare-const finSucc_domain_0 (Array Tuple_Int Bool))
(declare-const finSucc_domain_1 (Array Tuple_Int Bool))
(declare-const finSucc_domain_2 (Array Tuple_Int Bool))
(declare-const finSucc_domain_3 (Array Tuple_Int Bool))
(assert (= finSucc_domain_0 ((as const (Array Tuple_Int Bool)) false))) ; {}
(assert (= finSucc_domain_1 (store finSucc_domain_0 tuple_with_1 true))) ; {<1>}
(assert (= finSucc_domain_2 (store finSucc_domain_1 tuple_with_2 true))) ; {<1>,<2>}
(assert (= finSucc_domain_3 (store finSucc_domain_2 tuple_with_3 true))) ; {<1>,<2>,<3>}
```

The array storing the function's domain is used to guard the definition of the array storing the
function's mappings, since mappings should only be present for values in the domain. The array
storing the mappings of `finSucc`

is shown below.

```
(declare-const finSucc_0 (Array Tuple_Int Int))
(declare-const finSucc_1 (Array Tuple_Int Int))
(declare-const finSucc_2 (Array Tuple_Int Int))
(declare-const finSucc_3 (Array Tuple_Int Int))
(assert (ite (select finSucc_domain_3 tuple_with_1)
(= finSucc_1 (store finSucc_0 tuple_with_1 2))
(= finSucc_1 finSucc_0)))
(assert (ite (select finSucc_domain_3 tuple_with_2)
(= finSucc_2 (store finSucc_1 tuple_with_2 3))
(= finSucc_2 finSucc_1)))
(assert (ite (select finSucc_domain_3 tuple_with_3)
(= finSucc_3 (store finSucc_2 tuple_with_3 4))
(= finSucc_3 finSucc_2)))
```

Note that, unlike with the new encoding for sets, we do not use constant arrays. The reason is that the function's domain cannot be altered, so the array has to constrain only the values in said domain. Function application can be done by simply accessing the array at the index of the passed arguments. A function application with arguments outside the function's domain leads to an unspecified result in TLA+, which is perfectly captured by unconstrained entries in the SMT array. Below are some examples of function application.

```
(assert (= (select finSucc_3 tuple_with_1) 2)) ; SAT
(assert (= (select finSucc_3 tuple_with_2) 3)) ; SAT
(assert (= (select finSucc_3 tuple_with_3) 4)) ; SAT
(declare-const tuple_with_4 Tuple_Int) ; <4>
(assert (= (select finSucc_3 tuple_with_4) 16)) ; SAT
```

Although a function's domain cannot be altered, its image can be changed via the TLA+ function update operator. The update will be encoded as a guarded array update, as illustrated below; attempting to update an entry outside the function's domain will lead to no change happening.

```
(declare-const finSucc_4 (Array Tuple_Int Int))
(declare-const finSucc_5 (Array Tuple_Int Int))
(assert (ite (select finSucc_domain_3 tuple_with_1) ; [finSucc EXCEPT ![1] = 9]
(= finSucc_4 (store finSucc_3 tuple_with_1 9))
(= finSucc_4 finSucc_3)))
(assert (ite (select finSucc_domain_3 tuple_with_4) ; [finSucc EXCEPT ![4] = 25]
(= finSucc_5 (store finSucc_4 tuple_with_4 25))
(= finSucc_5 finSucc_4)))
(assert (= (select finSucc_5 tuple_with_1) 2)) ; UNSAT
(assert (= (select finSucc_5 tuple_with_1) 9)) ; SAT
(assert (= (select finSucc_5 tuple_with_4) 16)) ; SAT
```

In contrast to the current encoding, which produces a number of constraints that is linear in the size of the set approximating the function when encoding both function application and update, the new encoding will produce a single constraint for each operation. This will potentially lead to a significant increase in solving performance.

### Code changes

The following changes will be made to implement the new encoding of functions:

- Add alternative rewriting rules for functions when appropriate, by extending the existing rules. The
same caveats stated for the rewriting rules for sets will apply here.
- The sets of pairs used in the current encoding are the basis for the counter-example generation in
`SymbStateDecoder`

. In order to continue having counter-examples, these sets will keep being produced, but will not be present in the SMT constraints. They will be carried only as metadata in the`Arena`

.

- The sets of pairs used in the current encoding are the basis for the counter-example generation in
- Update class
`SymbStateRewriterImplWithArrays`

with the rules for functions. - Update the
`storeInSet`

IR operator to also store function updates. It will have the value resulting from the update as an optional argument.- Since functions will be encoded as SMT arrays, the
`selectInSet`

,`storeInSet`

, and`unchangedSet`

IR operators will be used when handling them. A future refactoring may rename these operators.

- Since functions will be encoded as SMT arrays, the
- Update class
`Z3SolverContext`

to handle the new SMT constraints over arrays.- A case for
`FunT`

will be added to`getOrMkCellSort`

. - In
`declareCell`

, functions will be declared as arrays, but will be left unconstrained. - The
`mkStore`

method will be updated to also handle functions. It will have an additional optional argument containing the value to be stored in the range of the array. The new argument's default value is`true`

, for the handling of sets. - The
`mkNestedSelect`

method is added to support set membership in function sets, i.e.,`f \in [S -> T]`

. The nesting has first`funAppRes = f[s \in S]`

, followed by`funAppRes \in T`

.

- A case for

## 5. Encoding the remaining TLA+ features

The use of SMT arrays will be restricted to TLA+ sets and functions for the moment. The encoding of additional features using SMT arrays, or potentially ADTs, will be left for the future.