## Folding Sets and Sequences

**Folds are an efficient replacement for recursive operators and functions.**

Apalache natively implements two operators users might be familiar with from the community modules
or functional programming.
Those operators are `ApaFoldSet`

and `ApaFoldSeqLeft`

.
This brief introduction to fold operators highlights the following:

- What are the semantics of fold operators?
- How do I use these operators in Apalache?
- Should I use folding or recursion?
- Examples of common operators defined with folds

### Syntax

The syntax of the fold operators is as follows:

```
\* @type: ( (a, b) => a, a, Set(b) ) => a;
ApaFoldSet( operator, base, set )
\* @type: ( (a, b) => a, a, Seq(b) ) => a;
ApaFoldSeqLeft( operator, base, seq )
```

### Semantics of fold operators

Folding refers to iterative application of a binary operator over a collection.
Given an operator `Op`

, a base value `b`

and a collection of values `C`

,
the definition of folding `Op`

over `C`

starting with `b`

depends on the type of the collection `C`

.

#### Semantics of `ApaFoldSeqLeft`

In the case of folding over sequences, `C`

is a sequence `<<a_1, ..., a_n>>`

. Then, `ApaFoldSeqLeft( Op, b, C )`

is defined as follows:

- If
`C`

is empty, then`ApaFoldSeqLeft( Op, b, <<>> ) = b`

, regardless of`Op`

- If
`C`

is nonempty, we establish a recursive relation between folding over`C`

and folding over`Tail(C)`

in the following way:`ApaFoldSeqLeft( Op, b, C ) = ApaFoldSeqLeft( Op, Op(b, Head(C)), Tail(C) )`

.

#### Semantics of `ApaFoldSet`

In the case of folding over sets, `C`

is a set `{a_1, ..., a_n}`

. Then, `ApaFoldSet( Op, b, C )`

is defined as follows:

- If
`C`

is empty, then`ApaFoldSet( Op, b, {} ) = b`

, regardless of`Op`

- If
`C`

is nonempty, we establish a recursive relation between folding over`C`

and folding over some subset of`C`

in the following way:`ApaFoldSet( Op, b, C ) = ApaFoldSet( Op, Op(b, x), C \ {x} )`

, where`x`

is some arbitrary member of`C`

(e.g.`x = CHOOSE y \in C: TRUE`

). Note that Apalache does not guarantee a deterministic choice of`x`

, unlike what using`CHOOSE`

would imply.

Note that the above are definitions of a *left fold* in the literature. Apalache does not implement a right fold.

For example, if `C`

is the sequence `<<x,y,z>>`

, the result is equal to `Op( Op( Op(b, x), y), z)`

.
If `C = {x,y}`

, the result is either `Op( Op(b, x), y)`

or `Op( Op(b, y), x)`

.
Because the order of elements selected from a set is not predefined, users should be careful,
as the result is only uniquely defined in the case that the operator is both associative
(`Op(Op(a,b),c) = Op(a,Op(b,c))`

) and commutative
(`Op(a,b) = Op(b,a)`

).

For example, consider the operator `Op(p,q) == 2 * p + q`

, which is non-commutative, and the set `S = {1,2,3}`

.
The value of `ApaFoldSet(Op, 0, S)`

depends on the order in which Apalache selects elements from S:

Order | ApaFoldSet value |
---|---|

1 -> 2 -> 3 | 11 |

1 -> 3 -> 2 | 12 |

2 -> 1 -> 3 | 13 |

2 -> 3 -> 1 | 15 |

3 -> 1 -> 2 | 16 |

3 -> 2 -> 1 | 17 |

Because Apalache does not guarantee deterministic choice in the order of iteration, users should treat all the above results as possible outcomes.

### Using fold operators in Apalache

As shown by the type signature, Apalache permits a very general form of folding,
where the types of the collection elements and the type of the base element/return-type of the operator do not have to match.
Again, we urge users to exercise caution when using `ApaFoldSet`

with an operator,
for which the types `a`

and `b`

are different,
as such operators cannot be commutative or associative,
and therefore the result is not guaranteed to be unique and predictable.

The other component of note is `operator`

, the *name* (not definition) of some binary operator, which is available in this context.
The following are examples of valid uses of folds:

```
PlusOne(p,q) == p + q + 1
X == ApaFoldSet( PlusOne, 0, {1,2,3} ) \* X = 9
X == LET Count(p,q) == p + 1 IN ApaFoldSeqLeft( Count, 0, <<1,2,3>> ) \* X = 3
```

while these next examples are considered invalid:

```
\* LAMBDAS in folds are not supported by Apalache
\* Define a LET-IN operator Plus(p,q) == p + q instead
X == ApaFoldSet( LAMBDA p,q: p + q, 0, {1,2,3} )
\* Built-in operators cannot be called by name in Apalache
\* Define a LET-IN operator Plus(p,q) == p + q instead
X == ApaFoldSet( + , 0, {1,2,3} )
```

Local LET definitions can also be used as closures:

```
A(x) == LET PlusX(p,q) == p + q + x IN ApaFoldSeqLeft( PlusX, 0, <<1,2,3>> )
X == A(1) \* X = 9
```

### Folding VS recursion

While TLA+ allows users to write arbitrary recursive operators, they are, in our experience, mostly used to implement collection traversals.
Consider the following implementations of a `Max`

operator, which returns the largest element of a sequence:

```
\* Max(<<>>) = -inf, but integers are unbounded in TLA+,
\* so there is no natural minimum like MIN_INT in programming languages
CONSTANT negInf
RECURSIVE MaxRec(_)
MaxRec(seq) == IF seq = <<>>
THEN negInf
ELSE LET tailMax == MaxRec(Tail(seq))
IN IF tailMax > Head(seq)
THEN tailMax
ELSE Head(seq)
MaxFold(seq) == LET Max(p,q) == IF p > q THEN p ELSE q
IN ApaFoldSeqLeft( Max, negInf, seq )
```

The first advantage of the fold implementation, we feel, is that it is much more clear and concise.
It also does not require a termination condition, unlike the recursive case.
One inherent problem of using recursive operators with a symbolic encoding, is the inability to estimate termination.
While it may be immediately obvious to a human, that `MaxRec`

terminates after no more than `Len(seq)`

steps,
automatic termination analysis is, in general, a rather complex and incomplete form of static analysis.
Apalache addresses this by finitely unrolling recursive operators and requires users to provide unroll limits (`UNROLL_LIMIT_MaxRec == ...`

),
which serve as a static upper bound to the number of recursive re-entries, because in general,
recursive operators may take an unpredictable number of steps
(e.g. computing the Collatz sequence) or never terminate at all.
Consider a minor adaptation of the above example, where the author made a mistake in implementing the operator:

```
RECURSIVE MaxRec(_)
MaxRec(seq) == IF seq = <<>>
THEN negInf
ELSE LET tailMax == MaxRec( seq ) \* forgot Tail!
IN IF tailMax > Head(seq)
THEN tailMax
ELSE Head(seq)
```

Now, `MaxRec`

never terminates, but spotting this error might not be trivial at a glance.
This is where we believe folds hold the second advantage:
`ApaFoldSet`

and `ApaFoldSeqLeft`

*always terminate* in `Cardinality(set)`

or `Len(seq)`

steps,
and each step is simple to describe, as it consists of a single operator application.

In fact, the vast majority of the traditionally recursive operators can be equivalently rewritten as folds, for example:

```
RECURSIVE Cardinality(_)
Cardinality(set) == IF set = {}
THEN 0
ELSE LET x == CHOOSE y \in set: TRUE
IN 1 + Cardinality( set \ {x} )
CardinalityFold(set) == LET Count(p,q) == p + 1 \* the value of q, the set element, is irrelevant
IN ApaFoldSet( Count, 0, set )
```

Notice that, in the case of sets, picking an arbitrary element `x`

,
to remove from the set at each step, utilizes the `CHOOSE`

operator.
This is a common trait shared by many operators that implement recursion over sets.
Since the introduction of folds, the use of `CHOOSE`

in Apalache is heavily discouraged as it is both inefficient,
and nondeterministic (unlike how `CHOOSE`

is defined in TLA+ literature).
For details, see the discussion in issue 841.

So the third advantage of using folds is the ability to, almost always, avoid using the `CHOOSE`

operator.

The downside of folding, compared to general recursion, is the inability to express non-primitively recursive functions. For instance, one cannot define the Ackermann function, as a fold. We find that in most specifications, this is not something the users would want to implement anyway, so in practice, we believe it is almost always better to use fold over recursive functions.

### Folding VS quantification and `CHOOSE`

Often, folding can be used to select a value from a collection,
which could alternatively be described by a predicate and selected with `CHOOSE`

.
Let us revisit the `MaxFold`

example:

```
MaxFold(seq) == LET Max(p,q) == IF p > q THEN p ELSE q
IN ApaFoldSeqLeft( Max, negInf, seq )
```

The fold-less case could, instead of using recursion, compute the maximum as follows:

```
MaxChoose(seq) ==
LET Range == {seq[i] : i \in DOMAIN seq}
IN CHOOSE m \in Range : \A n \in Range : m >= n
```

The predicate-based approach might result in a more compact specification,
but that is because specifications have no notion of execution or complexity.
Automatic verification tools, such as Apalache,
the job of which includes finding witnesses to the predicates, can work much faster with the fold approach.
The reason is that *evaluating* `CHOOSE x \in S : \A y \in S: P(x,y)`

is quadratic in the size of `S`

(in a symbolic approach this is w.r.t. the number of constraints).
For each candidate `x`

, the entire set `S`

must be tested for `P(x,_)`

.
On the other hand, the fold approach is linear in the size of `S`

, since each element is visited exactly once.

In addition, the fold approach admits no undefined behavior.
If, in the above example, `seq`

was an empty sequence,
the value of the computed maximum depends on the value of `CHOOSE x \in {}: TRUE`

, which is undefined in TLA+,
while the fold-based approach allows the user to determine behavior in that scenario (via the initial value).

Our general advice is to use folds over `CHOOSE`

with quantified predicates wherever possible,
if you're willing accept a very minor increase in specification size in exchange for a decrease in Apalache execution time,
or, if you wish to avoid `CHOOSE`

over empty sets resulting in undefined behavior.

### Examples: The versatility of folds

Here we give some examples of common operators, implemented using folds:

```
----- MODULE FoldDefined -----
EXTENDS Apalache
\* Sum of all values of a set of integers
Sum(set) == LET Plus(p,q) == p + q IN ApaFoldSet( Plus, 0, set )
\* Re-implementation of UNION setOfSets
BigUnion(setOfSets) == LET Union(p,q) == p \union q IN ApaFoldSet( Union, {}, setOfSets )
\* Re-implementation of SelectSeq
SelectSeq(seq, Test(_)) == LET CondAppend(s,e) == IF Test(e) THEN Append(s, e) ELSE s
IN ApaFoldSeqLeft( CondAppend, <<>>, seq )
\* Quantify the elements in S matching the predicate P
Quantify(S, P(_)) == LET CondCount(p,q) == p + IF P(q) THEN 1 ELSE 0
IN ApaFoldSet( CondCount, 0, S )
\* The set of all values in seq
Range(seq) == LET AddToSet(S, e) == S \union {e}
IN LET Range == ApaFoldSeqLeft( AddToSet, {}, seq )
\* Finds the the value that appears most often in a sequence. Returns elIfEmpty for empty sequences
Mode(seq, elIfEmpty) == LET ExtRange == Range(seq) \union {elIfEmpty}
IN LET CountElem(countersAndCurrentMode, e) ==
LET counters == countersAndCurrentMode[1]
currentMode == countersAndCurrentMode[2]
IN LET newCounters == [ counters EXCEPT ![e] == counters[e] + 1 ]
IN IF newCounters[e] > newCounters[currentMode]
THEN << newCounters, e >>
ELSE << newCounters, currentMode >>
IN ApaFoldSeqLeft( CountElem, <<[ x \in ExtRange |-> 0 ], elIfEmpty >>, seq )[2]
\* Returns TRUE iff fn is injective
IsInjective(fn) ==
LET SeenBefore( seenAndResult, e ) ==
IF fn[e] \in seenAndResult[1]
THEN [ seenAndResult EXCEPT ![2] = FALSE ]
ELSE [ seenAndResult EXCEPT ![1] = seenAndResult[1] \union {fn[e]} ]
IN ApaFoldSet( SeenBefore, << {}, TRUE >>, DOMAIN fn )[2]
================================
```

For the sake of comparison, we rewrite the above operators using recursion, `CHOOSE`

or quantification:

```
----- MODULE NonFoldDefined -----
EXTENDS Apalache
RECURSIVE Sum(_)
Sum(S) == IF S = {}
THEN 0
ELSE LET x == CHOOSE y \in S : TRUE
IN x + Sum(S \ {x})
RECURSIVE BigUnion(_)
BigUnion(setOfSets) == IF setOfSets = {}
THEN {}
ELSE LET S == CHOOSE x \in setOfSets : TRUE
IN S \union BigUnion(setOfSets \ {x})
RECURSIVE SelectSeq(_,_)
SelectSeq(seq, Test(_)) == IF seq = <<>>
THEN <<>>
ELSE LET tail == SelectSeq(Tail(seq), Test)
IN IF Test( Head(seq) )
THEN <<Head(seq)>> \o tail
ELSE tail
RECURSIVE Quantify(_,_)
Quantify(S, P(_)) == IF S = {}
THEN 0
ELSE LET x == CHOOSE y \in S : TRUE
IN (IF P(x) THEN 1 ELSE 0) + Quantify(S \ {x}, P)
RECURSIVE Range(_)
Range(seq) == IF seq = <<>>
THEN {}
ELSE {Head(seq)} \union Range(Tail(seq))
Mode(seq, elIfEmpty) == IF seq = <<>>
THEN elIfEmpty
ELSE LET numOf(p) == Quantify( DOMAIN seq, LAMBDA q: q = p )
IN CHOOSE x \in Range(seq): \A y \in Range(seq) : numOf(x) >= numOf(y)
IsInjective(fn) == \A a,b \in DOMAIN fn : fn[a] = fn[b] => a = b
================================
```

In most cases, recursive operators are much more verbose,
and the operators using `CHOOSE`

and/or quantification mask double iteration (and thus have quadratic complexity).
For instance, the evaluation of the fold-less `IsInjective`

operator actually requires the traversal of all domain pairs,
instead of the single domain traversal with fold.
In particular, `Mode`

, the most verbose among the fold-defined operators,
is still very readable (most LET-IN operators are introduced to improve readability, at the cost of verbosity) and quite efficient,
as its complexity is linear w.r.t. the length of the sequence
(the mode could also be computed directly, without a sub-call to `Range`

, but the example would be more difficult to read),
unlike the variant with `CHOOSE`

and `\A`

, which is quadratic.