Last week, the programmer Hillel posted a challenge (the link points to a partial postmortem of the provided solutions) on Twitter for someone to prove a correct implementation of three small programming problems: Leftpad, Unique, and Fulcrum.
This was a good opportunity to compare the SPARK language and its expressiveness and proof power to other systems and paradigms, so I took on the challenge. The good news is that I was able to prove all three solutions and that the SPARK proofs of each complete in no more than 10 seconds. I also believe the Fulcrum solution in particular shows some aspects of SPARK that are especially nice.
I will now explain my solutions to each problem, briefly for Leftpad and Unique and in detail for Fulcrum. At the end, I discuss my takeaways from this challenge.
The specification of Leftpad, according to Hillel, is as follows:
Takes a padding character, a string, and a total length, returns the string padded to that length with that character. If length is less than the length of the string, does nothing.
It is always helpful to start with translating the specification to a SPARK contract. To distinguish between the two cases (padding required or not), we use contract cases, and arrive at this specification (see the full code in this github repository):
function Left_Pad (S : String; Pad_Char : Character; Len : Natural) return String with Contract_Cases => (Len > S'Length => Left_Pad'Result'Length = Len and then (for all I in Left_Pad'Result'Range => Left_Pad'Result (I) = (if I <= Len - S'Length then Pad_Char else S (I - (Len - S'Length + 1) + S'First))), others => Left_Pad'Result = S);
In the case where padding is required, the spec also nicely shows how the result string is composed of both padding chars and chars from the input string.
The implementation in SPARK is of course very simple; we can even use an expression function to do it:
function Left_Pad (S : String; Pad_Char : Character; Len : Natural) return String is ((1 .. Len - S'Length => Pad_Char) & S);
The problem description, as defined by Hillel:
Takes a sequence of integers, returns the unique elements of that list. There is no requirement on the ordering of the returned values.
An explanation in this blog wouldn’t add anything to the commented code, so I suggest you check out the code for my solution directly.
The Fulcrum problem was the heart of the challenge. Although the implementation is also just a few lines, quite a lot of specification work is required to get it all proved.
The problem description, as defined by Hillel:
Given a sequence of integers, returns the index i that minimizes |sum(seq[..i]) - sum(seq[i..])|. Does this in O(n) time and O(n) memory.
(Side note: It took me quite a while to notice it, but the above notation seq[..i] apparently means the slice up to and excluding the value at index i. I have taken it instead to mean the slice up to and including the value at index i. Consequently, I used seq[i+1..] for the second slice. This doesn't change the nature or difficulty of the problem.)
I’m pleased with the solution I arrived at, so I will present it in detail in this blog post. It has two features that I think none of the other solutions has:
It runs in O(1) space, compared to the O(n) permitted by the problem description and required by the other solutions;
It uses bounded integers and proves absence of overflow; all other solutions use unbounded integers to side-step this issue.
Again, our first step is to transform the problem statement into a subprogram specification (please refer to the full solution on github for all the details: there are many comments in the code). For this program, we use arrays to represent sequences of integers, so we first need an array type and a function that can sum all integers in an array. My first try looked like this:
type Seq is array (Integer range <>) of Integer; function Sum (S : Seq) return Integer is (if S’Length = 0 then 0 else S (S’First) + Sum (S (S’First + 1 .. S’Last)));
So we are just traversing the array via recursive calls, summing up the cells as we go. Assuming an array S of type Seq and an array index I, we could now write the required difference between the left and right sums as follows (assuming S is non-empty, which we assume throughout):
abs (Sum (S (S’First .. I)) - Sum (S (I + 1 .. S’Last)))
However, there are two problems with this code, and both are specific to how SPARK works. The first problem could be seen as a limitation of SPARK: SPARK allows recursive functions such as Sum, but can’t use them to prove code and specifications that refer to them. This would result in a lot of unproved checks in our code. But my colleague Claire showed me a really nice workaround, which I now present .
The idea is to not compute a single sum value, but instead an array of partial sums, where we later can select the partial sum we want. Because such an array can’t be computed in a single expression, we define a new function Sum_Acc as a regular function (not an expression function) with a postcondition:
function Sum_Acc (S : Seq) return Seq with Post => (Sum_Acc'Result'Length = S'Length and then Sum_Acc'Result'First = S'First and then Sum_Acc'Result (S'First) = S (S'First) and then (for all I in S'First + 1 .. S'Last => Sum_Acc'Result (I) = Sum_Acc'Result (I - 1) + S (I)));
The idea is that each cell of the result array contains the sum of the input array cells, up to and including the corresponding cell in the input array. For an input array (1,2,3), the function would compute the partial sums (1,3,6). The last value is always the sum of the entire array. In the postcondition, we express that the result array has the same length and bounds as the input array, that the first cell of the result array is always equal to the first cell of the input array, and that each following cell is the sum of the previous cell of the result array, plus the input cell at the same index.
The implementation of this Sum_Acc function is straightforward and can be found in the code on github. We also need a Sum_Acc_Rev function which computes the partial sum starting from the end of the array. It is almost the same as Sum_Acc, but the initial (in fact last) value of the array will be zero, owing to the asymmetric definition of the two sums in the initial problem description. You can also find its specification and implementation in the github repository.
For our array S and index I, the expression to compute the difference between left and right sum now becomes:
abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I))
The second problem is that we are using Integer both as the type of the array cells and the sum result, but Integer is a bounded integer type (32 or 64 bits, depending on your platform), so the sum function will overflow! All other Fulcrum solutions referred to in Hillel’s summary post use unbounded integers, so they avoid this issue. But in many contexts, unbounded integers are unacceptable, because they are unpredictable in space and time requirements and require dynamic memory handling. This is why I decided to increase the difficulty of the Fulcrum problem a little and include proof of absence of overflow.
To solve the overflow problem, we need to bound the size of the values to be summed and how many we can sum them. We also need to take into account negative values, so that we also don’t exceed the minimum value. Luckily, range types in SPARK make this simple:
subtype Int is Integer range -1000 .. 1000; subtype Nat is Integer range 1 .. 1000; type Seq is array (Nat range <>) of Int; type Partial_Sums is array (Nat range <>) of Integer;
We will use Int as the type for the contents of arrays, and Nat for the array indices (effectively limiting the size of arrays to at most 1000). The sums will still be calculated in Integer, so we need to define a new array type to hold the partial sums (we need to change the return type of the partial sums functions to Partial_Sums to make this work). So if we were to sum the largest possible array with 1000 cells, containing only the the highest (or lowest) value 1000 (or -1000), the absolute value of the sum would not exceed 1 million, which even fits easily into 32 bits. Of course, we could also choose different, and much larger, bounds here.
We now can formulate the specification of Find_Fulcrum as follows:
function Find_Fulcrum (S : Seq) return Nat with Pre => S'Length > 0, Post => (Find_Fulcrum'Result in S'Range and then (for all I in S'Range => abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I)) >= abs (Sum_Acc (S) (Find_Fulcrum'Result) - Sum_Acc_Rev (S) (Find_Fulcrum'Result))));
The implementation of Fulcrum
The Sum_Acc and Sum_Acc_Rev functions we already defined suggest a simple solution to Fulcrum that goes as follows:
call Sum_Acc and Sum_Acc_Rev and store the result;
compute the index where the difference between the two arrays is the smallest.
The problem with this solution is that step (1) takes O(n) space but we promised to deliver a constant space solution! So we need to do something else.
In fact, we notice that every program that calls Sum_Acc and Sum_Acc_Rev will be already O(n) in space, so we should never call these functions outside of specifications. The Ghost feature of SPARK lets the compiler check this for us. Types, objects and functions can be marked ghost, and such ghost entities can only be used in specifications (like the postcondition above, or loop invariants and other intermediate assertions), but not in code. This makes sure that we don’t call these functions accidentally, slowing down our code. Marking a function ghost can be done just by adding with Ghost to its declaration.
The constant space implementation idea is that if, for some index J - 1, we have the partial sums Left_Sum = Sum_Acc (J - 1) and Right_Sum = Sum_Acc_Rev (J - 1), we can compute the values for the next index J + 1 by simply adding S (J) to Left_Sum and subtracting it from Right_Sum. This simple idea gives us the core of the implementation:
for I in S'First + 1 .. S'Last loop Left_Sum := Left_Sum + S (I); Right_Sum := Right_Sum - S (I); if abs (Left_Sum - Right_Sum) < Min then Min := abs (Left_Sum - Right_Sum); Index := I; end if; end loop; return Index;
To understand this code, we also need to know that Min holds the current minimal difference between the two sums and Index gives the array index where this minimal difference occurred.
In fact, the previous explanations of the code can be expressed quite nicely using this loop invariant (it also holds at the beginning of the loop):
pragma Loop_Invariant (Left_Sum = Sum_Acc (S) (I - 1) and then Right_Sum = Sum_Acc_Rev (S) (I - 1) and then Min = abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index)) and then (for all K in S'First .. I - 1 => abs (Sum_Acc (S) (K) - Sum_Acc_Rev (S) (K)) >= abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index))));
The only part that’s missing is the initial setup, so that the above conditions hold for I = S’First. For the Right_Sum variable, this requires a traversal of the entire array to compute the initial right sum. For this we have written a helper function Sum which is O(n) time and O(1) space. So we end up with this initialization code for Find_Fulcrum:
Index : Nat := S'First; Left_Sum : Integer := S (S'First); Right_Sum : Integer := Sum (S); Min : Integer := abs (Left_Sum - Right_Sum);
and it can be seen that these initial values establish the loop invariants for the first iteration of the loop (where I = S’First + 1, so I - 1 = S’First).
It took me roughly 15 minutes to come up with the Leftpad proof. I don’t have exact numbers for the two other problems, but I would guess roughly one hour for Unique and 2-3 hours for Fulcrum.
Fulcrum is about 110 lines of code, including specifications, but excluding comments and blank lines. An implementation without any contracts would be at 35 lines, so we have a threefold overhead of specification to code … though that’s typical for specification of small algorithmic problems like Fulcrum.
All proofs are done automatically, and SPARK verifies each example in less than 10 seconds.
Some thoughts about the challenge
First of all, many thanks to Hillel for starting that challenge and to Adrian Rueegsegger for bringing it to my attention. It was fun to do, and I believe SPARK did reasonably well in this challenge.
Hillel’s motivation was to counter exaggerated praise of functional programming (FP) over imperative programming (IP). So he proved these three examples in imperative style and challenged the functional programming community to do the same. In this context, doing the problems in SPARK was probably besides the point, because it’s not a functional language.
Going beyond the FP vs IP question asked in the challenge, I think we can learn a lot by looking at the solutions and the discussion around the challenge. This blog post by Neelakantan Krishnaswami argues that the real issue is the combination of language features, in particular aliasing in combination with side effects. If you have both, verification is hard.
One approach is to accept this situation and deal with it. In Frama-C, a verification tool for C, it is common to write contracts that separate memory regions. Some tools are based on separation logic, which directly features separating conjunction in the specification language. Both result in quite complex specifications, in my opinion.
Or you can change the initial conditions and remove language features. Functional programming removes side effects to make aliasing harmless. This has all kinds of consequences, for example the need for a garbage collector and the inability to use imperative algorithms from the literature. SPARK keeps side effects but removes aliasing, essentially by excluding pointers and a few other language rules. SPARK makes up for the loss of pointers by providing built-in arrays and parameter modes (two very common usages of pointers in C), but shared data structures still remain impossible to write in pure SPARK: one needs to leave the SPARK subset and go to full Ada for that. Rust keeps pointers, but only non-aliasing ones via its borrow checker; however, formal verification for Rust is still in very early stages as far as I know. The SPARK team is also working on support for non-aliasing pointers.
Beyond language features, another important question is the applicability of a language to a domain. Formal verification of code matters most where a bug would have a big impact. Such code can be found in embedded devices, for example safety-critical code that makes sure the device is safe to use (think of airplanes, cars, or medical devices). Embedded programming, in particular safety-critical embedded programming, is special because different constraints apply. For example, execution times and memory usage of the program must be very predictable, which excludes languages that have a managed approach to memory (memory usage becomes less predictable) with a GC (often, the GC can kick in at any time, which makes execution times unpredictable). In these domains, functional languages can’t really be applied directly (but see the use of Haskell in sel4). Unbounded integers can’t be used either because of the same issue - memory usage can grow if some computation yields a very large result, and execution time can vary as well. These issues were the main motivation for me to provide a O(1) space solution that uses bounded integers for the Fulcrum problem and proving absence of overflow.
Programming languages aren’t everything either. Another important issue is tooling, in particular proof automation. Looking at the functional solutions of Fulcrum (linked from Hillel’s blog post), they contain a lot of manual proofs. The Agda solution is very small despite this fact, though it uses a simple quadratic algorithm; I would love to see a variant that’s linear.
I believe that for formal verification to be accepted in industrial projects, most proofs must be able to be completed automatically, though some manual effort is acceptable for a small percentage of the proof obligations. The Dafny and SPARK solutions are the only ones (as far as I could see) that fare well in this regard. Dafny is well-known for its excellent proof automation via Boogie and Z3. SPARK also does very well here, all proofs being fully automatic.