# Manual Proof with Ghost Code in SPARK 2014

## by Claire Dross – Jun 26, 2014

Guiding automatic solvers by adding intermediate assertions is a commonly used technique.

We can go further in this direction, by adding complete pieces of code doing nothing, generally called ghost code, to guide the automated reasoning.

Let us look at an example. Assume that we want to prove the correctness of a sorting procedure on arrays implementing a selection sort, and, more precisely, that it always returns a permutation of the original array. Proving that it returns a sorted array is an orthogonal issue, actually easier to prove, that we won't consider here. For this, we use a Swap procedure that swaps the values stored at two different indexes in an array. We put as a postcondition of Swap the fact that it always returns a permutation of its array parameter:

type Nat_Array is array (Index range <>) of Natural;

procedure Swap (Values : in out Nat_Array;
X      : in     Index;
Y      : in     Index)
with
Pre  => (X in Values'Range and then
Y in Values'Range and then
X /= Y),
Post => Is_Perm (Values'Old, Values)
is
Temp : Natural;
begin
Temp       := Values (X);
Values (X) := Values (Y);
Values (Y) := Temp;
end Swap;

Now, we need to define the proof function Is_Perm. A common way to define permutations is to use the number of occurrences of elements in the array:

function Is_Perm (A, B : Nat_Array) return Boolean is
(for all E in Natural => Occ (A, E) = Occ (B, E));

The number of occurrences Occ can then be defined inductively over the size of its array parameter:

--  Function returning A without its last element
function Remove_Last (A : Nat_Array) return Nat_Array is
(A (A'First .. A'Last - 1))
with Pre  => A'Length > 0;

function Occ_Def (A : Nat_Array; E : Natural) return Nb_Occ is
(if A'Length = 0 then 0
elsif A (A'Last) = E then Occ_Def (Remove_Last (A), E) + 1
else Occ_Def (Remove_Last (A), E))
with Post => Occ_Def'Result <= A'Length;

function Occ (A : Nat_Array; E : Natural) return Nb_Occ is
(Occ_Def (A, E))
with Post => Occ'Result <= A'Length;

Note that we introduce Occ as a wrapper around the recursive definition of Occ_Def. This is to work around a current limitation of the tool that only introduces axioms for postconditions of non-recursive functions (to avoid possibly introducing unsound axioms that would not be detected by the tool).

To be able to prove the postcondition of Swap, we need two properties of Occ, namely, the preservation of Occ on extensionally equal arrays (that is, arrays which are equal at every index) and the way Occ is modified when changing a value of the array. We would like to introduce axioms for these two properties. There is no native construction for axioms in SPARK 2014 (on purpose! to avoid the dangerous introduction of unsound axioms), but, if we introduce a ghost subprogram with our axiom as a postcondition, then it will be available whenever the subprogram is called. We call such subprograms "lemma subprograms" (following a terminology used in other verification systems). Notice that we will still need to call the lemma subprogram with the proper arguments each time we want to get an instance of our axiom, like in manual proofs in an interactive theorem prover. Here are two lemma subprograms that can be used to assert our two properties:

procedure Lemma_Occ_Eq (A, B : Nat_Array; E : Natural) with
Pre  => A = B,
Post => Occ (A, E) = Occ (B, E);

--  Function returning A where the element at index I has been set to E
function Set (A : Nat_Array; I : Index; V : Natural) return Nat_Array with
Pre  => I in A'Range,
Post => Set'Result'First = A'First and then
Set'Result'Last = A'Last and then
Set'Result (I) = V and then
(for all J in A'Range => (if I /= J then Set'Result (J) = A (J)));

procedure Lemma_Occ_Set (A : Nat_Array; I : Index; V, E : Natural) with
Pre  => I in A'Range,
Post =>
(if V = A (I) then Occ (Set (A, I, V), E) = Occ (A, E)
elsif V = E then Occ (Set (A, I, V), E) = Occ (A, E) + 1
elsif A (I) = E then Occ (Set (A, I, V), E) = Occ (A, E) - 1
else Occ (Set (A, I, V), E) = Occ (A, E));

Using these two "axioms", I can now prove the implementation of Swap. Since I need to call the lemma subprograms for every natural, I use a loop. The inductive proof necessary to prove the universally quantified formula is then achieved thanks to the loop invariant, playing the role of an induction hypothesis:

procedure Swap (Values : in out Nat_Array;
X      : in     Index;
Y      : in     Index) is
Temp   : Natural;

--  Ghost variables used to store intermediate values of Values
Init   : constant Nat_Array (Values'Range) := Values;
Interm : Nat_Array (Values'Range);

--  Ghost function that proves the postcondition
procedure Prove_Post with
Pre  => X in Init'Range and then Y in Init'Range and then
Interm = Set (Init, X, Init (Y)) and then
Values = Set (Interm, Y, Init (X)),
Post => Is_Perm (Init, Values) is
begin
for E in Natural loop
--  The loop-invariant works as an induction hypothesis
--  For E > Natural'First, we know at this point that
--    (for all F in Natural'First .. E - 1 =>
--                       Occ (Init, F) = Occ (Values, F))

Lemma_Occ_Set (Init, X, Init (Y), E);
Lemma_Occ_Eq (Interm, Set (Init, X, Init (Y)), E);

Lemma_Occ_Set (Interm, Y, Init (X), E);
Lemma_Occ_Eq (Values, Set (Interm, Y, Init (X)), E);

pragma Loop_Invariant
(for all F in Natural'First .. E =>
Occ (Init, F) = Occ (Values, F));
end loop;
end Prove_Post;

begin
Temp       := Values (X);
Values (X) := Values (Y);

--  Ghost code
Interm := Values;

Values (Y) := Temp;

--  Ghost code
Prove_Post;
end Swap;

The procedure Swap can be verified using GNATprove, with provers Alt-Ergo and CVC4, in less than 5 seconds per verification condition. For now, I have used normal variables and functions for implementing ghost code as ghost variables are not yet part of the language (but this should be available soon). We can then use Swap to implement the sorting procedure, which can be verified by GNATprove in a similar fashion without additional information:

procedure Selection_Sort (Values : in out Nat_Array) with
Post => Is_Perm (Values'Old, Values);
is
Smallest : Index;
begin
if Values'Length = 0 then
return;
end if;

for Current in Values'First .. Values'Last - 1 loop

--  Get the index of the smallest value in the unsorted part...
Smallest := Index_Of_Minimum (Values (Current .. Values'Last));

--  ...and swap it with the current index.
if Smallest /= Current then
Swap (Values => Values,
X      => Current,
Y      => Smallest);
end if;

pragma Loop_Invariant (Is_Perm (Values'Loop_Entry, Values));
end loop;

end Selection_Sort;

Now, we may want to prove our two lemmas using the definition of Occ. Since this definition is recursive, these proofs will require induction, which is not normally in the reach of an automated prover. Still, we can have GNATprove verify these two lemmas using recursive calls on themselves to assert the induction hypothesis. Note that the proofs of the lemmas are then conditioned to the termination of our lemma functions, which currently cannot be verified by GNATprove. Here is how we can achieve the demonstration of our two lemmas:

procedure Lemma_Occ_Eq (A, B : Nat_Array; E : Natural) is
begin
if A'Length = 0 then
return;
end if;

--  Assume that Lemma_Occ_Eq holds for Remove_Last (A) and Remove_Last (B)
Lemma_Occ_Eq (Remove_Last (A), Remove_Last (B), E);
--  Do a case analysis on the value of A (A'Last) when proof is done per path
if A (A'Last) = E then
pragma Assert (B (B'Last) = E);
else
pragma Assert (B (B'Last) /= E);
end if;
end Lemma_Occ_Eq;

procedure Lemma_Occ_Set (A : Nat_Array; I : Index; V, E : Natural) is
begin
if A'Length = 0 then
return;
end if;

if I = A'Last then
--  Remove_Last (A) and Remove_Last (Set (A, I, V)) are equal.
--  Use Lemma_Occ_Eq to get the result
Lemma_Occ_Eq (Remove_Last (A), Remove_Last (Set (A, I, V)), E);

else
--  Assume Lemma_Occ_Set holds for Remove_Last (A)
Lemma_Occ_Set (Remove_Last (A), I, V, E);
Lemma_Occ_Eq (Remove_Last (Set (A, I, V)), Set (Remove_Last (A), I, V), E);
end if;
end Lemma_Occ_Set;

GNATprove proves automatically all checks on the final program, with a small timeout of 5s for the automatic prover. Although this kind of use of ghost code to guide automatic provers is not trivial (it took me a few days to complete this example, in particular to come up with an appropriate definition of permutations), the net benefit of this approach is that no manual proof in a proof assistant is needed!

Posted in #Formal Verification    #SPARK