Handling Aliasing through Pointers in SPARK

by Claire Dross – Mar 14, 2022

As I explained in a blog post a couple of years ago, pointers are subjected to a strict ownership policy in SPARK. It prevents aliasing and allows for an efficient formal verification model. Of course, it comes at the cost of restrictions which might not be applicable to all usage. In particular, while ownership makes it possible to represent certain recursive data-structures, those involving cycles or sharing are de-facto forbidden. This is a choice, and not every proof tool did the same. For example, the WP plug-in of Frama-C supports pointers with arbitrary aliasing. If some information about the separation of memory cells is necessary to verify a program, then the user shall give the annotation explicitly. I have investigated modeling pointers with aliasing in SPARK as indices in a big memory array. I will present the results of my experiments in this blog post. We will see that, while such a representation is indeed possible modulo some hiding in SPARK, it can quickly become rather heavy in practice.

First of all, as pointers in SPARK are always subjected to ownership, we need to hide the access type from the analysis tool. For that, we can use a private type whose full view is annotated with SPARK_Mode => Off. Since we might want to declare pointers designating different types, I have declared this private type inside a generic package:

generic
   type Object (<>) is private;
package Pointers_With_Aliasing with SPARK_Mode is
   type Pointer is private;

   procedure Create (O : Object; P : out Pointer);
   function Deref (P : Pointer) return Object;
   procedure Assign (P : Pointer; O : Object);
   procedure Dealloc (P : in out Pointer);

private
   pragma SPARK_Mode (Off);
   type Pointer is access Object;
end Pointers_With_Aliasing;

The functionalities are the same as basic pointers: they can be allocated, dereferenced, assigned, and deallocated. Deallocation takes a parameter of mode IN OUT, because it resets it to null, just like normal deallocation in Ada. Note that the Create subprogram, used for the allocation, is a procedure and not a function. Indeed, functions cannot have side-effects in SPARK, and we want allocations to have some global effect on the memory. To model the effects of these subprograms, I have declared a Memory object that I can use inside global contracts. I could have used an abstract state instead, but an actual object is more appropriate as it makes it possible to copy a previous value of the memory into a ghost constant to refer to it later if necessary:

Memory : Memory_Type;
procedure Create (O : Object; P : out Pointer) with
   Global => (In_Out => Memory);
function Deref (P : Pointer) return Object with
   Global => Memory;

Now let's consider some functional specifications. We represent the memory as a big map from addresses to objects: a pointer is valid if and only if its address, computed through an Address function, is associated with an object in the map, in which case it designates the object. This model can be used to annotate our primitives in a mostly straight-forward way. However, we can already get a taste at this stage of the annotation burden. For all procedures which update the memory, we must not only describe the effect on the modified cells, but also state that all the other cells are preserved. To make it easier, I have defined three helper functions, Writes, Allocates, and Deallocates. They take as a parameter a representation of the memory foot-print of the subprogram. For example, here is the contract of Assign. It states that no cells are either allocated or deallocated by Assign, and that all the previous mappings are preserved except for the address of P:

procedure Assign (P : Pointer; O : Object) with
   Global => (In_Out => Memory),
   Pre  => Valid (Memory, Address (P)),
   Post => Get (Memory, Address (P)) = O
     and then Allocates (Memory'Old, Memory, None)
     and then Deallocates (Memory'Old, Memory, None)
     and then Writes (Memory'Old, Memory, Only (Address (P)));

There is no magic behind the helper functions, they are simply defined using quantified formulas on the memory. The function Writes for example states that the mappings of all valid cells which are not in the memory foot-print are preserved. Foot-prints are represented here as a big array from addresses to a boolean value. An address is included in the foot-print if the array associates it with the value True:

function Writes (M1, M2 : Memory_Type; Target : Addresses) return Boolean is
   (for all A in Address_Type =>
      (if not Target (A) and Valid (M1, A) and Valid (M2, A)
       then Get (M1, A) = Get (M2, A)))
with Ghost;

Note that this representation of pointers has some draw-backs with respect to the use of normal (ownership-based) pointers, some of which can be alleviated. First, there is no way with these pointers to use SPARK to check the absence of memory leaks. As a work-around, it is probably possible to use some sort of ref-counting mechanism in the implementation of the library, but this mechanism will not be verified by SPARK. I have not tried it. Another disadvantage is the lack of a possible way to update a value in place (calling Deref and then Assign will copy the designated value twice). To alleviate this concern, I have introduced Constant_Reference and Reference functions which turn one of our pointers into an ownership pointer while observing or borrowing the whole memory. It still remains a bit more cumbersome to use than a direct update inside a visible pointer though. Finally, using a big memory object as a representation of the designated data might prevent SPARK from verifying that the program is thread safe. Indeed, if all pointers to a type are conceptually designating the same memory object, then all threads using these pointers are considered to access this object, potentially leading to a race condition. We would need some higher-level memory model to handle this case, maybe based on separation logic.

Now that we have defined a modelisation for our new pointer with aliasing, let's try to write and verify a few programs to assess their usability. We start with a very simple example, a Swap procedure. It is well known that it is not necessary to have two distinct pointers for swap to work as expected. Let's check it. First, we need to choose a type of objects and instantiate our generic to declare the pointer type. I have chosen a small record type with two fields. Swap can be defined straightforwardly from there:

type Object is record
   F : Integer;
   G : Integer;
end record;
package Pointers_To_Obj is new Pointers_With_Aliasing (Object);

procedure Swap (X, Y : Pointer) with
   Pre => Valid (Memory, Address (X)) and Valid (Memory, Address (Y)),
   Post => Deref (X) = Deref (Y)'Old and Deref (Y) = Deref (X)'Old
     and Allocates (Memory'Old, Memory, None)
     and Deallocates (Memory'Old, Memory, None)
     and Writes (Memory'Old, Memory, [for A in Address_Type => A in Address (X) | Address (Y)])
is
    Tmp : constant Object := Deref (X);
begin
    Assign (X, Deref (Y));
    Assign (Y, Tmp);
end Swap;

We can see that I have annotated Swap with already a fair amount of specifications, especially considering its size and its complexity. The precondition states that X and Y are valid pointers in the memory. It is necessary since aliasing may result in the pointer referencing some deallocated data in one of the parameters. In the first line of the postcondition, I use Deref to state that the values designated by the pointers have indeed been swapped. The following 3 lines are here to express the frame condition of Swap: all other memory cells but the ones taken as parameters by Swap are preserved. These 3 lines would not have been needed if we were using ownership pointers, as the separation is handled in a built-in way by the tool. Still, this is not horrendous, and the SPARK analysis tool can check Swap without any issues. We can call it on the same memory cell for both parameters, which would not have been possible if we were using ownership pointers.

Let's try our modelisation on a (slightly) more complex example: a simply linked list. Linked lists can be defined using pointers with ownership too, but let's assume that we want to allow sharing between linked lists. The first issue we encounter is technical, we cannot instantiate our generic package with an incomplete type, so we cannot construct our recursive data-structure directly. This means that we need to use a class-wide type for our designated element, so we can declare a pointer before the actual designated object:

type Object is tagged null record;
package Pointers_To_Obj is new Pointers_With_Aliasing (Object'Class);
type L_Cell is new Object with record
   V : Natural;
   N : Pointers_To_Obj.Pointer;
end record;

In addition, as our memory model uses the equality operator on the object type, we'd better know what it is. To this effect, we introduce a Valid_Memory predicate which states that all the objects in the memory are list cells, and not some other type derived from Object for which the "=" operator could behave unexpectedly:

function Valid_Memory (M : Memory_Type) return Boolean is
   (for all A in M => Get (M, A) in L_Cell);

As our memory could theoretically contain cycles, it seems safer to store the length of the list on the side. It makes it easy to define what it means for an address to designate a valid (acyclic) list of a given length as follows:

function Valid_List (L : Address_Type; N : Natural; M : Memory_Type) return Boolean is
   --  if L is a null pointer, the list is empty
   (if L = 0 then N = 0
   --  otherwise, L designates a valid pointer in M and L.N is a list of length N - 1
    else N /= 0
      and then Valid (M, L)
      and then Valid_List (Address (L_Cell (Get (M, L)).N), N - 1, M))
with Pre => Valid_Memory (M);

type List is record
   Length : Natural;
   Values : Pointer;
end record;
--  Type for lists. We store the length together with the pointer.

function Valid_List (L : List) return Boolean is
  (Valid_List (Address (L.Values), L.Length, Memory))
with Pre => Valid_Memory (Memory);

Now that we have defined our linked list objects, let's try to write a utility program for them. To keep it simple, I have chosen to consider an Append procedure which takes a list and inserts it at the end of another list. We don't consider the actual elements stored in the list and focus on the memory safety only. As an example, let's assume that we have three valid lists L1, L2, L3 and that we want to be able to prove that it is safe to use Append to concatenate L2 to both L1 and L3 so we get two valid lists which share a sublist:

declare
  L1 : List;
  L2 : List;
  L3 : List;
begin
  --  Create L1, L2, and L3
  ...
  pragma Assert (Valid_List (L1));
  pragma Assert (Valid_List (L2));
  pragma Assert (Valid_List (L3));

  Append (L1, L2);
  Append (L3, L2);
  pragma Assert (Valid_List (L1));
  pragma Assert (Valid_List (L2));
  pragma Assert (Valid_List (L3));
end;

Let's try to come up with a minimal contract for Append that would allow us to prove this kind of code. The first thing we need is to be able to say that two lists are disjoint (they do not share any memory cell). This is necessary as Append should only be called on disjoint lists if we do not want to create a cycle. To express it, we define a reachability predicate which returns True when there is a path from an address to another in the list structure. Using this predicate, we can say that two lists are disjoint if there is no cell reachable from both:

function Reachable (L : Address_Type; N : Natural; A : Address_Type; M : Memory_Type) return Boolean is
  --  A is reachable in the acyclic list starting at L in M iff:
  --  L is not null,
  (L /= 0 and then
  --  and either L is A or A is reachable from L.N
     (L = A
      or else Reachable (Address (L_Cell (Get (M, L)).N), N - 1, A, M)))
with Pre => Valid_Memory (M) and then Valid_List (L, N, M);

function Disjoint (L1, L2 : List) return Boolean is
  (for all A in Address_Type =>
     (if Reachable (Address (L1.Values), L1.Length, A, Memory)
      then not Reachable (Address (L2.Values), L2.Length, A, Memory)))
with Pre => Valid_Memory (Memory) and then Valid_List (L1) and then Valid_List (L2);

Using the above definitions, I have written the following specification for Append, which I have tried to make as simple as possible. In the precondition, I state that L1 and L2 are disjoint valid lists. In the postcondition, I need to express that L1 is still a valid list, the fact that other, disjoint lists are preserved is expressed through the frame condition: we only update memory cells reachable from L1 before the call. Finally, to be able to continue tracking the partitioning of memory after the call, I need to describe the cells which are reachable from L1 after the call. Note that I could have been more precise here, and constrain the list structure in a stronger way:

procedure Append (L1 : in out List; L2 : List) with
  Global => (In_Out => Memory),
  Pre => Valid_Memory (Memory)
     --  L1 and L2 are valid lists
     and then Valid_List (L1) and then Valid_List (L2)
     --  L1 and L2 are disjoint
     and then Disjoint (L1, L2)
     --  the sum of their lengths is a natural
     and then Natural'Last - L1.Length >= L2.Length,

  Post => Valid_Memory (Memory)
     --  L1 is a valid list
     and then Valid_List (L1)
     --  It is long as L1 + L2
     and then L1.Length = L1.Length'Old + L2.Length'Old
     --  The new list contains the same pointers as the 2 input lists
     and then (for all A in Address_Type => Reachable (Address (L1.Values), L1.Length, A, Memory) =
                 (Reachable (Address (L1.Values)'Old, L1.Length'Old, A, Memory'Old)
                  or Reachable (Address (L2.Values), L2.Length, A, Memory'Old)))
     --  Nothing has been allocated or deallocated
     and then Allocates (Memory'Old, Memory, None)
     and then Deallocates (Memory'Old, Memory, None)
     --  Only cells reachable from L1 before the call have been modified
     and then Writes (Memory'Old, Memory, Reachable_Locations (L1)'Old);

To prove the natural implementation of Append, I had to write 6 lemmas, all involving proofs by induction, which are currently out-of-reach of the automatic solvers at the backend of SPARK. Mostly, they state what happens to the validity of lists and reachability on preserved parts of the memory. Applications of the same lemmas were also necessary to prove the memory safety of the two consecutive applications of Append above. Here again, the complexity turned out to be tractable, even if definitely non-trivial. Note that an Append function on simply linked lists using regular SPARK pointers does not require any contract or ghost code to be proven correct, so the advantages of using regular pointers are obvious here. Whether they are coming from ownership or the fact that the pointers are supported in a built-in way by the tool is less clear. The need for a classwide type to construct the list would definitely disappear with built-in support, but the reachability predicate, the frame conditions, as well as some auxiliary lemmas would most probably remain, as can be seen on list examples in the Frama-C framework.

We have reached the end of this post. If you are still with me, I think that we have demonstrated that, while it is possible to define and use pointers with aliasing in SPARK, it definitely does not come for free. If you are interested, you can find the complete example in the spark testsuite.

Posted in #SPARK #Data Structures #Formal Verification

About Claire Dross

Claire Dross has a PhD in deductive verification of programs using a satisfiability modulo theory solver with the Universite Paris-Sud. She also has an engineering degree from the Ecole Polytechnique and an engineering degree from the Ecole Nationale Superieure des Telecommunications. At AdaCore, she works full-time on the formal verification SPARK 2014 toolset.