Pointer Based Data-Structures in SPARK

by Claire Dross – Oct 08, 2019

As seen in a previous post, it is possible to use pointers (or access types) in SPARK provided the program abides by a strict memory ownership policy designed to prevent aliasing. In this post, we will demonstrate how to define pointer-based data-structures in SPARK and how to traverse them without breaking the ownership policy.

Pointer-based data structures can be defined in SPARK as long as they do not introduce cycles. For example, singly-linked lists and trees are supported whereas doubly-linked lists are not. As an example for this post, I have chosen to use a map encoded as a singly-linked list of pairs of a key and a value. To define a recursive data structure in Ada, you need to use an access type. This is what I did here: I declared an incomplete view of the type Map, and then an access type Map_Acc designating this view. Then, I could give the complete view of Map as a record with a component Next of type Map_Acc:

type Map;
   type Map_Acc is access Map;
   type Element_Acc is not null access Element;
   type Map is record
      Key   : Positive;
      Value : Element_Acc;
      Next  : Map_Acc;
   end record;

Note that I also used an access type for the element that is stored in the map (and not for the key). Indeed, we assume here that the elements stored in the map can be big, so that we may want to modify them in place rather than copying them. We will see later how this can be done.

To be able to write specifications on my maps, I need to define some basic properties for them. To describe precisely my maps, I will need to speak about:

whether there is a mapping for a given key in my map, and
the value associated to a key by the map, knowing that there is a mapping for this key.

These concepts are encoded as the following functions:

function Model_Contains (M : access constant Map; K : Positive) return Boolean;

   function Model_Value (M : access constant Map; K : Positive) return Element
   with  Pre => Model_Contains (M, K);

Model_Contains should return True when there is an occurrence of the key K in the map M, and Model_Value should retrieve the element associated with the first occurrence of K in M.

Now, I need to give a definition to my functions. The natural way to express the meaning of properties on linked data-structures is through recursion. This is what I have done here for both Model_Contains and Model_Value:

function Model_Contains (M : access constant Map; K : Positive) return Boolean is
     (M /= null and then (M.Key = K or else Model_Contains (M.Next, K)));
   --  A key K has a mapping in a map M if either K is the key of the first mapping of M or if
   --  K has a mapping in M.Next.

   function Model_Value (M : access constant Map; K : Positive) return Element is
     (if M.Key = K then M.Value.all else Model_Value (M.Next, K));
   --  The value mapped to key K by a map M is either the value of the first mapping of M if
   --  K is the key this mapping or the value mapped to K in M.Next otherwise.

Note that we do not need to care about cases where there is no mapping for K in M in the definition of Model_Value, as we have put as a precondition that it cannot be used in such cases.

As these functions are recursive, GNATprove cannot determine that they will terminate. As a result, it will not use their definition in some cases, lest they may be incorrect. To avoid this problem, I have added Terminating annotations to their declarations. These annotations cannot be verified by the tool, but they can easily be discharged by review, so I accepted the check messages with appropriate justification. I have also marked the two functions as ghost, as I will assume that, for efficiency reasons or to control the stack usage, we don't want to use recursive functions in normal code:

function Model_Contains (M : access constant Map; K : Positive) return Boolean
   with
     Ghost,
     Annotate => (GNATprove, Terminating);
   pragma Annotate
     (GNATprove, False_Positive,
      "subprogram ""Model_Contains"" might not terminate",
      "Recursive calls occur on strictly smaller structure");

   function Model_Value (M : access constant Map; K : Positive) return Element
   with
     Ghost,
     Annotate => (GNATprove, Terminating),
     Pre => Model_Contains (M, K);
   pragma Annotate
     (GNATprove, False_Positive,
      "subprogram ""Model_Value"" might not terminate",
      "Recursive calls occur on strictly smaller structure");

Now that we have defined our specification properties for maps, let's try to implement some functionality. We will start with the easiest one, a Contains function which checks whether there is mapping for a given key in a map. In terms of functionality, it should really return the same value as Model_Contains. However, implementation-wise, we would like to use a loop to avoid introducing recursion. Here, we need to be careful. Indeed, traversing a linked data-structure using a loop generally involves an alias between the traversed structure and the pointer used for the traversal. SPARK supports this use case through the concepts of local observers and borrowers (borrowed from Rust's regular and mutable borrows). What we need here is an observer. Intuitively, it is a local variable which is granted for the duration of its lifetime read-only access to a component of another object. When the observer stays in scope, the actual owner of the object also has read-only access on the object. When the observed goes out of scope, it regains complete ownership (provided it used to have it, and there are no other observers in scope). Let us see how we can use it in our example:

function Contains (M : access constant Map; K : Positive) return Boolean with
     Post => Contains'Result = Model_Contains (M, K);

   function Contains (M : access constant Map; K : Positive) return Boolean is
      C : access constant Map := M;
   begin
      while C /= null loop
         pragma Loop_Invariant (Model_Contains (C, K) = Model_Contains (M, K));
         if C.Key = K then
            return True;
         end if;
         C := C.Next;
      end loop;
      return False;
   end Contains;

Because it is declared with an anonymous access-to-constant type, the assignment to C at its declaration is not considered to be a move, but an observe. In the body of Contains, C will be a read-only alias of M. M itself will retain read-only access to the data it designates, and will regain full ownership at the end of Contains (note that here, it does not change much, as M itself is an access-to-constant type, so it has read-only access to its designated data to begin with).

In the body of Contains, we use a loop which searches for a key equal to K in M. We see that, even though C is an observer, we can still assign to it. It is because it is the data designated by C which is read-only, and not C itself. When we assign directly into C, we do not modify the underlying structure, we simply modify the handle. Note that, as usual with loops in SPARK, I had to provide a loop invariant to help GNATprove verify my code. Here it states that K is contained in M, if and only if, it is contained in C. This is true because we have only traversed values different from K until now. We see that, in the invariant, we are allowed to mention M. This is because M retains read-only access to the data it designated during the observe.

Let's now write a function to retrieve in normal code the value associated to a key in a map. Since elements can be big, we don't want to copy them, so we should not return the actual value, but rather a pointer to the object stored in the map. For now, let's assume that we are interested in a read-only access to the element. As we have seen above, in the ownership model of SPARK, a read-only access inside an existing data-structure is an observer. So here, we want a function which computes and returns an observer of the input data-structure. This is supported in SPARK, provided the traversed data-structure is itself an access type, using what we call "traversal functions". An observing traversal function takes an access type as its first parameter and returns an anonymous access-to-constant object which should be a component of this first parameter.

function Constant_Access (M : access constant Map; K : Positive) return not null access constant Element
   with
     Pre  => Model_Contains (M, K),
     Post => Model_Value (M, K) = Constant_Access'Result.all;

   function Constant_Access (M : access constant Map; K : Positive) return not null access constant Element
   is
      C : access constant Map := M;
   begin
      while C /= null loop
         pragma Loop_Invariant (Model_Contains (C, K) = Model_Contains (M, K));
         pragma Loop_Invariant (Model_Value (C, K) = Model_Value (M, K));
         if C.Key = K then
            return C.Value;
         end if;
         C := C.Next;
      end loop;
      raise Program_Error;
   end Constant_Access;

The return type of Constant_Access is rather verbose. It states that it computes an anonymous access-to-constant object (an observer in SPARK) which is not allowed to be null. Indeed, since we know that we have a mapping for K in M when calling Constant_Access, we are sure that there will always be an element to return. This also explains why we have a raise statement as the last statement of Constant_Access: GNATprove is able to prove that this statement is unreachable, and we need that statement to confirm that the bottom of the function cannot be reached without returning a value.

The contract of Constant_Access is straightforward, as we are again reimplementing a concept that we already had in the specification (Constant_Access returns a pointer to the result of Model_Value). In the body of Constant_Access, we create a local variable C which observes the data-structure designated by M, just like we did for Contains. When the correct mapping is found, we return an access to the corresponding value in the data-structure. GNATprove will make sure that the structure returned is a part of the first parameter (here M) of the function.

Note that this function does not breach the ownership policy of SPARK as what is computed by the Constant_Access is an observer, so it will not be possible to assign it to a component inside another data-structure for example.

Now let's assume that we want to modify the value associated to a given key in the data-structure. We can provide a Replace_Element procedure which takes a map M, a key K, and an element V and replaces the value associated to K in M by V. We will write in its postcondition that the value associated to K in M after the call is V (this is not complete, as we do not say anything about other mappings, but for the sake of the explanation, let's stick to something simple).

procedure Replace_Element (M : access Map; K : Positive; V : Element) with
     Pre  => Model_Contains (M, K),
     Post => Model_Contains (M, K) and then Model_Value (M, K) = V;

In its body, we want to loop, find the matching pair, and replace the element by the new value V. Here we cannot use an observer to search for the key K, as we want to modify the mapping afterward. Instead, we will use a local borrower. Just like an observer, a borrower takes the ownership of a part of an existing data-structure for the duration of its life-time, but it takes full ownership, in the sense that a borrow has the right to both read and modify the borrowed object. While the borrower is in scope, the borrowed object cannot be accessed directly (there is a provision for reading it in the RM that is used by the tool in particular cases, we will see later). At the end of the scope of the borrower, the ownership returns to the borrowed object. A borrower in SPARK is introduced by the declaration of an object of an anonymous access-to-variable type (note the use of "access Map" instead of "access constant Map"):

procedure Replace_Element (M : access Map; K : Positive; V : Element) is
      X : access Map := M;
   begin
      while X /= null loop
         pragma Loop_Invariant (Model_Contains (X, K));
         pragma Loop_Invariant
           (Pledge (X, (if Model_Contains (X, K) then Model_Contains (M, K)
                                 and then Model_Value (M, K) = Model_Value (X, K))));

         if X.Key = K then
            X.Value.all := V;
            return;
         end if;
         X := X.Next;
      end loop;
   end Replace_Element;

The body of Replace_Element is similar to the body of Contains, except that we modify the borrower before returning from the procedure. However, we see that the loop invariant is more involved. Indeed, as we are modifying M using X, we need to know how the modification of X will affect M. Usually, GNATprove can track this information without help, but when loops are involved, this information needs to be supplied in the loop invariant. To describe how a structure and its borrower are affected, I have used a pledge. The notion of pledges was introduced by researchers from ETH Zurich to verify Rust programs (see the preprint of their work to be published at OOPSLA this year). Conceptually, a pledge is a property which will always hold between the borrowed object and the borrower, no matter the modifications that are made to the borrower. As pledges are not yet supported at a language level in SPARK, it is possible to mark (a part of) an assertion as a pledge by using an expression function which is annotated with a Pledge Annotate pragma:

function Pledge (Borrower : access constant Map; Prop : Boolean) return Boolean is
     (Prop)
   with Ghost,
     Annotate => (GNATProve, Pledge);
   --  Pledge for maps

Note that the name of the function could be something other than "Pledge", but the annotation should use the string "Pledge". A pledge function is a ghost expression function which takes a borrower and a property and simply returns the property. When GNATprove encounters a call to such a function, it knows that the property given as a second parameter to the call must be handled as a pledge of the local borrower given as a first parameter. It will attempt to verify that, no matter the modification which may be done to Borrower, the property will still hold. Inside Prop, it is necessary to be able to mention the borrowed object. This is why there is provision for reading it in the SPARK RM, and in fact, it is the only case where the tool will allow a borrowed object to be mentioned.

In our example, the pledge of X states that, no matter how X will be modified afterward, if it happens that X has a mapping for K, then M will have the same mapping for K. This is true because we have not encountered K in the previous iterations of the loop, so if we find a mapping in X for K, it will be the first such mapping in M too. Note that a more precise pledge, like Pledge (X, Model_Contains (X, K)), would not be correct. Given a borrower aliasing the root of a subtree in a borrowed object, the pledge relationship expresses what necessarily holds for that subtree, independently of any modifications that may occur to it through the borrower. This is why we need to state the pledge here with an if-expression: "if the borrower X still contains the key K, then M necessarily contains the key K, and both agree on the associated value".

After the loop, M has been modified through X. GNATprove does not know anything about it but what can be deduced from the current value of X and information about the relation between M and X supplied by the loop-invariants that contain pledges. These invariants don't completely define the relation between X and M, but they give enough information to deduce the postcondition when a mapping for K is found in X. Since at the last iteration X.Key = K, GNATprove can deduce that Model_Contains (X, K) and Model_Value (X, K) = V holds after the loop. Using the pledges from the loop-invariants, it can infer that we also have Model_Contains (M, K) and Model_Value(M, K) = V.

Before we reach the end of this post, we will go one step further. We now have a way to replace an element with another one in the map. It can be used if we want to do a modification inside an element of the map, but it won't be efficient, as the element will need to be copied. We would like to provide a way to find the value associated to a given key in a map, and return an access to it so that it can be modified in-place. As for Constant_Reference, this can be done using a traversal function:

function Pledge (Borrower : access constant Element; Prop : Boolean) return Boolean is
     (Prop)
   with Ghost,
     Annotate => (GNATProve, Pledge);
   --  Pledge for elements

   function Reference (M : access Map; K : Positive) return not null access Element
   with
     Pre  => Model_Contains (M, K),
     Post => Model_Value (M, K) = Reference'Result.all and then
         Pledge (Reference'Result, Model_Contains (M, K) and then
                 Model_Value (M, K) = Reference'Result.all);

Reference returns a mutable access inside the data-structure M. We can see that, in its postcondition, I have added a pledge. Indeed, since the result of Reference is an access-to-variable object, a user of my function can use it to modify M. If I want the tool to be able to deduce anything about such a modification, I need to describe the link between the result of a call to the Reference function, and its first parameter. Here my pledge gives the same information as the postcondition of Replace_Element, that is, that the value designated by the result of the call will be the one mapped from K in M.

function Reference (M : access Map; K : Positive) return not null access Element 
   is
      X : access Map := M;
   begin
      while X /= null loop
         pragma Loop_Invariant (Model_Contains (X, K));
         pragma Loop_Invariant (Model_Value (X, K) = Model_Value (M, K));
         pragma Loop_Invariant
           (Pledge (X, (if Model_Contains (X, K) then Model_Contains (M, K)
            and then Model_Value (M, K) = Model_Value (X, K))));

         if X.Key = K then
            return X.Value;
         end if;
         X := X.Next;
      end loop;
      raise Program_Error;
   end Reference;

The body of Reference contains the same loop as Constant_Reference, except that I have added a loop invariant, similar to the one in Replace_Element to supply a pledge to X.

The specification and verification of pointer-based data-structures is a challenge in deductive verification whether the "pointer" part is implemented as a machine pointer or as an array index (as an example, see our previous post about verifying insertion inside a red-black tree). In addition, SPARK has a strict ownership policy which will prevent completely the use of some (doubly-linked) data-structures, and complicate the writing of usual algorithms on others. However, I think I have demonstrated in this post that is still feasible to write and verify in SPARK some of these algorithms, with comparatively few user-supplied annotations.

Posted in #SPARK #Formal Verification

About Claire Dross

Claire Dross has a PhD in deductive verification of programs using a satisfiability modulo theory solver with the Universite Paris-Sud. She also has an engineering degree from the Ecole Polytechnique and an engineering degree from the Ecole Nationale Superieure des Telecommunications. At AdaCore, she works full-time on the formal verification SPARK 2014 toolset.