AdaCore Blog

Relaxing the Data Initialization Policy of SPARK

by Claire Dross

SPARK always being under development, new language features make it in every release of the tool, be they previously unsupported Ada features (like access types) or SPARK specific developments. However, new features generally take a while to make it into actual user code. The feature I am going to present here is in my experience an exception, as it was used both internally and by external users before it made it into any actual release. It was designed to enhance the verification of data initialization, whose limitations have been a long standing issue in SPARK.

In the assurance ladder, data initialization is associated with the bronze level, that is, the easiest to reach through SPARK. Indeed, most of the time, the verification of correct data initialization is achieved automatically without much need for user annotations or code modifications. However, once in a while, users encounter cases where the tool cannot verify the correct initialization of some data in their program, even though it is correct. Until recently, there were no good solutions for this problem. No additional annotation efforts could help, and users had to either accept the check messages and verify proper initialization by other means, or perform unnecessary initialization to please the tool. This has changed in the most recent releases of SPARK (SPARK community 2020 and recent previews of SPARK Pro 21). In this post, I describe a new feature, called Relaxed_Initialization, designed to help in this situation.

First, let's get some insight on the problem. SPARK performs several analyses. Among them, flow analysis is used to follow the flow of information through variables in the program. It is fast and scales well, but it is not sensitive to values in the program. Said otherwise, it follows variable names through the control flow, but does not try to track their values.
The other main analysis is formal proof. It translates the program into logical formulas that are then verified by an automated solver. It is precise, as it models values of variables at every program point, but it is potentially slow and requires user annotations to summarize the effect of subprograms in contracts. Verifications done by flow analysis are in general easier to complete, and so are associated with the bronze level in the assurance ladder, whereas verifications done by proof require more user inputs and are associated with levels silver or higher.

In SPARK, data initialization is in general handled by flow analysis. Indeed, most of the time, it is enough to look at the control flow graph to decide whether something has been initialized or not. However, using flow analysis for verifying data initialization induces some limitations. Most notably:

  • Arrays are handled as a whole, because flow analysis would need to track values to know which indexes have been written by a component assignment. As a result, SPARK is sometimes unable to verify code which initializes an array by part (using a loop for example, as opposed to a single assignment through an aggregate).
  •  As it does not require user annotations for checking data initialization, SPARK enforces a strict data initialization policy at subprogram boundary. In a nutshell, all inputs should be entirely initialized on subprogram entry, and all outputs should be entirely initialized on subprogram return.

In recent releases of SPARK, it is possible to use proof instead of flow analysis to verify the correct initialization of data. This has the effect of increasing the precision of the analysis, at the cost of a slower verification process and an increased annotation effort. Since this is a trade-off, SPARK allows users to choose if they want to use flow analysis or proof in a fine grained manner on a per variable basis. By default, the lighter approach is preferred, and initialization checks are handled by flow analysis. To use proof instead, users should annotate their variables with the Relaxed_Initialization aspect.

To demonstrate how this can be used to lift previous limitations, let us look at an example. As stated above, arrays are treated as a whole by flow analysis. Since initializing an array using a loop is a regular occurrence, flow analysis has some heuristics to recognize the most common cases. However, this falls short as soon as the loop does not cover the whole range of the array, elements are initialized more than one at a time, or the array is read during the initialization. In particular, this last case occurs if we try to describe the behavior of a loop using a loop invariant. As an example, Add computes the element-wise addition of two arrays of natural numbers:

   type Nat_Array is array (Positive range 1 .. 100) of Natural;

   function Add (A, B : Nat_Array) return Nat_Array with
     Pre => (for all E of A => E < 10000)
       and then (for all E of B => E < 10000),
     Post => (for all K in A'Range => Add'Result (K) = A (K) + B (K))
   is
      Res : Nat_Array;
   begin
      for I in A'Range loop
         Res (I) := A (I) + B (I);
         pragma Loop_Invariant
           (for all K in 1 .. I => Res (K) = A (K) + B (K));
      end loop;
      return Res;
   end Add;

The correct initialization of Res cannot be verified by flow analysis, because it cannot make sure that the invariant only reads initialized values. If we remove the invariant, then the initialization is verified, but of course the postcondition is not... Until now, the only solution to work around this problem was to add a (useless) initial value to Res using an aggregate. This was less than satisfactory... In recent versions of SPARK, I can instead specify that I want the initialization of Res to be verified by proof using the Relaxed_Initialization aspect:

      Res : Nat_Array with Relaxed_Initialization;

With this additional invariant, my program is entirely verified. Note that, when Relaxed_Initialization is used, the bronze level of the assurance ladder is no longer enough to ensure the correct initialization of data. We now need to reach the silver level, which may require adding more contracts and doing more code refactoring.

Let's now consider the second major limitation of the classical handling of initialization in SPARK: the data initialization policy. As I have mentioned earlier, it requires that inputs and outputs of subprograms are entirely initialized at subprogram boundaries. As an example, I can consider the following piece of code which tries to read several natural numbers from a string using a Read_Natural procedure. It as an Error output which is used to signal errors occurring during the read:

   type Error_Kind is (Empty_Input, Cannot_Read, No_Errors);

   subtype Size_Range is Natural range 0 .. 100;

   procedure Read_Natural
     (Input    : String;
      Result   : out Natural;
      Num_Read : out Natural)
  with Post => Num_Read <= Input'Length;
  --  Read a number from Input. Return in Num_Read the number of characters read.

   procedure Read
     (Input  : String;
      Buffer : out Nat_Array;
      Size   : out Size_Range;
      Error  : out Error_Kind)
   is
      Num_Read : Natural;
      Start    : Positive range Input'Range;

   begin
      --  If Input is empty, set the error code appropriately and return

      if Input'Length = 0 then
         Size := 0;
         Error := Empty_Input;
         return;
      end if;

      --  Otherwise, call Read_Natural until either Input is entirely read,
      --  or we have reached the end of Buffer.

      Start := Input'First;

      for I in Buffer'Range loop
         Read_Natural (Input (Start .. Input'Last), Buffer (I), Num_Read);

         --  If nothing can be read from Input, set the error mode and return

         if Num_Read = 0 then
            Size := 0;
            Error := Cannot_Read;
            return;
         end if;

         --  We have reached the end of Input

         if Start > Input'Last - Num_Read then
            Size := I;
            Error := No_Errors;
            return;
         end if;

         Start := Start + Num_Read;
      end loop;

      --  We have completely filled Buffer

      Size := 100;
      Error := No_Errors;
   end Read;

This example is not following the data initialization policy of SPARK, as I don't initialize Buffer when returning with an error. In addition, if Input contains less than 100 numbers, Buffer will only be initialized up to Size. If I launch SPARK on this example, flow analysis complains, stating that it cannot ensure that Buffer is initialized at the end of Read. To silence it, I can add a dummy initialization for Buffer at the beginning, for example setting every element to 0. However this is not what I want. Indeed, not only might this initialization be costly, but callers of Read may forget to check the error status and read Buffer, and SPARK won't detect it. Instead, I want SPARK to know which parts of Buffer are meaningful after the call, and to check that those only are accessed by callers.

Here again, I can use the Relaxed_Initialization aspect to exempt Buffer from the data initialization policy of SPARK. To annotate a formal parameter, I need to supply the aspect on the subprogram and mention the formal as a parameter:

   procedure Read
     (Input  : String;
      Buffer : out Nat_Array;
      Size   : out Natural;
      Error  : out Error_Kind)
   with Relaxed_Initialization => Buffer;

Now my procedure is successfully verified by SPARK. Note that I have initialized Size even when the call completes with errors. Indeed, Ada says that copying an uninitialized scalar, for example when giving it as an actual parameter to a subprogram call, is a bounded error. So the Relaxed_Initialization aspect wouldn't help here, as I would still need to initialize Size on all paths before returning from Read.

Let's write some user code to see if everything works as expected. Use_Read reads up to 100 numbers from a string and prints them to the standard output:

   procedure Use_Read (S : String) is
      Buffer : Nat_Array;
      Error  : Error_Kind;
      Size   : Natural;
   begin
      Read (S, Buffer, Size, Error);
      for N of Buffer loop
         Ada.Text_IO.Put_Line (N'Image);
      end loop;
   end Use_Read;

Here SPARK complains that Buffer might not be initialized on the call to Read. Indeed, as the local Buffer variable does not have the Relaxed_Initialization aspect set to True, SPARK attempts to verify that it is entirely initialized by the call. This is not what I want, so I annotate Buffer with Relaxed_Initialization:

      Buffer : Nat_Array with Relaxed_Initialization;

Now, if I run SPARK again on my example, I have another failed initialization check, this time on the call to Put_Line inside my loop. This one is expected, as I do not check the error status after my call to read. So I now fix my code so that it only accesses indices of Buffer which have been initialized by my read:

   procedure Use_Read (S : String) is
      Buffer : Nat_Array with Relaxed_Initialization;
      Error  : Error_Kind;
      Size   : Natural;
   begin
      Read (S, Buffer, Size, Error);
      if Error = No_Errors then
         for N of Buffer (1 .. Size) loop
            Ada.Text_IO.Put_Line (N'Image);
         end loop;
      end if;
   end Use_Read;

Unfortunately, it does not help, and the failed initialization check on the call to Put_Line remains the same. This is because I have not given any information about the initialization of Buffer in the contract of Read. With the usual data initialization policy of SPARK, nothing is needed, because SPARK enforces that all outputs are initialized after the call. However, since I have opted out of this policy for Buffer, I now need to use a postcondition to describe its initialization status after the call. This can be done easily using the 'Initialized attribute:

   procedure Read
     (Input  : String;
      Buffer : out Nat_Array;
      Size   : out Size_Range;
      Error  : out Error_Kind)
   with Relaxed_Initialization => Buffer,
     Post => (if Error = No_Errors then Buffer (1 .. Size)'Initialized
              else Size = 0);

The postcondition states that if no errors occurred, then Buffer has been initialized up to Size. If I want my code to prove, I also need to supply an invariant at the end of the loop inside read:

         pragma Loop_Invariant (Buffer (1 .. I)'Initialized);

Now both Read and Use_Read are entirely proved, and if I tweak Use_Read to access a part of Buffer with no meaningful values, SPARK will produce a failed initialization check.

The Relaxed_Initialization aspect provides a way to opt out of the strict data initialization policy of SPARK and work around the inherent imprecision of flow analysis on value sensitive checks. It enables the verification of valid programs which used to be out of the scope of the proof technology offered by SPARK. You can find more information in the user guide. Don't hesitate to try it in your project, and tell us if you think it is useful and how we can improve it!

Posted in #Formal Verification    #SPARK   

About Claire Dross

Claire Dross has a PhD in deductive verification of programs using a satisfiability modulo theory solver with the Universite Paris-Sud. She also has an engineering degree from the Ecole Polytechnique and an engineering degree from the Ecole Nationale Superieure des Telecommunications. At AdaCore, she works full-time on the formal verification SPARK 2014 toolset.