AdaCore Blog

A Little Exercise With Strings

by Johannes Kanig

Recently, I was looking at code which implemented basic functionality for string handling similar to the unix tools head and tail, which can select the start or the end of a file based on lines. I came across this function:

 function From_Line
     (S : String; LineNo : Natural) return String
   is
      Pos : Integer := S'First;
      Current_Line : Integer := 1;
   begin
      while Current_Line < LineNo and then Pos <= S'Last loop
         if S (Pos) = ASCII.LF then
            Current_Line := Current_Line + 1;
         end if;

         Pos := Pos + 1;
      end loop;

      return S (Pos .. S'Last);
   end From_Line;

The function From_Line tries to implement the following functionality: Given a string S, return the substring of S starting from the LineNo'th line in S. The function has a subtle bug, can you spot it?

Let's first try to find the problem manually. Looking at the code with specific test cases in mind, one will find that the function deals nicely with two corner cases. First, provided the empty string, the loop condition will be false before going into the first iteration, and the empty string will be returned, as expected. Second, if the string contains less line numbers than requested, the variable Pos will contain S'Last + 1. One may think that this is a problem for the slice taken from S in the last line, but it is not, because then we know that Pos .. S'Last is the empty range, so the indices are not required to be valid indices of the string.

Now let's try using SPARK 2014. SPARK reports three messages: it cannot prove that when accessing S inside the loop, Pos is always in the bounds of the string S, that no overflow occurs when incrementing Pos, and that the slice expression at the end is free of runtime errors.

The first message is surprising, clearly Pos is in the array bounds, because we initialize it with S'First, and only ever increment it, until we stop at S'Last. The problem is that SPARK cannot do this kind of reasoning, which is also called inductive. We have to help SPARK by writing the condition which we are trying to maintain as a loop invariant, namely that inside the loop, Pos is always in the bounds of S:

 pragma Loop_Invariant(Pos in S'Range);

There is another post which explains loop invariants in more detail.

Putting the loop invariant in, SPARK 2014 proves both the array access and the slice, but the overflow remains. How can Pos overflow? The problem actually lies in one of the cases that we already discussed: requesting more lines than are in the string. But the problem only appears when S'Last is equal to Integer'Last. In this case, we have Pos = S'Last + 1, and S'Last = Integer'Last, in other words, Pos has overflowed. That rarely happens in practice, but it may occur with specially crafted strings, so it is a security issue of the code. In other words, SPARK just has spotted a real problem in our code.

An easy way to get around this problem would be to add a precondition to From_Line to disallow strings with S'Last = Integer'Last. However, this would require every caller of From_Line to guarantee this. It would be preferable to make our function work on any string, and that's the topic of the rest of this blog post. We get bonus points if we can prove absence of runtime errors using SPARK.

The first idea is to avoid the overflow. Looking at the loop, the problem is that Pos is always incremented, even if we are at the last index of the string. We can fix this by breaking up the loop as follows:

while Current_Line < LineNo loop
         pragma Loop_Invariant(Pos in S'Range);
         if S (Pos) = ASCII.LF then
            Current_Line := Current_Line + 1;
         end if;
         exit when Pos = S'Last;
         Pos := Pos + 1;
      end loop;

Now Pos will never be set to a value outside of the string, which is good. A problem with this code is after the loop, some complex code would need to be written to detect which of the two loop exit conditions was the reason to exit the loop. In this case, at the exit statement, we know already what the return value must be: the empty string. So we can simplify the code as is:

      while Current_Line < LineNo loop
         pragma Loop_Invariant(Pos in S'Range);
         if S (Pos) = ASCII.LF then
            Current_Line := Current_Line + 1;
         end if;
         if Pos = S'Last then
            return "";
         end if;
         Pos := Pos + 1;
      end loop;

the return statement after the loop is unchanged and correct.

However, our loop invariant is actually not true when entering the loop, if the input string was empty. This is easy to check at the beginning of the function:

if S = "" then
      return "";
   end if;

We are done, and SPARK proves the absence of run-time errors in this code.

Here is the complete code for reference:

 function From_Line
     (S : String; LineNo : Natural) return String
   is
      Pos : Integer := S'First;
      Current_Line : Integer := 1;
   begin
      if S = "" then
         return "";
      end if;
      while Current_Line < LineNo loop
         pragma Loop_Invariant (Pos in S'Range);
         if S (Pos) = ASCII.LF then
            Current_Line := Current_Line + 1;
         end if;
         if Pos = S'Last then
            return "";
         end if;
         Pos := Pos + 1;
      end loop;

      return S (Pos .. S'Last);
   end From_Line;

Posted in #Formal Verification    #SPARK   

About Johannes Kanig

Johannes Kanig is a senior software engineer at AdaCore and developer of the SPARK technology. He received his PhD in formal verification from the University Paris-Sud, France.