A Little Exercise With Strings
by Johannes Kanig –
Recently, I was looking at code which implemented basic functionality for string handling similar to the unix tools head
and tail
, which can select the start or the end of a file based on lines. I came across this function:
function From_Line
(S : String; LineNo : Natural) return String
is
Pos : Integer := S'First;
Current_Line : Integer := 1;
begin
while Current_Line < LineNo and then Pos <= S'Last loop
if S (Pos) = ASCII.LF then
Current_Line := Current_Line + 1;
end if;
Pos := Pos + 1;
end loop;
return S (Pos .. S'Last);
end From_Line;
The function From_Line tries to implement the following functionality: Given a string S
, return the substring of S starting from the LineNo
'th line in S. The function has a subtle bug, can you spot it?
Let's first try to find the problem manually. Looking at the code with specific test cases in mind, one will find that the function deals nicely with two corner cases. First, provided the empty string, the loop condition will be false before going into the first iteration, and the empty string will be returned, as expected. Second, if the string contains less line numbers than requested, the variable Pos will contain S'Last + 1
. One may think that this is a problem for the slice taken from S
in the last line, but it is not, because then we know that Pos .. S'Last
is the empty range, so the indices are not required to be valid indices of the string.
Now let's try using SPARK 2014. SPARK reports three messages: it cannot prove that when accessing S
inside the loop, Pos
is always in the bounds of the string S
, that no overflow occurs when incrementing Pos
, and that the slice expression at the end is free of runtime errors.
The first message is surprising, clearly Pos
is in the array bounds, because we initialize it with S'First
, and only ever increment it, until we stop at S'Last
. The problem is that SPARK cannot do this kind of reasoning, which is also called inductive. We have to help SPARK by writing the condition which we are trying to maintain as a loop invariant, namely that inside the loop, Pos
is always in the bounds of S
:
pragma Loop_Invariant(Pos in S'Range);
There is another post which explains loop invariants in more detail.
Putting the loop invariant in, SPARK 2014 proves both the array access and the slice, but the overflow remains. How can Pos
overflow? The problem actually lies in one of the cases that we already discussed: requesting more lines than are in the string. But the problem only appears when S'Last
is equal to Integer'Last
. In this case, we have Pos = S'Last + 1
, and S'Last = Integer'Last
, in other words, Pos
has overflowed. That rarely happens in practice, but it may occur with specially crafted strings, so it is a security issue of the code. In other words, SPARK just has spotted a real problem in our code.
An easy way to get around this problem would be to add a precondition to From_Line
to disallow strings with S'Last = Integer'Last
. However, this would require every caller of From_Line
to guarantee this. It would be preferable to make our function work on any string, and that's the topic of the rest of this blog post. We get bonus points if we can prove absence of runtime errors using SPARK.
The first idea is to avoid the overflow. Looking at the loop, the problem is that Pos
is always incremented, even if we are at the last index of the string. We can fix this by breaking up the loop as follows:
while Current_Line < LineNo loop
pragma Loop_Invariant(Pos in S'Range);
if S (Pos) = ASCII.LF then
Current_Line := Current_Line + 1;
end if;
exit when Pos = S'Last;
Pos := Pos + 1;
end loop;
Now Pos
will never be set to a value outside of the string, which is good. A problem with this code is after the loop, some complex code would need to be written to detect which of the two loop exit conditions was the reason to exit the loop. In this case, at the exit statement, we know already what the return value must be: the empty string. So we can simplify the code as is:
while Current_Line < LineNo loop
pragma Loop_Invariant(Pos in S'Range);
if S (Pos) = ASCII.LF then
Current_Line := Current_Line + 1;
end if;
if Pos = S'Last then
return "";
end if;
Pos := Pos + 1;
end loop;
the return statement after the loop is unchanged and correct.
However, our loop invariant is actually not true when entering the loop, if the input string was empty. This is easy to check at the beginning of the function:
if S = "" then
return "";
end if;
We are done, and SPARK proves the absence of run-time errors in this code.
Here is the complete code for reference:
function From_Line
(S : String; LineNo : Natural) return String
is
Pos : Integer := S'First;
Current_Line : Integer := 1;
begin
if S = "" then
return "";
end if;
while Current_Line < LineNo loop
pragma Loop_Invariant (Pos in S'Range);
if S (Pos) = ASCII.LF then
Current_Line := Current_Line + 1;
end if;
if Pos = S'Last then
return "";
end if;
Pos := Pos + 1;
end loop;
return S (Pos .. S'Last);
end From_Line;