I can’t believe that I can prove that it can sort

by Yannick Moy , Lionel Matias – Jun 23, 2022

Sorting algorithms are to computer science what “Hello World!” is to programming. A way for beginners to get their hands dirty. Which also means that most programmers don’t write “Hello World!” programs past their studies, and computer scientists don’t look at sorting algorithms past their PhD.

Which made it surprising that a new “interesting” sorting algorithm was published at the end of 2021, whose appeal drew attention from both computer scientists and programmers. Here is the algorithm in full details:

As an expert user of Ada, and an anecdotal user of SPARK, one of us (Lionel) took it as a challenge to try some functional proof with SPARK. Despite a hopeful start, this ended up badly. As a SPARK expert thus contacted to help with the challenge, one of us (Yannick) took it as a way to show how functional proof should be approached in SPARK. Including some false starts, this ended up well (and under an hour). This is the story of this challenge, and the tips we think are important to share with those who aim at functional proof with SPARK.

I am Lionel, I’ll start.

I recently stumbled upon a tweet about a paper on an “unbelievable” sorting algorithm:

I read the paper intro, and couldn’t realize what was wrong with it instinctively, but quickly seeing all the replies to the tweet brought back memories of learning programming, writing cool (at the time) programs for years, and then later on getting some formal computer science education.

I dug up my first ever sorting algorithm. I’d used it in a silly 3D renderer that I’d implemented in 2000 on a Pentium 75, all implemented from scratch from what some might call nowadays “first principles” (i.e. I had absolutely no clue what I was doing). In a 3D renderer, once you’ve got your list of triangles to paint, and the direction of the camera set up, you want to z-sort the triangles so you can run the Painter algorithm (Wikipedia here is very generous, and calls it “depth-sort algorithm”). I needed a sort algorithm and I had no books, just my C compiler, SDL headers and some old French magazines about setting up VESA…

So I built one. I mean I built this one, and to me it looked like it worked. If you look at this video, it makes sense, somehow, especially for small array sizes, for a beginner that has no notion of smart sorting, big-O notation and no internet. My “unit tests” (generous term again) from the time show how little I understood about testing because they seem to be chosen for this algorithm (to validate this implementation and not sorting in general).

I’d put it together without videos and all the fancy visualizations one can use when learning sorting algorithms, and I didn’t think about this code for years afterwards.

When I later learned Ada, first year of formal studies, I needed a sort function for a project and I just ported that algorithm and went with it (and I lived with it until I some month later I had to sort 2 millions entries and after a night at 100% CPU with almost no progress, I caved and cracked my copy of Sedgewick and started learning the science and not just the hacking-stuff-up).

Paraphrasing the old code, it looked very simple:

procedure Stupid_Sort
is
   type A_Type is array (Natural range 1 .. 5) of Natural;

   procedure Sort (A : in out A_Type)
   is
   begin
      for I in A'Range loop
         for J in A'Range loop
            if A (I) < A (J) then
               declare
                  Tmp : constant Natural := A (I);
               begin
                  A (I) := A (J);
                  A (J) := Tmp;
               end;
           end if;
         end loop;
      end loop;
   end Sort;

   A : A_Type;
begin
   for I in A'Range loop 
     A (I) := A'Last - I + 1;
   end loop;

   Sort (A);
end Stupid_Sort;

Tip: Start small (simple small data types, a single subprogram)

When I read the tweet and all the replies, I admit I felt compassion for old me (well, young me), even though I saw other people admitting they’d come up with this sorting algorithm once upon a time. I wondered whether applying modern tech (other than having a sorting algorithm in the standard library and knowing about it…) blindly would comfort such a developer in his or her “wrong” version. I decided to launch myself a challenge: proving the algorithm using SPARK, without looking at the proof in the paper. Should be easy enough, right (famous last words)?

I fired up the ultimate Ada IDE (vim) and just copy-pasted the old code, compiled it (gnatmake -gnatA stupid_sort.adb) and ran it (through gdb to get the content of A after Sort returns). It works, for that input.

Tip: Start with a passing test

Then I wanted to prove that Sort sorts all possible arrays so I added the SPARK pixie dust:

procedure Stupid_Sort with SPARK_Mode => On

and the necessary post-condition for Sort:

procedure Sort (A : in out A_Type)
   with Post => (for all J in A'First .. A'Last - 1 => A (J) <= A (J + 1))

Which reads:

when the Sort procedure returns
for every element A(J) of the array (except the last one)
- the element A(J) is smaller or equal to the next element A(J+1)

Then I created a GPR project to run GNATprove:

project Stupid_Sort is

   for Main use ("stupid_sort.adb");
   for Source_Files use ("stupid_sort.adb");

   package Compiler is
      for Switches ("Ada") use
           ("-gnata",     -- enable assertions and runtime checks
            "-gnat2022",  -- for newest forms of expressions in Ada 2022
            "-g", "-O0"); -- for debugging
    end Compiler;

end Stupid_Sort;

And on I went:

> gnatprove -Pstupid_sort.gpr -j0
Phase 1 of 2: generation of Global contracts ...
Phase 2 of 2: flow analysis and proof ...

stupid_sort.adb:6:58: medium: postcondition might fail, cannot prove A (J) <= A (J + 1)
    6 |     with Post => (for all J in A'First .. A'Last - 1 => A (J) <= A (J + 1))
      |                                                         ^~~~~~~~~~~~~~~~~

Which was kind of expected, but there’s a --level knob, if you’re lazy like me you’ll just try it whenever you find something GNATprove balks at (sadly it only goes to 4, you can’t turn it to eleven… yet):

> gnatprove -Pstupid_sort.gpr -j0 --level=2
Phase 1 of 2: generation of Global contracts ...
Phase 2 of 2: flow analysis and proof ...

And… GNATprove managed to prove the functional correctness of that postcondition!

Tip: Use proof automation (and turn up the knob)

Let’s check the synthesis of what GNATprove did (the gnatprove.out file):

Summary of SPARK analysis
=========================

-------------------------------------------------------------------------------------------
SPARK Analysis results        Total       Flow   CodePeer    Provers   Justified   Unproved
-------------------------------------------------------------------------------------------
Data Dependencies                 .          .          .          .           .          .
Flow Dependencies                 .          .          .          .           .          .
Initialization                    1          1          .          .           .          .
Non-Aliasing                      .          .          .          .           .          .
Run-time Checks                   .          .          .          .           .          .
Assertions                        .          .          .          .           .          .
Functional Contracts              1          .          .     1 (Z3)           .          .
LSP Verification                  .          .          .          .           .          .
Termination                       .          .          .          .           .          .
Concurrency                       .          .          .          .           .          .
-------------------------------------------------------------------------------------------
Total                             2    1 (50%)          .    1 (50%)           .          .


max steps used for successful proof: 11967

Analyzed 1 unit
in unit stupid_sort, 2 subprograms and packages out of 2 analyzed
  Stupid_Sort at stupid_sort.adb:1 flow analyzed (0 errors, 0 checks, 0 warnings and 0 pragma Assume statements) and proved (0 checks)
  Stupid_Sort.Sort at stupid_sort.adb:5 flow analyzed (0 errors, 0 checks, 0 warnings and 0 pragma Assume statements) and proved (1 checks)

Here GNATprove tells us it managed to prove our postcondition (a functional contract) with Z3. So it works! SPARK can prove the algorithm! Victory? Not so fast.

Let’s try to go for larger array sizes, e.g. 1 .. 10. Now after 16 seconds, GNATprove goes back to:

stupid_sort.adb:6:58: medium: postcondition might fail, cannot prove A (J) <= A (J + 1)
    6 |     with Post => (for all J in A'First .. A'Last - 1 => A (J) <= A (J + 1))
      |                                                         ^~~~~~~~~~~~~~~~~

Turning the level up to the max (--level=4) doesn’t get us better results, but takes more than 3 minutes, for the same result. So back to square one.

The first reflex I had (wrongly) ingrained was to try and state obvious things through assertions. Keep in mind I didn’t want to read the paper, with the many juicy proofs and insights it might contain. So there I went:

First I put the swap code in its own procedure:

procedure Swap (A : in out A_Type; I : A_Index_Type; J : A_Index_Type)
  with Post => A (J) = A'Old (I) and A (I) = A'Old (J)
is
   Tmp : constant Natural := A (I);
begin
   A (I) := A (J);
   A (J) := Tmp;
end Swap;

… which (spoiler) wasn’t a very good idea (see “framing conditions”, later). I didn’t get better results, so I went on adding assertions.

At some point I ended up with lots of tautological asserts, and it felt more and more like I really didn’t understand what the problem was.

if A (I) < A (J) then
   pragma Assert (A (I) < A (J)); -- *that* should always be true, right?
   Swap (A, I, J);
   pragma Assert (A (I) >= A (J)); -- *that* too, no?
end if;
pragma Assert (A (I) >= A (J)); -- doubting everything…

And still, no progress on the proof of the postcondition.

Tip: Avoid the Assertocalypse

The message from GNATprove was hinting at a Loop_Invariant:

stupid_sort.adb:22:14: medium: postcondition might fail
   22 |     Post => Sorted (A, A'First, A'Last)
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  possible fix: loop at line 25 should mention A in a loop invariant
   25 |      for I in A'range loop
      |                       ^ here

So I looked at the videos of the sort again, and came up with two (very wrong) loop invariants:

pragma Loop_Invariant (if I > A'First then A (I) >= A (A'First));
pragma Loop_Invariant (for all K in A'First .. I - 1 => A(I) >= A(K));

This made GNATprove very mad:

stupid_sort.adb:16:59: medium: postcondition might fail, cannot prove A (J) <= A (J + 1)
   16 |      with Post => (for all J in A'First .. A'Last - 1 => A (J) <= A (J + 1))
      |                                                          ^~~~~~~~~~~~~~~~~

stupid_sort.adb:28:53: medium: loop invariant might not be preserved by an arbitrary iteration, cannot prove A (I) >= A (A'first)
   28 |         pragma Loop_Invariant (if I > A'First then A (I) >= A (A'First));
      |                                                    ^~~~~~~~~~~~~~~~~~~

stupid_sort.adb:29:66: medium: loop invariant might fail in first iteration, cannot prove A(I) >= A(K)
   29 |         pragma Loop_Invariant (for all K in A'First .. I - 1 => A(I) >= A(K));
      |                                                                 ^~~~~~~~~~~

stupid_sort.adb:29:66: medium: loop invariant might not be preserved by an arbitrary iteration, cannot prove A(I) >= A(K)
   29 |         pragma Loop_Invariant (for all K in A'First .. I - 1 => A(I) >= A(K));

Tip: Understand tool messages

That’s when I decided to call up my local SPARK friend Yannick, to teach me about loop invariants, and how you prove such an algorithm with SPARK.

Tip: Have an expert on call

I am Yannick, I’ll jump in.

Let’s return first to the addition of the Swap procedure in Lionel’s code. Remember the postcondition he wrote for Swap:

procedure Swap (A : in out A_Type; I : A_Index_Type; J : A_Index_Type)
   with Post => A (J) = A'Old (I) and A (I) = A'Old (J)

That’s true, but not sufficient. Indeed, all the provers know about variable A after a call to Swap (since this parameter is modified in Swap) is what the postcondition of Swap says about it. And… it says nothing about all the values of A outside of indexes I and J! So there is no chance that GNATprove will be able to prove our sorting code.

This need to identify what has changed during a call with enough precision is known as the frame condition, and it’s a typical beginner’s mistake to forget it. Here, a suitable postcondition would be:

procedure Swap (A : in out A_Type; I : A_Index_Type; J : A_Index_Type)
   with Post => A = (A’Old with delta I => A(J)’Old, J => A(I)’Old)

which states exactly the content of A after a call to Swap.

Tip: Beware the frame condition

Ironically, you get the same result if you don’t specify a postcondition at all on Swap, because GNATprove will inline the call in that case! Inlining of calls and unrolling of loops are powerful techniques for automating the proof of programs, without the need for a user to specify contracts and loop invariants. But as usual with automation, the risk is that, when the situation gets more complex, automation fails and the user is left with a complex situation that she does not understand (a.k.a. the curse of automation). Loop unrolling explains why Lionel was initially "feeling lucky!" with Z3 proving the postcondition of Sort for small array sizes, without having to write loop invariants. But the curse of automation stroke back when increasing the array size, as the loops are not unrolled anymore, or the unrolling leads to unmanageable formulas for automatic provers.

One way to remain aware of the choices in terms of automation made by GNATprove is to use the switch --info which outputs such information:

stupid_sort.adb:11:14: info: local subprogram "Sort" only analyzed in the context of calls
  add a contract to analyze it separately from calling contexts
stupid_sort.adb:33:21: info: unrolling loop
stupid_sort.adb:39:04: info: analyzing call to "Sort" in context
stupid_sort.adb:39:04: info: in inlined body at line 18
  unrolling loop
stupid_sort.adb:38:04: info: in inlined body at line 17
  unrolling loop

Tip: Use the right tool configuration (and switches!)

Contrary to the heroic (masochistic?) Lionel, I did not aim at rediscovering how the algorithm worked by looking at the code. Instead, I took the time to read the short article, and to convince myself that I understood why it worked.

Tip: Understand the code that you want to prove

Well, at least I thought I understood the algorithm. More on that later.

I started by defining suitable types for the index of the array, one being slightly larger than the other, in order to accommodate empty ranges of values (when we start the iteration):

type I_Type_Base is new Integer range 0 .. 5;
subtype I_Type is I_Type_Base range 1 .. I_Type_Base'Last;
type A_Type is array (I_Type) of Natural;

Tip: Define suitable types with the tightest constraints

Types are the best specifications, because you get their properties everywhere a value of the type is used, without having to repeat the properties in assertions, preconditions, etc.

Then, I defined expression functions for the important properties used in the article: that the array over a given range is sorted, and that the maximum of the array over a given range is at a given index.

function Sorted (A : A_Type; From : I_Type; To : I_Type_Base) return Boolean 
is
   (for all I in From .. To =>
      (for all J in I .. To =>
         A(I) <= A(J)));

function Is_Max (M : I_Type; A : A_Type; From, To : I_Type) return Boolean 
is
  (for all I in From .. To => A (I) <= A (M))
with
  Pre => M in From .. To;

Tip: Define suitable expression functions for important properties

In general, it’s better to give a name to important properties that will be used multiple times in specifications, assertions and ghost code, because it makes the specification and proof more readable, and because it can help automatic provers. Here, it’s all the more helpful for automatic provers because it isolates quantifiers “for all”.

The form of these properties is also quite important. Instead of using the natural expression of sortedness as “every element is less or equal to the next”, I have expressed it as the equivalent transitive closure of this property, that is, “every element is less or equal to every element that follows”. That makes a big difference for automatic provers, as establishing the latter from the former requires inductive reasoning, which automatic provers are poor at, while the former is immediately deduced from the latter.

Similarly, I could have defined Is_Max as a function that takes as input the maximum Max of the array over a given range, instead of the index M of the maximum, and computes the conjunction:

(for all I in From .. To => A (I) <= Max)
   and then
(for some M in From .. To => Max = A (M))

But this uses an existential quantification “for some” which is hard to establish for automatic provers, as this requires exhibiting the “witness” index M here. So I went for the definition that did not require an existential quantification, by passing the index M as parameter instead.

Tip: Use idiomatic definitions of properties that help automatic provers

With these definitions, the loop invariant of the external loop can be expressed very easily, based on the properties of that loop described in the article:

pragma Loop_Invariant (Sorted (A, A'First, I));
pragma Loop_Invariant (Is_Max (I, A, A'First, A'Last));

It gets a bit trickier for the internal loop, as the explanations in the article in terms of values of K before and after I cannot be directly translated into loop invariants. Plus the article talks of going from index I to index I+1 while the code goes from index I-1 and index I (when expressing the loop invariant). That’s where I tried out various loop invariants in my head, with the support of pen-and-paper to understand how the algorithm really worked. And it went 💡 the array stayed sorted at every iteration of the internal loop for all indexes lower than I!

pragma Loop_Invariant (Sorted (A, A'First, I));

And during the internal loop, the maximum value over the whole array was either located at index I-1 (at the beginning of the iteration) or at index I (after getting to iteration J = I-1):

pragma Loop_Invariant
   (declare
       M : constant I_Type := (if J < I then I-1 else I);
    begin
       Is_Max (M, A, A'First, A'Last));

Tip: Use pen-and-paper to really understand the code that you want to prove

It’s all too easy to “understand” that something works by going through the steps of an explanation/demonstration (like in the article), without being able to really understand why it works. Proof requires us to understand why it works.

As GNATprove could not prove the loop invariant of the inner loop, even at level 4 (all provers get called at that level, with a substantial timeout of 60 seconds per check), I tried running the program through its test with assertions enabled, and… the loop invariant failed at runtime! No wonder GNATprove could not prove it.

Tip: Execute assertions during tests to help debug them

Just looking at the failing loop invariant, I realized that it could not be true during the first iteration of the loop, where we have not yet identified the maximum value of the array. So I added a special case for I=A’First:

pragma Loop_Invariant
  (if I = A'First then
     Is_Max (I, A, A'First, J)
   else
      (declare
          M : constant I_Type := (if J < I then I-1 else I);
       begin
          Is_Max (M, A, A'First, A'Last)));

But the test was still failing at runtime! This time, I ran the test in the debugger (gdb inside GNAT Studio), to display values of all variables when hitting the failing loop invariant. That was a case of off-by-one error, the test “J < I” in the definition of constant M should be “J < I-1”. With that, the test was running without error.

Tip: Debug failing assertions by running tests with assertions in the debugger

I reran GNATprove on the code. It reported cases of runtime errors when calling functions Sorted/Is_Max, which I fixed. But GNATprove could still not prove the loop invariant of the internal loop stating that A remained sorted from A’First to I-1. I read again the explanations in the article, which confirmed that it was true. Yet it was not proved. So I looked at the code, to see how the loop invariant at iteration J can be deduced from the loop invariant at iteration J-1 and the execution of the current iteration. And it was not provable! Because we were missing the information that, up to value J=I-1, the value at index I is greater than all values seen so far:

pragma Loop_Invariant
   (if J < I then
      (for all K in A'First .. J => A(K) <= A(I)));

Now, GNATprove proves the code easily (at level 2).

Tip: Debug failing loop invariants by reasoning inductively, from one iteration to the next

All that remained was to generalize the array index type to range over all positive integers instead of the range 1..5, and to allow unconstrained arrays whose length is not known statically:

type I_Type_Base is new Integer range 0 .. Integer'Last;
subtype I_Type is I_Type_Base range 1 .. I_Type_Base'Last;
type A_Type is array (I_Type range <>) of Natural;

Because we’ve used so far relative attributes A’First/A’Last instead of the equivalent magic numbers 0/5, the adjustments needed are minimal.

Tip: Use language features to facilitate the generalization of assertions and ghost code

By reviewing the final code, I realized that the last loop invariant could be written without quantification, as we already get that the array is sorted up to index I-1:

pragma Loop_Invariant (if J < I then A(J) <= A(I));

Tip: Simplify ghost code once the program is proved

That concludes with the functional proof of this program.

Or does it? We have not proved here that the result of sorting is a shuffling of the entry. This is doable with SPARK, but just… not easy. In 99.9% of the cases, in practice you’d stop here because:

It is obvious from looking at the code to see that the result is a shuffling of the entry, as A is only modified by swapping two of its elements. So this can be verified by review easily.
Doing it by proof brings not much more assurance than the review, but a high cost both for the initial proof and for maintenance of the contracts and lemmas as the code or the tools evolve.

Tip: Don’t prove what you don’t need to prove

To recap, we saw 18 tips that could greatly facilitate your use of proof with SPARK (or any similar program proof environment):

Tip: Start small (simple small data types, a single subprogram)
Tip: Start with a passing test
Tip: Use proof automation (and turn up the knob)
Tip: Avoid the Assertocalypse
Tip: Understand tool messages
Tip: Have an expert on call
Tip: Beware the frame condition
Tip: Use the right tool configuration (and switches!)
Tip: Understand the code that you want to prove
Tip: Define suitable types with the tightest constraints
Tip: Define suitable expression functions for important properties
Tip: Use pen-and-paper to really understand the code that you want to prove
Tip: Execute assertions during tests to help debug them
Tip: Debug failing assertions by running tests with assertions in the debugger
Tip: Debug failing loop invariants by reasoning inductively, from one iteration to the next
Tip: Use language features to facilitate the generalization of assertions and ghost code
Tip: Simplify ghost code once the program is proved
Tip: Don’t prove what you don’t need to prove

Here is the final code for our version of the Stupid Sort algorithm:

pragma Ada_2022;

procedure Stupid_Sort with SPARK_Mode => On
is
   type I_Type_Base is new Integer range 0 .. Integer'Last;
   subtype I_Type is I_Type_Base range 1 .. I_Type_Base'Last;
   type A_Type is array (I_Type range <>) of Natural;

   function Sorted (A : A_Type; From, To : I_Type_Base) return Boolean is
      (for all I in From .. To =>
         (for all J in I .. To =>
            A(I) <= A(J)))
   with
     Pre => From in A'Range
       and then To <= A'Last;

   function Is_Max (M : I_Type; A : A_Type; From, To : I_Type_Base) return Boolean is
     (for all I in From .. To => A (I) <= A (M))
   with
     Pre => M in From .. To
       and then From in A'Range
       and then To in A'Range;

   procedure Sort (A : in out A_Type)
   with
     Post => (if A'Length > 0 then Sorted (A, A'First, A'Last))
   is
   begin
      for I in A'range loop
         for J in A'range loop
            if A(I) < A(J) then
               declare
                  Tmp : constant Natural := A(I);
               begin
                  A (I) := A (J);
                  A (J) := Tmp;
               end;
            end if;

            pragma Loop_Invariant (if J < I then A(J) <= A(I));
            pragma Loop_Invariant (Sorted (A, A'First, I-1));
            pragma Loop_Invariant
              (if I = A'First then
                Is_Max (I, A, A'First, J)
              else
                (declare
                   M : constant I_Type := (if J < I-1 then I-1 else I);
                 begin
                   Is_Max (M, A, A'First, A'Last)));
         end loop;

         pragma Loop_Invariant (Sorted (A, A'First, I));
         pragma Loop_Invariant (Is_Max (I, A, A'First, A'Last));
      end loop;
   end Sort;

   A : A_Type (1 .. 1000);
begin
   for I in A'Range loop
     A (I) := Integer (A'Last - I + 1);
   end loop;

   Sort (A);
end Stupid_Sort;

Posted in #SPARK #Proof #Algorithms

About Yannick Moy

Yannick Moy is Head of the Static Analysis Unit at AdaCore. Yannick contributes to the development of SPARK, a software source code analyzer aiming at verifying safety/security properties of programs. He frequently talks about SPARK in articles, conferences, classes and blogs (in particular blog.adacore.com). Yannick previously worked on source code analyzers for PolySpace (now The MathWorks) and at Université Paris-Sud.

About Lionel Matias

Lionel's been a real-time software engineer for over 10 years, working on real-time safety-critical systems. Fan of everything Ada, he's a believer in the bright future of SPARK and that the technology to build safe software should be in the hands of everyone. He's been experimenting in the recent years with automated testing, static analysis and proof technology, and his new motto is "just use my cycles !".