I have already described in another post why users must write loop invariants to use the formal verification tool GNATprove on SPARK subprograms with loops. We have shown in previous posts what loop invariants are necessary to prove loops over arrays and containers. But we've not talked yet about how you come up with the right loop invariant for your code. That's the purpose of this post.
It is not easy to start writing loop invariants. In fact, it is widely considered as one of the main hardships for beginners in formal verification. As a consequence, this is one of the topics on which we spend the most time during our training courses on SPARK. The difficulty comes from the fact that one needs some understanding of four widely different topics to write a good loop invariant:
- the general theory behind loop invariants,
- the semantics of the programming language used,
- the specific application of the general theory to the programming language used in a given tool, and
- the detailed knowledge of what the loop performs.
So it seems at first sight that one needs to be a bit of a logician, a language expert, a tool builder, and a programmer, to write a good loop invariant, but not everyone can be Donald Knuth! In fact, this skill is accessible to any programmer, thanks to a combination of tool support and appropriate training, as my colleagues from Altran have experienced during the past 10 years teaching SPARK to programmers with no prior experience in formal verification.
I can even try to summarize the required knowledge on all four topics above in one sentence for each, in the case of the tool GNATprove that we develop for SPARK 2014:
- general theory: A loop invariant should provide all the necessary information about variables modified in the loop, which is otherwise lost for proving properties inside the loop (including the loop invariant itself) and after the loop.
- language semantics: Between two iterations of the loop, the only information available for variables written to directly in the loop, or through calls inside the loop (including global output variables, and output parameters), and only those, is what the loop invariant mentions.
- tool GNATprove: The logical encoding of SPARK additionally preserves some information about modified variables across loop iterations: the range of a for-loop, the condition of a while-loop, the bounds of an array variable, the discriminants of a record variable.
- loop behavior: The programmer should mention in the loop invariant not only what properties the loop modifies, but also what properties the loop maintains up to the nth iteration, when these properties involved variables modified in the loop.
This may still sound a little theoretic if you've never used a formal verification tool. Nothing beats trying it out with a tool like GNATprove, to get quick feedback on properties proved on not. To make it easier, we have defined a methodology in four steps, that I have documented recently in the SPARK 2014 Toolset User's Guide. These four steps correspond to the four properties that a good loop invariant should fulfill, in increasing order of difficulty:
- [INIT] It should be provable in the first iteration of the loop.
- [INSIDE] It should allow proving absence of run-time errors and local assertions inside the loop.
- [AFTER] It should allow proving absence of run-time errors, local assertions and the subprogram postcondition after the loop.
- [PRESERVE] It should be provable after the first iteration of the loop.
With GNATprove, these four properties conveniently correspond to different sets of checks to prove, so it's very easy to assess your progress! The idea is to progressively add information in the loop invariant so that it fulfills INIT first, then INSIDE, eventually AFTER, and finally PRESERVE. The User's Guide details the application of this methodology to binary search. I propose to do the same here on a simpler map function over an array. The map function takes in parameter an array of (possibly absent) data elements to treat, and treats them when present. Here is the spec, with contracts that express these properties:
package Database_Operations with SPARK_Mode is type Index is range 1 .. 100_000; type Data is record Present : Boolean; Treated : Boolean; end record; type Database is array (Index range <>) of Data; procedure Treat (X : in out Data) with Pre => not X.Treated, Post => X.Treated; procedure Map (DB : in out Database) with Pre => (for all J in DB'Range => not DB (J).Treated), Post => (for all J in DB'Range => (if DB'Old (J).Present then DB (J).Treated)); end Database_Operations;
The implementation is a simple loop over the array, to treat elements that are present:
package body Database_Operations with SPARK_Mode is procedure Treat (X : in out Data) is begin X.Treated := True; end Treat; procedure Map (DB : in out Database) is begin for J in DB'Range loop if DB (J).Present then Treat (DB (J)); end if; end loop; end Map; end Database_Operations;
If we run GNATprove on this code, it does not manage to prove either that the precondition of the call to Treat is respected, or that the postcondition of Map holds:
database_operations.adb:13:13: warning: precondition might fail database_operations.ads:15:14: info: postcondition proved database_operations.ads:18:48: info: index check proved database_operations.ads:20:45: warning: postcondition might fail, requires DB (J).Treated database_operations.ads:20:29: info: index check proved database_operations.ads:20:49: info: index check proved
This is expected: as the code above does not contain a loop invariant, GNATprove knows nothing about the content of array DB, since the latter is modified inside the loop by passing it as an input-output parameter to procedure Treat. A natural loop invariant that comes to mind in that case would state that, up to the current loop index J, all elements of the array which were present on entry have been treated. This can be expressed by adding the following pragma Loop_Invariant at the end of the loop body:
pragma Loop_Invariant (for all K in DB'First .. J => (if DB'Loop_Entry (K).Present then DB (K).Treated));
Note the use of attribute Loop_Entry above (presented in this post) to refer to the value of DB prior to any modification. Although it is true that component Present is not modified by the loop, GNATprove does not know it unless this information is added to the loop invariant, so here we prefer to simply use the value of array DB before modifications.
This loop invariant is sufficient to prove the subprogram postcondition (property AFTER), and it is proved in the first iteration of the loop (property INIT), but it is still not sufficient to prove the precondition of the call to Treat (property INSIDE) and it is not proved after the first iteration (property PRESERVE):
database_operations.adb:13:13: warning: precondition might fail database_operations.adb:15:10: info: loop invariant initialization proved database_operations.adb:17:50: warning: loop invariant might fail after first iteration, requires DB (K).Treated database_operations.adb:17:34: info: index check proved database_operations.adb:17:54: info: index check proved database_operations.ads:15:14: info: postcondition proved database_operations.ads:18:48: info: index check proved database_operations.ads:19:14: info: postcondition proved database_operations.ads:20:29: info: index check proved database_operations.ads:20:49: info: index check proved
Indeed, a crucial missing piece of information is that the part of DB that has not been visited yet in the nth iteration has not been modified by the loop, and still respects the property stated in the precondition of Map. As mentioned above, GNATprove does not know this: as DB is modifid in the loop, GNATprove can only use the information given in the loop invariant. So let's add a new pragma after the previous one:
pragma Loop_Invariant (for all K in J + 1 .. DB'Last => DB (K) = DB'Loop_Entry (K));
With this additional loop invariant, GNATprove manages to prove all checks in Map:
database_operations.adb:13:13: info: precondition proved database_operations.adb:15:10: info: loop invariant initialization proved database_operations.adb:15:10: info: loop invariant preservation proved database_operations.adb:17:34: info: index check proved database_operations.adb:17:54: info: index check proved database_operations.adb:19:28: info: overflow check proved database_operations.adb:19:50: info: index check proved database_operations.adb:19:70: info: index check proved database_operations.ads:15:14: info: postcondition proved database_operations.ads:18:48: info: index check proved database_operations.ads:19:14: info: postcondition proved database_operations.ads:20:29: info: index check proved database_operations.ads:20:49: info: index check proved
I believe the four steps methodology is a powerful tool to master the skill of loop invariant writing. I personally use it each time I have to prove a program with loops. For further reading/watching on loop invariants, you can look at:
- this other blog post on loop invariants in C programs for the Escher C Verifier.
- a fun introduction to the theory and practice of loop variants in this video by the Microsoft researcher Rustan Leino, that I warmly recommend.
- this article about the classification of loop invariants, by the creator of the Eiffel programming language, Betrand Meyer, which he also presented during this seminar.