Project Hi-Lite started in 2010 with the objective of making it cost-effective to use formal verification instead of testing for critical software. Just a few months before, Airbus had presented their five “must-have” criteria for adopting a formal method: soundness, applicability to the code, usability by “normal” engineers on “normal” computers, improve on classical methods, certifiability. We have had these five criteria very much in mind during these three years, and here are the results for SPARK 2014 and the associated tool GNATprove.
soundness: This is the cornerstone of the language and the tools. It is not limited to saying: "we use Hoare logic". What we want to achieve is "maximum" soundness: not only the tools cannot state wrong properties when you use them appropriately, but it should not be easy to get a false sense of confidence in the software by misusing the tools (for example by stating a wrong axiom). Many decisions were taken to ensure maximum soundness, like forbidding users from stating axioms.
###strongSPARK 2014 is the largest possible subset of Ada that still makes it possible to do formal verification. In particular, it includes generics, tagged types, discriminants, dynamic types, early returns, and many other features that were excluded from SPARK 2005. Astrium case study validated that SPARK 2014 fits real critical code. And we have made it possible to combine formal verification with testing, for those parts of the code that cannot be formally analyzed. So it's not an all-or-nothing technology, but one that you can introduce in any Ada project really.
usability by normal engineers: This is the issue that made us most busy all these years. First, we have achieved a level of automation where most proofs go through automatically, in particular the proofs of absence of run-time error. Secondly, we have tightly integrated the tools in the developer workflow (for example, through use of project files and automatically computed dependencies) and development environment (for example, the ability to run GNATprove on a specific subprogram or line od code in GPS, and the display of problematic program paths in GPS).
improve on testing: The fact that formal verification is exhaustive makes it an ideal replacement for the costly unit-testing activity mandated for critical software. The main challenge here was to facilitate progressive adoption of formal verification. Because the subprogram contracts needed for formal verification are already useful for testing (some lesson from the Astrium case study), and because GNATprove can be applied separately to individual SPARK subprograms in any Ada program, switching to formal verification can be done in a matter of days.
certifiability: When we started project Hi-Lite, the formal methods supplement for the avionics standard DO-178 was not yet finalized. Since then, it has been published with the new version of the standard, DO-178C, so today, GNATprove can be used instead of testing for critical embedded programs in planes developed at the highest levels A or B. I have co-authored paper in a paper in IEEE Software (published this month) on how to deal with the subtle issue of coverage in that case. Of course, projects in other domains in which formal methods are already recognized (for example, railway) may also use GNATprove.
The final project meeting was the occasion to question what still needs to be improved, to better address users's needs. Regarding soundness, the modeling of floating-points as reals by GNATprove was pointed out by Astrium as problematic. Even the smallest theoretical concern that a property could be proved although it fails at run-time due to roundings, could be blocking for some users who wish to replace testing with formal verification. We agreed to investigate a true modeling of floating-points in GNATprove before the first release. Regarding applicability to the code, Astrium managed to develop its case study with almost 100% SPARK 2014 code. Regarding usability, both Altran and Astrium case studies emphasized the good level of automation of proof, although some efforts are still needed to make the technology truly usable by any engineer. In particular, Astrium case study required the insertion of assertions in the code to help the prover, as well as writing heavy loop invariants. We have ideas to improve on these specific issues before release 1.
All the slides presented during the final project meeting, as well as the final reports, are available from the project page. The slides and report on Astrium and Altran case studies are particularly interesting.