SPARK Crate of the Year: Unbounded containers in SPARK
by Manuel Hatzl (B&R) –
Why I created spark_unbound #
In this blog entry, I try to cover my experience using Ada/SPARK so far, why I created the spark_unbound
library, and what this library is about.
Discovering Ada/SPARK and formal verification #
I discovered Ada when going through publications of one of my professors at my university. Knowing a bit of C and embedded projects due to my work at B&R, Ada’s syntax seemed to solve a lot of problems I was facing. Creating specific types, in-out parameters and much more were very helpful. However, with hyped languages like Rust, and Ada’s smaller open source community, it is at the moment hard for me to choose Ada for hobbyist projects that are not run in safety critical systems or on embedded platforms. The recent announcement of AdaCore, trying to improve the interoperability between Rust and Ada, will definitely help to choose Ada in the future.
The big advantage Ada has in my opinion over other languages is SPARK. SPARK, being a subset of Ada, allows you to formally verify your project to be free of runtime errors. From what I have found, SPARK is the most convenient way to write formally verified software and is really well integrated in Ada and GNAT Studio. One might think that it is crazy to recommend SPARK for open source and/or hobbyist projects, even though there are even fewer projects written in SPARK. However, I would argue that having a formally verified open source project increases the likelihood of being used by big companies, as it can significantly help to prevent supply chain attacks. That’s assuming that necessary lower level projects/libraries are formally verified. Sure, having lots of tests is good, but writing good tests is hard and tedious (especially if you need a lot of mocks), and they must be in sync with the current source code. Otherwise, those tests are mere gimmicks and I think that it is harder to evaluate a project with lots of tests compared to a formally verified one. Note that I do not consider projects that have neither tests nor proofs, as they are doomed to fail over time. With formal verification as it is integrated in SPARK, the normal source code you write already takes away the need to write detailed unit tests. This helps to undertake low level changes during development with way more confidence than unit tests could ever give you.
Having pre-/post-conditions and asserts directly embedded in the source code also felt very natural to me while programming as I always try to narrow down what is allowed at a certain line and with other languages that just means a lot of if-statements.
The problem with fewer open source projects remains though, but since there are so few, you have a lot of room to create your own basic low level project. Bigger projects will have to wait for now, or be prepared to take way longer than expected.
How to unbind SPARK #
Searching through those few open source projects written in SPARK, I discovered that all of them are bounded at least at one point. That seemed odd to me as some of those projects were parsing JSON files and those may range from one or two objects to several thousands, including nesting. So why this limitation?
This is probably because heap allocation has not been possible in SPARK up to 2019, as pointers where not possible before. With the introduction of pointers in SPARK, using the heap for unbounded programs became possible.
Since I could not find an open source project that used these new possibilities to provide unbounded data structures and the formal containers should not be used in production due to their performance, I decided to create spark_unbound
.
Preparing to handle Storage_Error
#
Dynamic allocation in Ada is done using the new
keyword. This is also the case in SPARK. At this point you might ask yourself, how does SPARK handle cases where there is not enough memory left to allocate?
In short, it doesn’t. Instead, a Storage_Error
gets thrown.
But I thought SPARK assures absence of runtime errors and does not allow handling exceptions?
SPARK can assure the absence of all runtime errors except Storage_Error
and since exception handling is not included in SPARK, every call to new
is tricky to handle, since it might throw this error.
The solution I chose was proposed by Fabien Chouteau in the linked GitHub issue. SPARK allows excluding certain code regions from analysis and in those, it is then possible to handle exceptions.
With that solution in place, SPARK programs are truly free of runtime errors (when additionally using GNATstack to monitor the needed stack size).
Below is the function body template taken from the issue on how a Storage_Error
may be handled:
function Alloc return Name is
pragma SPARK_Mode (Off); -- Only the allocation has to be in SPARK off
begin
return new Object;
exception
when Storage_Error =>
return null;
end Alloc;
Current state of spark_unbound #
Overall structure #
All provided packages are generic, and I tried to put as few constraints on user types as possible. At the moment, spark_unbound
only provides an Unbound_Array
type that mimics the Vector
type in Ada.Containers.Vectors
and the Safe_Alloc
package that is a wrapper to handle the Storage_Error
as discussed above. The Safe_Alloc
package provides two child packages for definite and array types. Because exception handling cannot be proven by SPARK, AUnit has been used to validate the functionality of Alloc()
.
Below is an example on how to use Safe_Alloc
:
with Spark_Unbound.Safe_Alloc;
procedure Test is
type Alloc_Record is record
V1 : Integer;
V2 : Natural;
V3 : Positive;
end record;
type Record_Acc is access Alloc_Record;
package Record_Alloc is new Spark_Unbound.Safe_Alloc.Definite(T => Alloc_Record, T_Acc => Record_Acc);
Rec_Acc : Record_Acc;
begin
Rec_Acc := Record_Alloc.Alloc; -- Note: No `new` is set before
-- check if Rec_Acc is NOT null and then do something
Record_Alloc.Free(Rec_Acc);
end Test;
Unbound Array #
As mentioned, Unbound_Array
in the package Spark_Unbound.Arrays
mimics the functionality of the Vector
type in Ada.Containers.Vectors
. Internally, an array is allocated to fit all appended elements. If this array gets too small, a new one is allocated and all elements are copied to the new one. At the moment, not all procedures and functions that can be proven, have been implemented. Inserting and deleting elements that are not at the end of the array is not yet possible.
Procedures and functions that cannot be proven, have been removed and will not be implemented. For example, the add operator +
could fail, when the underlying array would need to be resized to fit all elements, but there is not enough contiguous memory.
All available procedures and functions are proven by GNATprove to achieve platinum level.
Below is an example on how to use Unbound_Array
:
with Spark_Unbound.Arrays;
procedure Test is
package UA_Integer is new Spark_Unbound.Arrays(Element_Type => Integer, Index_Type => Positive);
Test_UA : UA_Integer.Unbound_Array := UA_Integer.To_Unbound_Array(Initial_Capacity => 3);
Success : Boolean;
begin
-- Fill Array
UA_Integer.Append(Test_UA, 1, Success);
UA_Integer.Append(Test_UA, 2, Success);
UA_Integer.Append(Test_UA, 3, Success);
-- Now Append() needs to resize
UA_Integer.Append(Test_UA, 4, Success);
end Test;
Note: You should check for Success
after every call to Append()
. Otherwise, you won’t be able to prove correct use.
Repository and workflows #
spark_unbound
is available at GitHub with an automatically generated documentation at mhatzl.github.io/spark_unbound/ using GNATdoc.
To get this documentation, the configuration for GNATdoc was modified to only take ---
as documentation comments. The documentation is generated using a GitHub action and hosted using GitHub pages.
package Documentation is
for Doc_Pattern use "^-";
-- This considers comments beginning with "---" to be documentation
-- Needed to ignore commented Ghost functions that would break GNATdoc
end Documentation;
Besides the documentation, two more workflows are defined:
- Build and prove the crate
- Run the AUnit tests
Next steps #
At first, I would like to address the following points, but I am quite stuck on how to solve them.
Making the fields of the
Unbound_Array
record privateI was not able to write some proofs with the fields being private.
Getting rid of proof warning “memory accessed through objects of access type”
This should be due to the allocated array not being initialized on allocation. This is not a problem if one only uses the provided functions, but it bothers me.
Further goals are then to
- make the crate thread safe
- implement more procedures for
Unbound_Array
to get close toVector
functionality - implement the
Generic_Sorting
package - add a B‑Tree structure
I am also considering two changes to the current implementation:
Rename
Unbound_Array
toVector
Since the structure mimics the
Vector
type inAda.Containers.Vectors
, renaming it toVector
feels more natural.Remove exhaustive
Append()
functionalityAt the moment,
Append()
tries to double the allocated memory every time the procedure is called whenCapacity = Length
and if this doubled memory cannot be allocated, the requested memory is reduced linearly. Therefore,Append()
only fails ifcurrent capacity + 1
cannot be allocated.
While this squeezes out every last free memory region, it adds a significant performance overhead in those edge cases. My solutions would be to either fail if doubling is not possible, or reduce the requested memory in a logarithmic way. Any feedback regarding this is greatly appreciated.
I hope spark_unbound
helps to bring more people to SPARK, either using it directly or as a template.
If you have further questions about spark_unbound
or myself, checkout the issues and discussions pages of spark_unbound
, or contact me directly over Twitter @ManuelHatzl or LinkedIn.