AdaCore Blog

SPARK Crate of the Year: Unbounded containers in SPARK

by Manuel Hatzl (B&R) Guest Author

Why I cre­at­ed spark_​unbound #

In this blog entry, I try to cov­er my expe­ri­ence using Ada/​SPARK so far, why I cre­at­ed the spark_unbound library, and what this library is about.

Dis­cov­er­ing Ada/​SPARK and for­mal ver­i­fi­ca­tion #

I dis­cov­ered Ada when going through pub­li­ca­tions of one of my pro­fes­sors at my uni­ver­si­ty. Know­ing a bit of C and embed­ded projects due to my work at B&R, Ada’s syn­tax seemed to solve a lot of prob­lems I was fac­ing. Cre­at­ing spe­cif­ic types, in-out para­me­ters and much more were very help­ful. How­ev­er, with hyped lan­guages like Rust, and Ada’s small­er open source com­mu­ni­ty, it is at the moment hard for me to choose Ada for hob­by­ist projects that are not run in safe­ty crit­i­cal sys­tems or on embed­ded plat­forms. The recent announce­ment of Ada­Core, try­ing to improve the inter­op­er­abil­i­ty between Rust and Ada, will def­i­nite­ly help to choose Ada in the future.

The big advan­tage Ada has in my opin­ion over oth­er lan­guages is SPARK. SPARK, being a sub­set of Ada, allows you to for­mal­ly ver­i­fy your project to be free of run­time errors. From what I have found, SPARK is the most con­ve­nient way to write for­mal­ly ver­i­fied soft­ware and is real­ly well inte­grat­ed in Ada and GNAT Stu­dio. One might think that it is crazy to rec­om­mend SPARK for open source and/​or hob­by­ist projects, even though there are even few­er projects writ­ten in SPARK. How­ev­er, I would argue that hav­ing a for­mal­ly ver­i­fied open source project increas­es the like­li­hood of being used by big com­pa­nies, as it can sig­nif­i­cant­ly help to pre­vent sup­ply chain attacks. That’s assum­ing that nec­es­sary low­er lev­el projects/​libraries are for­mal­ly ver­i­fied. Sure, hav­ing lots of tests is good, but writ­ing good tests is hard and tedious (espe­cial­ly if you need a lot of mocks), and they must be in sync with the cur­rent source code. Oth­er­wise, those tests are mere gim­micks and I think that it is hard­er to eval­u­ate a project with lots of tests com­pared to a for­mal­ly ver­i­fied one. Note that I do not con­sid­er projects that have nei­ther tests nor proofs, as they are doomed to fail over time. With for­mal ver­i­fi­ca­tion as it is inte­grat­ed in SPARK, the nor­mal source code you write already takes away the need to write detailed unit tests. This helps to under­take low lev­el changes dur­ing devel­op­ment with way more con­fi­dence than unit tests could ever give you.

Hav­ing pre-/post-con­di­tions and asserts direct­ly embed­ded in the source code also felt very nat­ur­al to me while pro­gram­ming as I always try to nar­row down what is allowed at a cer­tain line and with oth­er lan­guages that just means a lot of if-statements.

The prob­lem with few­er open source projects remains though, but since there are so few, you have a lot of room to cre­ate your own basic low lev­el project. Big­ger projects will have to wait for now, or be pre­pared to take way longer than expected.

How to unbind SPARK #

Search­ing through those few open source projects writ­ten in SPARK, I dis­cov­ered that all of them are bound­ed at least at one point. That seemed odd to me as some of those projects were pars­ing JSON files and those may range from one or two objects to sev­er­al thou­sands, includ­ing nest­ing. So why this limitation?

This is prob­a­bly because heap allo­ca­tion has not been pos­si­ble in SPARK up to 2019, as point­ers where not pos­si­ble before. With the intro­duc­tion of point­ers in SPARK, using the heap for unbound­ed pro­grams became possible.

Since I could not find an open source project that used these new pos­si­bil­i­ties to pro­vide unbound­ed data struc­tures and the for­mal con­tain­ers should not be used in pro­duc­tion due to their per­for­mance, I decid­ed to cre­ate spark_unbound.

Prepar­ing to han­dle Storage_Error #

Dynam­ic allo­ca­tion in Ada is done using the new key­word. This is also the case in SPARK. At this point you might ask your­self, how does SPARK han­dle cas­es where there is not enough mem­o­ry left to allocate?

In short, it doesn’t. Instead, a Storage_Error gets thrown.

But I thought SPARK assures absence of run­time errors and does not allow han­dling exceptions?
SPARK can assure the absence of all run­time errors except Storage_Error and since excep­tion han­dling is not includ­ed in SPARK, every call to new is tricky to han­dle, since it might throw this error.

The solu­tion I chose was pro­posed by Fabi­en Chouteau in the linked GitHub issue. SPARK allows exclud­ing cer­tain code regions from analy­sis and in those, it is then pos­si­ble to han­dle exceptions.
With that solu­tion in place, SPARK pro­grams are tru­ly free of run­time errors (when addi­tion­al­ly using GNAT­stack to mon­i­tor the need­ed stack size).

Below is the func­tion body tem­plate tak­en from the issue on how a Storage_Error may be handled:

function Alloc return Name is
  pragma SPARK_Mode (Off); -- Only the allocation has to be in SPARK off
begin
  return new Object;
exception
  when Storage_Error =>
      return null;
end Alloc;

Cur­rent state of spark_​unbound #

Over­all struc­ture #

All pro­vid­ed pack­ages are gener­ic, and I tried to put as few con­straints on user types as pos­si­ble. At the moment, spark_unbound only pro­vides an Unbound_Array type that mim­ics the Vector type in Ada.Containers.Vectors and the Safe_Alloc pack­age that is a wrap­per to han­dle the Storage_Error as dis­cussed above. The Safe_Alloc pack­age pro­vides two child pack­ages for def­i­nite and array types. Because excep­tion han­dling can­not be proven by SPARK, AUnit has been used to val­i­date the func­tion­al­i­ty of Alloc().

Below is an exam­ple on how to use Safe_Alloc:

with Spark_Unbound.Safe_Alloc;

procedure Test is
  type Alloc_Record is record
    V1 : Integer;
    V2 : Natural;
    V3 : Positive;
  end record;

  type Record_Acc is access Alloc_Record;
          
  package Record_Alloc is new Spark_Unbound.Safe_Alloc.Definite(T => Alloc_Record, T_Acc => Record_Acc);
  Rec_Acc : Record_Acc;
begin
  Rec_Acc := Record_Alloc.Alloc; -- Note: No `new` is set before 

  -- check if Rec_Acc is NOT null and then do something

  Record_Alloc.Free(Rec_Acc);
end Test;

Unbound Array #

As men­tioned, Unbound_Array in the pack­age Spark_Unbound.Arrays mim­ics the func­tion­al­i­ty of the Vector type in Ada.Containers.Vectors. Inter­nal­ly, an array is allo­cat­ed to fit all append­ed ele­ments. If this array gets too small, a new one is allo­cat­ed and all ele­ments are copied to the new one. At the moment, not all pro­ce­dures and func­tions that can be proven, have been imple­ment­ed. Insert­ing and delet­ing ele­ments that are not at the end of the array is not yet possible.

Pro­ce­dures and func­tions that can­not be proven, have been removed and will not be imple­ment­ed. For exam­ple, the add oper­a­tor + could fail, when the under­ly­ing array would need to be resized to fit all ele­ments, but there is not enough con­tigu­ous memory.

All avail­able pro­ce­dures and func­tions are proven by GNAT­prove to achieve plat­inum level.

Below is an exam­ple on how to use Unbound_Array:

with Spark_Unbound.Arrays;

procedure Test is
  package UA_Integer is new Spark_Unbound.Arrays(Element_Type => Integer, Index_Type => Positive);
  Test_UA : UA_Integer.Unbound_Array := UA_Integer.To_Unbound_Array(Initial_Capacity => 3);
  Success : Boolean;
begin
  -- Fill Array
  UA_Integer.Append(Test_UA, 1, Success);
  UA_Integer.Append(Test_UA, 2, Success);
  UA_Integer.Append(Test_UA, 3, Success);

  -- Now Append() needs to resize
  UA_Integer.Append(Test_UA, 4, Success);
end Test;

Note: You should check for Success after every call to Append(). Oth­er­wise, you won’t be able to prove cor­rect use.

Repos­i­to­ry and work­flows #

spark_unbound is avail­able at GitHub with an auto­mat­i­cal­ly gen­er­at­ed doc­u­men­ta­tion at mhat​zl​.github​.io/​s​p​a​r​k​_​u​n​b​ound/ using GNATdoc.
To get this doc­u­men­ta­tion, the con­fig­u­ra­tion for GNAT­doc was mod­i­fied to only take --- as doc­u­men­ta­tion com­ments. The doc­u­men­ta­tion is gen­er­at­ed using a GitHub action and host­ed using GitHub pages.

package Documentation is
  for Doc_Pattern use "^-";
  -- This considers comments beginning with "---" to be documentation
  -- Needed to ignore commented Ghost functions that would break GNATdoc
end Documentation;

Besides the doc­u­men­ta­tion, two more work­flows are defined:

  1. Build and prove the crate
  2. Run the AUnit tests

Next steps #

At first, I would like to address the fol­low­ing points, but I am quite stuck on how to solve them.

  • Mak­ing the fields of the Unbound_Array record private

    I was not able to write some proofs with the fields being private.

  • Get­ting rid of proof warn­ing mem­o­ry accessed through objects of access type” 

    This should be due to the allo­cat­ed array not being ini­tial­ized on allo­ca­tion. This is not a prob­lem if one only uses the pro­vid­ed func­tions, but it both­ers me.

Fur­ther goals are then to

  • make the crate thread safe
  • imple­ment more pro­ce­dures for Unbound_Array to get close to Vector functionality
  • imple­ment the Generic_Sorting package
  • add a B‑Tree structure

I am also con­sid­er­ing two changes to the cur­rent implementation:

  1. Rename Unbound_Array to Vector

    Since the struc­ture mim­ics the Vector type in Ada.Containers.Vectors, renam­ing it to Vector feels more natural.

  2. Remove exhaus­tive Append() functionality

    At the moment, Append() tries to dou­ble the allo­cat­ed mem­o­ry every time the pro­ce­dure is called when Capacity = Length and if this dou­bled mem­o­ry can­not be allo­cat­ed, the request­ed mem­o­ry is reduced lin­ear­ly. There­fore, Append() only fails if current capacity + 1 can­not be allocated.
    While this squeezes out every last free mem­o­ry region, it adds a sig­nif­i­cant per­for­mance over­head in those edge cas­es. My solu­tions would be to either fail if dou­bling is not pos­si­ble, or reduce the request­ed mem­o­ry in a log­a­rith­mic way. Any feed­back regard­ing this is great­ly appreciated.

I hope spark_unbound helps to bring more peo­ple to SPARK, either using it direct­ly or as a template.

If you have fur­ther ques­tions about spark_unbound or myself, check­out the issues and dis­cus­sions pages of spark_unbound, or con­tact me direct­ly over Twit­ter @ManuelHatzl or LinkedIn.

Posted in #SPARK    #Unbound    #Data Structures   

About Manuel Hatzl

Manuel Hatzl

Manuel is studying computer science at the University of Salzburg and working part time at B&R. One of his goals is to increase the use of SPARK for open source projects.