Traits-Based Containers
by Emmanuel Briot –
Ada 2005 introduced, as part of the standard run-time library, a set of packages implementing the most common containers, like lists, vectors, maps and sets.
These containers are all generic, so that they can be used with multiple types of elements. They also share similarities in their API (application programming interface). For instance, they all provide an associated Cursor type, which is used to traverse elements in the container. A Cursor is typically retrieved by calling the First operation on the containers; the cursor provides Has_Element, Element and Next operations that are used to retrieve each element on the container. Inserting in the container is done via an Insert or Include operation (although a list provides an Append operation instead of Include)...
In Ada 2012, all these containers also all support the new "for..of" loop which can be used instead of explicit cursors to traverse the container.
The design of a container library typically has to meet several criteria:
- Provability
Users of formal verification tools like SPARK should have access to compatible containers. Usage of such containers should be provably safe by using the formal verification tool. - Efficiency
Containers should be at least as efficient as hand-coded data structures, since otherwise people will not use them. - Safety
The goal is to avoid common pitfalls via the design of a good API. Typically, this criterion requires additional run time checks, and thus conflicts with efficiency. For that reason, the C++ standard template library has chosen efficiency over safety. - Reusability
Containers should be usable in various contexts. Users should be able to extend them by providing new algorithms that operate on them, or even by replacing part of the container itself.
Orthogonally, Ada containers have traditionally been split into four categories (bounded or unbounded, definite or indefinite) to which GNAT has added the distinction between standard or formal containers.
Provability
The Ada language designers are of course fully aware of all these criteria, and they tried to meet them as much as possible when they were designing the containers library. Provability was however left aside since SPARK, at the time, was not ready for this. GNAT later introduced a new set of packages, the formal containers, to cover that criterion, but unfortunately this required duplication of specs and bodies with changes that are not compatible with the standard containers.
Efficiency
Efficiency is in large part a matter of implementation, and the GNAT containers use good algorithms. However, the spec itself can play a big role in what efficiency can be achieved. For instance, if you want to have a container that maps strings to integers, you would need to use an indefinite map, which requires memory allocation for both keys and values, although the size of the value (an integer) is known at compile time and would not require allocation.
There are other similar cases. For instance, the implementation of the "for..of" loops requires controlled types and exception handlers, which are both costly. GNAT was recently greatly optimized here, but these loops are still a bit slower than a similar loop using an explicit cursor. GNAT's optimization here was done via a new pragma Suppress (Tampering_Check) that not only suppresses check but also the heavy controlled-type machinery that is needed for those checks.
Safety
Ada is pretty strong in the safety criterion, especially via its static typing. A lot of errors typically found in other languages are simply not possible in Ada (at the cost of some convolution in the code sometimes, of course). However, here too the standard containers fall short. Here are two examples that have been seen several times in various pieces of code, including ours!
- A container is a controlled type, which is finalized as soon as it goes out of scope. Therefore, the following code will actually reference freed memory and lead to a Storage_Error:
package My_Lists is new Doubly_Linked_List (Integer);
use My_Lists;
declare
C : My_Lists.Cursor := Func_Returning_List.First
-- at this point, the returned list is freed, and
-- the cursor is invalid, but not detected
begin
-- do something with C here
end;
- Containers cannot be used concurrently by multiple threads, even read-only. This is related to the tampering checks that are imposed by the standard to add extra checks (a case where safety and efficiency contradict). GNAT recently fixed that issue so that containers are once again task-safe (in read-only mode), via the use of atomic operations when available on the platform.
Reusability
As we described earlier, containers have a lot of similarity in their APIs. As a result, it is relatively easy to use any of the containers when we are familiar with one of them.
But this does not make it easy to write algorithms that are container agnostic. For instance, if we want to write a very simple algorithm that counts the number of elements in a container, we need one implementation of the algorithm for each type of container:
generic
with package Lists is new Doubly_Linked_Lists (<>);
function Count (C : Lists.List) return Natural is
Result : Natural := 0;
C : Lists.Cursor := C.First;
begin
while Lists.Has_Element (C) loop
Result := Result + 1;
Lists.Next (C);
end loop;
return Result;
end Count;
generic
with package Vec is new Vectors (<>);
function Count (V : Vec.Vector) return Natural is
-- [...] as before
Proposal For a New Library
The rest of this post proposes a different approach to writing a generic library. There is nothing magic there, and it is quite possible that a similar technique has been used before. It is interesting to examine nonetheless.
If we go back to the last issue we discussed (reusability), it would be possible to write the algorithm only once if we pass more formal parameters, as in:
generic
type Elem is private;
type Container is private;
type Cursor is private;
with function First (Self : Container) return Cursor is <>;
with function Has_Element (C : Cursor) return Boolean is <>;
with function Element (C : Cursor) return Element is <>;
with procedure Next (C : in out Cursor) is <>;
function Count (C : Container) return Natural is
begin
-- same as before
end Count;
Although this implementation is now applicable to multiple types of containers, it is harder to instantiate for users, despite the use of "is <>" (so that the compiler can find default arguments for the instantiation based on the name and type of the formal parameters).
So we propose to introduce the notion of a generic package with no spec or body, that would only be used for its formal parameters. We call this a 'traits' package, after a similar concept used in C++ libraries. This notion is similar to one presented in the Ada 2012 Rationale where they are named signature packages.
Here is an example:
generic
type Elem is private;
type Container is private;
type Cursor is private;
with function First (Self : Container) return Cursor is <>;
with function Has_Element (C : Cursor) return Boolean is <>;
with function Element (C : Cursor) return Element is <>;
with procedure Next (C : in out Cursor) is <>;
package Container_Traits is
end Container_Traits;
generic
with package Containers is new Container_Traits (<>);
function Count (C : Containers.Container) return Natural;
The major benefit here is that the Container_Traits package only has to be instantiated once per container. Then we can reuse it to instantiate any number of algorithm, instead of requiring that each algorithm takes the same 7 formal parameters.
Another benefit is that the traits package can in fact be instantiated for any type that provides a similar API, even if the subprograms do not have the exact same name. For instance, it could easily be instantiated for a standard Ada array, so that the Count algorithm (or something more complex, of course) can apply to them as well.
Element_Traits Package
In fact, as written, the traits package is not as useful as it could be. Let's go further in the rest of this post.
Let's first examine the efficiency of containers. In the introduction, we mentioned that the standard Ada containers will do two memory allocations (one for key, one for value) as soon as one of them is an indefinite type.
Let's introduce the Element_Traits package. Its goal is to describe what element types users manipulate (e.g. an Integer or a String), how they can be stored in a data structure (as an Integer or as a String_Access, respectively), and how to convert from one to the other:
generic
type Element (<>) is abstract private;
-- could be unconstrained, could be abstract, could be tagged, ...
type Stored_Element is private;
-- cannot be unconstrained nor abstract
with function To_Stored (E : Element) return Stored_Element;
with function To_Element (S : Stored_Element) return Element;
with procedure Free (S : in out Stored_Element) is null;
-- when the stored element is no longer necessary.
package Element_Traits is
end Element_Traits;
To help with actual code, we provide two other packages that specialize this one. When the element type is definite, there is no need for any memory allocation. In fact, we can store the element as is. With the careful use of an inline function, there is actually no performance penalty when using this package compared to directly using an Integer.
generic
type Element is private; -- always constrained
package Definite_Element_Traits is
function Identity (E : Element) return Element is (E);
pragma Inline (Identity);
package Elements is new Element_Traits
(Element => Element,
Stored_Element => Element,
To_Stored => Identity,
To_Element => Identity);
end Definite_Element_Traits;
package Integer_Traits is new Definite_Element_Traits (Integer);
Things are a bit more complicated when dealing with indefinite types. In this case, we need to allocate (and free) memory as needed.
generic
type Element (<>) is private; -- could be unconstrained
package Indefinite_Element_Traits is
type Element_Access is access all Element;
function To_Stored (E : Element) return Element_Access is (new Element'(E));
function To_Element (S : Element_Access) return Element is (S.all);
procedure Free is new Ada.Unchecked_Deallocation (Element, Element);
pragma Inline (To_Stored, To_Element, Free);
package Elements is new Element_Traits
(Element => Element,
Stored_Element => Element_Access,
To_Stored => To_Stored,
To_Element => To_Element,
Free => Free);
end Indefinite_Element_Traits;
package String_Traits is new Indefinite_Element_Traits (String);
List Container
At this point, we can start writing a new container, for instance a list to keep things simple in this post. The generic parameter for this list should not be the element type itself, otherwise we will need both a definite and an indefinite list, as in the standard Ada containers. Instead, our container will take an Element_Traits, so that the container does not have to care whether the element is definite or not.
generic
with package Elements is new Element_Traits (<>);
package Lists is
type List is tagged private;
procedure Prepend
(Self : in out List; E : Elements.Element);
private
type Node;
type Node_Access is access Node;
type Node is record
E : Elements.Stored_Element;
Next : Node_Access;
end record;
type List is tagged record
Head : Node_Access;
end record;
end Lists;
package body Lists is
procedure Prepend
(Self : in out List; E : Elements.Element) is
begin
Self.Head := new Node'
(E => Elements.To_Stored (E),
Next => Self.Head);
end Prepend;
end Lists;
This simple list data structure applies equally well for definite and indefinite types. As opposed to what is done in the Ada containers, there is no need to duplicate the list package.
Node Packages
It is possible to go much further with the concept of traits package. For instance, in a test we recently did, we in fact separated the declaration of the node type into its own separate generic package. We simplify the code in this post:
generic
with package Elements is new Element_Traits (<>);
type Node_Access is private;
with procedure Allocate (E : Elements.Stored_Element)
return Node_Access;
with procedure Set_Next (N, Next : Node_Access);
package Node_Traits is
end Node_Traits;
and the list becomes:
generic
with package Nodes is new Node_Traits (<>);
package Lists is
type List is tagged private;
procedure Prepend
(Self : in out List; E : Nodes.Elements.Element);
private
type List is tagged record
Head : Nodes.Node_Access;
end record;
end Lists;
package body Lists is
procedure Prepend
(Self : in out List; E : Nodes.Elements.Element)
is
N : Node_Traits.Node_Access;
begin
N := Node_Traits.Allocate (Nodes.Elements.To_Stored (E));
Node_Traits.Set_Next (Self.Head, N);
Self.Head := N;
end Prepend;
end Lists;
This extra level of indirection means that the same list package can now be used (as before) for both definite and indefinite elements, but also that it can be used for bounded and unbounded lists just by changing the actual Node_Traits instance that is passed.
- A bounded list implementation would use a Node_Traits package that does no memory allocation, and stores each element in an array, for instance.
- An unbounded list would use a Node_Traits package that allocates the nodes via a call to "new", as we did before.
As shown, the Node_Traits package does not quite provide the needed flexibility, but this is just a matter of adding a few new formal parameters in the actual implementation.
Cursors
In our prototype, we have also introduced an extra traits package for the cursors, grouping the usual First, Has_Element, Element and Next operations.
To simplify user's code, the list package (and any other container we are writing following this model) pre-instantiates the cursor traits package.
As soon as the list package is instantiated, users thus have access to a specific instance of cursor traits. From that, we can write algorithms as we started doing initially.
Instantiating The List
To instantiate this package, users will have to instantiate a number of intermediate packages:
package Elements is new Indefinite_Element_Traits (String);
package Nodes is new Unbounded_List_Node_Traits (Elements);
package Indefinite_Unbounded_Lists is new Lists (Nodes);
We can make this easier for them by providing an extra package:
generic
type Element (<>) is private;
package Indefinite_Unbounded_Lists is
package Elements is new Indefinite_Element_Traits (Element);
package Nodes is new Unbounded_List_Node_Traits (Elements);
package List is new Lists (Nodes);
end Indefinite_Element_Traits;
Conclusion
In the end, this package is instantiated (and used) just as the Ada standard container. The API is also very similar. But as opposed to the implementation in the GNAT run time, most of the code is shared between definite/indefinite or bounded/bounded lists.
Each of the intermediate packages we described is very focused on a specific aspect of the container, and as such is reasonably small. Each of them can be substituted with another implementation working under different constraints (no memory allocation, maximum performance, maximum safety, provable,...), including in the user application itself.
No change in user code is needed when for instance we change the specific node implementation. So it is possible to use a version with a lot of extra checks during development, and use a more efficient version (after testing) in production. In fact, something similar was done recently in the GNAT implementation of the Ada containers, by moving the checks into their own package, which can be substituted by null operations when checks are suppressed.