The AdaCore Blog http://blog.adacore.com/ An insight into the AdaCore ecosystem en-us Wed, 18 Sep 2019 17:31:36 +0000 Wed, 18 Sep 2019 17:31:36 +0000 The Make with Ada competition is back! http://blog.adacore.com/the-make-with-ada-competition-is-back Thu, 12 Sep 2019 12:50:57 +0000 Emma Adby http://blog.adacore.com/the-make-with-ada-competition-is-back

AdaCore’s fourth annual Make with Ada competition launched this week with over $8K in cash and prizes to be awarded for the most innovative embedded systems projects developed using Ada and/or SPARK. 

The contest runs from September 10, 2019, to January 31, 2020, and participants can register on the Hackster.io developer platform here.

What’s new?

Based on feedback from previous participants, we changed the competition evaluation criteria so projects will now be judged on:

  • Software quality - Does the software meet its requirements?;

  • Openness - Is the project open source?; and

  • “Buzz factor” - Does it have the wow effect to appeal to the software community?

Further information about the judging criteria is available here

We’ve also increased the amount of prizes to give more projects a chance to win:

  • One First Prize, in the amount of 2000 (two thousand) USD

  • Ten Finalist Prizes, in the amount of 600 (six hundred) USD each

  • One Student-only Prize (an Analog Discovery 2 Pro Bundle worth 299.99 USD) will go to the best-ranking student finalist. A project submitted by a student is eligible for both the Student-only Prize and the cash prizes. 

Award winners will be announced in March 2020 and project submissions will be evaluated by a judging panel consisting of Bill Wong, Senior Technology Editor at Electronic Design, and Fabien Chouteau, AdaCore software engineer, and author of the Make with Ada blog post series.

Don’t forget that the new and enhanced GNAT Community 2019 is also available for download for use in your projects! 

]]>
First Ada Virtual Conference organized by and for the Ada community http://blog.adacore.com/ada-virtual-conferences Thu, 05 Sep 2019 13:34:25 +0000 Maxim Reznik http://blog.adacore.com/ada-virtual-conferences

The Ada Community gathered recently around a new exciting initiative - an Ada Virtual Conference, to present Ada-related topics in a 100% remote environment.

The first such conference took place on August, 10th 2019, around the topic of the new features in Ada 202x. The conference took the form of a video/audio chat based on the open source platform jitsi.org. No registration required, just access over a web browser or mobile application, with the possibility to participate anonymously. The presentation is just short of 25 minutes and is available on YouTube, Vimeo or via DropBox.

Note that while the talk presents the current draft for Ada 202x, some features are still in discussion and may not make it to the standard, or with a different syntax or set of rules. That's in particular the case for all parallelism-related features and iterators (that is, up to slide 16 in the talk) which are being revisited by a group of people within AdaCore, in order to submit recommendations to the Ada Rapporteur Group next year. 

As the talk concludes, please contribute to the new Ada/SPARK RFCs website if you have ideas about the future of the language!

Now the Ada Virtual Conference has a dedicated website, where anyone can vote for the topic of the next event. We invite you to participate!

Image by Tomasz Mikołajczyk from Pixabay.

]]>
Secure Use of Cryptographic Libraries: SPARK Binding for Libsodium http://blog.adacore.com/secure-use-of-cryptographic-libraries-spark-binding-for-libsodium Tue, 03 Sep 2019 11:54:00 +0000 Isabelle Vialard http://blog.adacore.com/secure-use-of-cryptographic-libraries-spark-binding-for-libsodium

The challenge faced by cryptography APIs is to make building functional and secure programs easy for the user. Even with good documentation and examples, this remains a challenge, especially because incorrect use is still possible. I made bindings for two C cryptography libraries, TweetNaCl (pronounce Tweetsalt) and Libsodium, with the goal of making this binding easier to use than the original API by making it possible to detect automatically a large set of incorrect uses. In order to do this, I did two bindings for each library: a low-level binding in Ada, and a higher level one in SPARK, which I call the interface. I used Ada strong-typing characteristics and SPARK proofs to enforce a safe and functional use of the subprograms in the library.

In this post I will explain the steps I took to create these bindings, and how to use them.

Steps to create a binding

I will use one program as example: Crypto_Box_Easy from Libsodium, a procedure which encrypts a message.

At first I generated a binding using the Ada spec dump compiler:

gcc -c -fdump-ada-spec -C ./sodium.h

Which gives me that function declaration:

  function crypto_box_easy
     (c : access unsigned_char;
      m : access unsigned_char;
      mlen : Extensions.unsigned_long_long;
      n : access unsigned_char;
      pk : access unsigned_char;
      sk : access unsigned_char) return int  -- ./sodium/crypto_box.h:61
   with Import => True, 
        Convention => C, 
        External_Name => "crypto_box_easy";

Then I modified this binding: First I changed the types used and I added in and out parameters. I removed the access parameter. Scalar parameters with out mode are passed using a temporary pointer on the C side, so it works even without explicit pointers. For unconstrained arrays like Block8 it is more complex. In Ada unconstrained arrays are represented by what is called fat pointers, that is to say a pointer to the bounds of the array and the pointer to the first element of the array. In C the expected parameter is a pointer to the first element of the array. So a simple binding like this one should not work. What saves the situation is this line which forces passing directly the pointer to the first element:

with Convention => C;

Thus we go from a low-level language where the memory is indexed by pointers to a typed language like Ada:

  function crypto_box_easy
     (c    :    out Block8;
      m    : in     Block8;
      mlen : in     Uint64;
      n    : in out Block8;
      pk   : in     Block8;
      sk   : in     Block8) return int  -- ./sodium/crypto_box.h:61
   with Import => True,
        Convention => C,
        External_Name => "crypto_box_easy";

After this, I created an interface in SPARK that uses this binding: the goal is to make the same program as in the binding with some modifications. Some useless parameters will be deleted. For instance the C program often asks for an array and the length of this array (like m and mlen), which is useless since the length can be found with the attribute 'Length. Functions with out parameters must be changed into procedures to comply with SPARK rules. Finally new types can be created to take advantage of strong typing, as well as preconditions and postconditions.

  procedure Crypto_Box_Easy
     (C  :    out Cipher_Text;
      M  : in     Plain_Text;
      N  : in out Box_Nonce;
      PK : in     Box_Public_Key;
      SK : in     Box_Secret_Key)
   with
       Pre => C'Length = M'Length + Crypto_Box_MACBYTES
       and then Is_Signed (M)
       and then Never_Used_Yet (N);

How to use strong typing, and why

Most of the parameters required by these programs are arrays. Some arrays for messages, others for key, etc. The type Block8 could be enough to represent them all:

type Block8 is array (Index range <>) of uint8;

But then anyone could use a key as a message, or a message as a key. To avoid that, in Libsodium I derived new types from Block8. For instance, Box_Public_Key and Box_Secret_Key are the key types used by the Crypto_Box_* programs. The type for messages is Plain_Text, and the type for messages after encryption is Cipher_Text. Thus I take advantage of Ada's strong-typing characteristics in order to enforce the right use of the programs and their parameters.

With TweetNaCl, I did things a bit differently: I created the different types directly in the binding, for the same result. Since TweetNaCl is a very small library, it was faster that way. In Libsodium I chose to let my first binding stay as close as possible to the generated one, and to focus on the interface where I use strong-typing and contracts (preconditions and postconditions).

Preconditions and postconditions

Preconditions and postconditions serve the same purpose as derived types: they  enforce a specific use of the programs. There are two kinds of conditions:

The first kind are mostly conditions on array's length. They are here to ensure the program will not fail. For instance:

Pre => C'Length = M'Length + Crypto_Box_MACBYTES;

This precondition says that the cipher text should be exactly Crypto_Box_MACBYTES bytes longer than the message that we want to encrypt. If this condition is not filled then execution will fail. Note that we can reference C'Length in the precondition, even though C is an out parameter, because the length attribute of an out parameter that is of an array type is available when the call is made, so we can reason about it in our precondition.

The other kind of conditions is used to avoid an unsafe use of the programs. For instance, Crypto_Box uses a Nonce. A Nonce is a small array used as a complement to a key. In theory, to be safe, a key should be long, and used only for one message. However, it is costly to generate a new long key for each message. So we use a long key for every message with a Nonce which is different for each message, but easy to generate. Thus the encryption is safe, but only if we remember to use a different Nonce for each message. To ensure that, I wrote this precondition:

Pre => Never_Used_Yet (N)

Never_Used_Yet is a ghost function, it means a function that doesn't affect the program's behavior.

type Box_Nonce is limited private;
function Never_Used_Yet (N: Box_Nonce) return Boolean with Ghost;

When GNATprove is used, it sees that procedure Randombytes (Box_Nonce: out N), the procedure that randomly generates the Nonce, has the postcondition Never_Used_Yet (N). So it deduces that when the Nonce is first generated, Never_Used_Yet (N) is true. Thus the first time N is used by Crypto_Box_Open, the precondition is valid. But if N is used a second time, GNATprove cannot prove Never_Used_Yet (N) is still true (because N as the parameter in out, so it could have been changed). That's why it cannot prove a program that calls Crypto_Box_Open twice with the same Nonce:

GNATprove returns this when a Nonce is used on more than one message

There are ways around this condition: for instance one could copy a random generated Nonce many times to use it on different message. To avoid that, Box_Nonce is declared as limited private: it cannot be copied and GNATprove cannot prove a copied Nonce has the same Never_Used_Yet property as a generated one.

Box_Nonce and Never_Used_Yet are declared in the private part of the package with SPARK_Mode Off so that GNATprove treats them as opaque entities:

private
   pragma SPARK_Mode (Off);

   type Box_Nonce is new Block8 (1 .. Crypto_Box_NONCEBYTES);

   function Never_Used_Yet (N : Box_Nonce) return Boolean is (True);

Never_Used_Yet always returns true. It is a fake implementation, what matters is that it is hidden for the proof. It works at runtime because it is always used as a positive condition. As a program requirement it is always used as "Never_Used_Yet (N)" and never "not Never_Used_Yet (N)" so the conditions are always valid, and Never_Used_Yet doesn't affect the program's behavior, even if contracts are executed at runtime.

Another example of preconditions made with a ghost function is the function Is_Signed (M : Plain_Text). When you want to send an encrypted message to someone, you want this person to be able to check if this message is from you, so no one will be able to steal your identity. To do this, you have to sign your message with Crypto_Sign_Easy, before encrypting it with Crypto_Box_Easy. Trying to skip the signing step leads to a proof error:

GNATprove returns this when an unsigned message is encrypted

How to use Libsodium_Binding

The repository contains:

    • The project file libsodium.gpr

    • The library directory lib

    • The common directory which contains: 

        ◦ The libsodium_binding package, a low-level binding in Ada made from the files generated using the Ada spec dump compiler.

        ◦ The libsodium_interface package, a higher level binding in SPARK which uses libsodium_binding.

    • include, a directory which contains the headers of libsodium.

    • libsodium_body, a directory which contains the bodies.

    • outside_src, a directory which contains the headers that were removed from include, to fix a problem of double definition.

    • thin_binding, a directory which contains the binding generated using the Ada spec dump compiler.

    • The test directory which contains tests for each group of functions.

    • A testsuite which verifies the same tests as the ones in the test directory.

    • The examples directory, with examples that use different groups of programs together. It also contains a program where a Nonce is used twice, and as expected it fails at proof stage.

outside_src and thin_binding are not used for the binding, but I let them in the repository because it shows what I changed from the original libsodium sources and the Ada generated binding.

This project is a library project so directory lib is the only thing necessary.

How to use TweetNaCl_Binding

The repository contains:

    • The project file tweetnacl.gpr

    • The common directory which contains:

        ◦ The tweetnacl_binding package, a low-level binding in Ada made from the files generated using the Ada spec dump compiler.

        ◦ The tweetnacl_interface package, a higher level binding in SPARK which uses tweetnacl_binding.

        ◦ tweetnacl.h and tweetnacl.c, the header and the body of the library

        ◦ randombytes.c, which holds randombytes, a program to generate arrays.

    • The test directory: test1 and test1b are functional examples of how to use tweetnacl main programs, the others are examples of what happens if you give an array with the wrong size, if you try to use the same nonce twice etc. They fail either at execution or at proof stage.

To use this binding, you just have to include the common directory in the Sources of your project file.

]]>
Proving a simple program doing I/O ... with SPARK http://blog.adacore.com/proving-a-simple-program-doing-io Tue, 09 Jul 2019 12:37:05 +0000 Joffrey Huguet http://blog.adacore.com/proving-a-simple-program-doing-io

The functionality of many security-critical programs is directly related to Input/Output (I/O). This includes command-line utilities such as gzip, which might process untrusted data downloaded from the internet, but also any servers that are directly connected to the internet, such as webservers, DNS servers and so on. However, I/O has received little attention from the formal methods community, and SPARK also does not help the programmer much when it comes to I/O. SPARK has been used to debug functions from the standard library Ada.Text_IO, but this approach lacked support for error handling and didn't allow going up to the application level.

As an example, take a look at the current specification of Ada.Text_IO.Put, which only recently has been annotated with some SPARK contracts:

procedure Put
  (File : File_Type;
   Item : String)
with
  Pre    => Is_Open (File) and then Mode (File) /= In_File,
  Global => (In_Out => File_System);

(We have suppressed the postcondition of this function, which talks about line and page length, a functionality of Ada.Text_IO which is not relevant to this blog post.)

We can see that Put has a very light contract that protects against some errors, such as calling it on a file that has not been opened for writing, but not against the other many possible errors related to writing, such as a full disk or file permissions. If such an error occurs, Put raises an exception, but SPARK does not allow catching exceptions. So in the above form, even a proved SPARK program that uses Put may terminate with an unhandled exception.

Moreover, Put does not specify what exactly is written. For example, one cannot prove in SPARK that two calls to Put with the same File argument write the concatenation of the two Item arguments to the file. 

This second problem can probably be solved using a stronger contract (though it would be difficult to do so in Ada.Text_IO, whose interface must respect the Ada standard), but together with the first point it means that, even if our program is annotated with a suitable contract and proved, we can only say something informal like “if the program doesn’t crash because of unexpected errors, it respects its contract”. 

In this blogpost, we propose to solve these issues as follows. We replace Put by a procedure Write which reports errors via its output Has_Written:

procedure Write
  (Fd          : int;
   Buf         : Init_String;
   Num_Bytes   : Size_T;
   Has_Written : out ssize_t);

We now can annotate Write with a suitable postcondition that explains exactly what has been written to the file descriptor, and that a negative value of Has_Written signals an error. 

It is no accident that the new procedure looks like the POSIX system call write. In fact we decided to base this experiment on the POSIX API that consists of open, close, read and write, and simply added very thin SPARK wrappers around those system calls. The advantage of this approach is that system calls never crash (unless there are bugs in the kernel, of course) - they simply flag errors via their return value, or in our case, the out parameter Has_Written. This means that, assuming our program contains a Boolean variable Error that is updated accordingly after invoking the system calls, we can now formally write down the contract of our program in the form:

if not Error then ...

And if one thinks about it, this is really the best any program dealing with I/O can hope for, because some errors cannot be predicted or avoided by the programmer, such as trying to write to a full disk or trying to open a file that does not exist.

We could further refine the above condition to distinguish the behavior of the program depending on the error that was encountered, but we did not do this in the work described here.

The other advantage of using a POSIX-like API is that porting existing programs from C to this API would be simpler.

We validated this approach by writing a SPARK clone of the “cat” utility. We were able to prove that cat copies the content of the files in its argument to stdout, unless errors were encountered. The project is available following this link.

How to represent the file system?

The main interest of the library is to be able to write properties about content, which means that we have to represent this content through something (global variables, state abstraction…). We decided to use maps, that link a file descriptor to an unbounded string representing the content of the corresponding file. Formal_Hashed_Maps were convenient to use because they come with many functions that compare two different maps (e.g Keys_Included_Except, or Elements_Equal_Except).

In Ada, even Unbounded_String (from Ada.Strings.Unbounded) are somehow bounded: the maximum length of Unbounded_String is Natural’Last, a length that could be exceeded by certain files. We had to create another library, that relies on Ada.Strings.Unbounded, but would accept appending two unbounded strings whose lengths are maximal. The choice we made was to drop any character that would make the length overflow, e.g. the append function has the following contract:

function "&" (L, R : My_Unbounded_String) return My_Unbounded_String with
  Global         => null,
  Contract_Cases =>
    (Length (L) = Natural'Last               =>
       To_String ("&"'Result) = To_String (L),
     --  When L already has maximal length

     Length (L) < Natural'Last
       and then
     Length (R) <= Natural'Last - Length (L) =>
       To_String ("&"'Result) = To_String (L) & To_String (R),
     --  When R can be appended entirely

     others                                  =>
       To_String ("&"'Result) 
       = To_String (L) & To_String (R) (1 .. Natural'Last - Length (L)));
     --  When R cannot be appended entirely

Also, what you can see here is that all properties of unbounded strings are expressed with the conversion to String type: this allows to use the existing axiomatization of strings (through arrays) instead of redefining one.

Another design choice was made here: what’s the content of a file we opened in read mode? In stdio.ads, we considered that the content of this file is strictly what we read from it. Because there is no way to know the content of the file, we decided to implement cat as a procedure that would “write whatever it reads”. Any given file could be modified while reading, or, for the case of stdin, parent process may also read part of the data.

The I/O library

As said before, the library consists of thin bindings to system calls. Interfacing scalar types (int, size_t, ssize_t) was easy. We also wanted to model more precisely instantiation of buffers. Indeed, when calling Read procedure, the buffer might not be initialized entirely after the call; it is also possible to call Write on a partially initialized String, to copy the first n values that are initialized. The current version of SPARK requires that arrays (and thus Strings) are fully initialized. More details are available in the SPARK User's Guide. A new proposed evolution of the language (see here) is already prototyped in SPARK and allows manipulating safely partially initialized data. In the prototype, we declare a new type, Init_Char, that will have the Init_By_Proof annotation.

subtype Init_Char is Character;
pragma Annotate(GNATprove, Init_By_Proof, Init_Char);

type Init_String is array (Positive range <>) of Init_Char;

With this type declaration, it is now possible to use the attribute ‘Valid_Scalars on Init_Char variables or slices of Init_String variables in Ghost code to specify which scalars have been initialized

The next part was writing the bindings to C functions. An example, for Read, is the following:

function C_Read (Fd : int; Buf : System.Address; Size : size_t; Offset : off_t) return ssize_t;
pragma Import (C, C_Read, "read");

procedure Read (Fd : int; Buf : out Init_String; Has_Read : out ssize_t) is
begin
   Has_Read := C_Read (Fd, Buf'Address, Buf'Length, 0);
end Read;

Other than hiding the use of addresses from SPARK, this part was not very difficult.

The final part was adding the contracts to our procedures.

Firstly, there are no preconditions. System calls may return errors, but they will accept any parameter in input. The only precondition we have in the library is a precondition on Write, that states that the characters that we want to write in the file are initialized. 

Secondly, every postcondition is a case expression, where we give properties for each possible return value, e.g:

procedure Open (File : char_array; Flags : int; Fd : out int) with
  Global => (In_Out => (FD_Table, Errors.Error_State, Contents)),
  Post   =>
    (case Fd is
       when -1                      => Contents'Old = Contents,
       when 0 .. int (OPEN_MAX - 1) =>
         Length (Contents'Old) + 1 = Length (Contents)
           and then
         Contains (Contents, Fd)
           and then
         Length (Element (Contents, Fd)) = 0
           and then
         not Contains (Contents'Old, Fd)
           and then
         Model (Contents'Old) <= Model (Contents)
           and then
         M.Keys_Included_Except (Model (Contents),
                                 Model (Contents'Old),
                                 Fd),
       when others                  => False);

The return value of Open will be either -1, which corresponds to an error, or a natural value (in my case, OPEN_MAX is equal to 1023, 1024 being the maximum number of files that can be open at the same time on my machine). If an error occured, the Contents map is the same as before. If an appropriate file descriptor is returned, the postcondition states that the new Contents map has the same elements as before, plus a new empty unbounded string associated with the file descriptor. With regard to the functional model, these contracts are complete.

The Cat program

The cat program is split in two different parts: the main program that opens/closes file(s) in its argument, and calls Copy_To_Stdout. This procedure will read from the input file and write to stdout what it read.

Since errors in I/O can happen, we propagate them using a status flag from nested subprograms to the main program and handle them there. This point is the first difference with Ada.Text_IO.

The other difference is the presence of postconditions about data, for example in postcondition of Copy_To_Stdout:

procedure Copy_To_Stdout (Input : int; Err : out Int) with
  Post => 
    (if Err = 0 then Element (Contents, Stdout)
                     = Element (Contents’Old, Stdout) & Element (Contents, Input));

This postcondition is the only functional contract we have for cat, this is why it is so important. It states that if no error occurred, the content of stdout is equal to its value before calling Copy_To_Stdout appended to the content we read from the input.

If we wanted to write a more precise contract, the main difficulty would be to handle the cases where an error occured. For example, if we call cat on three different files, and one error occurs when copying the second file to stdout, we have no contract about the content of Stdout, and everything becomes unprovable. Adding contracts for these cases would require work on slices and sliding and to do even more case-splitting, which would add more difficulty for the provers. 

Type definitions and helper subprograms to define the library take about 200 lines of code. The library itself has around 100 lines of contracts. The cat program has 100 lines of implementation and 1200 lines of Ghost code in order to prove everything. Around 1000 verification conditions for the entire project (I/O library + cat + lemmas) are discharged by the solvers to prove everything with auto-active proof. 

Conclusion

Cat looks like a simple program, and it is. But being able to prove correctness of cat shows that we are able to reason about contents and copying of data (here, between file descriptors), something which is necessary and largely identical for more complex applications like network drivers or servers that listen on sockets. We will be looking at extending our approach to code manipulating (network) sockets.

]]>
Using Ada for a Spanish Satellite Project http://blog.adacore.com/using-ada-for-a-spanish-satellite-project Tue, 18 Jun 2019 13:52:00 +0000 Juan Zamorano http://blog.adacore.com/using-ada-for-a-spanish-satellite-project

I am an Associate Professor at Polytechnic University of Madrid’s (Universidad Politécnica de Madrid / UPM) in the Department of Architecture and Technology of Computer Systems. For the past several years I have been directing a team of colleagues and students in the development of a UPMSat-2 microsatellite. The project originally started in 2013 as a follow-to the UPM-SAT 1, launched by an Ariane-4 in 1995.

The UPMSat-2 weighs 50kg, and its geometric envelope is a parallelepiped with a base measuring 0.5m x 0.5m and a height of 0.6m. The microsatellite is scheduled to be launched September 9, 2019 on a Vega launcher, and is expected to be operational for two years.

The primary goals of the project were:

  • to improve the knowledge of the project participants, both professors and students;
  • to demonstrate UPM’s capabilities in space technology;
  • to design, develop, integrate, test, launch and operate a microsatellite in orbit from within a university environment; and
  • to develop a qualified space platform that can be used for general purpose missions aimed at educational, scientific and technological demonstration applications.

The project encompasses development of the software together with the platform, thermal control, attitude control, and other elements. In 2014, we selected AdaCore’s GNAT cross-development environment for the UPMSat-2 microsatellite project’s real-time on-board and ground control software.

While Java is the primary language used to teach programming at UPM, Ada was chosen as the main programming language for our project because we considered it the most appropriate to develop high-integrity software. In total, the on-board software consists of more than 100 Ada packages, comprising over 38K lines of code. For the altitude control subsystem, we used C code that was generated automatically from Simulink® models (there are 10 source files in C, with a total of about 1,600 lines of code). We also used database and XML interfaces to support the development of the ground control software.

Ada is Easy to Learn

Since most of our students are last-year or graduate students, they generally have programming experience. However, they did not have experience with embedded systems or real-time programming, and none of them had any previous experience with GNAT, SPARK or Ada.

To teach Ada to our students, we provided them with John Barnes’ Programming in Ada 2012 textbook and spent a fair amount of time with it in the laboratory. The difficult part was not in understanding and using Ada, but rather in understanding the software issues and programming style associated with concurrency, exceptions, real-time scheduling, and large system design. Fortunately, Ada’s high-level concurrency model, simple exception facility, real-time support, and its many features for “programming in the large” (packages, data abstraction, child libraries, generics, etc.) helped to address these difficulties.

We also used a GNAT feature that saved us a lot of tedious coding time - the Scalar_Storage_Order attribute. The UPMSat-2’s on-board computer is big-endian and the ground computer is little-endian. Therefore, we had to decode and encode every telemetry and telecommand message to deal with the endianness mismatch. I learned about the Scalar_Storage_Order feature at an AdaCore Tech Day, and it works really well, even for 12-bit packet types.

Although it would have been nice to have some additional tool support for things like database and XML interfacing, we found the GNAT environment very intuitive and especially appreciated the GPS IDE; it’s a great tool for developing software.

Why Ada?

I have been in love with Ada for a long time. I learned programming with Pascal and concurrent Pascal (as well as Fortran and COBOL); I find it frustrating that, at many academic institutions, modular, strongly-typed, concurrent languages such as Ada have sometimes been replaced by others that have much weaker support for good programming practices.

While I do not teach programming at UPM, my research group tries to use Ada whenever possible, because we consider it the most appropriate programming language for illustrating the concepts of real-time and embedded systems. I have to say that most of my students have also fallen in love with Ada. Our graduate students in-particular appreciate the value and reliability that Ada brings to their final projects.

https://www.adacore.com/press/spanish-satellite-project

]]>
RFCs for Ada and SPARK evolution now on GitHub http://blog.adacore.com/rfcs-for-ada-and-spark-evolution-now-on-github Tue, 11 Jun 2019 12:46:54 +0000 Yannick Moy http://blog.adacore.com/rfcs-for-ada-and-spark-evolution-now-on-github

Ever wished that Ada was more this and less that? Or that SPARK had such-and-such feature for specifying your programs? Then you're not alone. The Ada-Comment mailing list is one venue for Ada language discussions, but many of us at AdaCore have felt the need for a more open discussion and prototyping of what goes in the Ada and SPARK languages. That's the main reason why we've set up a platform to collect, discuss and process language evolution proposals for Ada and SPARK. The platform is hosted on GitHub, and uses GitHub built-in mechanisms to allow people to propose fixes or evolutions for Ada & SPARK,or give feedback on proposed evolutions.

For SPARK, the collaboration between Altran and AdaCore allowed us to completely redesign the language as a large subset of Ada, including now object orientation (added in 2015), concurrency (added in 2016) and even pointers (now available!), but we're reaching the point where catching up with Ada cannot lead us much further, and we need a broader involvement from users to inform our strategic decisions. 

Regarding Ada, the language evolution has always been a collective effort by an international committee, but here too we feel that more user involvement would be beneficial to drive future evolution, including for the upcoming Ada 202X version. Note that there is no guarantee that changes discussed and eventually prototyped & implemented will ever make it into the Ada standard, even though AdaCore will do its best to collaborate with the Ada Rapporteur Group (ARG).

You will see that we've started using the RFC process internally. That's just the beginning. We plan to use this platform much more broadly within AdaCore and the Ada community to evolve Ada and SPARK in the future. Please join us in that collective effort if you are interested!

]]>
Using Pointers in SPARK http://blog.adacore.com/using-pointers-in-spark Thu, 06 Jun 2019 12:22:00 +0000 Claire Dross http://blog.adacore.com/using-pointers-in-spark

I joined the SPARK team during the big revamp leading to the SPARK 2014 version of the proof technology. Our aim was to include in SPARK all features of Ada 2012 that did not specifically cause problems for the formal verification process. Since that time, I have always noticed the same introduction to the SPARK 2014 language: a subset of Ada, excluding features not easily amenable to formal analysis. Following this sentence was a list of (more notable) excluded features. Over the years, this list of features has started to shrink, as (restricted) support for object oriented programming, tasking and others were added to the language. Up until now, the most notable feature that was still missing, to my sense, was pointer support (or support for access types as they are called in Ada). I always thought that this was a feature we were never going to include. Indeed, absence of aliasing is a key assumption of the SPARK analysis, and removing it would induce so much additional annotation burden for users, that it would make the tool hardly usable. This was what I believed, and this was what we kept explaining to users, who were regretting the absence of an often used feature of the language.

I think it was work on the ParaSail language, and the emergence of the Rust language, that first made us look again into supporting this feature. Pointer ownership models did not appear explicitly within Rust, but Rust made the restrictions associated with them look tractable, and maybe even desirable from a safety point of view. So what is pointer ownership? Basically, the idea is that an object designated by a pointer always has a single owner, which retains the right to either modify it, or (exclusive or) share it with others in a read-only way. Said otherwise, we always have either several copies of the pointer which allow only reading, or only a single copy of the pointer that allows modification. So we have pointers, but in a way that allows us to ignore potential aliases… What a perfect fit for SPARK! So we began to look into how ownership rules were enforced in Rust and ParaSail, and how we could adapt some of them for Ada without introducing too many special cases and new annotations. In this post, I will show you what we came up with. Don’t hesitate to comment and tell us what you like / don’t like about this feature.

The main idea used to enforce single ownership for pointers is the move semantics of assignments. When a pointer is copied through an assignment statement, the ownership of the pointer is transferred to the left hand side of the assignment. As a result, the right hand side loses the ownership of the object, and therefore loses the right to access it, both for writing and reading. On the example below, the assignment from X to Y causes X to lose ownership on the value it references. As a result, the last assertion, which reads the value of X, is illegal in SPARK, leading to an error message from GNATprove:

procedure Test is
  type Int_Ptr is access Integer;
  X : Int_Ptr := new Integer'(10);
  Y : Int_Ptr;                --  Y is null by default
begin
  Y := X;                     --  ownership of X is transferred to Y
  pragma Assert (Y.all = 10); --  Y can be accessed
  Y.all := 11;                --  both for reading and writing
  pragma Assert (X.all = 11); --  but X cannot, or we would have an alias
end Test;
test.adb:9:20: insufficient permission on dereference from "X"
test.adb:9:20: object was moved at line 6

In this example, we can see the point of these ownership rules. To correctly reason about the semantics of a program, SPARK needs to know, when a change is made, what are the objects that are potentially impacted. Because it assumes that there can be no aliasing (at least no aliasing of mutable data), the tool can easily determine what are the parts of the environment that are updated by a statement, be it a simple assignment, or for example a procedure call. If we were to break this assumption, we would need to either assume the worst (that all references can be aliases of each other) or require the user to explicitly annotate subprograms to describe which references can be aliased and which cannot. In our example, SPARK can deduce that an assignment to Y cannot impact X. This is only correct because of ownership rules that prevent us from accessing the value of X after the update of Y.

Note that a variable which has been moved is not necessarily lost for the rest of the program. Indeed, it is possible to assign it again, restoring ownership. For example, here is a piece of code that swaps the pointers X and Y:

declare
  Tmp : Int_Ptr := X; --  ownership of X is moved to Tmp
                      --  X cannot be accessed.
begin
  X := Y; --  ownership of Y is moved to X
          --  Y cannot be accessed
          --  X is unrestricted.
  Y := Tmp; --  ownership of Tmp is moved to Y
            --  Tmp cannot be accessed
            --  Y is unrestricted.
end;

This code is accepted by the SPARK tool. Intuitively, we can see that writing at top-level into X after it has been moved is OK, since it will not modify the actual owner of the moved value (here Tmp). However, writing in X.all is forbidden, as it would affect Tmp (don’t hesitate to look at the SPARK Reference Manual if you are interested in the formal rules of the move semantics). For example, the following variant is rejected:

declare
  Tmp : Int_Ptr := X; --  ownership of X is moved to Tmp
                      --  X cannot be accessed.
begin
  X.all := Y.all;
insufficient permission on dereference from "X"
object was moved at line 2

Moving is not the only way to transfer ownership. It is also possible to borrow the ownership of (a part of) an object for a period of time. When the borrower disappears, the borrowed object regains the ownership, and is accessible again. It is what happens for example for mutable parameters of a subprogram when the subprogram is called. The ownership of the actual parameter is transferred to the formal parameter for the duration of the call, and should be returned when the subprogram terminates. In particular, this disallows procedures that move some of their parameters away, as in the following example:

type Int_Ptr_Holder is record
   Content : Int_Ptr;
end record;

procedure Move (X : in out Int_Ptr_Holder; Y : in out Int_Ptr_Holder) is
begin
   X := Y; --  ownership of Y.Content is moved to X.Content
end Move;
insufficient permission for "Y" when returning from "Move"
object was moved at line 3

Note that I used a record type for the type of the parameters. Indeed, SPARK RM has a special wording for in out parameters of an access type, stating that they are not borrowed but moved on entry and on exit of the subprogram. This allows us to move in out access parameters, which otherwise would be forbidden, as borrowed top-level access objects cannot be moved.

The SPARK RM also allows declaring local borrowers in a nested scope by using an anonymous access type:

declare
  Y : access Integer := X;    --  Y borrows the ownership of X
                              --  for the duration of the declare block
begin
  pragma Assert (Y.all = 10); --  Y can be accessed
  Y.all := 11;                --  both for reading and writing
end;
pragma Assert (X.all = 11);   --  The ownership of X is restored, 
                              --  it can be accessed again

But this is not supported yet by the proof tool, as it raises the complex issue of tracking modifications of X that were done through Y during its lifetime:

local borrower of an access object is not yet supported

It is also possible to share a single reference between several readers. This mechanism is called observing. When a variable is observed, both the observed object and the observer retain the right to read the object, but none can modify it. As for borrowing, when the observer disappears, the observed object regains the permissions it had before (read-write or read-only). Here is an example. We have a list L, defined as a recursive pointer-based data structure in the usual way.  We then observe its tail by introducing a local observer N using an anonymous access to constant type. We then do it again to observe the tail of N:

declare
   N : access constant List := L.Next; -- observe part of L
begin
   declare
      M : access constant List := N.Next; -- observe again part of N
   begin
      pragma Assert (M.Val = 3); --  M can be read
      pragma Assert (N.Val = 2); --  but we can still read N
      pragma Assert (L.Val = 1); --  and even L
   end;
end;
L.Next := null; --  all observers are out of scope, we can modify L

We can see that the three variables retain the right to read their content. But it is OK as none of them is allowed to update it. When no more observers exist, it is again possible to modify L.

In addition to single ownership, SPARK restricts the use of access types in several ways. The most notable one is that SPARK does not allow general access types. The reason is that we did not want to deal with accesses to variables defined on the stack and accessibility levels. Also, access types cannot be stored in subcomponents of tagged types, to avoid having access types hidden in record extensions.

To get convinced that the rules enforced by SPARK still allow common use cases, I think the best is to look at an example.

A common use case for pointers in Ada is to store indefinite types inside data-structures. Indefinite types are types whose subtype is not known statically. It is the case for example for unconstrained arrays. Since the size of an indefinite type is not known statically, it is not possible to store it inside a data-structure, such as another array, or a record. For example, as strings are arrays, it is not possible to create an array that can hold strings of arbitrary length in Ada. The usual work-around consists in adding an indirection via the use of pointers, storing pointers to indefinite elements inside the data structure. Here is an example of how this can now be done in SPARK, for a minimal implementation of a dictionary. A simple vision of a dictionary is an array of strings. Since strings are indefinite, I need to define an access type to be allowed to store them inside an array:

type Word is not null access String;
type Dictionary is array (Positive range <>) of Word;

We can then search for a word in a dictionary. The function below is successfully verified in SPARK. In particular, SPARK is able to verify that no null pointer dereference may happen, due to Word being an access type with null exclusion:

function Search (S : String; D : Dictionary) return Natural with
  Post => (Search'Result = 0 and then
             (for all I in D'Range => D (I).all /= S))
  or else (Search'Result in D'Range and then D (Search'Result).all = S) is
begin
   for I in D'Range loop
      pragma Loop_Invariant
        (for all K in D'First .. I - 1 => D (K).all /= S);
      if D (I).all = S then
         return I;
      end if;
   end loop;
   return 0;
end Search;

Now imagine that I want to modify one of the words stored in my dictionary. The words may not have the same length, so I need to replace the pointer in the array. For example:

My_Dictionary (1) := new String'("foo");
pragma Assert (My_Dictionary (1).all = "foo");

But this is not great, as now I have a memory leak. Indeed, the value previously stored in My_Dictionary is no longer accessible and it has not been deallocated. The SPARK tool does not currently complain about this problem, even though the SPARK definition says it should (it has not been implemented yet). But let’s try to correct our code nevertheless by storing the value previously in dictionary in a temporary and deallocating it afterward. First I need a deallocation function. In Ada, they can be obtained by instantiating the generic procedure Ada.Unchecked_Deallocation with the appropriate types (note that as access objects are set to null after deallocation, I had to introduce a base type for Word without the null exclusion constraint):

type Word_Base is access String;
subtype Word is not null Word_Base;
procedure Free is new Ada.Unchecked_Deallocation (String, Word_Base);

Then, I can try to do the replacement:

declare
   Temp : Word_Base := My_Dictionary (1);
begin
   My_Dictionary (1) := new String'("foo");
   Free (Temp);
   pragma Assert (My_Dictionary (1).all = "foo");
end;

Unfortunately this does not work, the SPARK tool complains with:

test.adb:36:37: insufficient permission on dereference from "My_Dictionary"
test.adb:36:37: object was moved at line 3

Where line 31 is the line where Temp is defined and 36 is the assertion. So, what is happening? In fact, this is due to the way checking of single ownership is done in SPARK. As the analysis used for this verification is not value dependent, when a cell of an array is moved, the tool is never able to determine whether or not ownership to an array cell has been regained. As a result, if an element of an array is moved away, the array will never become readable again unless it is assigned as a whole. Better to avoid moving elements of an array in these conditions, right? So what can we do? If we cannot move, what about borrowing... Let us try with an auxiliary Swap procedure:

procedure Swap (X, Y : in out Word_Base) with
  Pre => X /= null and Y /= null,
  Post => X /= null and Y /= null
  and X.all = Y.all'Old and Y.all = X.all'Old
is
   Temp : Word_Base := X;
begin
   X := Y;
   Y := Temp;
end Swap;

declare
   Temp : Word_Base := new String'("foo");
begin
   Swap (My_Dictionary (1), Temp);
   Free (Temp);
   pragma Assert (My_Dictionary (1).all = "foo");
end;

Now everything is fine. The ownership on My_Dictionary (1) is temporarily transferred to the X formal parameter of Swap for the duration of the call, and it is restored at the end. Now the SPARK tool can ensure that My_Dictionary indeed has the full ownership of its content after the call, and the read inside the assertion succeeds. This small example is also verified by the SPARK tool.

I hope this post gave you a taste of what it would be like to program using pointers in SPARK. If you now feel like using them, a preview is available in the community 2019 edition of GNAT+SPARK. Don’t hesitate to come back to us with your findings, either on GitHub or by email.

]]>
GNAT Community 2019 is here! http://blog.adacore.com/gnat-community-2019-is-here Wed, 05 Jun 2019 12:56:00 +0000 Nicolas Setton http://blog.adacore.com/gnat-community-2019-is-here

We are pleased to announce that GNAT Community 2019 has been released! See https://www.adacore.com/download.

This release is supported on the same platforms as last year:

  • Windows, Linux, and Mac 64-bit native

  • RISC-V hosted on Linux

  • ARM 32 bits hosted on 64-bit Linux, Mac, and Windows

GNAT Community now includes a number of fixes and enhancements, most notably:

  • The installer for Windows and Linux now contains pre-built binary distributions of Libadalang, a very powerful language tooling library for Ada and SPARK.

Check out the README for some additional platform-specific notes.

We hope you enjoy using SPARK and Ada!

]]>
Bringing Ada To MultiZone http://blog.adacore.com/bringing-ada-to-multizone Wed, 29 May 2019 21:30:00 +0000 Boran Car http://blog.adacore.com/bringing-ada-to-multizone

Introduction

C is the dominant language of the embedded world, almost to the point of exclusivity. Due to its age, and its goal of being a “portable assembler”, it deliberately lacks type-safety that languages like Ada provide. The lack of type-safety in C is one of the reasons for the commonness of embedded device exploits. Proposed solutions are partitioning the application into smaller intercommunicating blocks, designed with the principle of least privilege in mind; and rewriting the application in a type-safe language. We believe that both approaches are complementary and want to show you how to combine separation and isolation provided by MultiZone together with iteratively rewriting parts in Ada.

We will take the MultiZone SDK demo and rewrite one of the zones in Ada. The full demo simulates an industrial application with a robotic arm. It runs on the Arty A7-35T board and interfaces with the PC and a robotic arm (OWI-535 Robotic Arm) via an SPI to USB converter. More details are available from the MultiZone Security SDK for Ada manual (https://github.com/hex-five/multizone-ada/blob/master/manual.pdf). We will just be focusing on the porting process here.

MultiZone SDK demo with 3 zones

MultiZone Security

MultiZone(TM) Security is the first Trusted Execution Environment for RISC-V - it enables development of a simple, policy-based security environment for RISC-V that supports rich operating systems through to bare metal code. It is a culmination of the embedded security best practices developed over the last decade and now applied to RISC-V processors. Instead of splitting into the secure and non-secure domain, MultiZoneTM Security provides policy-based hardware-enforced separation for an unlimited number of security domains, with full control over data, code and peripherals.

MultiZoneTM Security consists of the following components:

  • MultiZone(TM) nanoKernel - lightweight, formally verifiable, bare metal kernel providing policy-driven hardware-enforced separation of ram, rom, i/o and interrupts.
  • InterZone(TM) Messenger - communications infrastructure to exchange secure messages across zones on a no- shared memory basis.
  • MultiZone(TM) Configurator - combines fully linked zone executables with policies and kernel to generate the signed firmware image.
  • MultiZone(TM) Signed Boot - 2-stage signed boot loader to verify integrity and authenticity of the firmware image (sha-256 / ECC)

Contrary to traditional solutions, MultiZone(TM) Security requires no additional hardware, dedicated cores or clunky programming models. Open source libraries, third party binaries and legacy code can be configured in minutes to achieve unprecedented levels of safety and security. See https://hex-five.com/ for more details or check out the MultiZone SDK repository on GitHub - https://github.com/hex-five/multizone-sdk.

Ada on MultiZone

We port zone 3, the zone controlling the robotic arm, to Ada. The zone communicates with other zones via MultiZone APIs and with the robotic arm by bitbanging GPIO pins.

New Runtime

MultiZone zones differ from a bare metal applications as access to resources is restricted – a zone has only a portion of the RAM and FLASH and can only access some of the peripherals. Looking at our configuration (https://github.com/hex-five/multizone-ada/blob/master/bsp/X300/multizone.cfg), zone3 has the following access privileges:

Zone = 3 #
    base = 0x20430000; size =   64K; rwx = rx # FLASH
    base = 0x80003000; size =    4K; rwx = rw # RAM
    base = 0x0200BFF8; size =   0x8; rwx = r  # RTC
    base = 0x10012000; size = 0x100; rwx = rw # GPIO

In the Ada world, this translates to having a separate runtime that we need to create. Luckily, AdaCore has released sources to their existing runtimes on GitHub - https://github.com/adacore/bb-runtimes and they have also included a how-to for creating new runtimes - https://github.com/AdaCore/bb-runtimes/tree/community-2018/doc/porting_runtime_for_cortex_m. Here’s how we create our customized HiFive1 runtime:

./build_rts.py --bsps-only --output=build --prefix=lib/gnat hifive1

This creates sources for building a runtime using a mix of sources from bb-runtimes and from the compiler itself thanks to the --bsps-only flag. Without this switch, we would need the original GNAT repository, which is not publicly available. Notice we don’t use --link, so our runtime sources are a proper copy and can be checked into a new git repo - https://github.com/hex-five/multizone-ada/tree/master/bsp/X300/runtime. Our runtime needs to be compiled and installed before it can be used, and we do that automatically as part of the Makefile for zone3 - https://github.com/hex-five/multizone-ada/blob/master/zone3/Makefile:

BSP_BASE := ../bsp
PLATFORM_DIR := $(BSP_BASE)/$(BOARD)
RUNTIME_DIR := $(PLATFORM_DIR)/runtime
GPRBUILD := $(abspath $(GNAT))/bin/gprbuild
GPRINSTALL := $(abspath $(GNAT))/bin/gprinstall

.PHONY: all
all:
    $(GPRBUILD) -p -P $(RUNTIME_DIR)/zfp_hifive1.gpr
    $(GPRINSTALL) -f -p -P $(RUNTIME_DIR)/zfp_hifive1.gpr --prefix=$(RUNTIME_DIR)
    $(AR) cr $(RUNTIME_DIR)/lib/gnat/zfp-hifive1/adalib/libgnat.a
    $(RUNTIME_DIR)/hifive1/zfp/obj/*.o
    $(GPRBUILD) -f -p -P zone3.gpr
    $(OBJCOPY) -O ihex obj/main zone3.hex --gap-fill 0x00

Note that we use a combination of GPRbuild and Make to minimize code differences between the two repositories as much as possible. GPRbuild is limited to zone3 only.

Board support

We use the Ada Drivers Library on GitHub (https://github.com/AdaCore/Ada_Drivers_Library) as a starting point as it provides examples for a variety of boards and architectures. Our target is the X300, itself a modified HiFive1/FE310/FE300. The differences between FE310/FE300 and X300 are detailed on the multizone-fpga GitHub repository (https://github.com/hex-five/multizone-fpga).

We need to change the drivers for our application slightly - we want LD0 for indicating the status of the robotic arm, red blink when disconnected and green when connected:

with FE310.Device; use FE310.Device;
with SiFive.GPIO; use SiFive.GPIO;

package Board.LEDs is

    subtype User_LED is GPIO_Point;

    Red_LED   : User_LED renames P01;
    Green_LED : User_LED renames P02;
    Blue_LED  : User_LED renames P03;

    procedure Initialize;
    -- MUST be called prior to any use of the LEDs

    procedure Turn_On (This : in out User_LED) renames SiFive.GPIO.Set;
    procedure Turn_Off (This : in out User_LED) renames SiFive.GPIO.Clear;
    procedure Toggle (This : in out User_LED) renames SiFive.GPIO.Toggle;

    procedure All_LEDs_Off with Inline;
    procedure All_LEDs_On with Inline;
end Board.LEDs;

Having the package named Board allows us to modify the target board at compile-time by just providing the folder containing the Board package. This goes in line with what multizone-sdk does.

Code

MultiZone support

MultiZone nanoKernel offers trap and emulate so existing applications can be provided to MultiZone directly unmodified and will work as expected. Access to RAM, FLASH and peripherals needs to be allowed in the configuration file, though. MultiZone does provide functionality for increased performance, better power usage and interzone communication via the API in Libhexfive. Here we create an Ada wrapper around libhexfive32.a, providing the MultiZone specific calls:

package MultiZone is

    procedure ECALL_YIELD; -- libhexfive.h:8
    pragma Import (C, ECALL_YIELD, "ECALL_YIELD");

    procedure ECALL_WFI; -- libhexfive.h:9
    pragma Import (C, ECALL_WFI, "ECALL_WFI");
    ...
end MultiZone;

A typical MultiZone optimized application will yield whenever it doesn’t have anything to do, to save on processing time and power:

with MultiZone; use MultiZone;

procedure Main is
begin
    -- Application initialization
    loop
      -- Application code
      ECALL_YIELD;
    end loop;
end Main;

If a zone wants to communicate with other zones, such as receiving commands and sending back replies, it needs to use the ECALL_SEND/ECALL_RECV. These send/receive a chunk of 16 bytes and return a status whether the send/receive was successful. The prototypes are a bit special, as they take a void * parameter, which translates to System.Address:

function ECALL_SEND_C (arg1 : int; arg2 : System.Address) return int; -- libhexfive.h:11
pragma Import (C, ECALL_SEND_C, "ECALL_SEND");
function ECALL_RECV_C (arg1 : int; arg2 : System.Address) return int; -- libhexfive.h:12
pragma Import (C, ECALL_RECV_C, "ECALL_RECV");

We wrap these to provide a more Ada idiomatic alternative:

type Word is new Unsigned_32;
type Message is array (0 .. 3) of aliased Word;
pragma Pack (Message);
subtype Zone is int range 1 .. int'Last;

function Ecall_Send (to : Zone; msg : Message) return Boolean;
function Ecall_Recv (from : Zone; msg : out Message) return Boolean;

We hide the System.Address usage and provide a safer subtype for the source/destination zone, since zone cannot be negative or 0.

function Ecall_Send (to : Zone; msg : Message) return Boolean is
begin
    return ECALL_SEND_C (to, msg'Address) = 1;
end Ecall_Send;

function Ecall_Recv (from : Zone; msg : out Message) return Boolean is
begin
    return ECALL_RECV_C (from, msg'Address) = 1;
end Ecall_Recv;

With all the primitives in place, we can make a simple MultiZone optimized application that can respond to a ping from another zone:

with HAL;       use HAL;
with MultiZone; use MultiZone;

procedure Main is
begin
    -- Application initialization
    loop
      -- Application code
      declare
        msg : Message;
        Status : Boolean := Ecall_Recv (1, msg);
      begin
        if Status then
          if msg(0) = Character'Pos('p') and
            msg(1) = Character'Pos('i') and
            msg(2) = Character'Pos('n') and
            msg(3) = Character'Pos('g') then
            Status := Ecall_Send (1, msg);
          end if;
        end if;
      end;

      ECALL_YIELD;
    end loop;
end Main;

Keeping some legacy (OWI Robot)

We keep the SPI functionality and the Owi Task as C files and just create Ada bindings for them. spi_c.c implements the SPI protocol by bit-banging GPIO pins:

package Spi is

    procedure spi_init; -- ./spi.h:8
    pragma Import (C, spi_init, "spi_init");
    
    function spi_rw (cmd : System.Address) return UInt32; -- ./spi.h:9
    pragma Import (C, spi_rw, "spi_rw");

end Spi;

The OwiTask (owi_task.c) is a state machine containing different robot sequences. The main function is owi_task_run, while others are change the active state. owi_task_run returns the next SPI command to send via SPI for a given moment in time:

package OwiTask is

    procedure owi_task_start_request; -- ./owi_task.h:8
    pragma Import (C, owi_task_start_request, "owi_task_start_request");

    procedure owi_task_stop_request; -- ./owi_task.h:9
    pragma Import (C, owi_task_stop_request, "owi_task_stop_request");

    procedure owi_task_fold; -- ./owi_task.h:10
    pragma Import (C, owi_task_fold, "owi_task_fold");

    procedure owi_task_unfold; -- ./owi_task.h:11
    pragma Import (C, owi_task_unfold, "owi_task_unfold");

    function owi_task_run (time : UInt64) return UInt32; -- ./owi_task.h:12
    pragma Import (C, owi_task_run, "owi_task_run");

end OwiTask;

The following Ada code runs the state machine for the OWI robot. Since the target functions are written in C, we see how Ada can interact with legacy C code:

-- OWI sequence run
if usb_state = 16#12670000# then
  declare
    cmd_word : UInt32 := owi_task_run(CLINT.Machine_Time);
    cmd_bytes : Cmd;
  begin
    if cmd_word /= -1 then
      cmd_bytes(0) := UInt8 (cmd_word and 16#FF#);
      cmd_bytes(1) := UInt8 (Shift_Right (cmd_word,  8) and 16#FF#);
      cmd_bytes(2) := UInt8 (Shift_Right (cmd_word, 16) and 16#FF#);
      rx_data := spi_rw (cmd_bytes'Address);
      ping_timer := CLINT.Machine_Time + PING_TIME;
    end if;
  end;
end if;

We now port over the owi sequence selection:

declare
  Status : Boolean := Ecall_Recv (1, msg);
begin
  if Status then
    -- OWI sequence select
    if usb_state = 16#12670000# then
      case msg(0) is
        when Character'Pos('<') => owi_task_fold;
        when Character'Pos('>') => owi_task_unfold;
        when Character'Pos('1') => owi_task_start_request;
        when Character'Pos('0') => owi_task_stop_request;
        when others => null;
      end case;
    end if;
...

We leave it as an exercise for the reader to port the OWI sequence handler code to Ada.

LED Blinking

We want the LED color to change as a result of the robot connected or disconnected events. Each of the LED colors is a separate pin on the board:

Red_LED   : User_LED renames P01;
Green_LED : User_LED renames P02;
Blue_LED  : User_LED renames P03;

One way of handling this is to create an Access Type that can store the currently selected LED color:

procedure Main is
    type User_LED_Select is access all User_LED;
    ...
    LED : User_LED_Select := Red_LED'Access;
    ...
begin
    ...
    -- Detect USB state every 1sec
    if CLINT.Machine_Time > ping_timer then
      rx_data := spi_rw(CMD_DUMMY'Address);
      ping_timer := CLINT.Machine_Time + PING_TIME;
    end if;

    -- Update USB state
    declare
      Status : int;
    begin
      if rx_data /= usb_state then
        usb_state := rx_data;

        if rx_data = UInt32'(16#12670000#) then
          LED := Green_LED'Access;
        else
          LED := Red_LED'Access;
          owi_task_stop_request;
        end if;
      end if;
    end;

The actual blinking is then just a matter of dereferencing the Access Type and calling the right function:

    -- LED blink
    if CLINT.Machine_Time > led_timer then
      if GPIO.Set (Red_LED) or GPIO.Set (Green_LED) or GPIO.Set (Blue_LED) then
        All_LEDs_Off;
        led_timer := CLINT.Machine_Time + LED_OFF_TIME;
      else
        All_LEDs_Off;
        Turn_On (LED.all);
        led_timer := CLINT.Machine_Time + LED_ON_TIME;
      end if;
    end if;

Closing

If you would like to know more about MultiZone and Ada, please reach out to Hex-Five Security - https://hex-five.com/contact-2/. I will also be attending the RISC-V Workshop in Zurich - https://tmt.knect365.com/risc-v-workshop-zurich/ if you would like to grab a coffee and discuss MultiZone and Ada on RISC-V.

]]>
Winning DTU RoboCup with Ada and SPARK http://blog.adacore.com/winning-dtu-robocup-with-ada-and-spark Thu, 16 May 2019 13:48:19 +0000 Allan Ascanius http://blog.adacore.com/winning-dtu-robocup-with-ada-and-spark

Using SPARK to prove absence of run time errors on a hobby project.

The Danish Technical University has a yearly RoboCup where autonomous vehicles solve a number of challenges. Each solved challenge gives points and the team with most points wins. The track is available for testing two weeks before the qualification round.

The idea behind and creation of RoadRunner for DTU RoboCup 2019.

RoadRunner is a 3D printed robot with wheel suspension, based on the BeagleBone Blue ARM-based board and the Pixy 1 camera with custom firmware enabling real-time line detection. Code is written in Ada and formally proved correct with SPARK at Silver level. SPARK Silver level proves that the code will execute without run-time errors like overflows, race-conditions and deadlocks. During development SPARK prevented numerous hard-to-debug errors to reach the robot. The time-boxed testing period of DTU RoboCup made early error detection even more valuable. In our opinion the estimated time saved on debugging more than outweighs the time we spent on SPARK. The robot may still fail, but it will not be due to run-time errors in our code...

Ada was the obvious choice. We both have years of experience with Ada. The BeagleBone Blue runs Debian Linux and we use the default GNAT FSF compiler version from Debian Testing. It is easy to cross compile from a Debian laptop and it is not too old compared to GNAT Community Edition. The BeagleBone Blue runs Debian Stable.

We decided initially not to use SPARK. According to some internet articles, SPARK had problems proving properties about floating points, and the Ada subset for tasking seemed to be too restricted. Support for floating points has improved since then, and tasking was extended to the Jorvik (extended Ravenscar) profile.

A race condition in the code changed the SPARK decision. A lot of valuable time was spent chasing it. The GNAT runtime on Debian armhf has no traceback info, so it was difficult to find what caused a Storage_Error. SPARK flow analysis detects race conditions, so that would prevent that kind of issues in the future.

It took a lot of effort before SPARK would be able to do flow analysis without errors. SPARK rejects code with exception handlers, fixed-point to floating-point conversions, the standard Ada.Containers and task entries. Hopefully, future versions of SPARK will be able to analyze the above or at least give a warning and continue analysis, even if it contains unsupported Ada features. When the code base is in SPARK, it is quite easy to add new code in SPARK.

Experience with SPARK.

I had to start from scratch, with no previous experience with SPARK, formal methods or safety critical software. I read the online documentation to find examples on how to prove different types of properties.

SPARK is a tough code reviewer. It detects bad design decisions and forces a rewrite. Bad design is impossible to test and prove and SPARK does not miss any of it. Very valuable for small projects without any code review.

SPARK code is easier to read. Defensive code is moved from exception handlers and if statements to preconditions. The resulting code has less branching and is easier to understand and test.

SPARK code is also more difficult to read. Loop invariants and assertions to prove loops can outnumber the Ada statements in the loop.

We used saturation as a shortcut to prove absence of overflows, in locations where some values could not be proved to stay in bounds. But this is not an ideal solution: if it does saturate, then the result is not correct.

In our experience, proving absence of run time errors also detects real bugs, not only division by zero and overflows.

Two examples.

SPARK could not prove a resulting value in range. A scaling factor had been divided with instead of multiplied with.

SPARK could not prove array index in range. X and Y on camera image calculation had been swapped.

Stability. 

If code is in SPARK, then all changes must be analyzed by SPARK. Almost all coding error gives a run time exception. That happened several times on the track during testing. But when tested with SPARK, it always found the exact line with the bug. After learning the lesson we found that code analyzed with SPARK always worked the first time.

Having a stable build environment with SPARK and automated tests, made it possible to make successful last-minute changes to the software just before the final run in the video link.


Here is the presentation video for our robot:


and the video of the final winning challenge:

]]>
Using SPARK to prove 255-bit Integer Arithmetic from Curve25519 http://blog.adacore.com/using-spark-to-prove-255-bit-integer-arithmetic-from-curve25519 Tue, 30 Apr 2019 19:00:00 +0000 Joffrey Huguet http://blog.adacore.com/using-spark-to-prove-255-bit-integer-arithmetic-from-curve25519

In 2014, Adam Langley, a well-known cryptographer from Google, wrote a post on his personal blog, in which he tried to prove functions from curve25519-donna, one of his projects, using various verification tools: SPARK, Frama-C, Isabelle... He describes this attempt as "disappointing", because he could not manage to prove "simple" things, like absence of runtime errors. I will show in this blogpost that today, it is possible to prove what he wanted to prove, and even more.

Algorithms in elliptic-curve cryptography compute numbers which are too large to fit into registers of typical CPUs. In this case, curve25519 uses integers ranging from 0 to 2255-19. They can't be represented as native 32- or 64-bit integers. In the original implementations from Daniel J. Bernstein and curve25519-donna, these integers are represented by arrays of 10 smaller, 32-bit integers. Each of these integers is called a "limb". Limbs have alternatively 26-bit and 25-bit length. This forms a little-endian representation of the bigger integer, with low-weight limbs first. In section 4 of this paper, Bernstein explains both the motivation and the details of his implementation. The formula to convert an array A of 32-bit integers into the big integer it represents is: \[ \sum_{i=0}^{9} A (i) \times 2^{\lceil 25.5i \rceil} \] where the half square brackets represent the ceiling function. I won't be focusing on implementation details here, but you can read about them in the paper above. To me, the most interesting part about this project is its proof.

First steps

Types

The first step was to define the different types used in the implementation and proof.

I'll use two index types. Index_Type, which is range 0 .. 9, is used to index the arrays that represent the 255-bit integers. Product_Index_Type is range 0 .. 18, used to index the output array of the Multiply function.

Integer_Curve25519 represents the arrays that will be used in the implementation, so 64-bit integer arrays with variable bounds in Product_Index_Type.  Product_Integer is the array type of the Multiply result. It is a 64-bit integer array with Product_Index_Type range. Integer_255 is the type of the arrays that represent 255-bit integers, so arrays of 32-bit integers with Index_Type range.

Big integers library

Proving Add and Multiply in SPARK requires reasoning about the big integers represented by the arrays. In SPARK, we don't have mathematical integers (i.e. integers with infinite precision), only bounded integers, which are not sufficient for our use case. If I had tried to prove correctness of Add and Multiply in SPARK with bounded integers, the tool would have triggered overflow checks every time an array is converted to integer.

To overcome this problem, I wrote the specification of a minimal big integers library, that defines the type, relational operators and basic operations. It is a specification only, so it is not executable.

SPARK is based on why, which has support for mathematical integers, and there is a way to link a why3 definition to a SPARK object directly. This is called external axiomatization; you can find more details about how to do it here.

Using this feature, I could easily provide a big integers library with basic functions like "+", "*" or To_Big_Integer (X : Integer). As previously mentioned, this library is usable for proof, but not executable (subprograms are marked as Import).  To avoid issues during binding, I used a feature of SPARK named Ghost code. It takes the form of an aspect: "Ghost" indicates to the compiler that this code cannot affect the values computed by ordinary code, thus can be safely erased. It's very useful for us, since this aspect can be used to write non-executable functions which are only called in annotations.

Conversion function

One of the most important functions used in the proof is a function that converts the array of 10 integers to the big integer it represents, so when X is an Integer_255 then +X is its corresponding big integer.

function "+" (X : Integer_Curve25519) return Big_Integer
  renames To_Big_Integer;

After this, I can define To_Big_Integer recursively, with the aid of an auxiliary function, Partial_Conversion.

function Partial_Conversion
  (X : Integer_Curve25519;
   L : Product_Index_Type)
   return Big_Integer
is
  (if L = 0
   then (+X(0)) * Conversion_Array (0)
   else
      Partial_Conversion (X, L - 1) + (+X (L)) * Conversion_Array (L))
with
  Ghost,
  Pre => L in X'Range;

function To_Big_Integer (X : Integer_Curve25519)
  return Big_Integer
is
  (Partial_Conversion (X, X'Last))
with
  Ghost,
  Pre => X'Length > 0;

Specification of Add and Multiply

With these functions defined, it was much easier to write specifications of Add and Multiply. All_In_Range is used in preconditions to bound the parameters of Add and Multiply in order to avoid overflow.

function All_In_Range
  (X, Y     : Integer_255;
   Min, Max : Long_Long_Integer)
   return     Boolean
is
  (for all J in X'Range =>
      X (J) in Min .. Max
      and then Y (J) in Min .. Max);

function Add (X, Y : Integer_255) return Integer_255 with
  Pre  => All_In_Range (X, Y, -2**30 + 1, 2**30 - 1),
  Post => +Add'Result = (+X) + (+Y);

function Multiply (X, Y : Integer_255) return Product_Integer with
  Pre  => All_In_Range (X, Y, -2**27 + 1, 2**27 - 1),
  Post => +Multiply'Result = (+X) * (+Y);

Proof

Proof of absence of runtime errors was relatively simple: no annotation was added to Add, and a few, very classical loop invariants for Multiply. Multiply is the function where Adam Langley stopped because he couldn't prove absence of overflow. I will focus on proof of functional correctness, which was a much more difficult step. In SPARK, Adam Langley couldn't prove it for Add because of overflow checks triggered by the tool when converting the arrays to the big integers they represent. This is the specific part where the big integers library is useful: it is possible to manipulate big integers in proof without overflow checks.

The method I used to prove both functions is the following:

  1.   Create a function that allows tracking the content of the returned array.
  2.   Actually track the content of the returned array through loop invariant(s) that ensure equality with the function in point 1.
  3.  Prove the equivalence between the loop invariant(s) at the end of the loop and the postcondition in a ghost procedure.
  4.  Call the procedure right before return; line.

Add

I will illustrate this example with Add. The final implementation of Add is the following.

function Add (X, Y : Integer_255) return Integer_255 is
   Sum : Integer_255 := (others => 0);
begin
   for J in Sum'Range loop
      Sum (J) := X (J) + Y (J);

      pragma Loop_Invariant (for all K in 0 .. J =>
                               Sum (K) = X (K) + Y (K));
   end loop;
   Prove_Add (X, Y, Sum);
   return Sum;
end Add;

Points 1 and 2 are ensured by the loop invariant. No new function was created since the content of the array is simple (just an addition). At the end of the loop, the information we have is: "for all K in 0 .. 9 => Sum (K) = X (K) + Y (K)" which is an information on the whole array. Points 3 and 4 are ensured by Prove_Add. Specification of Prove_Add is:

procedure Prove_Add (X, Y, Sum : Integer_255) with
  Ghost,
  Pre  => (for all J in Sum'Range => Sum (J) = X (J) + Y (J)),
  Post => To_Big_Integer (Sum) = To_Big_Integer (X) + To_Big_Integer (Y);

This is what we call a lemma in SPARK. Lemmas are ghost procedures, where preconditions are the hypotheses, and postconditions are the conclusions. Some lemmas are proved automatically, while others require solvers to be guided. You can do this by adding a non-null body to the procedure, and guide them differently, depending on the conclusion you want to prove.

I will talk about the body of Prove_Add in a few paragraphs.

Multiply

Multiply was much harder than Add had been. But I had the most fun proving it. I refer to the implementation in curve25519-donna as the "inlined" one, because it is fully explicit and doesn't go through loops. This causes a problem in the first point of my method: it is not possible to track what the routine does, except by adding assertions at every line of code. To me, it's not a very interesting point of view, proof-wise. So I decided to change the implementation, to make it easier to understand. The most natural approach for multiplication, in my opinion, is to use distributivity of product over addition. The resulting implementation would be similar to TweetNaCl's implementation of curve25519 (see M function):

function Multiply (X, Y : Integer_255) return Product_Integer is
   Product : Product_Integer := (others => 0);
begin
   for J in Index_Type loop
      for K in Index_Type loop
         Product (J + K) :=
           Product (J + K)
           + X (J) * Y (K) * (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1);
      end loop;
   end loop;
   return Product;
end Multiply;

Inside the first loop, J is fixed and we iterate over all values of K, so on all the values of Y. It is repeated over the full range of J, so the entire content of X.  With this implementation, it is possible to track the content of the array in loop invariants through an auxiliary function, Partial_Product, that will track the value of a certain index in the array at each iteration. We add the following loop invariant at the end of the loop:

pragma Loop_Invariant (for all L in 0 .. K =>
                         Product (J + L) = Partial_Product (X, Y, J, L));

The function Partial_Product is defined recursively, because it is a sum of several factors.

function Partial_Product
  (X, Y : Integer_255;
   J, K : Index_Type)
   return Long_Long_Integer
is
  (if K = 9 or else J = 0
   then (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1) * X (J) * Y (K)
   else Partial_Product (X, Y, J - 1, K + 1)
        + (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1) * X (J) * Y (K));

As seen in the loop invariant above, the function returns Product (J + K), which is modified for all indexes L and M when J + K = L + M. The recursive call will be on the pair J + 1, K - 1, or J - 1, K + 1. Given how the function is designed in the implementation, the choice J - 1, K + 1 is preferable, because it follows the evolution of Product (J + K). The base case is when J = 0 or K = 9.

Problems and techniques

Defining functions to track content was rather easy to do for both functions. Proving that the content is actually equal to the function was a bit more difficult for the Multiply function, but not as difficult as proving the equivalence between this and the postcondition. With these two functions, we challenge the provers in many ways:

  • We use recursive functions in contracts, which means provers have to reason inductively and they struggle with this. That's why we need to /guide/ them in order to prove properties that involve recursive functions.
  • The context size is quite big, especially for Multiply. Context represent all variables, axioms, theories that solvers are given to prove a check. The context grows with the size of the code, and sometimes it is too big. When this happens, solvers may be lost and not be able to prove the check. If the context is reduced, it will be easier for solvers, and they may be able to prove the previously unproved checks. 
  • Multiply contracts need proof related to nonlinear integers arithmetics theory, which provers have a lot of problems with.  This problem is specific to Multiply, but it explains why this function took some time to prove. Solvers need to be guided quite a bit in order to prove certain properties.

What follows is a collection of techniques I found interesting and might be useful when trying to prove problems, even very different from this one.

Axiom instantiation

When I tried to prove loop invariants in Multiply to track the value of Product (J + K), solvers were unable to use the definition of Partial_Product. Even asserting the exact same definition failed to be proven. This is mainly due to context size: solvers are not able to find the axiom in the search space, and the proof fails. The workaround I found is to create a Ghost procedure which has the definition as its postcondition, and a null body, like this:

procedure Partial_Product_Def
  (X, Y : Integer_255;
   J, K : Index_Type)
with
  Pre  => All_In_Range (X, Y, Min_Multiply, Max_Multiply),
  Post =>
    (if K = Index_Type'Min (9, J + K)
     then Partial_Product (X, Y, J, K)
          = (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1)
            * X (J) * Y (K)
     else Partial_Product (X, Y, J, K)
          = Partial_Product (X, Y, J - 1, K + 1)
            + X (J) * Y (K)
              * (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1));
procedure Partial_Product_Def
  (X, Y : Integer_255;
   J, K : Index_Type)
is null;

In this case, the context size is reduced considerably, and provers are able to prove it without any body. During the proof process, I had to create other recursive functions that needed this *_Def procedure in order to use their definition. It has to be instantiated manually, but it is a way to "remind" solvers this axiom. A simple but very useful technique.

Manual induction

When trying to prove Prove_Add, I encountered one of the cases where the provers have to reason inductively: a finite sum is defined recursively. To help them prove the postcondition, the conversion is computed incrementally in a loop, and the loop invariant tracks the evolution.

procedure Prove_Add (X, Y, Sum : Integer_255) with
  Ghost,
  Pre  => (for all J in Sum'Range => Sum (J) = X (J) + Y (J)),
  Post => To_Big_Integer (Sum) = To_Big_Integer (X) + To_Big_Integer (Y);
--  Just to remember

procedure Prove_Add (X, Y, Sum : Integer_255) with
  Ghost
is
  X_255, Y_255, Sum_255 : Big_Integer := Zero;
begin
  for J in Sum'Range loop

      X_255 := X_255 + (+X (J)) * Conversion_Array (J);
      Y_255 := Y_255 + (+Y (J)) * Conversion_Array (J);
      Sum_255 := Sum_255 + (+Sum (J)) * Conversion_Array (J);

      pragma Loop_Invariant (X_255 = Partial_Conversion (X, J));
      pragma Loop_Invariant (Y_255 = Partial_Conversion (Y, J));
      pragma Loop_Invariant (Sum_255 = Partial_Conversion (Sum, J));
      pragma Loop_Invariant (Partial_Conversion (Sum, J) =
                               Partial_Conversion (X, J) +
                               Partial_Conversion (Y, J));
   end loop;
end Prove_Add;

What makes this proof inductive is the treatment of Loop_Invariants by SPARK: it will first try to prove the first iteration, as an initialization, and then it will try to prove iteration N knowing that the property is true for N - 1.

Here, the final loop invariant is what we want to prove, because when J = 9, it is equivalent to the postcondition. The other loop invariants and variables are used to compute incrementally the values of partial conversions and facilitate proof.

Fortunately, in this case, provers do not need more guidance to prove the postcondition.  Prove_Multiply also follows this proof scheme, but is much more difficult. You can access its proof following this link.

Another case of inductive reasoning in my project is a lemma, whose proof present a very useful technique when proving algorithms using recursive ghost functions in contracts.

procedure Equal_To_Conversion
  (A, B : Integer_Curve25519;
   L    : Product_Index_Type)
with
  Pre  =>
    A'Length > 0
    and then A'First = 0
    and then B'First = 0
    and then B'Last <= A'Last
    and then L in B'Range
    and then (for all J in 0 .. L => A (J) = B (J)),
  Post => Partial_Conversion (A, L) = Partial_Conversion (B, L);

It states that given two Integer_Curve25519, A and B, and a Product_Index_Type, L, if A (0 .. L) = B (0 .. L), then Partial_Conversion (A, L) = Partial_Conversion (B, L). The proof for us is evident because it can be proved by induction over L, but we have to help SPARK a bit, even though it's usually simple.

procedure Equal_To_Conversion
  (A, B : Integer_Curve25519;
   L    : Product_Index_Type)
is
begin
   if L = 0 then
      return;                         --  Initialization of lemma
   end if;
   Equal_To_Conversion (A, B, L - 1); --  Calling lemma for L - 1
end Equal_To_Conversion;

The body of a procedure proved by induction actually looks like induction: first thing to add is initialization, with an if statement ending with a return;. Then, the following code is the general case. We call the same lemma for L - 1, and we add assertions to prove postcondition if necessary. In this case, calling the lemma at L - 1 was sufficient to prove the postcondition.

Guide with assertions

Even if the solvers know the relation between Conversion_Array (J + K) and Conversion_Array (J) * Conversion_Array (K), it is hard for them to prove properties requiring non-linear arithmetic reasoning. The following procedure is a nice example:

procedure Split_Product
  (Old_Product, Old_X, Product_Conversion : Big_Integer;
   X, Y                                   : Integer_255;
   J, K                                   : Index_Type)
with
  Ghost,
  Pre  =>
    Old_Product
    = Old_X * (+Y)
      + (+X (J))
        * (if K = 0
           then Zero
           else Conversion_Array (J) * Partial_Conversion (Y, K - 1))
  and then
    Product_Conversion
    = Old_Product
      + (+X (J)) * (+Y (K))
        * (+(if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1))
        * Conversion_Array (J + K),
  Post =>
    Product_Conversion
    = Old_X * (+Y)
      + (+X (J)) * Conversion_Array (J)
        * Partial_Conversion (Y, K);

The preconditions imply the postcondition by arithmetic reasoning. With a null body, the procedure is not proved. We can guide provers through assertions, by splitting the proof into two different cases:

procedure Split_Product
  (Old_Product, Old_X, Product_Conversion : Big_Integer;
   X, Y                                   : Integer_255;
   J, K                                   : Index_Type)
is
begin
   if J mod 2 = 1 and then K mod 2 = 1 then
      pragma Assert (Product_Conversion
                     = Old_Product
                     + (+X (J)) * (+Y (K))
                     * Conversion_Array (J + K) * (+2));
      pragma Assert ((+2) * Conversion_Array (J + K)
                     = Conversion_Array (J) * Conversion_Array (K));
      --  Case where Conversion_Array (J + K) * 2
      --             = Conversion_Array (J) * Conversion_Array (K).
   else
      pragma Assert (Conversion_Array (J + K)
                     = Conversion_Array (J) * Conversion_Array (K));
      --  Other case
   end if;
     if K > 0 then
        pragma Assert (Partial_Conversion (Y, K)
                       = Partial_Conversion (Y, K - 1)
                       + (+Y (K)) * Conversion_Array (K));
      --  Definition of Partial_Conversion, needed for proof
   end if;
end Split_Product;

What is interesting in this technique is that it shows that to prove something with auto-active proof, you need to understand it yourself first. I proved this lemma by hand on paper, which was not difficult, and then I rewrote my manual proof through assertions. If my manual proof is easier when I split two different cases, so it should be for provers. It allows them to choose the first /good/ steps to the proof.

Split into subprograms

Another thing I have noticed, is that provers were really quickly overwhelmed by context size in my project. I had a lot of recursive functions in my contracts, but also quantifiers... I did not hesitate to split proofs into Ghost procedures in order to reduce context size, but also wrap some expressions into functions.

I have 7 subprograms that enable me to prove Prove_Multiply, which is not a lot in my opinion. It also increases readability, which is important if you want other people to read your code.

There is also another method I want to share to overcome this problem. When there is only one assertion to prove, but it requires a lot of guidance, it is possible to put all assertions, including the last one in a begin end block, and the last one has a Assert_And_Cut pragma, just like this:

code ...
--  It is possible to add the keyword declare to declare variables
--  before the block, if it helps for proof.
begin
  pragma Assert (one assert needed to prove the last assertion);
  pragma Assert (another one);
  pragma Assert_And_Cut (Assertion to prove and remember);
end;

Assert_And_Cut asks the provers to prove the property inside them, but also will remove all context that has been created in the begin end; block. It will keep just this assertion, and might help to reduce context.

Sadly, this workaround didn't work for my project, because context was already too big to prove the first assertions. But adding a lot of subprograms also has drawbacks, e.g. you have to write preconditions to the procedures, and this may be more difficult than just writing the block with an Assert_And_Cut. Surely, both are useful in various projects, and I think it's nice to have these methods in mind.

Takeaways

For statistics only: Add has a 8-line implementation for around 25 lines of Ghost code, to prove it. Multiply has 12 lines of implementation for more than 500 lines of proof. And it's done with auto-active proof, which means that verification was all done automatically! In comparison, verification of TweetNaCl's curve25519 has more than 700 lines of Coq in order to prove Multiply for an implementation of 5 lines, but they have a carry operation to prove on top of the product.

I would say this challenge is at an intermediary level, because it is not difficult to understand the proof of Multiply when you write it down. But it presents several techniques that can apply to various other problems, and this is the main interest of the project to me.

As done in the blogpost by Adam Langley, I tried to prove my project with alt-ergo only, because it was the only prover available for SPARK back in 2014. Even today, alt-ergo alone is unable to prove all code. However, it doesn't make alt-ergo a bad prover, in fact, none of the provers available in SPARK (CVC4, Z3 and Alt-Ergo) are able to prove the entire project on their own. I think it shows that having multiple provers available increases greatly chances of code to be proved.

At the beginning, working on this project was just a way to use my big integers library in proof. But in the end, I believe it is an interesting take on the challenge of verifying elliptic curves functions, especially when projects like this appear for example with fiat-crypto or verification of TweetNaCl's curve25519, and I had a lot of fun experimenting with SPARK to prove properties that usually are badly handled by provers. You can access full proof of my project in this repository.

]]>
Public Ada Training Paris June 3-7, 2019 http://blog.adacore.com/public-ada-training-paris-june-3-7-2019 Thu, 07 Mar 2019 15:00:00 +0000 Pamela Trevino http://blog.adacore.com/public-ada-training-paris-june-3-7-2019

This course is geared to software professionals looking for a practical introduction to the Ada language with a focus on embedded systems, including real-time features as well as critical features introduced in Ada 2012. By attending this course you will understand and know how to use Ada for both sequential and concurrent applications, through a combination of live lectures from AdaCore's expert instructors and hands-on workshops using AdaCore's latest GNAT technology. AdaCore will provide an Ada 2012 tool-chain and ARM-based target boards for embedded workshops. No previous experience with Ada is required.

The course will be conducted in English.

Prerequisite: Knowledge of a programming language (Ada 83, C, C++, Java…)

Each participant should come with a computer running Windows.

For more information, the agenda, and to register, visit: https://www.adacore.com/public...


]]>
How Do We Use CodePeer at AdaCore http://blog.adacore.com/how-do-we-use-codepeer-at-adacore Tue, 05 Mar 2019 14:08:00 +0000 Arnaud Charlet http://blog.adacore.com/how-do-we-use-codepeer-at-adacore

A question that our users sometimes ask us is "do you use CodePeer at AdaCore and if so, how?". The answer is yes! and this blog post will hopefully give you some insights into how we are doing it for our own needs.

First, I should note that at AdaCore we are in a special situation since we are both developing lots of Ada code, and at the same time we are also developing and evolving CodePeer itself. One of the consequences is that using CodePeer on our own code has actually two purposes: one is to improve the quality of our code by using an advanced static analyzer, and the other is to eat our own dog food and improve the analyzer itself by finding limitations, sub-optimal usage, etc...

In the past few years, we've gradually added automated runs of CodePeer on many of our Ada code base, in addition to the systematic use of light static analysis performed by the compiler: GNAT already provides many useful built-in checks such as all the static Ada legality rules, as well as many clever warnings fine tuned over the past 25 years and available via the -gnatwa compiler switch and coupled with -gnatwe that transforms warnings into errors, not mentioning also the built-in style checks available via the -gnaty switch.

GNAT

For GNAT sources (Ada front-end and runtime source code) given that this is a large and complex piece of Ada code (compilers do come with a high level of algorithmic complexity and code recursion!) and that we wanted to favor rapid and regular feedback, we've settled on running CodePeer at level 1 with some extra fine tuning: we've found that some categories of messages didn't generate any extra value for the kind of code found in GNAT, so we've disabled these categories via the --be-messages switch. We've also excluded some files from analysis which were taking a long time compared to other files, for little benefits, via the Excluded_Source_Files project attribute as explained in the Partial Analysis section of the User's Guide.

Given that the default build mechanism of GNAT is based on Makefiles, we've written a separate project file for performing the CodePeer analysis, coupled with a separate simple Makefile that performs the necessary setup phase (automatic generation of some source files in the case of GNAT sources).

After this fine tuning, CodePeer generated a few dozen useful messages that we analyzed and allowed us to fix potential issues, as well as improve the general quality of the code. You'll find a few examples below. The most useful findings on the GNAT code base are related to redundant or dead code and potential access to uninitialized variables when complex conditions are involved, which is often the case in a compiler! CodePeer also detected useful cases of code cleanups in particular related to wrong parameter modes: in out parameters that should be out or inout parameters that should be in out.

Once we've addressed these findings by improving the source code, we ended up in a nice situation where the CodePeer run was completely clean: no new messages found. So we've decided on a strict policy similar to our no warning policy: no CodePeer messages should be left alone moving forward. To ensure that, we've put in place a continuous run that triggers after each commit in the GNAT repository and reports back its findings within half an hour.

One of the most useful categories of potential issues found by CodePeer in our case is related to not always initializing a variable before using it (what CodePeer calls validity checks). For example in gnatchop.adb we had the following code:

   Offset_FD : File_Descriptor;
   [...]

exception
   when Failure | Types.Terminate_Program =>
      Close (Offset_FD);
      [...]

CodePeer detected that at line 6 Offset_FD may not have been initialized (precisely because an exception may have been raised before assigning Offset_FD. We fixed this potential issue by explicitly assigning a default value and testing for it:

   Offset_FD : File_Descriptor := Invalid_FD;
   [...]

exception
   when Failure | Types.Terminate_Program =>
      if Offset_FD /= Invalid_FD then
         Close (Offset_FD);
      end if;
      [...]

CodePeer also helped detect suspicious or dead code that should not have been there in the first place. Here is an example of code smell that CodePeer detected in the file get_scos.adb:

procedure Skip_EOL is
   C : Character;
begin
   loop
      Skipc;
      C := Nextc;
      exit when C /= LF and then C /= CR;

      if C = ' ' then
         Skip_Spaces;
         C := Nextc;
         exit when C /= LF and then C /= CR;
      end if;
   end loop;
end Skip_EOL;

in the code above, CodePeer complained that the test at line 9 is always false because C is either LF or CR. And indeed, if you look closely at line 7, when the code goes past this line, then C will always be either CR or LF, and therefore cannot be a space character. The code was simplified into:

   procedure Skip_EOL is
      C : Character;
   begin
      loop
         Skipc;
         C := Nextc;
         exit when C /= LF and then C /= CR;
      end loop;
   end Skip_EOL;

GPS

Analysis of GPS sources with CodePeer is used at AdaCore both for improving the code quality and also to test our integration with the SonarQube tool via our GNATdashboard integration.

For GPS, we are using CodePeer in two modes: one mode where developers can manually run CodePeer on their local set up at level 0. This mode runs in about 3 to 5 minutes to analyze all the GPS sources and runs the first level of checks provided by CodePeer, which is a very useful complement to the compiler warnings and style checks and allows to stay 100% clean of messages after an initial set of code cleanups.

In addition, an automated run is performed nightly on a server using level 1, further tuned in a similar way to what we did for GNAT. Here we have some remaining messages under analysis and we use SonarQube to track and analyze these messages.

Here is an example of code that looked suspicious in GPS sources:

   View : constant Registers_View := Get_Or_Create_View (...);
begin
   View.Locked := True;

   if View /= null then
      [...]
   end if;

CodePeer complained at line 5 that the test is always True since View cannot be null at this point. Why? Because at line 3 we are already dereferencing View, so CodePeer knows that after this point either an exception was raised or, if not, View cannot be null anymore.

In this case, we've replaced the test by an explicit assertion since it appears that Get_Or_Create_View can never return null:

   View : constant Registers_View := Get_Or_Create_View (...);
begin
   pragma Assert (View /= null);
   View.Locked := True;
   [...]

Run Time Certification

As part of a certification project of one of our embedded runtimes for a bare metal target, we ran CodePeer at its highest level (4) in order to detect all potentially cases of a number of vulnerabilities and in particular: validity checks, divide by zero, overflow checks, as well as confirming that the runtime did not contain dead code or unused assignments. CodePeer was run manually and then all messages produced were reviewed and justified, as part of our certification work.

GNAT LLVM

As part of the GNAT LLVM project (more details on this project if you're curious in a future post!) in its early stages, we ran CodePeer manually on all the GNAT LLVM sources - excluding the sources common to GNAT, already analyzed separately - initially at level 1 and then at level 3, and we concentrated on analyzing all (and only) the potential uninitialized variables (validity checks). In this case we use the same project file used to build GNAT LLVM itself and added the CodePeer settings which basically looked like:

package CodePeer is
   for Switches use ("-level", "3", "--be-messages=validity_check", "--no-lal-checkers");
   for Excluded_Source_Dirs use ("gnat_src", "obj");
end CodePeer;

which allowed us to perform a number of code cleanups and review more closely the code pointed by CodePeer.

CodePeer

Last but not least, we also run CodePeer on its own code base! For CodePeer given that this is another large and complex piece of Ada code and we wanted to favor rapid and regular feedback, we've settled on a setting similar to GNAT: level 1 with some extra fine tuning via --be-messages. We also added a few pragma Annotate to both justify some messages - pragma Annotate (CodePeer, False_Positive) - as well as skipping analysis of some subprograms or files where CodePeer was taking too long to analyze, for little benefits (via either the Excluded_Source_Files project attribute or pragma Annotate (CodePeer, Skip_Analysis)).

A CodePeer run is triggered after each change in the repository in a continuous builder and made available to the team within 30 minutes. We've found that in this case the most interesting messages where: validity checks on local variables and out parameters, test always true/false, duplicated code and potential wrong parameter mode.

We also run CodePeer on other code bases in a similar fashion, such as analysis of the SPARK tool sources.

As part of integrating CodePeer in our daily work, we also took the opportunity to improve the documentation and describe many possible workflows corresponding to the various needs of teams wanting to analyze Ada code, with an explanation on how to put these various scenarios in place, check Chapter 5 of the User's Guide if you're curious.

What about you? Do not hesitate to tell us how you are using CodePeer and what are the most useful benefits you are getting by commenting below.

]]>
Ten Years of Using SPARK to Build CubeSat Nano Satellites With Students http://blog.adacore.com/ten-years-of-using-spark-to-build-cubesat-nano-satellites-with-students Fri, 01 Mar 2019 19:31:01 +0000 Peter Chapin http://blog.adacore.com/ten-years-of-using-spark-to-build-cubesat-nano-satellites-with-students

My colleague, Carl Brandon, and I have been running the CubeSat Laboratory at Vermont Technical College (VTC) for over ten years. During that time we have worked with nearly two dozen students on building and programming CubeSat nano satellites. CubeSats are small (usually 10cm cube), easily launched spacecraft that can be outfitted with a variety of cameras, sensing instruments, and communications equipment. Many CubeSats are built by university groups like ours using students at various skill levels in the design and production process.

Students working in the CubeSat Laboratory at VTC have been drawn from various disciplines including computer engineering, electro-mechanical engineering, electrical engineering, and software engineering. VTC offers a masters degree in software engineering, and two of our MSSE students have completed masters projects related to CubeSat flight software. In fact, our current focus is on building a general purpose flight software framework called CubedOS.

Like all spacecraft CubeSats are difficult to service once they are launched. Because of the limited financial resources available to university groups, and because of on-board resource constraints, CubeSats typically don't support after-launch uploading of software updates. This means the software must be fully functional and fault-free, with no possibility of being updated, at the time of launch

Many university CubeSat missions have failed due to software errors. This is not surprising considering that most flight software is written in C, a language that is difficult to use correctly. To mitigate this problem we use the SPARK dialect of Ada in all of our software work. Using the SPARK tools we work toward proving the software free of runtime error, meaning that no runtime exceptions will occur. However, in general we have not attempted to prove functional correctness properties, relying instead on conventional testing for that level of verification.

Although we do have some graduate students working in the CubeSat Laboratory, most of our students are third and fourth year undergraduates. The standard curriculum for VTC's software engineering program includes primarily the Java and C languages. Although I have taught Ada in the classroom, it has only been in specialized classes that a limited number of students take. As a result most of the students who come to the CubeSat Laboratory have no prior knowledge of SPARK or Ada, although they have usually taken several programming courses in other languages.

Normally I arrange to give new students an intensive SPARK training course that spans three or four days, with time for them to do some general exercises. After that I assign the students a relatively simple introductory task involving our codebase so they can get used to the syntax and semantics of SPARK without the distraction of complex programming problems. Our experience has been that good undergraduate students with a reasonable programming background can become usefully productive with SPARK in as little as two weeks. Of course, it takes longer for them to gain the skills and experience to tackle the more difficult problems, but the concern commonly expressed by some that the lack of SPARK skills in a programming team is a barrier to the adoption of SPARK is not validated by our experience

Students are, of course, novice programmers almost by definition. Many of our students are in the process of learning basic software engineering principles such as the importance of requirements, code review, testing, version control, continuous integration, and many other things. Seeing these ideas in the context of our CubeSat work gives them an important measure of realism that can be lacking in traditional courses.

However, because of their general inexperience, and because of the high student turnover rate that is natural in an educational setting, our development process is often far from ideal. Here SPARK has been extremely valuable to us. What we lack in rigor of the development process we make up for in the rigor of the SPARK language and tools. For example, if a student refactors some code, perhaps without adequately communicating with the authors of the adjoining code, the SPARK tools will often find inconsistencies that would otherwise go unnoticed. This has resulted in a much more disciplined progression of the code than one would expect based on the team's overall culture. Moreover the discipline imposed by SPARK and its tools can serve to educate the students about the kinds of issues that can go wrong by following an overly informal approach to development.

For example, one component of CubedOS is a module that sends “tick” messages to other modules on request. These messages are largely intended to trigger slow, non-timing critical housekeeping tasks in the other modules. For flexibility the module supports both one-shot tick messages that occur only once and periodic tick messages. Many “series” of messages can be active at once. The module can send periodic tick messages with different periods to several receivers with additional one-shot tick messages waiting to fire as well.

The module has been reworked many times. At one point it was decided that tick messages should contain a counter that represents the number of such messages that have been sent in the series. Procedure Next_Ticks below is called at a relatively high frequency to scan the list of active series and issue tick messages as appropriate. I asked a new student to add support for the counters to this system as a simple way of getting into our code and helping us to make forward progress. The version produced was something like:

      procedure Next_Ticks is
         Current_Time : constant Ada.Real_Time.Time := Ada.Real_Time.Clock;
      begin
         -- Iterate through the array to see who needs a tick message.
         for I in Series_Array'Range loop
            declare
               Current_Series : Series_Record renames Series_Array(I);
            begin
               -- If we need to send a tick from this series...
               if Current_Series.Is_Used and then Current_Series.Next <= Current_Time then
                  Route_Message
                    (Tick_Reply_Encode
                       (Receiver_Domain => Domain_ID,
                        Receiver   => Current_Series.Module_ID,
                        Request_ID => 0,
                        Series_ID  => Current_Series.ID,
                        Count      => Current_Series.Count));
 
                  -- Update the current record.
                  case Current_Series.Kind is
                     when One_Shot =>
                        -- TODO: Should we reinitialize the rest of the series record?
                        Current_Series.Is_Used := False;
 
                     when Periodic =>
                        -- Advance the counter.
                        Current_Series.Count := Current_Series.Count + 1;
                        Current_Series.Next :=
                         Current_Series.Next + Current_Series.Interval;
                  end case;
               end if;
            end;
         end loop;
      end Next_Ticks;

The code iterates over an array of Series_Record objects looking for the records that are active and are due to be ticked (Current_Series.Next <= Current_Time). For those records, it invokes the Route_Message procedure on an appropriately filled in Tick Reply message as created by function Tick_Reply_Encode. The student had to modify the information saved for each series to contain a counter, modify the Tick Reply message to contain a count, and update Tick_Reply_Encode to allow for a count parameter. The student also needed to increment the count as needed for periodic messages. This involved relatively minor changes to a variety of areas in the code.

The SPARK tools ensured that the student dealt with initialization issues correctly. But they also found an issue in the code above with surprisingly far reaching consequences. In particular, SPARK could not prove that no runtime error could ever occur on the line

    Current_Series.Count := Current_Series.Count + 1;

Of course, this will raise Constraint_Error when the count overflows the range of the data type used. If periodic tick messages are sent for too long, the exception would eventually be raised, potentially crashing the system. Depending on the rate of tick messages, that might occur in as little as a couple of years, a very realistic amount of time for the duration of a space mission. It was also easy to see how the error could be missed during ordinary testing. A code review might have discovered the problem, but we did code reviews infrequently. On the other hand, SPARK made this problem immediately obvious and led to an extensive discussion about what should be done in the event of the tick counters overflowing.

Our CubeSat Before Launch

In November 2013 we launched a low Earth orbiting CubeSat. The launch vehicle contained 13 other university built CubeSats. Most were never heard from. One worked for a few months. Ours worked for two years until it reentered Earth's atmosphere as planned in November 2015. Although the reasons for the other failures are not always clear, software problems were known to be an issue in at least one of them and probably for many others. We believe the success of our mission, particularly in light of the small size and experience of our student team, is directly attributable to the use of SPARK.

]]>
A Readable Introduction to Both MISRA C and SPARK Ada http://blog.adacore.com/a-readable-introduction-to-both-misra-c-and-spark-ada Thu, 21 Feb 2019 09:50:09 +0000 Yannick Moy http://blog.adacore.com/a-readable-introduction-to-both-misra-c-and-spark-ada

MISRA C is the most widely known coding standard restricting the use of the C programming language for critical software. For good reasons. For one, its focus is entirely on avoiding error-prone programming features of the C programming language rather than on enforcing a particular programming style. In addition, a large majority of rules it defines are checkable automatically (116 rules out of the total 159 guidelines), and many tools are available to enforce those. As a coding standard, MISRA C even goes out of its way to define a consistent sub-language of C, with its own typing rules (called the "essential type model" in MISRA C) to make up for the lack of strong typing in C.

That being said, it's still a long shot to call it a security coding standard for C. Even when taking into account the 14 additional guidelines focusing on security of the "MISRA C:2012 - Amendment 1: Additional security guidelines for MISRA C:2012". MISRA C is first and foremost focusing on software quality, which has obvious consequences for security, but programs in MISRA C remain for the most part vulnerable to the major security vulnerabilities that plague C programs.

In particular, it's hard to state what guarantees are obtained when respecting the MISRA C rules (which means essentially respecting the 116 decidable rules enforced automatically by analysis tools). In order to clarify this, and to present at the same time how guarantees can be obtained using a different programming language, we have written a book available online. Even better, we host on our e-learning website an interactive version of the book where you can compile C or Ada code, and analyze SPARK code, to experiment how a different language with its associated analysis toolset can go beyond what MISRA C allows.

So that, even if MISRA C is the best thing that could happen to C, you can decide if C is really the best thing that could happen to your software.

]]>
AdaCore at FOSDEM 2019 http://blog.adacore.com/adacore-at-fosdem-2019 Tue, 19 Feb 2019 09:30:35 +0000 Yannick Moy http://blog.adacore.com/adacore-at-fosdem-2019

Like last year, we've sent a squad of AdaCore engineers to participate in the celebration of Open Source software at FOSDEM. Like last year, we had great interactions with the rest of the Ada and SPARK Community in the Ada devroom on Saturday. You can check the program with videos of all the talks here. This year's edition was particularly diverse, with an academic project from Austria for an autonomous train control in Ada, two talks on Database development and Web development made type-safe with Ada, distributed computing, libraries, C++ binding, concurrency, safe pointers, etc.

AdaCore engineers gave two talks in the Ada devroom:

We also had a talk in the RISC-V devroom:

And there was a related talk in the Security devroom on the use of SPARK for security:

Hope to see you at FOSDEM next year!

]]>
AdaFractal Part 2: Fixed Point and Floating Point Math Performance and Parallelization http://blog.adacore.com/adafractal-part-2-fixed-point-and-floating-point-math-performance-and-parallelization Thu, 14 Feb 2019 14:23:00 +0000 Rob Tice http://blog.adacore.com/adafractal-part-2-fixed-point-and-floating-point-math-performance-and-parallelization

In Part 1 of this blog post I discussed why I chose to implement this application using the Ada Web Server to serve the computed fractal to a web browser. In this part I will discuss a bit more about the backend of the application, the Ada part.

Why do we care about performance?

The Ada backend will compute a new image each time one is requested from the front-end. The front-end immediately requests a new image right after it receives the last one it requested. Ideally, this will present to the user as an animation of a Mandelbrot fractal changing colors. If the update is too slow, the animation will look slow and terrible. So we want to minimize the compute time as much as possible. We have a few ways to do that: optimize the computation, and parallelize the computation.  

Parallelizing the Fractal Computation

An interesting feature of the Mandelbrot calculation is that the computation of any pixel is independent of any other pixel. That means we can completely parallelize the computation of each pixel! So if our requested image is 1920x1280 pixels we can spawn 2457600 tasks right? Theoretically, yes we can do that. But it doesn’t necessarily speed up our application compared to, let’s say, 8 or 16 tasks, each of which computes a row or selection of rows. Either way, we know we need to create what’s called a task pool, or a group of tasks that can be queued up as needed to do some calculations. We will create our task pool by creating a task type, which will implement the actual activity that the task will complete, and we will use a Synchronous_Barrier to synchronize all of the tasks back together.

task type Chunk_Task_Type is
   pragma Priority (0);
   entry Go (Start_Row : Natural;
             Stop_Row  : Natural;
             Buf       : Stream_Element_Array_Access);
end Chunk_Task_Type;

type Chunk_Task is record
  T         : Chunk_Task_Type;
  Start_Row : Natural;
  Stop_Row  : Natural;
end record;

type Chunk_Task_Pool is array (1 .. Task_Pool_Size) of Chunk_Task;

S_Task_Pool : Chunk_Task_Pool;
S_Sync_Obj  : Synchronous_Barrier (Release_Threshold => Task_Pool_Size + 1);

This snippet is from fractal.ads. S_Task_Pool will be our task pool and S_Sync_Obj will be our synchronization object. If you notice, each Chunk_Task_Type in the S_Task_Pool takes an access to a buffer in its Go entry procedure. In our implementation, each task will have an access to the same buffer. Isn’t this a race condition? Shouldn’t we use a protected object?

The downside of protected object

The answer to both of those questions, is yes. This is a race condition and we should be using a protected object. If you were to run CodePeer on this project, it identifies this as a definite problem. But using a protected object is going to destroy our performance. The reason for this is because under the hood, the protected object will be using locks each time we access data from the buffer. Each lock, unlock, and wait-on-lock call is going to make the animation of our fractal look worse and worse. However, there is a way to get around this race-condition issue. By design, we can guarantee that each task is going to access the buffer from Start_Row to Stop_Row. So, by design we can make sure that each task’s rows don’t overlap another task’s rows thereby avoiding a race condition.

Parallelization Implementation

Now that we understand the specification of the task pool, let’s look at the implementation.

task body Chunk_Task_Type
is
   Start   : Natural;
   Stop    : Natural;
   Buffer  : Stream_Element_Array_Access;
   Notified : Boolean;
begin

   loop
      accept Go (Start_Row : Natural;
                 Stop_Row : Natural;
                 Buf : Stream_Element_Array_Access) do
         Start := Start_Row;
         Stop := Stop_Row;
         Buffer := Buf;
      end Go;

      for I in Start .. Stop loop

         Calculate_Row (Y      => I,
                        Idx    => Buffer'First +
                          Stream_Element_Offset ((I - 1) *
                                Get_Width * Pixel'Size / 8),
                        Buffer => Buffer);
      end loop;
      Wait_For_Release (The_Barrier => S_Sync_Obj,
                        Notified    =>  Notified);
   end loop;
end Chunk_Task_Type;

The body of each Chunk_Task_Type waits for a Go signal from the main task. When it receives the go, it also receives an access to the buffer and the start and stop row to work on. It then calculates the pixels for the rows. When it finishes it calls the Wait_For_Release procedure of the Synchronous_Barrier object which blocks until all tasks have completed their work. Once all the tasks have checked in with the Synchronous_Barrier using the Wait_For_Release call, all tasks are released. The main task grabs the buffer with the new image and hands it back off to the AWS server to be sent out and the worker tasks go back to hanging on their entry calls.

The bigger the task pool…

Now we come back to the size of the task pool. Based on the implementation above we can chunk the processing up to the granularity of at least the number of rows being requested. So in the case of a 1920x1280 image, we could have 1280 tasks! But we have to ask ourselves, is that going to give us better performance than 8 or 16 tasks? The answer, unfortunately, is probably not. If we create 8 tasks, and we have an 8 core processor, we can assume that some of our tasks are going to execute on different cores in parallel. If we create 1280 tasks and we use the same 8 core processor, we don’t get much more parallelization than with 8 tasks. This is a place where tuning and best judgement will give you the best performance.

Fixed vs Floating Point

Now that we have the parallelization component, let’s think about optimizing the maths. In most fractal computations floating point complex numbers are used. Based on our knowledge of processors, we can assume that in most cases floating point calculations will be slower than integer calculations. So theoretically, using fixed point numbers might give us better performance. For more information on Ada fixed point types check out the Fixed-point types section of the Introduction to Ada course on learn.adacore.com .

The Generic Real Type

Because we are going to use the same algorithm for both floating and fixed point math, we can implement the algorithm using a generic type called Real. The Real type is defined in computation_type.ads.

generic
   type Real is private;
   with function "*" (Left, Right : Real) return Real is <>;
   with function "/" (Left, Right : Real) return Real is <>;
   with function To_Real (V : Integer) return Real is <>;
   with function F_To_Real (V : Float) return Real is <>;
   with function To_Integer (V : Real) return Integer is <>;
   with function To_Float (V : Real) return Float is <>;
   with function Image (V : Real) return String is <>;
   with function "+" (Left, Right : Real) return Real is <>;
   with function "-" (Left, Right : Real) return Real is <>;
   with function ">" (Left, Right : Real) return Boolean is <>;
   with function "<" (Left, Right : Real) return Boolean is <>;
   with function "<=" (Left, Right : Real) return Boolean is <>;
   with function ">=" (Left, Right : Real) return Boolean is <>;
package Computation_Type is    

end Computation_Type;

We can then create instances of the Julia_Set package using a floating point and fixed point version of the computation_type package.

type Real_Float is new Float;

function Integer_To_Float (V : Integer) return Real_Float is
  (Real_Float (V));

function Float_To_Integer (V : Real_Float) return Integer is
  (Natural (V));

function Float_To_Real_Float (V : Float) return Real_Float is
  (Real_Float (V));

function Real_Float_To_Float (V : Real_Float) return Float is
   (Float (V));

function Float_Image (V : Real_Float) return String is
  (V'Img);

D_Small : constant := 2.0 ** (-21);
type Real_Fixed is delta D_Small range -100.0 .. 201.0 - D_Small;

function "*" (Left, Right : Real_Fixed) return Real_Fixed;
pragma Import (Intrinsic, "*");

function "/" (Left, Right : Real_Fixed) return Real_Fixed;
pragma Import (Intrinsic, "/");

function Integer_To_Fixed (V : Integer) return Real_Fixed is
  (Real_Fixed (V));

function Float_To_Fixed (V : Float) return Real_Fixed is
  (Real_Fixed (V));

function Fixed_To_Float (V : Real_Fixed) return Float is
   (Float (V));

function Fixed_To_Integer (V : Real_Fixed) return Integer is
  (Natural (V));

function Fixed_Image (V : Real_Fixed) return String is
   (V'Img);

package Fixed_Computation is new Computation_Type (Real       => Real_Fixed,
                                                   "*"        => Router_Cb."*",
                                                   "/"        => Router_Cb."/",
                                                   To_Real    => Integer_To_Fixed,
                                                   F_To_Real  => Float_To_Fixed,
                                                   To_Integer => Fixed_To_Integer,
                                                   To_Float   => Fixed_To_Float,
                                                   Image      => Fixed_Image);

package Fixed_Julia is new Julia_Set (CT               => Fixed_Computation,
                                      Escape_Threshold => 100.0);

package Fixed_Julia_Fractal is new Fractal (CT              => Fixed_Computation,
                                            Calculate_Pixel => Fixed_Julia.Calculate_Pixel,
                                            Task_Pool_Size  => Task_Pool_Size);


package Float_Computation is new Computation_Type (Real       => Real_Float,
                                                   To_Real    => Integer_To_Float,
                                                   F_To_Real  => Float_To_Real_Float,
                                                   To_Integer => Float_To_Integer,
                                                   To_Float   => Real_Float_To_Float,
                                                   Image      => Float_Image);

package Float_Julia is new Julia_Set (CT               => Float_Computation,
                                      Escape_Threshold => 100.0);

package Float_Julia_Fractal is new Fractal (CT              => Float_Computation,
                                            Calculate_Pixel => Float_Julia.Calculate_Pixel,
                                            Task_Pool_Size  => Task_Pool_Size);

We now have the Julia_Set package with both the floating point and fixed point implementations. The AWS URI router is set up to serve a floating point image if the GET request URI is “/floating_fractal” and a fixed point image is the request URI is “/fixed_fractal”.

The performance results

Interestingly, fixed point was not unequivocally faster than floating point in every situation. On my 64 bit mac, the floating point was slightly faster. On a 64 bit ARM running QNX the fixed point was faster. Another phenomenon I noticed is that the fixed point was less precise with little performance gain in most cases. When running the fixed point algorithm, you will notice what looks like dust in the image. That is integer overflow or underflow instances on the pixel. Under normal operation, these would manifest as runtime exceptions in the application, but to increase performance, I compiled with those checks turned off.  

Fixed point - the tradeoff

Here’s the takeaway from this project: fixed point math performance is only better if you need lower precision OR a limited range of values. Keep in mind that this is an OR situation. You can either have a precise fixed point number with a small range, or a imprecise fixed point number with a larger range. If you try to have a precise fixed point number with a large range, the underlying integer type that is used will be quite large. If the type requires a 128 bit or larger integer type, then you lose all performance you would have gained by using fixed point in the first place.

In the case of the fractal, the fixed point is useless because we need a high precision with a large range. In this respect, we are stuck with floating point, or weird looking dusty fixed point fractals.

Conclusion

Although this was an interesting exercise to compare the two types of performances in a pure math situation, it isn’t entirely meaningful. If this was a production application, we would be optimizing the algorithms for the types being used. In this case, the algorithm was very generic and didn’t account for the limited range or precision that would be necessary for an optimized fixed point type. This is why the floating point performance was comparable in most cases to the fixed point with better visual results.

However, there are a few important design paradigms that were interesting to implement. The task pool is a useful concept for any parallelized application such as this. And the polymorphism via generics that we used to create the Real type can be extremely useful to give you greater flexibility to have multiple build or feature configurations.

With Object Oriented programming languages like Ada and C++ it is sometimes tempting to start designing an application like this using base classes and derivation to implement new functionality. In this application that wouldn’t have made sense because we don’t need dynamic polymorphism; everything is known at compile time. So instead, we can use generics to achieve the polymorphism without creating the overhead associated with tagged types and classes.

]]>
AdaFractal Part1: Ada with a Portable GUI http://blog.adacore.com/adafractal-part1-ada-with-a-portable-gui Tue, 12 Feb 2019 14:03:00 +0000 Rob Tice http://blog.adacore.com/adafractal-part1-ada-with-a-portable-gui

The AdaFractal project is exactly what it sounds like; fractals generated using Ada. The code, which can be found here , calculates a Mandelbrot set using a fairly common calculation. Nothing fancy going on here. However, how are we going to display the fractal once its calculated?

To Graphics Library or not to Graphics Library?

We could use GTKAda, the Ada binding to GTK, to display a window… But that means we are stuck with GTK which really is overkill for displaying our little image, and is only supported on Windows and Linux... And Solaris if you’re into that sort of thing...

What’s more portable and easier to handle than GTK? Well, doing a GUI in web based technology is very easy. With a few lines of HTML, some Javascript, and some simple CSS we can throw together an entire Instagram of fractals... Fractalgram? Does that exist already? If it doesn’t, someone should get on that.

The Server Solution

The last piece of the puzzle we need is to be able to serve our web files and fractal image to a web browser for rendering. Since we want to ensure portability, we should probably leverage the inherent portability of Ada and use AWS. No, it’s not the same aws that you may use for cloud storage. This AWS is the Ada Web Server which is a small and powerful HTTP server that has support for SOAP/WSDL, Server Push, HTTPS/SSL and client HTTP. If you want to know more about the other aws, check out the blog post Amazon Relies on Formal Methods for the Security of AWS. For the rest of this blog series, let’s assume AWS stands for Ada Web Server.

Because AWS is written entirely in Ada, it is incredibly portable. The current build targets include UNIX-like OS’s, Windows, macOS, and VxWorks for operating systems and ARM, x86, and PPC for target architectures. Just by changing the compiler I have been able to recompile and run this application, with no source code differences, for QNX on 64 bit ARM, macOS, Windows, and Linux on 64 bit x86, and VxWorks 6.9 on 32 bit ARM. That’s pretty portable!

For those who have access to GNATPro on Windows, Linux, or macOS, you can find AWS included with the toolsuite installation. For GNAT Community users, or GNATPro users who would like to compile AWS for a different target, you can build the library from the source located on the AdaCore GitHub.

Hooking up the pieces

Hooking up the fractal application to AWS was pretty easy. It only required creating a URI router which takes incoming GET requests and dispatches them to the correct worker function. The full router tree can be found in the router_cb.adb file.

function Router (Request : AWS.Status.Data) return AWS.Response.Data
is
   URI      : constant String := AWS.Status.URI (Request);
   Filename : constant String := "web/" & URI (2 .. URI'Last);
begin
   --  Main server access point
   if URI = "/" then
      --  main page
      return AWS.Response.File (AWS.MIME.Text_HTML, "web/html/index.html");
      
   --  Requests a new image from the server
   elsif URI = "/fractal" then   
      return AWS.Response.Build
        (Content_Type  => AWS.MIME.Application_Octet_Stream,
         Message_Body  => Compute_Image);       
      
   --  Serve basic files
   elsif AWS.Utils.Is_Regular_File (Filename) then
      return AWS.Response.File
        (Content_Type => AWS.MIME.Content_Type (Filename),
         Filename     => Filename);
      
   --  404 not found
   else
      Put_Line ("Could not find file: " & Filename);

      return AWS.Response.Acknowledge
        (AWS.Messages.S404,
         "<p>Page '" & URI & "' Not found.");
   end if;

end Router;

Here is a sample of a simplified router tree. The Router function is registered as a callback with the AWS framework. It is called whenever a new GET request is received by the server. If the request is for the index page (“/”), we send back the index.html page. If we receive “/fractal” we recompute the fractal image and return the pixel buffer. If the URI is a file, like a css or js file, we simply serve the file. If it’s anything else, we respond with a 404 error.

The AdaFractal app uses a slightly more complex router tree in order to handle input from the user, compute server side zoom of the fractal, and handle computing images of different sizes based on the size of the browser. On the front-end, all of these requests and responses are handled by jQuery. The image that is computed on the backend is an array of pixels using the following layout:

type Pixel is record
   Red   : Color;
   Green : Color;
   Blue  : Color;
   Alpha : Color;
end record
  with Size => 32;

for Pixel use record
   Red at 0 range 0 .. 7;
   Green at 1 range 0 .. 7;
   Blue at 2 range 0 .. 7;
   Alpha at 3 range 0 .. 7;
end record;

type Pixel_Array is array
  (Natural range <>) of Pixel
  with Pack;

A Pixel_Array with the computed image is sent as the response to the fractal request. The javascript on the front-end overlays that onto a Canvas object which is rendered on the page. No fancy backend image library required!

AWS serves our application to whichever port we specify (the default port is 8080). We can either point our browser to localhost:8080 (127.0.0.1:8080) or if we have a headless device (such as a Raspberry PI, a QNX board, or a VxWorks solution) we can point the browser from another device to the IP address of the headless device port 8080 to view the fractal.

How about the math and performance?

An interesting side experiment came from this project after trying to increase the performance of the application. After implementing a thread pool to parallelize the math, I decided to try using fixed point types for the math. The implementation of the thread pool and the fixed point versus floating point math performance will be presented in Part 2 of this blog post.

]]>
NVIDIA is joining the Ada and SPARK adopter wave http://blog.adacore.com/nvidia-blogpost-announcement Tue, 05 Feb 2019 14:02:57 +0000 Quentin Ochem http://blog.adacore.com/nvidia-blogpost-announcement

Over the past several years, a great number of public announcements have been made about companies that are either studying or adopting the Ada and SPARK programming languages. Noteworthy examples include DolbyDensoLASP and Real Heart, as well as the French Security Agency.

Today, NVIDIA, the inventor and market leader of the Graphical Processing Unit (GPU), joins the wave by announcing the selection of SPARK and Ada for its next generation of security-critical GPU firmware running on RISC-V microcontrollers. NVIDIA’s announcement marks a new era of industrial standards for safety- and security-critical software development. 

For more information, see the joint announcement https://www.adacore.com/press/..., or NVIDIA’s recent blog post https://blogs.nvidia.com/blog/2019/02/05/adacore-secure-autonomous-driving/

]]>
Proving Memory Operations - A SPARK Journey http://blog.adacore.com/proving-memory-operations-a-spark-journey Tue, 08 Jan 2019 13:33:00 +0000 Quentin Ochem http://blog.adacore.com/proving-memory-operations-a-spark-journey

The promise behind the SPARK language is the ability to formally demonstrate properties in your code regardless of the input values that are supplied - as long as those values satisfy specified constraints. As such, this is quite different from static analysis tools such as our CodePeer or the typical offering available for e.g. the C language, which trade completeness for efficiency in the name of pragmatism. Indeed, the problem they’re trying to solve - finding bugs in existing applications - makes it impossible to be complete. Or, if completeness is achieved, then it is at the cost of massive amount of uncertainties (“false alarms”). SPARK takes a different approach. It requires the programmer to stay within the boundaries of a (relatively large) Ada language subset and to annotate the source code with additional information - at the benefit of being able to be complete (or sound) in the verification of certain properties, and without inundating the programmer with false alarms.

With this blog post, I’m going to explore the extent to which SPARK can fulfill this promise when put in the hands of a software developer with no particular affinity to math or formal methods. To make it more interesting, I’m going to work in the context of a small low-level application to gauge how SPARK is applicable to embedded device level code, with some flavor of cyber-security.

The problem to solve and its properties

As a prelude, even prior to defining any behavior and any custom property on this code, the SPARK language itself defines a default property, the so-called absence of run-time errors. These include out of bounds access to arrays, out of range assignment to variables, divisions by zero, etc. This is one of the most advanced properties that traditional static analyzers can consider. With SPARK, we’re going to go much further than that, and actually describe functional requirements.

Let’s assume that I’m working on a piece of low level device driver whose job is to set and move the boundaries of two secure areas in the memory: a secure stack and a secure heap area. When set, these areas come with specific registers that prevent non-secure code from reading the contents of any storage within these boundaries. This process is guaranteed by the hardware, and I’m not modeling this part. However, when the boundaries are moved, the data that was in the old stack & heap but not in the new one is now accessible. Unless it is erased, it can be read by non-secure code and thus leak confidential information. Moreover, the code that changes the boundaries is to be as efficient as possible, and I need to ensure that I don’t erase memory still within the secure area.

What I described above, informally, is the full functional behavior of this small piece of code. This could be expressed as a boolean expression that looks roughly like: \[dataToErase = (oldStack \cup oldHeap) \cap \overline{(newStack \cup newHeap)} \]. Or in other words, the data to erase is everything that was in the previous memory (\(oldStack \cup oldHeap\)) and (\(\cap\)) not in the new memory (\(\overline{(newStack \cup newHeap}\)).

Another way to write the same property is to use a quantifier on every byte of memory, and say that on every byte, if this byte is in the old stack or the old heap but not in the new stack or the new heap, it should be erased: \[\forall b \in memory, ((isOldStack(b) \lor isOldHeap (b)) \land \neg (isNewStack (b) \lor isNewHeap (b)) \iff isErased (b)) \] Which means that for all the bytes in the memory (\(\forall b \in memory\)) what’s in the old regions (\(isOldStack(b) \lor isOldHeap (b)\)) but not in the new ones (\(\neg (isNewStack (b) \lor isNewHeap (b))\)) has to be erased (\(\iff isErased (b)\)).

We will also need to demonstrate that the heap and the stack are disjoint.

Ideally, we’d like to have SPARK make the link between these two ways of expressing things, as one may be easier to express than the other.

When designing the above properties, it became quite clear that I needed some intermediary library with set operations, in order to be able to express unions, intersections and complement operations. This will come with its own set of lower-level properties to prove and demonstrate.

Let’s now look at how to define the specification for this memory information.

Memory Specification and Operations

The first step is to define some way to track the properties of the memory - that is whether a specific byte is a byte of heap, of stack, and what kind of other properties they can be associated with (like, has it been erased?). The interesting point here is that the final program executable should not have to worry about these values - not only would it be expensive, it wouldn’t be practical either. We can’t easily store and track the status of every single byte. These properties should only be tracked for the purpose of statically verifying the required behavior.

There is a way in SPARK to designate code to be solely used for the purpose of assertion and formal verification, through so-called “ghost code”. This code will not be compiled to binary unless assertions are activated at run-time. But here we’ll push this one step further by writing ghost code that can’t be compiled in the first place. This restriction imposed on us will allow us to write assertions describing the entire memory, which would be impossible to compile or run.

The first step is to model an address. To be as close as possible to the actual way memory is defined, and to have access to Ada’s bitwise operations, we’re going to use a modular type. It turns out that this introduces a few difficulties: a modular type wraps around, so adding one to the last value goes back to the first one. However, in order to prove absence of run-time errors, we want to demonstrate that we never overflow. To do that, we can define a precondition on the “+” and “-” operators, with an initial attempt to define the relevant preconditions:

function "+" (Left, Right : Address_Type) return Address_Type
is (Left + Right)
with Pre => Address_Type'Last - Left >= Right; 

function "-" (Left, Right : Address_Type) return Address_Type
is (Left - Right)
with Pre => Left >= Right;

The preconditions  verify that Left + Right doesn’t exceed Address_Type’Last (for “+”), and that Left - Right is above zero (for “-”). Interestingly, we could have been tempted to write the first precondition the following way:

with Pre => Left + Right <= Address_Type'Last; 

However, with wrap-around semantics inside the precondition itself, this would  always be true.

There’s still a problem in the above code, due to the fact that “+” is actually implemented with “+” itself (hence there’s is an infinite recursive call in the above code). The same goes for “-”. To avoid that, we’re going to introduce a new type “Address_Type_Base” to do the computation without contracts, “Address_Type” being the actual type in use. The full code, together with some size constants (assuming 32 bits), then becomes:

Word_Size    : constant := 32;
Memory_Size  : constant := 2 ** Word_Size;
type Address_Type_Base is mod Memory_Size; -- 0 .. 2**Word_Size - 1
type Address_Type is new Address_Type_Base;

function "+" (Left, Right : Address_Type) return Address_Type
is (Address_Type (Address_Type_Base (Left) + Address_Type_Base (Right)))
with Pre => Address_Type'Last - Left >= Right;

function "-" (Left, Right : Address_Type) return Address_Type
is (Address_Type (Address_Type_Base (Left) - Address_Type_Base (Right)))
with Pre => Left >= Right;

Armed with the above types, it’s now time to get started on the modeling of actual memory. We’re going to track the status associated with every byte. Namely, whether a given byte is part of the Stack, part of the Heap, or neither; and whether that byte has been Scrubbed (erased). The prover will reason on the entire memory. However, the status tracking will never exist in the program itself - it will just be too costly to keep all this data at run time. Therefore we’re going to declare all of these entities as Ghost (they are here only for the purpose of contracts and assertions), and we will never enable run-time assertions. The code looks like:

type Byte_Property is record
   Stack    : Boolean;
   Heap     : Boolean;
   Scrubbed : Boolean;
end record
with Ghost;

type Memory_Type is array (Address_Type) of Byte_Property
with Ghost;

Memory : Memory_Type 
with Ghost;

Here, Memory is a huge array declared as a global ghost variable. We can’t write executable code with it, but we can write contracts. In particular, we can define a contract for a function that sets the heap between two address values. As a precondition for this function, the lower value has to be below or equal to the upper one. As a postcondition, the property of the memory in the range will be set to Heap. The specification looks like this:

procedure Set_Heap (From, To : Address_Type)
with
   Pre => To >= From,
   Post => (for all B in Address_Type => (if B in From .. To then Memory (B).Heap else not Memory (B).Heap)),
   Global => (In_Out => Memory);

Note that I’m also defining a global here which is how Memory is processed. Here it’s modified, so In_Out.

While the above specification is correct, it’s also incomplete. We’re defining what happens for the Heap property, but not the others. What we expect here is that the rest of the memory is unmodified. Another way to say this is that only the range From .. To is updated, the rest is unchanged. This can be modelled through the ‘Update attribute, and turn the postcondition into:

Post => (for all B in Address_Type => 
   (if B in From .. To then Memory (B) = Memory'Old (B)'Update (Heap => True) 
    else Memory (B) = Memory'Old (B)'Update (Heap => False))),

Literally meaning “The new value of memory equals the old value of memory (Memory’Old) with changes (‘Update) being Heap = True from From to To, and False elsewhere“.

Forgetting to mention what doesn’t change in data is a common mistake when defining contracts. It is also a source of difficulties to prove code, so it’s a good rule of the thumb to always consider what’s unchanged when checking these postconditions. Of course, the only relevant entities are those accessed and modified by the subprogram. Any variable not accessed is by definition unchanged.

Let’s now get to the meat of this requirement. We’re going to develop a function that moves the heap and the stack boundaries, and scrubs all that’s necessary and nothing more. The procedure will set the new heap boundaries between Heap_From .. Heap_To, and stack boundaries between Stack_From and Stack_To, and is defined as such:

procedure Move_Heap_And_Stack
   (Heap_From, Heap_To, Stack_From, Stack_To : Address_Type)

Now remember the expression of the requirement from the previous section:

\[\forall b \in memory, ((isOldStack(b) \lor isOldHeap (b)) \land \neg (isNewStack (b) \lor isNewHeap (b)) \iff isErased (b)) \]

This happens to be a form that can be easily expressed as a quantifier, on the Memory array described before:

(for all B in Address_Type =>
   (((Memory'Old (B).Stack or Memory'Old (B).Heap) 
         and then not 
      (Memory (B).Stack or Memory (B).Heap))
   = Memory (B).Scrubbed));

The above is literally the transcription of the property above, checking all bytes B in the address range, and then stating that if the old memory is Stack or Heap and the new memory is not, then the new memory is scrubbed, otherwise not. This contract is going to be the postcondition of Move_Heap_And_Stack. It’s the state of the system after the call.

Interestingly, for this to work properly, we’re going to need also to verify the consistency of the values of Heap_From, Heap_To, Stack_From and Stack_To - namely that the From is below the To and that there’s no overlap between heap and stack. This will be the precondition of the call:

((Stack_From <= Stack_To and then Heap_From <= Heap_To) and then
 Heap_From not in Stack_From .. Stack_To and then
 Heap_To not in Stack_From .. Stack_To and then
 Stack_From not in Heap_From .. Heap_To and then
 Stack_To not in Heap_From .. Heap_To);

That’s enough for now at this stage of the demonstration. We have specified the full functional property to be demonstrated. Next step is to implement, and prove this implementation.

Low-level Set Library

In order to implement the service, it would actually be useful to have a lower level library that implements set of ranges, together with Union, Intersect and Complement functions. This library will be called to determine which regions to erase - it will also be at the basis of the proof. Because this is going to be used at run-time, we need a very efficient implementation. A set of ranges is going to be defined as an ordered disjoint list of ranges. Using a record with discriminant, it looks like:

type Area is record
   From, To : Address_Type;
end record
with Predicate => From <= To;

type Area_Array is array(Natural range <>) of Area;
   
type Set_Base (Max : Natural) is record
   Size  : Natural;
   Areas : Area_Array (1 .. Max);
end record;

Note that I call this record Set_Base, and not Set. Bear with me.

You may notice above already a first functional predicate. In the area definition, the fields From and To are described such as From is always below To. This is a check very similar to an Ada range check in terms of where it applies - but on a more complicated property. For Set, I’m also going to express the property I described before, that Areas are ordered (which can can be expressed as the fact that the To value of an element N is below the From value of the element N + 1) and disjoint (the From of element N minus the To of element N + 1 is at least 1). There’s another implicit property to be specified which is that the field Size is below or equal to the Max size of the array. Being able to name and manipulate this specific property has some use, so I’m going to name it in an expression function:

function Is_Consistent (S : Set_Base) return Boolean is
   (S.Size <= S.Max and then
      ((for all I in 1 .. S.Size - 1 => 
         S.Areas (I).To < S.Areas (I + 1).From and then 
         S.Areas (I + 1).From - S.Areas (I).To > 1)));

Now comes the predicate. If I were to write:

type Set_Base (Max : Natural) is record
   Size  : Natural;
   Areas : Area_Array (1 .. Max);
end record
with Predicate => Is_Consistent (Set_Base);

I would have a recursive predicate call. Predicates are checked in particular when passing parameters, so Is_Consistent would check the predicate of Set_Base, which is a call to Is_Consistent, which would then check the predicate and so on. To avoid this, the predicate is actually applied to a subtype:

subtype Set is Set_Base
with Predicate => Is_Consistent (Set);

As it will later turn out, this property is very fundamental to the ability for proving other properties. At this stage, it’s already nice to see some non-trivial property being expressed, namely that the structure is compact (it doesn't waste space by having consecutive areas that could be merged into one, or said otherwise, all areas are separated by at least one excluded byte).

The formal properties expressed in the next steps will be defined in the form of inclusion - if something is included somewhere then it may or may not be included somewhere else. This inclusion property is expressed as a quantifier over all the ranges. It’s not meant to be run by the program, but only for the purpose of property definition and proof. The function defining that a given byte is included in a set of memory ranges can be expressed as follows:

function Includes (B : Address_Type; S : Set) return Boolean
is (for some I in 1 .. S.Size => 
    B in S.Areas (I).From .. S.Areas (I).To)
with Ghost;

Which means that for all the areas in the set S, B is included in the set if B is included in at least one (some) of the areas in the set.

I’m now going to declare a constructor “Create” together with three operators, “or”, “and”, “not” which will respectively implement union, intersection and complement. For each of those, I need to provide some expression of the maximum size of the set before and after the operation, as well as the relationship between what’s included in the input and in the output.

The specification of the function Create is straightforward. It takes a range as input, and creates a set where all elements within this range are contained in the resulting set. This reads:

function Create (From, To : Address_Type) return Set
with Pre => From <= To,
   Post => Create'Result.Max = 1
      and then Create'Result.Size = 1
      and then (for all B in Address_Type => 
         Includes (B, Create'Result) = (B in From .. To));

Note that interestingly, the internal implementation of the Set isn’t exposed by the property computing the inclusion. I’m only stating what should be included without giving details on how it should be included. Also note that as in many other places, this postconditon isn’t really something we’d like to execute (that would be possibly a long loop to run for large area creation). However, it’s a good way to model our requirement.

Let’s carry on with the “not”. A quick reasoning shows that at worst, the result of “not” is one area bigger than the input. We’ll need a precondition checking that the Size can indeed be incremented (it does not exceed the last value of the type). The postcondition is that this Size has been potentially incremented, that values that were not in the input Set are now in the resulting one and vice-versa. The operator with its postcondition and precondition reads:

function "not" (S : Set) return Set
with 
   Pre => Positive'Last - S.Size > 0,
   Post =>
      (for all B in Address_Type => 
          Includes (B, "not"'Result) /= Includes (B, S)) 
       and then "not"'Result.Size <= S.Size + 1;

The same reasoning can be applied to “and” and “or”, which leads to the following specifications:

function "or" (S1, S2 : Set) return Set
with 
   Pre => Positive'Last - S1.Size - S2.Size >= 0,
   Post => "or"'Result.Size <= S1.Size + S2.Size
      and Is_Consistent ("or"'Result)
      and (for all B in Address_Type => 
         (Includes (B, "or"'Result)) = 
            ((Includes (B, S1) or Includes (B, S2))));

function "and" (S1, S2 : Set) return Set
with 
   Pre => Positive'Last - S1.Size - S2.Size >= 0,
   Post => "and"'Result.Size <= S1.Size + S2.Size
      and (for all B in Address_Type => 
         (Includes (B, "and"'Result)) = 
            ((Includes (B, S1) and Includes (B, S2))));

Of course at this point, one might be tempted to first prove the library and then the user code. And indeed I was tempted and fell for it. However, as this turned out to be a more significant endeavor, let’s start by looking at the user code.

Move_Heap_And_Stack - the “easy” part

Armed with the Set library, implementing the move function is relatively straightforward. We’re using other services to get the heap and stack boundaries, then creating the set, using the proper operators to create the list to scrub, and then scrubbing pieces of memory one by one, then finally setting the new heap and stack pointers.

procedure Move_Heap_And_Stack
   (Heap_From, Heap_To, Stack_From, Stack_To : Address_Type)
is
   Prev_Heap_From, Prev_Heap_To, 
   Prev_Stack_From, Prev_Stack_To : Address_Type;
begin
   Get_Stack_Boundaries (Prev_Stack_From, Prev_Stack_To);
   Get_Heap_Boundaries (Prev_Heap_From, Prev_Heap_To);

   declare
      Prev : Set := Create (Prev_Heap_From, Prev_Heap_To) or
                    Create (Prev_Stack_From, Prev_Stack_To);
      Next : Set := Create (Heap_From, Heap_To) or 
                    Create (Stack_From, Stack_To);
      To_Scrub : Set := Prev and not Next;
   begin
      for I in 1 .. To_Scrub.Size loop
         Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);
      end loop;
      
      Set_Stack (Stack_From, Stack_To);
      Set_Heap (Heap_From, Heap_To);
   end;
end Move_Heap_And_Stack;

Now let’s dive into the proof. As a disclaimer, the proofs we’re going to do for now on are hard. One doesn’t need to go this far to take advantage of SPARK. As a matter of fact, defining requirements formally is already taking good advantage of the technology. Most people only prove data flow or absence of run-time errors which is already a huge win. The next level is some functional key properties. We’re going one level up and entirely proving all the functionalities. The advanced topics that are going to be introduced in this section, such as lemma and loop invariants, are mostly needed for these advanced levels.

The first step is to reset the knowledge we have on the scrubbing state of the memory. Remember that all of the memory state is used to track the status of the memory, but it does not correspond to any real code. To reset the flag, we’re going to create a special ghost procedure, whose sole purpose is to set these flags:

procedure Reset_Scrub
with
   Post =>
      (for all B in Address_Type =>
         Memory (B) = Memory'Old(B)'Update (Scrubbed => False)),
   Ghost;

In theory, it is not absolutely necessary to provide an implementation to this procedure if it’s not meant to be compiled. Knowing what it’s supposed to do is good enough here. However, it can be useful to provide a ghost implementation to describe how it would operate. The implementation is straightforward:

procedure Reset_Scrub is
begin
   for B in Address_Type'Range loop
      Memory (B).Scrubbed := False;
   end loop;
end Reset_Scrub;

We’re going to hit now our first advanced proof topic. While extremely trivial, the above code doesn’t prove, and the reason why it doesn’t is because it has a loop. Loops are something difficult for provers and as of today, they need help to break them down into sequential pieces. While the developer sees a loop, SPARK sees three different pieces of code to prove, connected by a so-called loop invariant which summarizes the behavior of the loop:

procedure Reset_Scrub is
   begin
      B := Address_Type’First;
         Memory (B).Scrubbed := False;
         [loop invariant]

Then:

         [loop invariant]
         exit when B = Address_Type’Last;
	 B := B + 1;
         Memory (B).Scrubbed := False;
         [loop invariant]

And eventually:

         [loop invariant]
   end loop;
end;

The difficulty is now about finding this invariant that is true on all these sections of the code, and that ends up proving the postcondition. To establish those, it’s important to look at what needs to be proven at the end of the loop. Here it would be that the entire array has Scrubbed = False, and that the other fields still have the same value as at the entrance of the loop (expressed using the attribute ‘Loop_Entry):

(for all X in Address_Type =>
   Memory (X) = Memory'Loop_Entry(X)'Update (Scrubbed => False))

Then to establish the loop invariant, the question is how much of this is true at each step of the loop. The answer here is that this is true up until the value B. The loop invariant then becomes:

(for all X in Address_Type'First .. B =>
   Memory (X) = Memory'Loop_Entry(X)'Update (Scrubbed => False))

Which can be inserted in the code:

procedure Reset_Scrub is
begin
   for B in Address_Type'Range loop
      Memory (B).Scrubbed := False;
      pragma Loop_Invariant
        (for all X in Address_Type'First .. B =>
            Memory (X) = 
            Memory'Loop_Entry(X)'Update (Scrubbed => False))
   end loop;
end Reset_Scrub;

Back to the main code. We can now insert a call to Reset_Scrub before performing the actual scrubbing. This will not do anything in the actual executable code, but will tell the prover to consider that the ghost values are reset. Next, we have another loop scrubbing subsection:

for I in 1 .. To_Scrub.Size loop
   Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);
end loop;

Same as before, the question is here what is the property true at the end of the loop, and breaking this property into what’s true at each iteration. The property true at the end is that everything included in the To_Scrub set has been indeed scrubbed:

(for all B in Address_Type => 
 Includes (B, To_Scrub) = Memory (B).Scrubbed);

This gives us the following loop with its invariant:

for I in 1 .. To_Scrub.Size loop
   Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);
              
   -- everything until the current area is properly scrubbed
   pragma Loop_Invariant
     (for all B in Address_Type'First .. To_Scrub.Areas (I).To =>
      Includes (B, To_Scrub) = Memory (B).Scrubbed);
end loop;

So far, this should be enough. Establishing these loop invariants may look a little bit intimidating at first, but with a little bit of practise they become rapidly straightforward.

Unfortunately, this is not enough.

Implementing Unproven Interaction with Registers

As this is a low level code, some data will not be proven in the SPARK sense. Take for example the calls that read the memory boundaries:

procedure Get_Heap_Boundaries (From, To : out Address_Type)
with Post => (for all B in Address_Type => (Memory (B).Heap = (B in From .. To)))
   and then From <= To;

procedure Get_Stack_Boundaries (From, To : out Address_Type)
with Post => (for all B in Address_Type => (Memory (B).Stack = (B in From .. To)))
   and then From <= To;

The values From and To are possibly coming from registers. SPARK wouldn’t necessary be able to make the link with the post condition just because the data is outside of its analysis. In this case, it’s perfectly fine to just tell SPARK about a fact without having to prove it. There are two ways to do this, one is to deactivate SPARK from the entire subprogram:

 procedure Get_Heap_Boundaries
     (From, To : out Address_Type)
 with SPARK_Mode => Off
 is
 begin
   -- code
 end Get_Heap_Boundaries;

In this case, SPARK will just assume that the postcondition is correct. The issue is that there’s no SPARK analysis on the entire subprogram, which may be too much. An alternative solution is just to state the fact that the postcondition is true at the end of the subprogram:

procedure Get_Heap_Boundaries
   (From, To : out Address_Type)
is
begin 
   -- code
      
   pragma Assume (for all B in Address_Type => (Memory (B).Heap = (B in From .. To)));
end Get_Heap_Boundaries;

In this example, to illustrate the above, the registers will be modeled as global variables read and written from C - which is outside of SPARK analysis as registers would be.

Move_Heap_And_Stack - the “hard” part

Before diving into what’s next and steering away readers from ever doing proof, let’s step back a little. We’re currently set to doing the hardest level of proof - platinum. That is, fully proving a program’s functional behavior. There is a lot of benefit to take from SPARK prior to reaching this stage. The subset of the language alone provides more analyzable code. Flow analysis allows you to easily spot uninitialized data. Run-time errors such as buffer overflow are relatively easy to clear out, and even simple gold property demonstration is reachable by most software engineers after a little bit of training.

Full functional proof - that is, complex property demonstration - is hard. It is also not usually required. But if this is what you want to do, there’s a fundamental shift of mindset required. As it turns out, it took me a good week to understand that. For a week as I was trying to build a proof from bottom to top, adding various assertions left and right, trying to make things fits the SPARK expectations. To absolutely no results.

And then, in the midst of despair, the apple fell on my head.

The prover is less clever than you think. It’s like a kid learning maths. It’s not going to be able to build a complex demonstration to prove the code by itself. Asking it to prove the code is not the right approach. The right approach is to build the demonstration of the code correctness, step by step, and to prove that this demonstration is correct. This demonstration is a program in itself, with its own subprograms, its own data and control flow, its own engineering and architecture. What the prover is going to do is to prove that the demonstration of the code is correct, and as the demonstration is linked to the code, the code happens to be indirectly proven correct as well.

Now reversing the logic, how could I prove this small loop? One way to work that out is to describe scrubbed and unscrubbed areas one by one:

  • On the first area to scrub, if it doesn’t start at the beginning of the memory, everything before the start has not been scrubbed.

  • When working on any area I beyond the first one, everything between the previous area I - 1 and the current one has not been scrubbed

  • At the end of an iteration, everything beyond the current area I is unscrubbed.

  • At the end of the last iteration, everything beyond the last area is unscrubbed

The first step to help the prover is to translate all of these into assertions, and see if these steps are small enough for the demonstration to be proven - and for the demonstration to indeed prove the code. At this stage, it’s not a bad idea to describe the assertion in the loop as loop invariants as we want the information to be carried from one loop iteration to the next. This leads to the following code:

for I in 1 .. To_Scrub.Size loop                        
   Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);
            
   pragma Loop_Invariant
      (if I > 1 then (for all B in Address_Type range To_Scrub.Areas (I - 1).To + 1 .. To_Scrub.Areas (I).From - 1 => 
         not Memory (B).Scrubbed));
            
   pragma Loop_Invariant
      (if To_Scrub.Areas (1).From > Address_Type'First
         then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Memory (B).Scrubbed));

   pragma Loop_Invariant
      (if To_Scrub.Areas (I).To < Address_Type'Last then
         (for all B in To_Scrub.Areas (I).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));
            
   pragma Loop_Invariant
      (for all B in Address_Type'First .. To_Scrub.Areas (I).To => Includes (B, To_Scrub) = Memory (B).Scrubbed);
end loop;
         
pragma Assert
   (if To_Scrub.Size >= 1 and then To_Scrub.Areas (To_Scrub.Size).To < Address_Type'Last then
      (for all B in To_Scrub.Areas (To_Scrub.Size).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));

Results are not too bad at first sight. Out of the 5 assertions, only 2 don’t prove. This may mean that they’re wrong - this may also mean that SPARK needs some more help to prove.

Let’s look at the first one in more details:

pragma Loop_Invariant
   (if To_Scrub.Areas (1).From > Address_Type'First
      then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Memory (B).Scrubbed));

Now if we were putting ourselves in the shoes of SPARK. The prover doesn’t believe that there’s nothing scrubbed before the first element. Why would that be the case? None of these bytes are in the To_Scrub area, right? Let’s check. To investigate this, the technique is to add assertions to verify intermediate steps, pretty much like what you’d do with the debugger. Let’s add an assertion before:

pragma Assert
   (if To_Scrub.Areas (1).From > Address_Type'First
      then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Includes (B, To_Scrub)));

That assertion doesn’t prove. But what would this be true? Recall that we have a consistency check for all sets which is supposed to be true at this point, defined as:

function Is_Consistent (S : Set_Base) return Boolean is
   (S.Size <= S.Max and then
      ((for all I in 1 .. S.Size - 1 => 
         S.Areas (I).To < S.Areas (I + 1).From and then 
         S.Areas (I + 1).From - S.Areas (I).To > 1)));

So looking at the above, if all areas are after the first one, there should be nothing before the first one. If Is_Consistent is true for To_Scrub, then the assertion ought to be true. Yet SPARK doesn’t believe us.

When reaching this kind of situation, it’s good practise to factor out the proof. The idea is to create a place where we say “given only these hypotheses, can you prove this conclusion?”. Sometimes, SPARK is getting lost in the wealth of information available, and just reducing the number of hypothesis to consider to a small number is enough for get it to figure out something.

Interestingly, this activity of factoring out a piece of proof is very close to what you’d do for a regular program. It’s also easier for the developer to understand small pieces of code than a large flat program. The prover is no better than that.

These factored out proofs are typically referred to as lemmas. They are Ghost procedures that prove a postcondition from a minimal precondition. For convention, we’ll call them all Lemma something. The Lemma will look like:

procedure Lemma_Nothing_Before_First (S : Set) with
   Ghost,
   Pre => Is_Consistent (S),
   Post =>
      (if S.Size = 0 then (for all B in Address_Type => not Includes (B, S))
         elsif S.Areas (1).From > Address_Type'First then
            (for all B in Address_Type'First .. S.Areas (1).From - 1 => not Includes (B, S)));

Stating that if S is consistent, then either it’s null (nothing is included) or all elements before the first From are not included.

Now let’s see if reducing the scope of the proof is enough. Let’s just add an empty procedure:

procedure Lemma_Nothing_Before_First (S : Set) is
begin
   null;
end Lemma_Nothing_Before_First;

Still no good. That was a good try though. Assuming we believe to be correct here (and we are), the game is now to demonstrate to SPARK how to go from the hypotheses to the conclusion.

To do so we need to take into account one limitation of SPARK - it doesn’t do induction. This has a significant impact on what can be deduced from one part of the hypothesis:

(for all I in 1 .. S.Size - 1 => 
   S.Areas (I).To < S.Areas (I + 1).From and then 
   S.Areas (I + 1).From - S.Areas (I).To > 1)

If all elements “I” are below the “I + 1” element, then I would like to be able to check that all “I” are below all the “I + N” elements after it. This ability to jump from proving a one by one property to a whole set is called induction. This happens to be extremely hard to do for state-of-the-art provers. Here lies our key. We’re going to introduce a new lemma that goes from the same premise, and then demonstrates that it means that all the areas after a given one are greater:

procedure Lemma_Order (S : Set) with
   Ghost,
   Pre => (for all I in 1 .. S.Size - 1 => S.Areas (I).To < S.Areas (I + 1).From),
   Post =>
      (for all I in 1 .. S.Size - 1 => (for all J in I + 1 .. S.Size => S.Areas (I).To < S.Areas (J).From));

And we’re going to write the demonstration as a program:

procedure Lemma_Order (S : Set)
is
begin
   if S.Size = 0 then
      return;
   end if;

   for I in 1 .. S.Size - 1 loop
      for J in I + 1 .. S.Size loop
         pragma Assert (S.Areas (J - 1).To < S.Areas (J).From);
         pragma Loop_Invariant (for all R in I + 1 .. J => S.Areas (I).To < S.Areas (R).From);
      end loop;

      pragma Loop_Invariant
        ((for all R in 1 .. I => (for all T in R + 1 .. S.Size => S.Areas (R).To < S.Areas (T).From)));
   end loop;
end Lemma_Order;

As you can see here, for each area I, we’re checking that the area I + [1 .. Size] are indeed greater. This happens to prove trivially with SPARK. We can now prove Lemma_Nothing_Before_First by applying the lemma Lemma_Order. To apply a lemma, we just need to call it as a regular function call. Its hypotheses (precondition) will be checked by SPARK, and its conclusion (postcondition) added to the list of hypotheses available to prove:

procedure Lemma_Nothing_Before_First (S : Set) is
begin
   Lemma_Order (S);
end Lemma_Nothing_Before_First;

This now proves trivially. Back to the main loop, applying the lemma Lemma_Nothing_Before_First looks like:

for I in 1 .. To_Scrub.Size loop                        
   Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);
   Lemma_Nothing_Before_First (To_Scrub);

   pragma Loop_Invariant
      (if I > 1 then 
         (for all B in Address_Type range To_Scrub.Areas (I - 1).To + 1 .. To_Scrub.Areas (I).From - 1 => not Memory (B).Scrubbed));
            
   pragma Loop_Invariant
      (if To_Scrub.Areas (1).From > Address_Type'First
         then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Memory (B).Scrubbed));

   pragma Loop_Invariant
      (if To_Scrub.Areas (I).To < Address_Type'Last then
         (for all B in To_Scrub.Areas (I).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));
            
   pragma Loop_Invariant
      (for all B in Address_Type'First .. To_Scrub.Areas (I).To => Includes (B, To_Scrub) = Memory (B).Scrubbed);
end loop;
         
pragma Assert
   (if To_Scrub.Size >= 1 and then To_Scrub.Areas (To_Scrub.Size).To < Address_Type'Last then
      (for all B in To_Scrub.Areas (To_Scrub.Size).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));

And voila! One more loop invariant now proving properly.

More madness, in a nutshell

At this point, it’s probably not worth diving into all the details of this small subprogram - the code is available here. There’s just more of the same.

The size of this small function is relatively reasonable. Now let’s give some insights on a much more difficult problem: the Set library.

A generous implementation brings about 250 lines of code (it could actually be less if condensed, but let’s start with this). That’s a little bit less than a day of work for implementation and basic testing.

For the so called silver level - that is absence of run-time errors, add maybe around 50 lines of assertions and half a day of work. Not too bad.

For gold level, I decided to prove one key property. Is_Consistent, to be true after each operator. Maybe a day of work was needed for that one. Add another 150 lines of assertions maybe. Still reasonable.

Platinum is about completely proving the functionality of my subprogram. And that proved (pun intended) to be a much much more difficult experience. See this link and this link for other similar experiences. As a disclaimer, I am an experienced Ada developer but had relatively little experience with proof. I also selected a problem relatively hard - the quantified properties and the Set structure are quite different, and proving quantifiers is known to be hard for provers to start with.With that in mind, the solution I came up with spreads over almost a thousand lines of code - and consumed about a week and a half of effort.

I’m also linking here a solution that my colleague Claire Dross came up with. She’s one of our most senior expert in formal proof, and within a day could prove the two most complex operators in about 300 lines of code (her implementation is also more compact than mine).

The above raises a question - is it really worth it? Silver absolutely does - it is difficult to bring a case against spending a little bit more effort in exchange for the absolute certainty of never having a buffer overflow or a range check error. There’s no doubt that in the time I spent in proving this, I would have spent much more in debugging either testing, or worse, errors in the later stages should this library be integrated in an actual product. Gold is also a relatively strong case. The fact that I only select key properties allows only concentrating on relatively easy stuff, and the confidence of the fact that they are enforced and will never have to be tested clearly outweighs the effort.

I also want to point out that the platinum effort is well worth it on the user code in this example. While it looks tedious at first sight, getting these properties right is relatively straightforward, and allows gaining confidence on something that can’t be easily tested; that is, a property on the whole memory.

Now the question remains - is it worth the effort on the Set library, to go from maybe two days of code + proof to around a week and a half?

I can discuss it either way but having to write 700 lines of code to demonstrate to the prover that what I wrote is correct keeps haunting me. Did I really have these 700 lines of reasoning in my head when I developed the code? Did I have confidence in the fact that each of those was logically linked to the next? To be fair, I did find errors in the code when writing those, but the code wasn’t fully tested when I started the proof. Would the test have found all the corner cases? How much time would such a corner case take to debug if found in a year? (see this blog post for some insights on hard to find bugs removed by proof).

Some people who safely certify software against e.g. avionics & railway standards end up writing 10 times more lines of tests than code - all the while just verifying samples of potential data. In that situation, provided that the properties under test can be modelled by SPARK assertions and that they fit what the prover knows how to do, going through this level of effort is a very strong case.

Anything less is open for debate. I have to admit, against all odds, it was a lot of fun and I would personally be looking forward to taking the challenge again. Would my boss allow me to is a different question. It all boils down to the cost of a failure versus the effort to prevent said failure. Being able to make an enlightened decision might be the most valuable outcome of having gone through the effort.

]]>
​Amazon Relies on Formal Methods for the Security of AWS http://blog.adacore.com/amazon-relies-on-formal-methods-for-the-security-of-aws Tue, 23 Oct 2018 13:17:09 +0000 Yannick Moy http://blog.adacore.com/amazon-relies-on-formal-methods-for-the-security-of-aws

Byron Cook, who founded and leads the Automated Reasoning Group at Amazon Web Services (AWS) Security, gave a powerful talk at the Federated Logic Conference in July about how Amazon uses formal methods for ensuring the security of parts of AWS infrastructure. In the past four years, this group of 20+ has progressively hired well-known formal methods experts to face the growing demand inside AWS to develop tools based on formal verification for reasoning about cloud security.

Cook summarizes very succinctly the challenge his team is addressing at 17:25 in the recording: "How does AWS continues to scale quickly and securely?"

A message that Cook hammers out numerous times in his talk is that "soundness is key". See at 25:05 as he explains that some customers value so much the security guarantees that AWS can offer with formal verification that it justified their move to AWS.

Even closer to what we do with SPARK, he talks at 26:42 about source code verification, and has this amaz(on)ing quote: "Proof is an accelerator for adoption. People are moving orders of magnitude workload more because they're like 'in my own data center I don't have proof' but there they have proofs."

In the companion article that was published at the conference, Cook gives more details about what the team has achieved and where they are heading now:

In  2017  alone  the  security  team  used  deductive  theorem provers or model checking tools to reason about cryptographic protocols/systems, hypervisors, boot-loaders/BIOS/firmware, garbage collectors, and network designs.
In many cases we use formal verification tools continuously to ensure that security is implemented as designed. In this scenario, whenever changes and  updates  to  the  service/feature  are  developed,  the  verification  tool is  reexecuted automatically prior to the deployment of the new version.
The customer reaction to features based on formal reasoning tools has been overwhelmingly  positive,  both  anecdotally  as  well  as  quantitatively.  Calls by AWS services to the automated reasoning tools increased by four orders of magnitude in 2017. With the formal verification tools providing the semantic foundation, customers can make stronger universal statements about their policies and networks and be confident that their assumptions are not violated.

While AWS certainly has unique security challenges that justify a strong investment in security, it's not unique in depending on complex software for its operations. What is unique so far is the level of investment at AWS in formal verification as a means to radically eliminate some security problems, both for them and for their customers.

This is certainly an approach we're eager to support with our own investment in the SPARK technology.

]]>
It's time to Make with Ada! http://blog.adacore.com/its-time-to-make-with-ada Wed, 17 Oct 2018 04:00:00 +0000 Emma Adby http://blog.adacore.com/its-time-to-make-with-ada

The challenge

Are you ready to develop a project to the highest levels of safety, security and reliability? If so, Make with Ada is the challenge for you! We’re calling on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and are offering over $8000 in total prizes. In addition, eligible students will compete for a reward of an Analog Discovery 2 Pro Bundle worth $299.99!

For this competition, we chose not to set a project theme because we want you to be able to demonstrate your inventiveness and to work on a project that motivates you. We’re inviting you to contribute to the world of safe, secure and reliable software by using Ada/SPARK to build something that matters. Learn how to program in Ada and SPARK here – learn.adacore.com.

It's time to Make with Ada. Find out more on Hackster.io.


]]>
Public Ada Training Paris, France Dec 3 - 7, 2018 http://blog.adacore.com/public-ada-training-paris-france-dec-3-7-2018 Wed, 10 Oct 2018 13:04:00 +0000 Pamela Trevino http://blog.adacore.com/public-ada-training-paris-france-dec-3-7-2018

This course is geared to software professionals looking for a practical introduction to the Ada language with a focus on embedded systems, including real-time features as well as critical features introduced in Ada 2012. By attending this course you will understand and know how to use Ada for both sequential and concurrent applications, through a combination of live lectures from AdaCore's expert instructors and hands-on workshops using AdaCore's latest GNAT technology. AdaCore will provide an Ada 2012 tool-chain and ARM-based target boards for embedded workshops. No previous experience with Ada is required.

The course will be conducted in English.

Prerequisite: Knowledge of a programming language (Ada 83, C, C++, Java…).

Each participant should come with a computer having at least one USB port and running Windows.

For the full agenda and to register, visit: https://www.adacore.com/public...

Attachments

]]>
Train control using Ada on a Raspberry Pi http://blog.adacore.com/train-control-using-ada-on-a-raspberry-pi Tue, 18 Sep 2018 12:50:43 +0000 Julia Teissl http://blog.adacore.com/train-control-using-ada-on-a-raspberry-pi

I was looking for a topic for my master thesis in embedded systems engineering when one of my advisor proposed the idea of programming a control system for autonomous trains in Ada. Since I am fascinated by the idea of autonomous vehicles I agreed immediately without knowing Ada. 

The obligatory "hello world" was written within a few minutes, so nothing could stop me from writing an application for autonomous trains. I was such a fool. The first days of coding were really hard and it felt like writing a Shakespeare sonnet in a language I had never heard but after a few days and a lot of reading in "Ada 2012" by John Barns I recognized the beauty and efficiency of this language.

Half a year had to go by before the first lines of code for my project could be written. First, I had to define requirements and choose which hardware to use. As a start, my advisor suggested Raspberry Pi 3 as implementation platform. Then I had to dive into the world of model trains to develop my first model railway layout.

My railway track

A lot of soldering and wiring was required to build such a simple track (see image below). What a perfect job for a software developer, but as a prospective engineer nothing seemed impossible. It took two months until the whole hardware part was set up. Thanks to my colleagues, electronic engineers, we managed to set up everything as required and stabilized the signals until the interference from the turnout drives was eliminated. 


Img 1143
Img 1141

The system architecture  (see below) is a bit complicated since a lot of different components had to be taken into account. 

System architecture and communication protocols

To communicate with the trains, turnouts and train position detectors I used an Electronic Solutions Ulm Command Station 50210 (ECoS). It uses the DCC Railcom protocol to communicate with the attached hardware. There are two possibilities to interact with the ECoS -- either an IP interface to send commands  or the touchscreen. 

The Messaging Raspberry Pi (MRP) receives all messages of the system and continuously polls  its interfaces to detect changes for example if a button was pressed to call a train to a specific station.  A colleague wrote the software for this part in C#. All messages are converted into a simple format, are transferred to the Control Raspberry Pi and are finally interfaced to the Ada train control software. 

A third microcontroller (an STM32F1) triggers the attached LEDs  to display if a part of the track can be passed through. The  respective commands are sent via UART to the STM32F1. The  STM32 software is written in C, but will be rewritten in Ada in the next winter semester by my students during their Ada lessons. If you  allow me a side note: I was overwhelmed by the simplicity of Adas multitasking mechanisms, so that I decided to change the content of my safety programming lectures from C and the MISRA C standard to Ada and SPARK, but that is another story. 

A goal of my master thesis was to program an on demand autonomous train system. Therefore, trains have to drive autonomously to the different stations,  turnouts must be changed,  signals must be set and trains must be detected along the track. Each train has its own task storing and processing information: it has to identify its position on the track or the next station to stop at  and to offer information like which message it expects next. One main task receives all the messages from the MRP, analyzes them and passes them to the different train tasks. The main part of the software is to handle and analyze all the  messages, since each component like the signals or turnouts have their own protocols to adhere to.  Thankfully, rendezvous in Ada are very easy to implement so it was fun to write this part of code. 

The following video shows a train which is called to a station and drives to a terminus station after  stopping at the station and picking up passengers. As the train starts from the middle station, another train has to drive to the middle station. The middle station has to be  occupied at all times because waiting times must be kept as short as possible. The signals are not working in this video because I started a new project with a bigger track and already detached most of the parts. 

]]>
Ada on FPGAs with PicoRV32 http://blog.adacore.com/ada-on-fpgas-with-picorv32 Tue, 11 Sep 2018 12:27:00 +0000 Fabien Chouteau http://blog.adacore.com/ada-on-fpgas-with-picorv32

When I bought the TinyFPGA-BX board, I thought it would be an opportunity to play a little bit with FPGA, learn some Verilog or VHDL. But when I discovered that it was possible to have a RISC-V CPU on it, I knew I had to run Ada code on it.

The RISC-V CPU in question is the PicoRV32 from Clifford Wolf. It is written in Verilog and implements the RISC-V 32bits instruction set (IMC extensions). In this blog post I will explain how I added support for this CPU and made an example project.

Compiler and run-time

More than a year ago I wrote a blog post about building an experimental Ada compiler and running code the first RISC-V micro-controller, the HiFive1. Since then, we released an official support of RISC-V in the Pro and Community editions of GNAT so you don’t even have to build the compiler anymore.

For the run-time, we will start from the existing HiFive1 run-time and change a few things to match the specs of the PicoRV32. As you will see it’s very easy.

Compared to the HiFive1, the PicoRV32 run-time will have

  • A different memory map (RAM and flash)

  • A different text IO driver (UART)

  • Different instruction set extensions

Memory map

This step is very simple, we just use linker script syntax to declare the two memory areas of the TinyFPGA-BX chip (ICE40):

MEMORY
{
  flash (rxai!w) : ORIGIN = 0x00050000, LENGTH = 0x100000
  ram (wxa!ri)   : ORIGIN = 0x00000000, LENGTH = 0x002000
}

Text IO driver

Again this is quite simple, the UART peripheral provided with PicoRV32 only has two registers. We first declare them at their respective addresses:

   UART_CLKDIV : Unsigned_32
     with Volatile, Address => System'To_Address (16#02000004#);

   UART_Data : Unsigned_32
     with Volatile, Address => System'To_Address (16#02000008#);

Then the code just initializes the peripheral by setting the clock divider register to get a 115200 baud rate, and sends characters to the data register:

   procedure Initialize is
   begin
      UART_CLKDIV := 16_000_000 / 115200;
      Initialized := True;
   end Initialize;

   procedure Put (C : Character) is
   begin
      UART_Data := Unsigned_32 (Character'Pos (C));
   end Put;

Run-time build script

The last modification is to the Python scripts that create the run-times. To create a new run-time, we add a class that defines different properties like compiler switches or the list of files to include:

class PicoRV32(RiscV32):
    @property
    def name(self):
        return 'picorv32'

    @property
    def compiler_switches(self):
        # The required compiler switches
        return ['-march=rv32imc', '-mabi=ilp32']

    @property
    def has_small_memory(self):
        return True

    @property
    def loaders(self):
        return ['ROM']

    def __init__(self):
        super(PicoRV32, self).__init__()

        # Use the same base linker script as the HiFive1
        self.add_linker_script('riscv/sifive/hifive1/common-ROM.ld',
                               loader='ROM')

        self.add_linker_script('riscv/picorv32/memory-map.ld',
                               loader='ROM')

        # Use the same startup code as the HiFive1
        self.add_sources('crt0', ['riscv/sifive/fe310/start-rom.S',
                                  'riscv/sifive/fe310/s-macres.adb',
                                  'riscv/picorv32/s-textio.adb'])

Building the run-time

Once all the changes are made, it is time to build the run-time.

First download and install the Community edition of the RISC-V32 compiler from adacore.com/download (Linux64 host only).

Then run the script inside the bb-runtime repository to create the run-time:

$ git clone https://github.com/AdaCore/bb-runtimes
$ cd bb-runtimes
$ ./build_rts.py --bsps-only --output=temp picorv32

Compile the generated run-time:

$ gprbuild -P temp/BSPs/zfp_picorv32.gpr

And install:

$ gprinstall -p -f -P temp/BSPs/zfp_picorv32.gpr

This is it for the run-time. For a complete view of the changes, you can have a look at the commit on GitHub: here.

Example project

Now that we can compile Ada code for the PicoRV32, let’s work on an example project. I wanted to include a custom Verilog module, otherwise what’s the point of using an FPGA, right? So I made a peripheral that controls WS2812 RGB LEDs, also known as Neopixels.

I won’t explain the details of this module, I would just say that digital logic is difficult for the software engineer that I am :)

The hardware/software interface is a memory mapped register that once written to, sends a WS2812 data frame. To control a strip of multiple LEDs, the software just has to write to this register multiple times in a loop.

The example software is relatively simple, the Neopixel driver package is generic so that it can handle different lengths of LED strips. The address of the memory mapped register is also a parameter of the generic, so it is possible to have multiple peripherals controlling different LED strips. The memory mapped register is defined using Ada’s representation clauses and Address attribute:

  type Pixel is record
      B, R, G : Unsigned_8;
   end record
     with Size => 32, Volatile_Full_Access;

   for Pixel use record
      B at 0 range 0 .. 7;
      R at 0 range 8 .. 15;
      G at 0 range 16 .. 23;
   end record;

   Data_Register : Pixel with Address => Peripheral_Base_Addr;

Then an HSV to RGB conversion function is used to implement different animations on the LED strip, like candle simulation or rainbow effect. And finally there are button inputs to select the animation and the intensity of the light.

Both hardware and software sources can be found in this repository. I recommend to follow the TinyFPGA-BX User Guide first to get familiar with the board and how the bootloader works.


Feeling inspired and want to start Making with Ada? We have the perfect challenge for you!

The Make with Ada competition, hosted by AdaCore, calls on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and offers over €8000 in total prizes.

]]>
AdaCore major sponsor at HIS 2018 http://blog.adacore.com/adacore-major-sponsor-at-his-2018 Wed, 08 Aug 2018 09:51:00 +0000 Pamela Trevino http://blog.adacore.com/adacore-major-sponsor-at-his-2018

We are happy to announce that, AdaCore, alongside Altran and Jaguar Land Rover will be major sponsors of the fifth edition of the renowned High Integrity Software Conference on the 6th November in Bristol!

The core themes of the conference this year will be Assured Autonomous Systems, Hardware and Software Architectures, Cyber Security – People & Practice, Languages & Applications. We will address the relationship between software vulnerability, security and safety, and the far-reaching impacts it can have.

This year's programme offers a wide and diverse selection of talks, including our keynotes presented by Simon Burton from Bosch and Dr. Andrew Blyth from the University of Wales.

Speakers from ANSSI, Assuring Autonomy International Programme - University of York, BAE Systems, CyBOK, Imperial College, IRM, Meridian Mobile, NCSC, RealHeart, Rolls-Royce, University of Edinburgh will be giving technical sessions on building trustworthy software for embedded, connected and infrastructure systems across a range of application domains.

For the full agenda and to register: https://www.his-2018.co.uk/

]]>
Learn.adacore.com is here http://blog.adacore.com/learn-adacore-com-is-here Wed, 25 Jul 2018 14:47:00 +0000 Fabien Chouteau http://blog.adacore.com/learn-adacore-com-is-here

We are very proud to announce the availability of our new Ada and SPARK learning platform: learn.adacore.com.

Following on from the AdaCoreU(niversity) e-learning platform, this website is the next step in AdaCore’s endeavour to provide a better online learning experience for the Ada and SPARK programming languages. This new website is designed for individuals who want to get up and running with Ada/SPARK, and also for teams or teachers looking for training or tutorial material based on Ada/SPARK. In designing the site, we decided to evolve from the video-based approach used for AdaCoreU and instead have created text-based, interactive content to ease the learning experience. In light of this, AdaCoreU will be decommissioned in the coming weeks but the course videos are already available on YouTube and the base material, including slides, remains available on GitHub for people who want to use it for their courses or trainings.

The main benefit of a textual approach is the greater flexibility in how you advance through the course. Now you can easily pick and choose from the course material with the opportunity to move on to more advanced sections but then refer back to previous content when needed.

We also provide greater interactivity through code snippets embedded in a widget that allows you to compile, run and even prove your code (in the case of SPARK) directly from your Web browser. This allows you to experiment with the tools without having to install them, and tweak the examples to gain a better understanding for what’s allowed and feasible.

As for course content production, this new format also allows greater interactivity with the community. The source code for the courses are in reST (reStructuredText) format, which makes it easy to edit and collaborate. It’s also hosted on GitHub so that everyone can suggest fixes or ideas for new content.

For the release of learn.adacore.com, you will find two courses, an introduction to Ada and an introduction to SPARK, as well as an ebook “Ada for the C++ or Java Developer”. In the future we plan to add advanced Ada and SPARK courses and also a dedicated course on embedded programming, so watch this space!

Learn.adacore.com can also help you get started early and gain a competitive advantage ahead of this year’s Make with Ada competition.


]]>
GNAT Community 2018 is here! http://blog.adacore.com/gnat-community-2018 Tue, 26 Jun 2018 13:01:45 +0000 Emma Adby http://blog.adacore.com/gnat-community-2018

Calling all members of the Ada and SPARK community, we are pleased to announce that GNAT Community 2018 is here! adacore.com/download

What’s new?

BBC micro:bit first class support

We decided to adopt the micro:bit as our reference platform for teaching embedded Ada and SPARK. We chose the micro:bit for its worldwide availability, great value for a low price and the included hardware debugger. You can get one at:

We did our best to provide a smooth experience when programing the micro:bit in Ada or SPARK. Here is a quick guide on how to get started:

  • Download and install GNAT arm-elf hosted on your platform: Windows, Linux or MacOS. This package contains the ARM cross compiler as well the required Ada run-times

  • Download and install GNAT native for your platform: Windows, Linux or MacOS. This package contains the GNAT Programming Studio IDE and an example to run on the micro:bit

  • Start GNAT Programming Studio

  • Click on “Create a new Project”

  • Select the “Scrolling Text” project under “BBC micro:bit” and click Next

  • Enter the directory where you want the project to be deployed and click Apply

  • Plug your micro:bit board with a USB cable, and wait for the system to recognize it. This can take a few seconds

  • Back in GNAT Programming Studio, click on the “flash to board” icon

  • That’s it!

The example provided only uses the LED matrix, for support of more advanced micro:bit features, please have a look at the Ada Drivers Library project.

RISC-V support

The RISC-V open instruction set and its ecosystem are getting more interesting every week, with big names of the tech industry investing in the platform as well as cheap prototyping board being available for makers and hobbyists. After a first experiment last year, AdaCore decided to develop the support of RISC-V in GNAT. In this GNAT Community release we provide support for bare-metal RISC-V 32bit hosted on Linux. In particular, for the HiFive1 board from SiFive. You will also find drivers for this board in the Ada Drivers Library project.

Find SPARK included in the package by default

For the first time in the community release, SPARK is now packaged with the native compiler, making it very easy for everyone to try it out. The three standard provers Alt-Ergo, CVC4 and Z3 are included.

Windows 64bit is finally here

By popular request, we decided to change GNAT Community Windows from 32bit to 64bit.

Arm-elf hosted on Mac

In this release we also add the support for ARM bare-metal hosted on MacOS (previously only Windows and Linux). This includes the support for the BBC micro:bit mentioned above, as well as our usual STM32F4 and STM32F7 boards.



Feeling inspired and want to start Making with Ada today? Use the brand new GNAT Community 2018 to get a head start in this years Make with Ada competition! makewithada.org

]]>
Security Agency Uses SPARK for Secure USB Key http://blog.adacore.com/security-agency-uses-spark-for-secure-usb-key Mon, 25 Jun 2018 12:51:00 +0000 Yannick Moy http://blog.adacore.com/security-agency-uses-spark-for-secure-usb-key

ANSSI, the French national security agency, has published the results of their work since 2014 on designing and implementing an open-hardware & open-source USB key that provides defense-in-depth against vulnerabilities on the USB hardware, architecture, protocol and software stack. In this project called WooKey, Ada and SPARK are key components for the security of the platform. The conference paper (in English), the presentation slides (in French) and a video recording of their presentation (in French) are all available online on the website of the French security conference SSTIC 2018. The complete hardware designs and software code will be available in Q3 2018 in the GitHub project (currently empty).

Following the BadUSB vulnerability discloser in 2014 (a USB key can be used to impersonate other devices, and permanently infect all computers it connects to, as well as devices connected to these computers), the only solution to defend against such attacks (to this day) has been to disable USB connections on computers. There are a number of commercial providers of secure USB keys, but their hardware/software stacks are proprietary, so it's not possible to evaluate their level of security. Shortly after the BadUSB disclosure, ANSSI set up an internal project to devise a secure USB key that would restore trust, by being fully open source, based on state-of-the-art practice, yet be affordable for anyone to build/use. The results, four years later, were presented at conference SSTIC 2018 on June 13th.

What is interesting is the key role played by the use of safe languages (Ada and SPARK) as well as formal verification (SPARK) to secure the most important services of the EwoK micro-kernel on the USB key, and the combination of these with measures to design a secure software architecture and a secure hardware. They were also quite innovative in their adoption of Ada/SPARK, automatically and progressively replacing units in C with their counterpart in Ada/SPARK in their build system. Something worth noting is that the team discovered Ada/SPARK as part of this project, and managed to prove absence of runtime errors (no buffer overflows!) in their code easily.

Arnauld Michelizza from ANSSI will present their work on the EwoK micro-kernel and the software development process they adopted as part of High Integrity Software conference in Bristol on November 6.


]]>
How Ada and SPARK Can Increase the Security of Your Software http://blog.adacore.com/how-ada-and-spark-can-increase-the-security-of-your-software Tue, 29 May 2018 12:15:14 +0000 Yannick Moy http://blog.adacore.com/how-ada-and-spark-can-increase-the-security-of-your-software
AdaCore Technologies for Cyber Security booklet

There is a long-standing debate about which phase in the Software Development Life Cycle causes the most bugs: the specification phase or the coding phase? Along with the information on the cost to fix these bugs, answering this question would allow better allocation of QA (Quality Assurance) resources. Furthermore, the cost of bug fixes remains the subject of much debate.

A recent study by NIST shows that, in the software industry at large, coding bugs are causing the majority of security issues. They analyzed the provenance of security bugs across all publicly disclosed vulnerabilities in the National Vulnerability Database from 2008 to 2016. They discovered that coding bugs account for two thirds of the total. As they say: 

The high proportion of implementation errors suggests that little progress has been made in reducing these vulnerabilities that result from simple mistakes, but also that more extensive use of static analysis tools, code reviews, and testing could lead to significant improvement.

Our view at AdaCore is that the above list of remedies lacks a critical component for "reducing these vulnerabilities that result from simple mistakes" and probably the most important one: pick a safer programming language! This might not be appropriate for all your software, but why not re-architect your system to isolate the most critical parts and progressively rewrite them with a safer programming language? Better still - design your next system this way in the first place. What safer language to choose? One candidate is Ada, or its SPARK subset. How can they help? We've collected the answers to that question in a booklet to help people and teams who want to use Ada or SPARK for increasing the security of their software. It is freely available here.

]]>
Taking on a Challenge in SPARK http://blog.adacore.com/taking-on-a-challenge-in-spark Tue, 08 May 2018 14:01:00 +0000 Johannes Kanig http://blog.adacore.com/taking-on-a-challenge-in-spark

Last week, the programmer Hillel posted a challenge (the link points to a partial postmortem of the provided solutions) on Twitter for someone to prove a correct implementation of three small programming problems: Leftpad, Unique, and Fulcrum.

This was a good opportunity to compare the SPARK language and its expressiveness and proof power to other systems and paradigms, so I took on the challenge. The good news is that I was able to prove all three solutions and that the SPARK proofs of each complete in no more than 10 seconds. I also believe the Fulcrum solution in particular shows some aspects of SPARK that are especially nice.

I will now explain my solutions to each problem, briefly for Leftpad and Unique and in detail for Fulcrum. At the end, I discuss my takeaways from this challenge.

Leftpad

Hillel mentioned that the inclusion of Leftpad into the challenge was kind of a joke. A retracted JavaScript package that implemented Leftpad famously broke thousands of projects back in 2016. Part of the irony was that Leftpad is so simple one shouldn’t depend on a package for this functionality.

The specification of Leftpad, according to Hillel, is as follows:

Takes a padding character, a string, and a total length, returns the string padded to that length with that character. If length is less than the length of the string, does nothing.

It is always helpful to start with translating the specification to a SPARK contract. To distinguish between the two cases (padding required or not), we use contract cases, and arrive at this specification (see the full code in this github repository):


   function Left_Pad (S : String; Pad_Char : Character; Len : Natural)
                      return String
   with Contract_Cases =>
     (Len > S'Length =>
            Left_Pad'Result'Length = Len and then
            (for all I in Left_Pad'Result'Range =>
                 Left_Pad'Result (I) =
                        (if I <= Len - S'Length then Pad_Char
                         else S (I - (Len - S'Length + 1) + S'First))),
      others => Left_Pad'Result = S);

In the case where padding is required, the spec also nicely shows how the result string is composed of both padding chars and chars from the input string.

The implementation in SPARK is of course very simple; we can even use an expression function to do it:

   function Left_Pad (S : String; Pad_Char : Character; Len : Natural)
                      return String
   is ((1 .. Len - S'Length => Pad_Char) & S);

Unique

The problem description, as defined by Hillel:

Takes a sequence of integers, returns the unique elements of that list. There is no requirement on the ordering of the returned values.

An explanation in this blog wouldn’t add anything to the commented code, so I suggest you check out the code for my solution directly.

Fulcrum

The Fulcrum problem was the heart of the challenge. Although the implementation is also just a few lines, quite a lot of specification work is required to get it all proved.

The problem description, as defined by Hillel:

Given a sequence of integers, returns the index i that minimizes |sum(seq[..i]) - sum(seq[i..])|. Does this in O(n) time and O(n) memory.

(Side note: It took me quite a while to notice it, but the above notation seq[..i] apparently means the slice up to and excluding the value at index i. I have taken it instead to mean the slice up to and including the value at index i. Consequently, I used seq[i+1..] for the second slice. This doesn't change the nature or difficulty of the problem.)

I’m pleased with the solution I arrived at, so I will present it in detail in this blog post. It has two features that I think none of the other solutions has:

  • It runs in O(1) space, compared to the O(n) permitted by the problem description and required by the other solutions;

  • It uses bounded integers and proves absence of overflow; all other solutions use unbounded integers to side-step this issue.

Again, our first step is to transform the problem statement into a subprogram specification (please refer to the full solution on github for all the details: there are many comments in the code). For this program, we use arrays to represent sequences of integers, so we first need an array type and a function that can sum all integers in an array. My first try looked like this:

type Seq is array (Integer range <>) of Integer;

function Sum (S : Seq) return Integer is
  (if S’Length = 0 then 0
   else S (S’First) + Sum (S (S’First + 1 .. S’Last)));

So we are just traversing the array via recursive calls, summing up the cells as we go. Assuming an array S of type Seq and an array index I, we could now write the required difference between the left and right sums as follows (assuming S is non-empty, which we assume throughout):

abs (Sum (S (S’First .. I)) - Sum (S (I + 1 .. S’Last)))

However, there are two problems with this code, and both are specific to how SPARK works. The first problem could be seen as a limitation of SPARK: SPARK allows recursive functions such as Sum, but can’t use them to prove code and specifications that refer to them. This would result in a lot of unproved checks in our code. But my colleague Claire showed me a really nice workaround, which I now present .

The idea is to not compute a single sum value, but instead an array of partial sums, where we later can select the partial sum we want. Because such an array can’t be computed in a single expression, we define a new function Sum_Acc as a regular function (not an expression function) with a postcondition:

   function Sum_Acc (S : Seq) return Seq 
   with Post =>
     (Sum_Acc'Result'Length = S'Length and then
      Sum_Acc'Result'First = S'First and then
      Sum_Acc'Result (S'First) = S (S'First) and then
      (for all I in S'First + 1 .. S'Last =>
            Sum_Acc'Result (I) = Sum_Acc'Result (I - 1) + S (I)));

The idea is that each cell of the result array contains the sum of the input array cells, up to and including the corresponding cell in the input array. For an input array (1,2,3), the function would compute the partial sums (1,3,6). The last value is always the sum of the entire array. In the postcondition, we express that the result array has the same length and bounds as the input array, that the first cell of the result array is always equal to the first cell of the input array, and that each following cell is the sum of the previous cell of the result array, plus the input cell at the same index.

The implementation of this Sum_Acc function is straightforward and can be found in the code on github. We also need a Sum_Acc_Rev function which computes the partial sum starting from the end of the array. It is almost the same as Sum_Acc, but the initial (in fact last) value of the array will be zero, owing to the asymmetric definition of the two sums in the initial problem description. You can also find its specification and implementation in the github repository.

For our array S and index I, the expression to compute the difference between left and right sum now becomes:

abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I))

The second problem is that we are using Integer both as the type of the array cells and the sum result, but Integer is a bounded integer type (32 or 64 bits, depending on your platform), so the sum function will overflow! All other Fulcrum solutions referred to in Hillel’s summary post use unbounded integers, so they avoid this issue. But in many contexts, unbounded integers are unacceptable, because they are unpredictable in space and time requirements and require dynamic memory handling. This is why I decided to increase the difficulty of the Fulcrum problem a little and include proof of absence of overflow.

To solve the overflow problem, we need to bound the size of the values to be summed and how many we can sum them. We also need to take into account negative values, so that we also don’t exceed the minimum value. Luckily, range types in SPARK make this simple:

   subtype Int is Integer range -1000 .. 1000;
   subtype Nat is Integer range 1 .. 1000;

   type Seq is array (Nat range <>) of Int;
   type Partial_Sums is array (Nat range <>) of Integer;

We will use Int as the type for the contents of arrays, and Nat for the array indices (effectively limiting the size of arrays to at most 1000). The sums will still be calculated in Integer, so we need to define a new array type to hold the partial sums (we need to change the return type of the partial sums functions to Partial_Sums to make this work). So if we were to sum the largest possible array with 1000 cells, containing only the the highest (or lowest) value 1000 (or -1000), the absolute value of the sum would not exceed 1 million, which even fits easily into 32 bits. Of course, we could also choose different, and much larger, bounds here.

We now can formulate the specification of Find_Fulcrum as follows:

  function Find_Fulcrum (S : Seq) return Nat
  with Pre => S'Length > 0,
       Post =>
         (Find_Fulcrum'Result in S'Range and then
         (for all I in S'Range =>
               abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I)) >=
               abs (Sum_Acc (S) (Find_Fulcrum'Result) - Sum_Acc_Rev (S) (Find_Fulcrum'Result))));

The implementation of Fulcrum

The Sum_Acc and Sum_Acc_Rev functions we already defined suggest a simple solution to Fulcrum that goes as follows:

  1. call Sum_Acc and Sum_Acc_Rev and store the result;

  2. compute the index where the difference between the two arrays is the smallest.

The problem with this solution is that step (1) takes O(n) space but we promised to deliver a constant space solution! So we need to do something else.

In fact, we notice that every program that calls Sum_Acc and Sum_Acc_Rev will be already O(n) in space, so we should never call these functions outside of specifications. The Ghost feature of SPARK lets the compiler check this for us. Types, objects and functions can be marked ghost, and such ghost entities can only be used in specifications (like the postcondition above, or loop invariants and other intermediate assertions), but not in code. This makes sure that we don’t call these functions accidentally, slowing down our code. Marking a function ghost can be done just by adding with Ghost to its declaration.

The constant space implementation idea is that if, for some index J - 1, we have the partial sums Left_Sum = Sum_Acc (J - 1) and Right_Sum = Sum_Acc_Rev (J - 1), we can compute the values for the next index J + 1 by simply adding S (J) to Left_Sum and subtracting it from Right_Sum. This simple idea gives us the core of the implementation:

for I in S'First + 1 .. S'Last loop
    Left_Sum := Left_Sum + S (I);
    Right_Sum := Right_Sum - S (I);
    if abs (Left_Sum - Right_Sum) < Min then
       Min := abs (Left_Sum - Right_Sum);
       Index := I;
    end if;
end loop;
return Index;

To understand this code, we also need to know that Min holds the current minimal difference between the two sums and Index gives the array index where this minimal difference occurred.

In fact, the previous explanations of the code can be expressed quite nicely using this loop invariant (it also holds at the beginning of the loop):

pragma Loop_Invariant
      (Left_Sum = Sum_Acc (S) (I - 1) and then
       Right_Sum = Sum_Acc_Rev (S) (I - 1) and then
       Min = abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index)) and then
       (for all K in S'First .. I - 1 =>
               abs (Sum_Acc (S) (K) - Sum_Acc_Rev (S) (K)) >=
               abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index))));

The only part that’s missing is the initial setup, so that the above conditions hold for I = S’First. For the Right_Sum variable, this requires a traversal of the entire array to compute the initial right sum. For this we have written a helper function Sum which is O(n) time and O(1) space. So we end up with this initialization code for Find_Fulcrum:

Index     : Nat     := S'First;
Left_Sum  : Integer := S (S'First);
Right_Sum : Integer := Sum (S);
Min       : Integer := abs (Left_Sum - Right_Sum);

and it can be seen that these initial values establish the loop invariants for the first iteration of the loop (where I = S’First + 1, so I - 1 = S’First).

Some metrics

It took me roughly 15 minutes to come up with the Leftpad proof. I don’t have exact numbers for the two other problems, but I would guess roughly one hour for Unique and 2-3 hours for Fulcrum.

Fulcrum is about 110 lines of code, including specifications, but excluding comments and blank lines. An implementation without any contracts would be at 35 lines, so we have a threefold overhead of specification to code … though that’s typical for specification of small algorithmic problems like Fulcrum.

All proofs are done automatically, and SPARK verifies each example in less than 10 seconds.

Some thoughts about the challenge

First of all, many thanks to Hillel for starting that challenge and to Adrian Rueegsegger for bringing it to my attention. It was fun to do, and I believe SPARK did reasonably well in this challenge.

Hillel’s motivation was to counter exaggerated praise of functional programming (FP) over imperative programming (IP). So he proved these three examples in imperative style and challenged the functional programming community to do the same. In this context, doing the problems in SPARK was probably besides the point, because it’s not a functional language.

Going beyond the FP vs IP question asked in the challenge, I think we can learn a lot by looking at the solutions and the discussion around the challenge. This blog post by Neelakantan Krishnaswami argues that the real issue is the combination of language features, in particular aliasing in combination with side effects. If you have both, verification is hard.

One approach is to accept this situation and deal with it. In Frama-C, a verification tool for C, it is common to write contracts that separate memory regions. Some tools are based on separation logic, which directly features separating conjunction in the specification language. Both result in quite complex specifications, in my opinion.

Or you can change the initial conditions and remove language features. Functional programming removes side effects to make aliasing harmless. This has all kinds of consequences, for example the need for a garbage collector and the inability to use imperative algorithms from the literature. SPARK keeps side effects but removes aliasing, essentially by excluding pointers and a few other language rules. SPARK makes up for the loss of pointers by providing built-in arrays and parameter modes (two very common usages of pointers in C), but shared data structures still remain impossible to write in pure SPARK: one needs to leave the SPARK subset and go to full Ada for that. Rust keeps pointers, but only non-aliasing ones via its borrow checker; however, formal verification for Rust is still in very early stages as far as I know. The SPARK team is also working on support for non-aliasing pointers.

Beyond language features, another important question is the applicability of a language to a domain. Formal verification of code matters most where a bug would have a big impact. Such code can be found in embedded devices, for example safety-critical code that makes sure the device is safe to use (think of airplanes, cars, or medical devices). Embedded programming, in particular safety-critical embedded programming, is special because different constraints apply. For example, execution times and memory usage of the program must be very predictable, which excludes languages that have a managed approach to memory (memory usage becomes less predictable) with a GC (often, the GC can kick in at any time, which makes execution times unpredictable). In these domains, functional languages can’t really be applied directly (but see the use of Haskell in sel4). Unbounded integers can’t be used either because of the same issue - memory usage can grow if some computation yields a very large result, and execution time can vary as well. These issues were the main motivation for me to provide a O(1) space solution that uses bounded integers for the Fulcrum problem and proving absence of overflow.

Programming languages aren’t everything either. Another important issue is tooling, in particular proof automation. Looking at the functional solutions of Fulcrum (linked from Hillel’s blog post), they contain a lot of manual proofs. The Agda solution is very small despite this fact, though it uses a simple quadratic algorithm; I would love to see a variant that’s linear.

I believe that for formal verification to be accepted in industrial projects, most proofs must be able to be completed automatically, though some manual effort is acceptable for a small percentage of the proof obligations. The Dafny and SPARK solutions are the only ones (as far as I could see) that fare well in this regard. Dafny is well-known for its excellent proof automation via Boogie and Z3. SPARK also does very well here, all proofs being fully automatic.

]]>
PolyORB now lives on Github http://blog.adacore.com/polyorb-now-lives-on-github Wed, 18 Apr 2018 15:52:00 +0000 Thomas Quinot http://blog.adacore.com/polyorb-now-lives-on-github

PolyORB, AdaCore's versatile distribution middleware, now lives on Github. Its new home is https://github.com/AdaCore/polyorb

PolyORB is a development toolsuite and a library of runtime components that implements several distribution models, including CORBA and the Ada 95 Distributed Systems Annex. Originally developed as part of academic research at Telecom ParisTech, it became part of the GNAT Pro family in 2003.

Since then, it has been used in a number of industrial applications in a wide variety of domains such as:
 * air traffic flow management
 * enterprise document management
 * scientific data processing in particle physics experiments

AdaCore has always been committed to involving the user community in the development of PolyORB. Over the past 15 years, many contributions from industrial as well as hobbyist users have been integrated, and community releases were previously made available in conjunction with GNAT GPL.

Today we are pleased to further this community engagement and renew our commitment to an open development process by making the PolyORB repository (including full history) available on Github. This will allow users of GNAT GPL to benefit from the latest developments and contribute fixes and improvements.

We look forward to seeing your issues and pull requests on this repository!

]]>
SPARKZumo Part 2: Integrating the Arduino Build Environment Into GPS http://blog.adacore.com/sparkzumo-part-2-integrating-the-arduino-build-environment-into-gps Wed, 04 Apr 2018 04:00:00 +0000 Rob Tice http://blog.adacore.com/sparkzumo-part-2-integrating-the-arduino-build-environment-into-gps

This is part #2 of the SPARKZumo project where we go through how to actually integrate a CCG application in with other source code and how to create GPS plugins to customize features like automating builds and flashing hardware. To read more about the software design of the project visit the other blog post here.

The Build Process

At the beginning of our build process we have a few different types of source files that we need to bring together into one binary, Ada/SPARK, C++, C, and an Arduino sketch. During a typical Arduino build, the build system converts the Arduino sketch into valid C++ code, brings in any libraries (user and system) that are included in the sketch, synthesizes a main, compiles and links that all together with the Arduino runtime and selected BSP, and generates the resulting executable binary. The only step we are adding to this process is that we need to run CCG on our SPARK code to generate a C library that we can pass to the Arduino build as a valid Arduino library. The Arduino sketch then pulls the resulting library into the build via an include.  

Build Steps

From the user’s perspective, the steps necessary to build this application are as follows:

  1. Run CCG on the SPARK/Ada Code to produce C files and Ada Library Information files, or ali files. For more information on these files, see the GNAT Compilation Model documentation.
  2. Copy the resulting C files into a directory structure valid for an Arduino library


    1. We will use the lib directory in the main repo to house the generated Arduino library.
  3. Run c-gnatls on the ali files to determine which runtime files our application depends on.
  4. Copy those runtime files into the Arduino library structure. 
  5. Make sure our Arduino sketch has included the header files generated by the CCG tool.
  6. Run the arduino-builder tool with the appropriate options to tell the tool where our library lives and which board we are compiling for.


    1. The arduino-builder tool will use the .build directory in the repo to stage the build
  7. Then we can flash the result of the compilation to our target board.

That seems like a lot of work to do every time we need to make a change to our software! 

Since these steps are the same every time, we can automate this. Since we should try to make this as host agnostic as possible, meaning we would like for this to be used on Windows and Linux, we should use a scripting language which is fairly host agnostic. It would also be nice if we could integrate this workflow into GPS so that we can develop our code, prove our code, and build and flash our code without leaving our IDE. It is an Integrated Development Environment after all.

Configuration Files

The arduino-builder program is the command line version of the Arduino IDE. When you build an application with the Arduino IDE it creates a build.options.json file with the options you select from the IDE. These options include the location of any user libraries, the hardware to build for, where the toolchain lives, and where the sketch lives. We can pass the same options to the arduino-builder program or we can pass it the location of a build.options.json file. 

For this application I put a build.options.json file in the conf directory of the repository. This file should be configured properly for your build system. The best way, I have found, to get this file configured properly is to install the Arduino IDE and build one of the example applications. Then find the generated build.options.json file generated by the IDE and copy that into the conf directory of the repository. You then only need to modify:

  1. The “otherLibrariesFolders” to point to the absolute path of the lib folder in the repo.
  2. The”sketchLocation” to point at the SPARKZumo.ino file in the repo.

The other conf files in the conf directory are there to configure the flash utilities. When flashing the AVR on the Arduino Uno, the avrdude flash utility is used. This application takes the information from the flash.yaml file and the path of the avrdude.conf file to configure the flash command. Avrdude uses this to inform the flashing utility about the target hardware. The HiFive board uses openocd as its flashing utility. The openocd.cfg file has all the necessary configuration information that is passed to the openocd tool for flashing. 

The GPS Plugin

[DISCLAIMER: This guide assumes you are using version 18.1 or newer of GPS]

Under the hood, GPS, or the GNAT Programming Studio, has a combination of Ada, graphical frameworks, and Python scripting utilities. Using the Python plugin interface, it is very easy to add functionality to our GPS environment. For this application we will add some buttons and menu items to automate the process mentioned above. We will only be using a small subset of the power of the Python interface. For a complete guide to what is possible you can visit the Customizing and Extending GPS and Scripting API Reference for GPS sections of the GPS User’s Guide.

Plugin Installation Locations

Depending on your use case you can add Python plugins in a few locations to bring them into your GPS environment. There are already a handful of plugins that come with the GPS installation. You can find the list of these plugins by going to Edit->Preferences and navigating to the Plugin tab (near the bottom of the preferences window on the left sidebar). Because these plugins are included with the installation, they live under the installation directory in <installation directory>/share/gps/plug-ins. If you would like to modify you installation, you can add your plugins here and reload GPS. They will then show up in the plugin list. However, if you reinstall GPS, it will overwrite your plugin!

There is a better place to put your plugins such that they won’t disappear when you update your GPS installation. GPS adds a folder to your Home directory which includes all your user defined settings for GPS, such as your color theme, font settings, pretty printer settings, etc. This folder, by default, lives in <user’s home directory>/.gps. If you navigate to this folder you will see a plug-ins folder where you can add your custom plugins. When you update your GPS installation, this folder persists.

Depending on your application, there may be an even better place to put your plugin. For this specific application we really only want this added functionality when we have the SPARKzumo project loaded. So ideally, we want the plugin to live in the same folder as the project, and to load only when we load the project. To get this functionality, we can name our plugin <project file name>.ide.py and put it in the same directory as our project. When GPS loads the project, it will also load the plugin. For example, our project file is named zumo.gpr, so our plugin should be called zumo.ide.py. The source for the zumo.ide.py file is located here.

The Plugin Skeleton

When GPS loads our plugin it will call the initialize_project_plugin function. We should implement something like this to create our first button:

import GPS
import gps_utils
class ArduinoWorkflow:
   def __somefunction(self):
       # do stuff here
   def __init__(self):
       gps_utils.make_interactive(
               callback=self.__somefunction,
               category="Build",
               name="Example",
               toolbar='main',
               menu='/Build/Arduino/' + "Example",
               description="Example")
def initialize_project_plugin():
   ArduinoWorkflow()

This simple class will create a button and a menu item with the text Example. When we click this button or menu item it will callback to our somefunction function. Our actual plugin creates a few buttons and menu items that look like this:

Buttons in GPS created by user plug-in
Menus in GPS created by user plug-in

Task Workflows

Now that we have the ability to run some scripts by clicking buttons we are all set! But there’s a problem; when we execute a script from a button, and the script takes some time to perform some actions, GPS hangs waiting for the script to complete. We really should be executing our script asynchronously so that we can still use GPS while we are waiting for the tasks to complete. Python has a nice feature called coroutines which can allow us to run some tasks asynchronously. We can be super fancy and implement these coroutines using generators!

Or…

ProcessWrapper

GPS has already done this for us with the task_workflow interface. The task_workflow call wraps our function in a generator and will asynchronously execute parts of our script. We can modify our somefunction function now to look like this:

def __somefunction(self, task):
       task.set_progress(0, 1)
       try:
           proc = promises.ProcessWrapper(["script", "arg1", "arg2"], spawn_console="")
       except:
           self.__error_exit("Could not launch script.")
           return
       ret, output = yield proc.wait_until_terminate()
       if ret is not 0:
           self.__error_exit("Script returned an error.")
           return
       task.set_progress(1, 1)

In this function we are going to execute a script called script and pass 2 arguments to it. We wrap the call to the script in a ProcessWrapper which returns a promise. We then yield on the result. The process will run asynchronously, and the main thread will transfer control back to the main process. When the script is complete, the yield returns the stdout and exit code of the process. We can even feed some information back to the user about the progress of the background processes using the task.set_progress call. This registers the task in the task window in GPS. If we have many tasks to run, we can update the task window after each task to tell the user if we are done yet.

TargetWrapper

The ProcessWrapper interface is nice if we need to run an external script but what if we want to trigger the build or one of the gnat tools? 

Triggering CCG

Just for that, there’s another interface: TargetWrapper. To trigger the build tools, we can run something like this:

builder = promises.TargetWrapper("Build All")
retval = yield builder.wait_on_execute()
if retval is not 0:
     self.__error_exit("Failed to build all.")
     return

With this code, we are triggering the same action as the Build All button or menu item. 

Triggering GNATdoc

We can also trigger the other tools within the GNAT suite using the same technique. For example, we can run the GNATdoc tool against our project to generate the project documentation:

gnatdoc = promises.TargetWrapper("gnatdoc")
retval = yield gnatdoc.wait_on_execute(extra_args=["-P", GPS.Project.root().file().path, "-l"])
   if retval is not 0:
           self.__error_exit("Failed to generate project documentation.")
           return

Here we are calling gnatdoc with the arguments listed in extra_args. This command will generate the project documentation and put it in the directory specified by the Documentation_Dir attribute of the Documentation package in the project file. In this case, I am putting the docs in the docs folder of the repo so that my GitHub repo can serve those via a GitHub Pages website

Accessing Project Configuration

The file that drives the GNAT tools is the GNAT Project file, or the gpr file. This file has all the information necessary for GPS and CCG to process the source files and build the application. We can access all of this information from the plugin as well to inform where to find the source files, where to find the object files, and what build configuration we are using. For example, to access the list of source files for the project we can use the following Python command: GPS.Project.root().sources().

Another important piece of information that we would like to get from the project file is the current value assigned to the “board” scenario variable. This will tell us if we are building for the Arduino target or the HiFive target. This variable will change the build configuration that we pass to arduino-builder and which flash utility we call. We can access this information by using the following command: GPS.Project.root().scenario_variables(). This will return a dictionary of all scenario variables used in the project. We can then access the “board” scenario variable using the typical Python dictionary syntax GPS.Project.root().scenario_variables()[‘board’].

Determining Runtime Dependencies

Because we are using the Arduino build system to build the output of our CCG tool, we will need to include the runtime dependency files used by our CCG application in the Arduino library directory. To detect which runtime files we are using we can run the c-gnatls command against the ali files generated by the CCG tool. This will output a set of information that we can parse. The output of c-gnatls on one file looks something like this

$ c-gnatls -d -a -s obj/geo_filter.ali 
geo_filter.ads
geo_filter.adb
<CCG install direction>/libexec/gnat_ccg/lib/gcc/x86_64-pc-linux-gnu/7.3.1/adainclude/interfac.ads
<CCG install directory>/libexec/gnat_ccg/lib/gcc/x86_64-pc-linux-gnu/7.3.1/adainclude/i-c.ads
line_finder_types.ads
<CCG install directory>/libexec/gnat_ccg/lib/gcc/x86_64-pc-linux-gnu/7.3.1/adainclude/system.ads
types.ads

When we parse this output we will have to make sure we run c-gnatls against all ali files generated by CCG, we will need to strip out any files listed that are actually part of our sources already, and we will need to remove any duplicate dependencies. The c-gnatls tool also lists the Ada versions of the runtime files and not the C versions. So we need to determine the C equivalents and then copy them into our Arduino library folder. The __get_runtime_deps function is responsible for all of this work. 

Generating Lookup Tables

If you had a chance to look at the first blog post in this series, I talked about a bit about code in this application that was used to do some filtering of discrete states using a graph filter. This involved mapping some states onto some physical geometry and sectioning off areas that belonged to different states. The outcome of this was to map each point in a 2D graph to some state using a lookup table. 

To generate this lookup table I used a python library called shapely to compute the necessary geometry and map points to states. Originally, I had this as a separate utility sitting in the utils folder in the repo and would copy the output of this program into the geo_filter.ads file by hand. Eventually, I was able to bring this utility into the plugin workflow using a few interesting features of GPS.

GPS includes pip

Even though GPS has the Python env embedded in it, you can still bring in outside packages using the pip interface. The syntax for installing an external dependency looks something like:

import pip
ret = pip.main(["install"] + dependency)

Where dependency is the thing you are looking to install. In the case of this plugin, I only need the shapely library and am installing that when the GPS plugin is initialized.

Accessing Ada Entities via Libadalang

The Libadalang library is now included with GPS and can be used inside your plugin. Using the libadalang interface I was able to access the value of user defined named numbers in the Ada files. This was then passed to the shapely application to compute the necessary geometry.

ctx = lal.AnalysisContext()
unit = ctx.get_from_file(file_to_edit)
myVarNode = unit.root.findall(lambda n: n.is_a(lal.NumberDecl) and n.f_ids.text=='my_var')
value = int(myVarNode[0].f_expr.text)

This snippet creates a new Libadalang analysis context, loads the information from a file and searches for a named number declaration called ‘my_var’. The value assigned to ‘my_var’ is then stored in our variable value.

I was then able to access the location where I wanted to put the output of the shapely application using Libadalang:

array_node = unit.root.findall(lambda n: n.is_a(lal.ObjectDecl) and n.f_ids.text=='my_array')
agg_start_line = int(array_node[0].f_default_expr.sloc_range.start.line)
agg_start_col = int(array_node[0].f_default_expr.sloc_range.start.column)
agg_end_line = int(array_node[0].f_default_expr.sloc_range.end.line)
agg_end_col = int(array_node[0].f_default_expr.sloc_range.end.column)

This gave me the line and column number of the start of the array aggregate initializer for the lookup table ‘my_array’. 

Editing Files in GPS from the Plugin

Now that we have the computed lookup table, we could use the typical python file open mechanism to edit the file at the location obtained from Libadalang. But since we are already in GPS, we could just use the GPS.EditorBuffer interface to edit the file. Using the information from our shapely application and the line and column information obtained from Libadalang we can do this:

buf = GPS.EditorBuffer.get(GPS.File(file_to_edit))
agg_start_cursor = buf.at(agg_start_line, agg_start_col)
agg_end_cursor = buf.at(agg_end_line, agg_end_col)
buf.delete(agg_start_cursor, agg_end_cursor)
array_str = "(%s));" % ("(%s" % ("),\n(".join([', '.join([item for item in row]) for row in array])))
buf.insert(agg_start_cursor, array_str[agg_start_col - 1:])

First we open a buffer to the file that we want to edit. Then we create a GPS.Location for the beginning and end of the current array aggregate positions that we obtained from Libadalang. Then we remove the old information in the buffer. We then turn the array we received from our shapely application into a string and insert that into the buffer. 

We have just successfully generated some Ada code from our GPS plugin!

Writing Your Own Python Plugin

Most probably, there is already a plugin that exists in the GPS distribution that does something similar to what you want to do. For this plugin, I used the source for the plugin that enables flashing and debugging of bare-metal STM32 ARM boards. This file can be found in your GPS installation at <install directory>/share/gps/support/ui/board_support.py. You can also see this file on the GPS GitHub repository here.

In most cases, it makes sense to search through the plugins that already exist to get a starting point for your specific application, then you can fill in the blanks from there. You can view the entire source of GPS on AdaCore’s Github repository

That wraps up the overview of the build system for this application. The source for the project can be found here. Feel free to fork this project and create new and interesting things.

Happy Hacking!

]]>
A Modern Syntax for Ada http://blog.adacore.com/a-modern-syntax-for-ada Sun, 01 Apr 2018 14:00:00 +0000 Fabien Chouteau http://blog.adacore.com/a-modern-syntax-for-ada

One of the most criticized aspect of the Ada language throughout the years has been its outdated syntax. Fortunately, AdaCore decided to tackle this issue by implementing a new, modern, syntax for Ada.

The major change is the use of curly braces instead of begin/end. Also the following keywords have been shortened:

  • return becomes ret
  • function becomes fn
  • is becomes :
  • with becomes include

For instance, the following function:

with Ada.Numerics;

function Fools (X : Float) return Float is
begin
   return X * Ada.Numerics.Pi;
end;

is now written:

include Ada.Numerics;

fn Fools (X : Float) ret Float :
{
   ret X * Ada.Numerics.Pi;
};

This modern syntax is a major milestone in the adoption of Ada. John Dorab recently discovered the qualities of Ada:

I have an eye condition that prevents me from reading code without curly braces. Thanks to this new syntax, I can now benefit from the advanced type system, programming by contract, portability, functional safety, [insert more cool Ada features here] of Ada. Also, it looks like most other programming languages, so it must be better.

This new syntax is also a boost in productivity for Ada developers. Mr Fisher testifies:

I write at around 10 lines of code per day. With this new syntax I save up to 30 keystrokes. That’s at a huge increase to my productivity! The code is less readable for debugging, code reviews and maintenance in general, but I write a little bit more of it.

The standardization effort related to this new syntax is expected to start in the coming year and to last a few years. In order to allow early adopters to get their hands on Ada without having to wait for the next standard, we have created a font that allows you to display Ada code with the new syntax:

The font contains other useful ligatures, like displaying the Ada assignment operator “:=” as “=”, and the Ada equality operator “=” as “==”. It was created based on Courier New, using Glyphr Studio and cloudconvert. It’s work in progress, so feel free to extend it! It is attached below.

In the future, we also plan to go beyond a pure syntactic layer with Libadalang. For example, we could emulate the complex type promotions/conversions rules of C by inserting Unchecked_Conversion calls each time the programmer tries to convert a value to an incompatible type. If you have other ideas, please let us know in the comments below!

Attachments

]]>
Getting Rid of Rust with Ada http://blog.adacore.com/getting-rid-of-rust-with-ada Sun, 01 Apr 2018 08:00:00 +0000 Fabien Chouteau http://blog.adacore.com/getting-rid-of-rust-with-ada

There are a lot of DIY CNC projects out there (router, laser, 3D printer, egg drawing, etc.), but I never saw a DIY CNC sandblaster. So I decided to make my own.

Hardware

The CNC frame is one of those cheap kits that you can get on ebay for instance. Mine was around 200 euros, and it is actually a good value for the price. I built the kit and then replaced the electronic controller with an STM32F469 discovery board and an arduino CNC shield.

For the sandblaster itself, my father and I hacked this simple solution made from a soda bottle and pipes/fittings that you can find in any hardware store.

The sand is falling from the tank in a small tube mostly thanks to gravity. The sand tank still needs to be pressurised to avoid air coming up from the nozzle.

The sandblaster was then mounted to the CNC frame where the engraving spindle is supposed to be, and the sand tank is somewhat fixed above the machine. As you can shortly see on the video, I’m controlling the airflow manually as I didn’t have a solenoid valve to make the machine fully autonomous.

Software

On the software side I re-used my Ada Gcode controller from a previous project. I still wanted to add something to it, so this time I used a board with a touch screen to create a simple control interface.

Conclusion

This machine is actually not very practical. The 1.5 litre soda bottle holds barely enough sand to write 3 letters and the dust going everywhere will jam the machine after a few minutes of use. But this was a fun project nonetheless!

PS: Thank you dad for letting me use your workshop once again ;)

]]>
SPARKZumo Part 1: Ada and SPARK on Any Platform http://blog.adacore.com/sparkzumo-part-1-ada-and-spark-on-any-platform Wed, 28 Mar 2018 04:00:00 +0000 Rob Tice http://blog.adacore.com/sparkzumo-part-1-ada-and-spark-on-any-platform
Pololu robot with Arduino Uno Rev 3 mounted and SiFive HiFive1

So you want to use SPARK for your next microcontroller project? Great choice! All you need is an Ada 2012 ready compiler and the SPARK tools. But what happens when an Ada 2012 compiler isn’t available for your architecture?

This was the case when I started working on a mini sumo robot based on the Pololu Zumo v1.2

The chassis is complete with independent left and right motors with silicone tracks, and a suite of sensors including an array of infrared reflectance sensors, a buzzer, a 3-axis accelerometer, magnetometer, and gyroscope. The robot’s control interface uses a pin-out and footprint compatible with Arduino Uno-like microcontrollers. This is super convenient, because I can use any Arduino Uno compatible board, plug it into the robot, and be ready to go. But the Arduino Uno is an AVR, and there isn’t a readily available Ada 2012 compiler for AVR… back to the drawing board…

Or…

What if we could still write SPARK code and be able to compile it into some C code. Then use the Arduino compiler to compile and link this code in with the Arduino BSPs and runtimes? This would be ideal because I wouldn’t need to worry about writing a BSP for the board I am using, and I would only have to focus on the application layer. And I can use SPARK! Luckily, AdaCore has a solution for exactly this! 

CCG to the rescue!

The Common Code Generator, or CCG, was developed to solve the issue where an Ada compiler is not available for a specific architecture, but a C compiler is readily available. This is the case for architectures like AVR, PIC, Renesas, and specialized DSPs from companies like TI and Analog Devices. CCG can take your Ada or SPARK code, and “compile” it to a format that the manufacturer’s supplied C compiler can understand. With this technology, we now have all of the benefits of Ada or SPARK on any architecture.

Note that this is not fundamentally different from what’s already happening in a compiler today. Compilation is essentially a series of translations from one language to the other, each one being used for specific optimization or analysis phase. In the case of GNAT for example the process is as follows:

  1. The Ada code is first translated into a simplified version of Ada (called the expanded tree). 

  2. Then into the gcc tree format which is common to all gcc-supported languages.

  3. Then into a format ideal for computing optimizations called gimple. 

  4. Then into a generic assembly language called RTL. 

  5. And finally to the actual target assembler.

With CCG, C becomes one of these intermediate languages, with GNAT taking care of the initial compilation steps and a target compiler taking care of the final ones. One important consequence of this is that the C code is not intended to be maintainable or modified. CCG is not a translator from Ada or SPARK to C, it’s a compiler, or maybe half a compiler.

Ada Compilation Steps

There are some limitations to this though, that are important to know, which are today mostly due to the fact that the technology is very young and targets a subset of Ada. Looking at the limitations more closely, they resemble the limitations imposed by the SPARK language subset on a zero-footprint runtime. I would generally use the zero-footprint runtime in an environment where the BSP and runtime were supplied by a vendor or an RTOS, so this looks like a perfect time to use CCG to develop SPARK code for an Arduino supported board using the Arduino BSP and runtime support.  For a complete list of supported and unsupported constructs you can visit the CCG User’s Guide.

Another benefit I get out of this setup is that I am using the Arduino framework as a hardware abstraction layer. Because I am generating C code and pulling in Arduino library calls, theoretically, I can build my application for many processors without changing my application code. As long as the board is supported by Arduino and is pin compatible with my hardware, my application will run on it!

Abstracting the Hardware

Left to Right: SiFive HiFive1 RISC V board, Arduino Uno Rev 3

For this application I looked at targeting two different architectures, the Arduino Uno Rev 3 which has an ATmega328p on board, and a SiFive HiFive1 which has a Freedom E310 on board. These were chosen because they are pin compatible but are massively different from the software perspective. The ATmega328p is a 16 bit AVR and the Freedom E310 is a 32 bit RISC-V. The system word size isn’t even the same! The source code for the project is located here.

In order to abstract the hardware differences away, two steps had to be taken:

  1. I used a target configuration file to tell the CCG tool how to represent data sizes during the code generation. By default, CCG assumes word sizes based on the default for the host OS. To compile for the 16 bit AVR, I used the target.atp file located in the base directory to inform the tool about the layout of the hardware. The configuration file looks like this:
  2. Bits_BE                       0
    Bits_Per_Unit                 8
    Bits_Per_Word                16
    Bytes_BE                      0
    Char_Size                     8
    Double_Float_Alignment        0
    Double_Scalar_Alignment       0
    Double_Size                  32
    Float_Size                   32
    Float_Words_BE                0
    Int_Size                     16
    Long_Double_Size             32
    Long_Long_Size               64
    Long_Size                    32
    Maximum_Alignment            16
    Max_Unaligned_Field          64
    Pointer_Size                 32
    Short_Enums                   0
    Short_Size                   16
    Strict_Alignment              0
    System_Allocator_Alignment   16
    Wchar_T_Size                 16
    Words_BE                      0
    float         15  I  32  32
    double        15  I  32  32
  3. The bsp folder contains all of the differences between the two boards that were necessary to separate out. This is also where the Arduino runtime calls were pulled into the Ada code. For example, in bsp/wire.ads you can see many pragma Import calls used to bring in the Arduino I2C calls located in wire.h.

In order to tell the project which version of these files to use during the compilation, I created a scenario variable in the main project, zumo.gpr

type Board_Type is ("uno", "hifive");
Board : Board_Type := external ("board", "hifive");

Common_Sources := ("src/**", "bsp/");
Target_Sources := "";
case Board is
   when "uno" =>
      Target_Sources := "bsp/atmega328p";
   when "hifive" =>
      Target_Sources := "bsp/freedom_e310-G000";
end case;

for Source_Dirs use Common_Sources & Target_Sources;

Software Design

Interaction with Arduino Sketch

A typical Arduino application exposes two functions to the developer through the sketch file: setup and loop. The developer would fill in the setup function with all of the code that should be run once at start-up, and then populates the loop function with the actual application programming. During the Arduino compilation, these two functions get pre-processed and wrapped into a main generated by the Arduino runtime. More information about the Arduino build process can be found here.

Because we are using the Arduino runtime we cannot have the actual main entry point for the application in the Ada code (the Arduino pre-processor generates this for us). Instead, we have an Arduino sketch file called SPARKZumo.ino which has the typical Arduino setup() and loop() functions. From setup() we need to initialize the Ada environment by calling the function generated by the Ada binder, sparkzumoinit(). Then, we can call whatever setup sequence we want.

CCG maps Ada package and subprogram namespacing into C-like namespacing, so package.subprogram in Ada would become package__subprogram() in C. The setup function we are calling in the sketch is sparkzumo.setup in Ada, which becomes sparkzumo__setup() after CCG generates the files. The loop function we are calling in the sketch is sparkzumo.workloop in Ada, which becomes sparkzumo__workloop().

Handling Exceptions

Even though we are generating C code from Ada, the CCG tool can still expand the Ada code to include many of the compiler generated checks associated with Ada code before generating the C code. This is very cool because we still have much of the power of the Ada language even though we are compiling to C.

If any of these checks fail at runtime, the __gnat_last_chance_handler is called. The CCG system supplies a definition for what this function should look like, but leaves the implementation up to the developer. For this application, I put the handler implementation in the sketch file, but am calling back into the Ada code from the sketch to perform more actions (like blink LEDs and shut down the motors). If there is a range check failure, or a buffer overflow, or something similar, my __gnat_last_chance_handler will dump some information to the serial port then call back into the Ada code to  shut down the motors, and flash an LED on an infinite loop. We should never need this mechanism because since we are using SPARK in this application, we should be able to prove that none of these will ever occur!

Standard.h file

The minimal runtime that does come with the CCG tool can be found in the installation directory under the adalib folder. Here you will find the C versions of the Ada runtimes files that you would typically find in the adainclude directory.

The important file to know about here is the standard.h file. This is the main C header file that will allow you to map Ada to C constructs. For instance, this header file defines the fatptr construct used under Ada arrays and strings, and other integral types like Natural, Positive, and Boolean.

You can and should modify this file to fit within your build environment. For my application, I have included the Arduino.h at the top to bring in the Arduino type system and constructs. Because the Arduino framework defines things like Booleans, I have commented out the versions defined in the standard.h file so that I am consistent with the rest of the Arduino runtime. You can find the edited version of the standard.h file for this project in the src directory.

Drivers

For the application to interact with all of the sensors available on the robot, we need a layer between the runtime and BSP, and the algorithms. The src/drivers directory contains all of the code necessary to communicate with the sensors and motors. Most of the initial source code for this section was a direct port from the zumo-shield library that was originally written in C++. After porting to Ada, the code was modified to be more robust by refactoring and adding SPARK contracts.

Algorithms

Even though this is a sumo robot, I decided to start with a line follower algorithm for the proof of concept. The source code for the line follower algorithm can be found in src/algos/line_finder. The algorithm was originally a direct port of the Line Follow example in the zumo-shield examples repo.

The C++ version of this algorithm worked ok but wasn’t really able to handle occasions where the line was lost, or the robot came to a fork, or an intersection. After refactoring and adding SPARK features, I added a detection lookup so that the robot could determine what type of environment the sensors were looking at. The choices are: Lost (meaning no line is found), Online (meaning there’s a single line), Fork (two lines diverge), BranchLeft (left turn), BranchRight (right turn), Perpendicular intersection (make a decision to go left or right), or Unknown (no clue what to do, let’s keep doing what we were doing and see what happens next). After detecting a change in state, the robot would make a decision like turn left, or turn right to follow a new line. If the robot was in a Lost state, it would go into a “re-finding” algorithm where it would start to do progressively larger circles.

This algorithm worked ok as well, but was a little strange. Occasionally, the robot would decide to change direction in the middle of a line, or start to take a branch and turn back the other way. The reason for this was that the robot was detecting spurious changes in state and reacting to them instantaneously. We can call this state noise. In order to minimize this state noise, I added a state low-pass filter using a geometric graph filter. 

The Geometric Graph Filter

Example plot of geometric graph filter

If you ask a mathematician they will probably tell you there’s a better way to filter discrete states than this, but this method worked for me! Lets picture mapping 6 points corresponding to the 6 detection states onto a 2d graph, spacing them out evenly along the perimeter of a square. Now, let’s say we have a moving window average with X positions. Each time we get a state reading from the sensors we look up the corresponding coordinate for that state in the graph and add the coordinate to the window. For instance, if we detect a Online state our corresponding coordinate is (15, 15). If we detect a Perpendicular state our coordinate is (-15, 0). And so on. If we average over the window we will end up with a coordinate somewhere in the inside of the square. If we then section off the area of the square into sections, and assign each section to map to the corresponding state, we will then find that our average is sitting in one of those sections that maps to one of our states. 

For an example, let’s assume our window is 5 states wide and we have detected the following list of states (BranchLeft, BranchLeft, Online, BranchLeft, Lost). If we map these to coordinates we get the following window: ((-15, 15), (-15, 15), (15, 15), (-15, 15), (-15, -15)). When we average these coordinates in the window we get a point with the coordinates (-9, 9). If we look at our lookup table we can see that this coordinate is in the BranchLeft polygon.

One issue that comes up here is that when the average point moves closer to the center of the graph, there’s high state entropy, meaning our state can change more rapidly and noise has a higher effect. To solve this, we can hold on to the previous calculated state, and if the new calculated state is somewhere in the center of the graph, we throw away the new calculation and pass along the previous calculation. We don’t purge the average window though so that if we get enough of one state, the average point can eventually migrate out to that section of the graph.  

To avoid having to calculate this geometry every time we get a new state, I generated a lookup table which maps every point in the polygon to a state. All we have to do is calculate the average in the window and do the lookup at runtime. There are some python scripts that are used to generate most of the src/algos/line_finder/geo_filter.ads file. This script also generates a visual of the graph. For more information on these scripts, see part #2 [COMING SOON!!] of this blog post! One issue that I ran into was that I had to use a very small graph which decreased my ability to filter. This is because the amount of RAM I had available on the Arduino Uno was very small. The larger the graph, the larger the lookup table, the more RAM I needed. 

There are a few modifications to this technique that could be done to make it more accurate and more fair. Using a square and only 2 dimensions to map all the states means that the distance between any two states is different than the distance between any other 2 states. For example, it’s easier to switch between BranchLeft and Online than it is to switch between BranchLeft and Fork. For the proof of concept this technique worked well though. 

Future Activity

The code still needs a bit of work to get the IMU sensors up and going. We have another project called the Certyflie which has all of the gimbal calculations to synthesize roll, pitch, and yaw data from an IMU. The Arduino Uno is a bit too weak to perform these calculations properly. One issue is that there is no floating point unit on the AVR. The RISC-V has an FPU and is much more powerful. One option is to add a bluetooth transceiver to the robot and send the IMU data back to a terminal on a laptop for synthesization.

Another issue that came up during this development is that the HiFive board uses level shifters on all of the GPIO lines. The level shifters use internal pull-ups which means that the processor cannot read the reflectance sensors. The reflectance sensor is actually just a capacitor that is discharged when light hits the substrate. So to read the sensor we need to pull the GPIO line high to charge the capacitor then pull it low and read the amount of time it takes to discharge. This will tell us how much light is hitting the sensor. Since the HiFive has the pull ups on the GPIO lines, we can’t pull the line low to read the sensor. Instead we are always charging the sensor. More information about this process can be found on the IR sensor manufacturer’s website under How It Works.

There will be a second post coming soon, which will describe how to actually build this crazy project. There I detail the development of the GPS plugin that I used to build everything and flash the board. As always, the code for the entire project is available here: https://github.com/Robert-Tice/SPARKZumo

Happy Hacking!

]]>
Two Days Dedicated to Sound Static Analysis for Security http://blog.adacore.com/sound-static-analysis-for-security Wed, 14 Mar 2018 15:37:00 +0000 Yannick Moy http://blog.adacore.com/sound-static-analysis-for-security

AdaCore has been working with CEA, Inria and NIST to organize a two-days event dedicated to sound static analysis techniques and tools, and how they are used to increase the security of software-based systems. The program gathers top-notch experts in the field, from industry, government agencies and research institutes, around the three themes of analysis of legacy code, use in new developments and accountable software quality.

The theme "analysis of legacy code" is meant to all those who have to maintain an aging codebase while facing new security threats from the environment. This theme will be introduced by David A. Wheeler whose contributions to security and open source are well known. From the many articles I like from him, I recommend his in-depth analysis of Heartbleed and the State-of-the-Art Resources (SOAR) for Software Vulnerability Detection, Test, and Evaluation, a government official report detailing the tools and techniques for building secure software. David is leading the CII Best Practiced Badge Program to increase security of open source software. The presentations in this theme will touch on analysis of binaries, analysis of C code, analysis of Linux kernel code and analysis of nuclear control systems.

The theme "use in new developments" is meant to all those who start new projects with security requirements. This theme will be introduced by K. Rustan M. Leino, an emblematic researcher in program verification, who has inspired many of the profound changes in the field from his work on ESC/Modula-3 with Greg Nelson, to his work on a comprehensive formal verification environment around the Dafny language, with many others in between: ESC/Java, Spec#, Boogie, Chalice, etc. The presentations in this theme will touch on securing mobile platforms and our critical infrastructure, as well as describing techniques for verifying floating-point programs and more complex requirements.

The theme "accountable software quality" is meant to all those who need to justify the security of their software, either because they have a regulatory oversight or because of commercial/corporate obligations. This theme will be introduced by David Cok, former VP of Technology and Research at GrammaTech, who is well-known for his work on formal verification tools for Java: ESC/Java, ESC/Java2, now OpenJML. The presentations in this theme will touch on what soundness means for static analysis and the demonstrable benefits it brings, the processes around the use of sound static analysis (including the integration between test and proof results), and the various levels of assurance that can be reached.

The event will take place at the National Institute of Standards and Technologies (NIST) at the invitation of researcher Paul Black. Paul co-authored a noticed report last year on Dramatically Reducing Software Vulnerabilities which highlighted sound static analysis as a promising venue. He will introduce the two days of conference with his perspective on the issue.

The workshop will end with tutorials on Frama-C & SPARK given by the technology developers (CEA and AdaCore), so that attendees can have first-hand experience with using the tools. There will be also vendor displays to discuss with techno providers. All in all, a very unique event to attend, especially when you know that, thanks to our sponsors, participation is free! But registration is compulsory. To see the full program and register for the event, see the webpage of the event.

]]>
Secure Software Architectures Based on Genode + SPARK http://blog.adacore.com/secure-software-architectures-based-on-genode-spark Mon, 05 Mar 2018 13:19:00 +0000 Yannick Moy http://blog.adacore.com/secure-software-architectures-based-on-genode-spark

SPARK user Alexander Senier recently presented their use of SPARK for building secure mobile architecture at BOB Konferenz in Germany. What's nice is that they build on the guarantees that SPARK provides at software level, using them to create a secure software architecture based on the Genode operating system framework. At 19:07 in the video he presents 3 interesting architectural designs (policy objects, trusted wrappers, and transient components) that make it possible to build a trustworthy system out of untrustworthy building blocks (like a Web browser or a network stack). Almost as exciting as Alchemy's goal of transforming lead into gold!

Their solution is to design architectures where untrusted components must communicate through trusted ones. They use Genode to enforce the rule that no other communications are allowed and SPARK to make sure that trusted components can really be trusted. You can see an example of an application they build with these technologies at Componolit at 33:37 in the video: a baseband firewall, to protect the Android platform on a mobile device (e.g., your phone) from attacks that get through the baseband processor, which manages radio communications on your mobile.

As the title of the talk says, for security of connected devices in the modern world, we are at a time "when one beyond-mainstream technology is not enough". For more info on what they do, see Componolit website.

]]>
Ada on the micro:bit http://blog.adacore.com/ada-on-the-microbit Mon, 26 Feb 2018 13:26:00 +0000 Fabien Chouteau http://blog.adacore.com/ada-on-the-microbit

Updated July 2018

The micro:bit is a very small ARM Cortex-M0 board designed by the BBC for computer education. It's fitted with a Nordic nRF51 Bluetooth enabled 32bit ARM microcontroller. At $15 it is one of the cheapest yet most fun piece of kit to start embedded programming.

Since the initial release of this blog post we have improved the support of Ada and SPARK on the BBC micro:bit.

In GNAT Community Edition 2018, the micro:bit is now directly supported on Linux, Windows and MacOS. This means that the procedure to use the board is greatly simplified: 

  • Download and install GNAT arm-elf hosted on your platform: Windows, Linux or MacOS. This package contains the ARM cross compiler as well the required Ada run-times

  • Download and install GNAT native for your platform: Windows, Linux or MacOS. This package contains the GNAT Programming Studio IDE and an example to run on the micro:bit

  • Start GNAT Programming Studio

  • Click on “Create a new Project”

  • Select the “Scrolling Text” project under “BBC micro:bit” and click Next

  • Enter the directory where you want the project to be deployed and click Apply

  • On Linux only:  you might need privileges to access the USB ports without which the flash program will say "No connected boards". To do this on Ubuntu, you can do it by creating (as administrator) the file /etc/udev/rules.d/mbed.rules and add the line: SUBSYSTEM=="usb", ATTR{idVendor}=="0d28", ATTR{idProduct}=="0204", MODE:="666" then restarting the service by doing $ sudo udevadm trigger
  • Plug your micro:bit board with a USB cable, and wait for the system to recognize it. This can take a few seconds

  • Back in GNAT Programming Studio, click on the “flash to board” icon

  • That’s it!

We also improved the micro:bit support and documentation in the Ada Drivers Library project. Follow this link for documented examples of the various features available on the board (text scrolling, buttons, digital in/out, analog in/out, music).

Conclusion

That’s it, your first Ada program on the Micro:Bit! If you have an issue with this procedure, please tell us in the comments section below.

In the meantime, here is an example of the kind of project that you can do with Ada on the Micro:Bit

]]>
Tokeneer Fully Verified with SPARK 2014 http://blog.adacore.com/tokeneer-fully-verified-with-spark-2014 Fri, 23 Feb 2018 09:49:00 +0000 Yannick Moy http://blog.adacore.com/tokeneer-fully-verified-with-spark-2014

Tokeneer is a software for controlling physical access to a secure enclave by means of a fingerprint sensor. This software was created by Altran (Praxis at the time) in 2003 using the previous generation of SPARK language and tools, as part of a project commissioned by the NSA to investigate the rigorous development of critical software using formal methods.

The project artefacts, including the source code, were released as open source in 2008. Tokeneer was widely recognized as a milestone in industrial formal verification. Original project artefacts, including the original source code in SPARK 2005, are available here.

We recently transitioned this software to SPARK 2014, and it allowed us to go beyond what was possible with the previous SPARK technology. The initial transition by Altran and AdaCore took place in 2013-2014, when we translated all the contracts from SPARK 2005 syntax (stylized comments in the code) to SPARK 2014 syntax (aspects in the code). But at the time we did not invest the time to fully prove the resulting translated code. This is what we have now completed. The resulting code is available on GitHub. It will also be available in future SPARK releases as one of the distributed examples.

What we did

With a few changes, we went from 234 unproved checks on Tokeneer code (the version originally translated to SPARK 2014), down to 39 unproved but justified checks. The justification is important here: there are limitations to GNATprove analysis, so it is expected that users must sometimes step in and take responsibility for unproved checks.

Using predicates to express constraints

Most of the 39 justifications in Tokeneer code are for string concatenations that involve attribute 'Image. GNATprove currently does not know that S'Image(X), for a scalar type S and a variable X of this type, returns a rather small string (as specified in Ada RM), so it issues a possible range check message when concatenating such an image with any other string. We chose to isolate such calls to 'Image in dedicated functions, with suitable predicates on their return type to convey the information about the small string result. Take for example enumeration type ElementT in audittypes.ads. We define a function ElementT_Image which returns a small string starting at 1 and with length less than 20 as follows:

   function ElementT_Image (X : ElementT) return CommonTypes.StringF1L20 is
      (ElementT'Image (X));
   pragma Annotate (GNATprove, False_Positive,
                    "range check might fail",
                    "Image of enums of type ElementT are short strings starting at index 1");
   pragma Annotate (GNATprove, False_Positive,
                    "predicate check might fail",
                    "Image of enums of type ElementT are short strings starting at index 1");

Note the use of pragma Annotate to justify the range check message and the predicate check message that are generated by GNATprove otherwise. Type StringF1L20 is defined as a subtype of the standard String type with additional constraints expressed as predicates. In fact, we create an intermediate subtype StringF1 of strings that start at index 1 and which are not "super flat", i.e. their last index is at least 0. StringF1L20 inherits from the predicate of StringF1 and adds the constraint that the length of the string is no more than 20:

   subtype StringF1 is String with
     Predicate => StringF1'First = 1 and StringF1'Last >= 0;
   subtype StringF1L20 is StringF1 with
     Predicate => StringF1L20'Last <= 20;

Moving query functions to the spec

Another crucial change was to give visibility to client code over query functions used in contracts. Take for example the API in admin.ads. It defines the behavior of the administrator through subprograms whose contracts use query functions RolePresent, IsPresent and IsDoingOp:

   procedure Logout (TheAdmin :    out T)
     with Global => null,
          Post   => not IsPresent (TheAdmin)
                      and not IsDoingOp (TheAdmin);

The issue was that these query functions, while conveniently abstracting away the details of what it means for the administrator to be present, or to be doing an operation, were defined in the body of package Admin, inside file admin.adb. As a result, the proof of client code of Admin had to consider these calls as blackboxes, which resulted in many unprovable checks. The fix here consisted in moving the definition for the query functions inside the private part of the spec file admin.ads: this way, client code still does not see their implementation, but GNATprove can use these expression functions in proving client code.

   function RolePresent (TheAdmin : T) return PrivTypes.PrivilegeT is
     (TheAdmin.RolePresent);

   function IsPresent (TheAdmin : T) return Boolean is
     (TheAdmin.RolePresent in PrivTypes.AdminPrivilegeT);

   function IsDoingOp (TheAdmin : T) return Boolean is
      (TheAdmin.CurrentOp in OpT);

Using type invariants to enforce global invariants

Some global properties on the version in SPARK 2005 were justified manually, like the global invariant maintained in package Auditlog over the global variables encoding the state of the files used to log operations: CurrentLogFile, NumberLogEntries, UsedLogFiles, LogFileEntries. Here is the text for this justification:

-- Proof Review file for 
--    procedure AuditLog.AddElementToLog

-- VC 6
-- C1:    fld_numberlogentries(state) = (fld_length(fld_usedlogfiles(state)) - 1) 
--           * 1024 + element(fld_logfileentries(state), [fld_currentlogfile(state)
--           ]) .
-- C1 is a package state invariant.
-- proof shows that all public routines that modify NumberLogEntries, UsedLogFiles.Length,
-- CurrentLogFile or LogFileEntries(CurrentLogFile) maintain this invariant.
-- This invariant has not been propogated to the specification since it would unecessarily 
-- complicate proof of compenents that use the facilities from this package.

We can do better in SPARK 2014, by expressing this property as a type invariant. This requires all four variables to become components of the same record type, so that a single global variable LogFiles replaces them:

   type LogFileStateT is record
      CurrentLogFile   : LogFileIndexT  := 1;
      NumberLogEntries : LogEntryCountT := 0;
      UsedLogFiles     : LogFileListT   :=
        LogFileListT'(List   => (others => 1),
                      Head   => 1,
                      LastI  => 1,
                      Length => 1);
      LogFileEntries   : LogFileEntryT  := (others => 0);
   end record
     with Type_Invariant =>
       Valid_NumberLogEntries
         (CurrentLogFile, NumberLogEntries, UsedLogFiles, LogFileEntries);

   LogFiles         : LogFilesT := LogFilesT'(others => File.NullFile)
     with Part_Of => FileState;

With this change, all public subprograms updating the state of log files can now assume the invariant holds on entry (it is checked by GNATprove on every call) and must restore it on exit (it is checked by GNATprove when returning from the subprogram). Locally defined subprograms need not obey this constraint however, which is exactly what is needed here. One subtlety is that some of these local subprograms where accessing the state of log files as global variables. If we had kept LogFiles as a global variable, SPARK rules would have required that its invariant is checked on entry and exit from this subprograms. Instead, we changed the signature of these local subprograms to take LogFiles as an additional parameter, on which the invariant needs not hold.

Other transformations on contracts

A few other transformations were needed to make contracts provable with SPARK 2014. In particular, it was necessary to change a number of "and" logical operations into their short-circuit version "and then". See for example this part of the precondition of Processing in tismain.adb:

       (if (Admin.IsDoingOp(TheAdmin) and
              Admin.TheCurrentOp(TheAdmin) = Admin.OverrideLock)
        then
           Admin.RolePresent(TheAdmin) = PrivTypes.Guard)

The issue was that calling TheCurrentOp requires that IsDoingOp holds:

   function TheCurrentOp (TheAdmin : T) return OpT
     with Global => null,
          Pre    => IsDoingOp (TheAdmin);

Since "and" logical operation evaluates both its operands, TheCurrentOp will also be called in contexts where IsDoingOp does not hold, thus leading to a precondition failure. The fix is simply to use the short-circuit equivalent:

       (if (Admin.IsDoingOp(TheAdmin) and then
              Admin.TheCurrentOp(TheAdmin) = Admin.OverrideLock)
        then
           Admin.RolePresent(TheAdmin) = PrivTypes.Guard)

We also added a few loop invariants that were missing.

What about security?

You can read the original Tokeneer report for a description of the security properties that were provably enforced through formal verification.

To demonstrate that indeed formal verification brings assurance that some security vulnerabilities are not present, we have seeded four vulnerabilities in the code, and reanalyzed it. The analysis of GNATprove (either through flow analysis or proof) detected all four: an information leak, a back door, a buffer overflow and an implementation flaw. You can see that in action in this short 4-minutes video.

]]>
The Road to a Thick OpenGL Binding for Ada: Part 2 http://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada-part-2 Thu, 22 Feb 2018 06:00:00 +0000 Felix Krause http://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada-part-2

This blog post is part two of a tutorial based on the OpenGLAda project and will cover implementation details such as a type system for interfacing with C, error handling, memory management, and loading functions.

If you haven't read part one I encourage you to do so. It can be found here

Wrapping Types

As part of the binding process we noted in the previous blog post that we will need to translate typedefs within the OpenGL C headers into Ada types so that our description of C functions that take arguments or return a value are accurate. Let’s begin with the basic numeric types:

with Interfaces.C;

package GL.Types is
   type Int   is new Interfaces.C.int;      --  GLint
   type UInt  is new Interfaces.C.unsigned; --  GLuint

   subtype Size is Int range 0 .. Int'Last; --  GLsizei

   type Single is new Interfaces.C.C_float; --  GLfloat
   type Double is new Interfaces.C.double;  --  GLdouble
end GL.Types;

We use Single as a name for the single-precision floating point type to avoid confusion with Ada's Standard.Float. Moreover, we can apply Ada’s powerful numerical typing system in our definition of GLsize by defining it with a non-negative range. This affords us some extra compile-time and run-time checks without having to add any conditionals – something not possible in C.

The type list above is, of course, shortened for this post, however, two important types are explicitly declared elsewhere:

  • GLenum, which is used for parameters that take a well-defined set of values specified within the #define directive in the OpenGL header. Since we want to make the Ada interface safe we will use real enumeration types for that.
  • GLboolean, which is an unsigned char representing a boolean value. We do not want to have a custom boolean type in the Ada API because it will not add any value compared to using Ada's Boolean type (unlike e.g. the Int type, which may have a different range than Ada's Integer type).

For these types, we define another package called GL.Low_Level:

with Interfaces.C;

package GL.Low_Level is
   type Bool is new Boolean;

   subtype Enum is Interfaces.C.unsigned;
private
   for Bool use (False => 0, True => 1);
   for Bool'Size use Interfaces.C.unsigned_char'Size;
end GL.Low_Level;

We now have a Bool type that we can use for API imports and an Enum type that we will solely use to define the size of our enumeration types. Note that Bool also is an enumeration type, but uses the size of unsigned_char because that is what OpenGL defines for GLboolean.

To show how we can wrap GLenum into actual Ada enumeration types, lets examine glGetError which is defined like this in the C header:

GLenum glGetError(void);

The return value is one of several error codes defined as preprocessor macros in the header. We translate these into an Ada enumeration then wrap the subprogram resulting in the following:

package GL.Errors is
   type Error_Code is
     (No_Error, Invalid_Enum, Invalid_Value, Invalid_Operation,
      Stack_Overflow, Stack_Underflow, Out_Of_Memory,
      Invalid_Framebuffer_Operation);

   function Error_Flag return Error_Code;
private
   for Error_Code use
     (No_Error                      => 0,
      Invalid_Enum                  => 16#0500#,
      Invalid_Value                 => 16#0501#,
      Invalid_Operation             => 16#0502#,
      Stack_Overflow                => 16#0503#,
      Stack_Underflow               => 16#0504#,
      Out_Of_Memory                 => 16#0505#,
      Invalid_Framebuffer_Operation => 16#0506#);
   for Error_Code'Size use Low_Level.Enum'Size;
end GL.Errors;

With the above code we encode the errors defined in the C header as representations for our enumeration values - this way, our safe enumeration type has the exact same memory layout as the defined error codes and maintains compatibility.

We then add the backend for Error_Flag as import to GL.API:

function Get_Error return Errors.Error_Code;
pragma Import (StdCall, Get_Error, "glGetError");

Error Handling

The OpenGL specification states that whenever an error arises while calling a function of the API, an internal error flag gets set. This flag can then be retrieved with the function glGetError we wrapped above.

It would certainly be nicer, though, if these API calls would raise Ada exceptions instead, but this would mean that in every wrapper to an OpenGL function that may set the error flag we'd need to call Get_Error, and, when the returned flag is something other than No_Error, we'd subsequently need to raise the appropriate exception. Depending on what the user does with the API, this may lead to significant overhead (let us not forget that OpenGL is much more performance-critical than it is safety-critical). In fact, more recent graphics API’s like Vulkan have debugging extensions which require manual tuning to receive error messages - in other words, due to overhead, Vulkan turns off all error checking by default.

So, what we will provide is a feature that auto-raises exceptions whenever the error flag is set, but make it optional. To achieve this, Ada exceptions derived from OpenGL’s error flags need to be defined.

Let’s add the following exception definitions to GL.Errors:

Invalid_Operation_Error             : exception;
Out_Of_Memory_Error                 : exception;
Invalid_Value_Error                 : exception;
Stack_Overflow_Error                : exception;
Stack_Underflow_Error               : exception;
Invalid_Framebuffer_Operation_Error : exception;
Internal_Error                      : exception;

Notice that the exceptions carry the same names as the corresponding enumeration values in the same package. This is not a problem because Ada is intelligent enough to know which one of the two we want depending on context. Also notice the exception Internal_Error which does not correspond to any OpenGL error – we'll see later what we need it for.

Next, we need a procedure that queries the error flag and possibly raises the appropriate exception. Since we will be using such a procedure almost everywhere in our wrapper let’s declare it in the private part of the GL package so that all of GL's child packages have access:

procedure Raise_Exception_On_OpenGL_Error;

And in the body:

procedure Raise_Exception_On_OpenGL_Error is separate;

Here, we tell Ada that this procedure is defined in a separate compilation unit enabling us to provide different implementations depending on whether the user wants automatic exception raising to be enabled or not. Before we continue though let’s set up our project with this in mind:

library project OpenGL is
   --  Windowing_System config omitted

   type Toggle_Type is ("enabled", "disabled");
   Auto_Exceptions : Toggle_Type := external ("Auto_Exceptions", "enabled");

   OpenGL_Sources := ("src");
   case Auto_Exceptions is
      when "enabled" =>
         OpenGL_Sources := OpenGL_Sources & "src/auto_exceptions";
      when "disabled" =>
         OpenGL_Sources := OpenGL_Sources & "src/no_auto_exceptions";
   end case;
   for Source_Dirs use OpenGL_Sources;

   --  packages and other things omitted
end OpenGL;

To conform with the modifications made to the project file we must now create two new directories inside the src folder and place the implementations of our procedure accordingly. GNAT expects the source files to both be named gl-raise_exception_on_openl_error.adb. The implementation of no_auto_exceptions is trivial:

separate (GL)
procedure Raise_Exception_On_OpenGL_Error is
begin
   null;
end Raise_Exception_On_OpenGL_Error;

And the one in auto_exceptions looks like this:

with GL.Errors;

separate (GL)
procedure Raise_Exception_On_OpenGL_Error is
begin
   case Errors.Error_Flag is
      when Errors.Invalid_Operation             => raise Errors.Invalid_Operation_Error;
      when Errors.Invalid_Value                 => raise Errors.Invalid_Value_Error;
      when Errors.Invalid_Framebuffer_Operation => raise Errors.Invalid_Framebuffer_Operation_Error;
      when Errors.Out_Of_Memory                 => raise Errors.Out_Of_Memory_Error;
      when Errors.Stack_Overflow                => raise Errors.Stack_Overflow_Error;
      when Errors.Stack_Underflow               => raise Errors.Stack_Underflow_Error;
      when Errors.Invalid_Enum                  => raise Errors.Internal_Error;
      when Errors.No_Error                      => null;
   end case;
exception
   when Constraint_Error => raise Errors.Internal_Error;
end Raise_Exception_On_OpenGL_Error;

The exception section at the end is used to detect cases where glGetError returns a value we did not know of at the time of implementing this wrapper. Ada would then try to map this value to the Error_Code enumeration, and since the value does not correspond to any value specified in the type definition, the program will raise a Constraint_Error. Of course, OpenGL is very conservative about adding error flags, so this is unlikely to happen, but it is still nice to plan for the future.

Types Fetching Function Pointers at Runtime

Part 1: Implementing the "Fetching" Function

As previously noted, many functions from the OpenGL API must be retrieved as a function pointer at run-time instead of linking to them at compile-time. The reason for this once again comes down to the concept of graceful degradation -- if some functionality exists as an extension (especially functions not part of the OpenGL core) but is unimplemented by a target graphics card driver then the programmer will be able to identify or recognize this case when setting the relevant function pointers during execution. Unfortunately though, this creates an extra step which prevents us from simply importing the whole of the API, and, worse still, on Windows no functions being defined on OpenGL 2.0 or later are available for compile time linking, making programmatic queries required.

So then, the question arises: how will these function pointers are to be retrieved? Sadly, this functionality is not available from within the OpenGL API or driver, but instead is provided by platform-specific extensions, or more specifically, the windowing system supporting OpenGL. So, as with exception handling, we will use a procedure with multiple implementations and switch to the appropriate implementation via GPRBuild:

case Windowing_System is
   when "windows" => OpenGL_Sources := OpenGL_Sources & "src/windows";
   when "x11"     => OpenGL_Sources := OpenGL_Sources & "src/x11";
   when "quartz"  => OpenGL_Sources := OpenGL_Sources & "src/mac";
end case;

...and we declare this function in the main source:

function GL.API.Subprogram_Reference (Function_Name : String)
  return System.Address;

Then finally, in the windowing-system specific folders, we place the implementation and necessary imports from the windowing system's API. Those imports and the subsequent implementations are not very interesting, so I will not discuss them at length here, but I will show you the implementation for Apple's Mac operating system to give you an idea:

with GL.API.Mac_OS_X;

function GL.API.Subprogram_Reference (Function_Name : String)
  return System.Address is

   -- OSX-specific implementation uses CoreFoundation functions
   use GL.API.Mac_OS_X;

   package IFC renames Interfaces.C.Strings;

   GL_Function_Name_C : IFC.chars_ptr := IFC.New_String (Function_Name);

   Symbol_Name : constant CFStringRef :=
     CFStringCreateWithCString
       (alloc    => System.Null_Address,
       cStr     => GL_Function_Name_C,
       encoding => kCFStringEncodingASCII);

   Result : constant System.Address :=
     CFBundleGetFunctionPointerForName
       (bundle      => OpenGLFramework,
       functionName => Symbol_Name);
begin
   CFRelease (Symbol_Name);
   IFC.Free (GL_Function_Name_C);
   return Result;
end GL.API.Subprogram_Reference;

With the above code in effect, we are now able to retrieve the function pointers, however, we still need to implement the querying machinery to which there are three possible approaches:

  • Lazy: When a feature is first needed, its corresponding function pointer is loaded and stored for future use. This approach to loading may produce the least amount of work needed to be done by the resulting application, although, theoretically, it makes performance of a call unpredictable. Since fetching function pointers is fairly trivial operation, however, this is not really a necessarily practical reason against this.
  • Eager: At some defined point in time, a call gets issued to a loading function for every function pointer that is supported by OpenGLAda. The Eager approach produces the largest amount of work for the resulting application, but again, since loading is trivial it does not noticeably slow down the application (and, even if it did, it would so during initialization where it is most tolerable).
  • Explicit: The user is required to specify which features they want to use and we only load the function pointers related to such features. Explicit loading places the heaviest burden on the user, since they must state which features they will be using.

Overall, the consequences of choosing one of these three possibilities are mild, so we will go with the one easiest to implement, which is the eager approach and is the same one used by many other popular OpenGL libraries.

Part 2: Autogenerating the Fetching Implementation

For each OpenGL function we import that must be loaded at runtime we need to create three things:

  • The definition of an access type describing the function's parameters and return types.
  • A global variable having this type to hold the function pointer as soon as it gets loaded.
  • A call to a platform-specific function which will return the appropriate function pointer from a DLL or library for storage into our global function pointer.

Implementing these segments for each subprogram is a very repetitive task, which hints to the possibility of automating it. To check whether this is feasible, let’s go over the actual information we need to write in each of these code segments for an imported OpenGL function:

  • The Ada subprogram signature
  • The name of the C function we import

As you can see, this is almost exactly the same information we would need to write an imported subprogram loaded at compile time! To keep all information about imported OpenGL function centralized, let’s craft a simple specification format where we may list all this information for each subprogram.

Since we need to define Ada subprogram signatures, it seems a good idea to use Ada-like syntax (like GPRBuild does for its project files). After writing a small parser (I will not show details here since that is outside the scope of this post), we can now process a specification file looking like the following. We will discuss the package GL.Objects.Shaders and more about what it does in a bit.

with GL.Errors;
with GL.Types;
with GL.Objects.Shaders;

spec GL.API is
   use GL.Types;

   function Get_Error return Errors.Error_Code with Implicit => "glGetError";
   procedure Flush with Implicit => "glFlush";

   function Create_Shader
     (Shader_Type : Objects.Shaders.Shader_Type) return UInt
   with
     Explicit => "glCreateShader";
end GL.API;

This specification contains two imports we have already created manually and one new import – in this case we use Create_Shader as an example for a subprogram that needs to be loaded via function pointer. We use Ada 2012-like syntax for specifying the target link name with aspects and the import mode. There are two import modes:

  • Implicit - meaning that the subprogram will be imported via pragmas. This will give us a subprogram declaration that will be bound to its implementation by the dynamic library loader. So it happens implicitly and we do not actually need to write any code for it. This is what we previously did in our import of glFlush in part one.
  • Explicit - meaning that the subprogram will be provided as a function pointer variable. We will need to generate code that assigns a proper value to that variable at runtime in this case.

Processing this specification will generate us the following Ada subunits:

with GL.Errors;
with GL.Types;

private package GL.API is
   use GL.Types;

   type T1 is access function (P1 : Objects.Shaders.Shader_Type) return UInt;
   pragma Convention (StdCall, T1);

   function Get_Error return Errors.Error_Code;
   pragma Import (StdCall, Get_Error, "glGetError");

   procedure Flush;
   pragma Import (StdCall, Flush, "glFlush");

   Create_Shader : T1;
end package GL.API;

--  ---------------

with System;
with Ada.Unchecked_Conversion;
private with GL.API.Subprogram_Reference;
procedure GL.Load_Function_Pointers is
   use GL.API;

   generic
      type Function_Reference is private;
   function Load (Function_Name : String) return Function_Reference;
   
   function Load (Function_Name : String) return Function_Reference is
      function As_Function_Reference is
        new Ada.Unchecked_Conversion
              (Source => System.Address,
               Target => Function_Reference);

      Raw : System.Address := Subprogram_Reference (Function_Name);
   begin
      return As_Function_Reference (Raw);
   end Load;

   function Load_T1 is new Load (T1);
begin
   GL.API.Create_Shader := Load_T1 ("glCreateShader");
end GL.Load_Function_Pointers;

Notice how our implicit subprograms get imported like before, but for the explicit subprogram, a type T1 got created as an access type to the subprogram, and a global variable Create_Shader is defined to be of this type - satisfying all of our needs.

The procedure GL.Load_Function_Pointers contains the code to fill this variable with the right value by obtaining a function pointer using the platform-specific implementation discussed above. The generic load function exists so that additional function pointers can be loaded using this same code.

The only thing left to do is to expose this functionality in the public interface like the example below:

package GL is
   --  ... other code

   procedure Init;

   --  ... other code
end GL;

--  ------

with GL.Load_Function_Pointers;

package body GL is
   --  ... other code

   procedure Init renames GL.Load_Function_Pointers;
  
   --  ... other code
end GL;

Of course, we now require the user to explicitly call Init somewhere in their code... You might think that we could automatically execute the loading code at package initialization, but this would not work, because some OpenGL implementations (most prominently the one on Windows) will refuse to load any OpenGL function pointers unless there is a current OpenGL context. This context will only exist after we created an OpenGL surface to render on, which will be done programmatically by the user.

In practice, OpenGLAda includes a binding to the GLFW library as a platform-independent way of creating windows with an OpenGL surface on them, and this binding automatically calls Init whenever a window is made current (i.e. placed in foreground), so that the user does not actually need to worry about it. However, there may be other use-cases that do not employ GLFW, like, for example, creating an OpenGL surface widget with GtkAda. In that case, calling Init manually is still required given our design.

Memory Management

The OpenGL API enables us to create various objects that reside in GPU memory for things like textures or vertex buffers. Creating such objects gives us an ID (kind of like a memory address) which we can then use to refer to the object instead of a memory address. To avoid memory leaks, we will want to manage these IDs automatically in our Ada wrapper so they are automatically destroyed once the last reference vanishes. Ada’s Controlled types are an ideal candidate for the job. Let's start writing a package GL.Objects to encapsulate the functionality:

package GL.Objects is
   use GL.Types;

   type GL_Object is abstract tagged private;

   procedure Initialize_Id (Object : in out GL_Object);

   procedure Clear (Object : in out GL_Object);

   function Initialized (Object : GL_Object) return Boolean;
   
   procedure Internal_Create_Id
     (Object : GL_Object; Id : out UInt) is abstract;

   procedure Internal_Release_Id
     (Object : GL_Object; Id : UInt) is abstract;
private
   type GL_Object_Reference;
   type GL_Object_Reference_Access is access all GL_Object_Reference;

   type GL_Object_Reference is record
      GL_Id           : UInt;
      Reference_Count : Natural;
      Is_Owner        : Boolean;
   end record;

   type GL_Object is abstract new Ada.Finalization.Controlled with record
      Reference : GL_Object_Reference_Access := null;
   end record;

   -- Increases reference count.
   overriding procedure Adjust (Object : in out GL_Object);

   -- Decreases reference count. Destroys texture when it reaches zero.
   overriding procedure Finalize (Object : in out GL_Object);
end GL.Objects;   

GL_Object is our smart pointer here, and GL_Object_Reference is the holder of the object's ID as well as the reference count. We will derive the actual object types (which there are quite a few) from GL_Object so that the base type can be abstract and we can define some subprograms that must be overridden by the child types to enforce the rule. Note that since the class hierarchy is based on GL_Object, all derived types have an identically-typed handle to a GL_Object_Reference object, and thus, our reference-counting is independent of the actual derived type.

The only thing the derived type must declare in order for our automatic memory management to work is how to create and delete the OpenGL object in GPU memory – this is what Internal_Create_Id and Internal_Release_Id in the above segment are for. Because they are abstract, they must be put into the public part of the package even though they should never be called by the user directly.

The core of our smart pointer machinery will be implemented in the Adjust and Finalize procedures provided by Ada.Finalization.Controlled. Since this topic has already been extensively covered in this Ada Gem I am going to skip over the gory implementation details.

So, to create a new OpenGL object the user must call Initialize_Id on a smart pointer which assigns the ID of the newly created object to the smart pointer's backing object. Clear can then later be used to make the smart pointer uninitialized again (but only delete the object if the reference count reaches zero).

To test our system, let's implement a Shader object. Shader objects will hold source code and compiled binaries of GLSL (GL Shading Language) shaders. We will call this package GL.Objects.Shaders in keeping with the rest of the project's structure:

package GL.Objects.Shaders is
   pragma Preelaborate;

   type Shader_Type is
     (Fragment_Shader,
      Vertex_Shader,
      Geometry_Shader,
      Tess_Evaluation_Shader,
      Tess_Control_Shader);

   type Shader (Kind : Shader_Type) is new GL_Object with private;

   procedure Set_Source (Subject : Shader; Source : String);

   procedure Compile (Subject : Shader);

   procedure Release_Shader_Compiler;

   function Compile_Status (Subject : Shader) return Boolean;

   function Info_Log (Subject : Shader) return String;

private
   type Shader (Kind : Shader_Type) is new GL_Object with null record;

   overriding
   procedure Internal_Create_Id (Object : Shader; Id : out UInt);

   overriding
   procedure Internal_Release_Id (Object : Shader; Id : UInt);

   for Shader_Type use
     (Fragment_Shader        => 16#8B30#,
      Vertex_Shader          => 16#8B31#,
      Geometry_Shader        => 16#8DD9#,
      Tess_Evaluation_Shader => 16#8E87#,
      Tess_Control_Shader    => 16#8E88#);

   for Shader_Type'Size use Low_Level.Enum'Size;
end GL.Objects.Shaders;

The two overriding procedures are implemented like this:

overriding
procedure Internal_Create_Id (Object : Shader; Id : out UInt) is
begin
   Id := API.Create_Shader (Object.Kind);
   Raise_Exception_On_OpenGL_Error;
end Internal_Create_Id;

overriding
procedure Internal_Release_Id (Object : Shader; Id : UInt) is
   pragma Unreferenced (Object);
begin
   API.Delete_Shader (Id);
   Raise_Exception_On_OpenGL_Error;
end Internal_Release_Id;

Of course, we need to add the subprogram Delete_Shader to our import specification so it will be available in the generated GL.API package. A nice thing is that, in Ada, pointer dereference is often done implicitly so we need not worry whether Create_Shader and Delete_Shader are loaded via function pointers or with the dynamic library loader – the code would look exactly the same in both cases!

Documentation

One problem we did not yet address is documentation. After all, because we are adding structure and complexity to the OpenGL API, which does not exist in its specification, how is a user supposed to find the wrapper of a certain OpenGL function they want to use?

What we need to do, then, is generate a list where the name of each OpenGL function we wrap is listed and linked to its respective wrapper function in OpenGLAda's API. Of course, we do not want to generate that list manually. Instead, let’s use our import specification again and enrich it with additional information:

   function Get_Error return Errors.Error_Code with
     Implicit => "glGetError",  Wrapper => "GL.Errors.Error_Flag";
   procedure Flush with
     Implicit => "glFlush", Wrapper => "GL.Flush";

With the new "aspect-like" declarations in our template we can enhance our generator with code that writes a Markdown file listing all imported OpenGL functions and linking that to their wrappers. In theory, we could even avoid adding the wrapper information explicitly by analyzing OpenGLAda's code to detect which subprogram wraps the OpenGL function. Tools like ASIS and LibAdaLang would help us with that, but that implementation would be far more work than adding our wrapper references explicitly.

The generated list can be seen on OpenGLAda's website showing all the functions that are actually supported. It is intended to be navigated via search (a.k.a. Ctrl+F).

Conclusion

By breaking down the complexities of a large C API like OpenGL, we have gone through quite a few improvements that can be done when creating an Ada binding. Some of them were not so obvious and probably not necessary for classifying a binding as thick - for example, auto-loading our function pointers at run-time was simply an artifact of supporting OpenGL and not covered inside the scope of the OpenGL API itself.

We also discovered that when wrapping a C API in Ada we must lift the interface to a higher level since Ada is indeed designed to be a higher-level language than C, and, in this vein, it was natural to add features that are not part of the original API to make it fit more at home in an Ada context.

It might be tempting to write a thin wrapper for your Ada project to avoid overhead, but beware - you will probably still end up writing a thick wrapper. After all, the code around calls that facilitates thinly wrapped functions and the need for data conversions does not simply vanish!

Of course, all this is a lot of work! To give you some numbers: The OpenGLAda repository contains 15,874 lines of Ada code (excluding blanks and comments, tests, and examples) while, for comparison, the C header gl.h (while missing many key features) is only around 3,000 lines.

]]>
For All Properties, There Exists a Proof http://blog.adacore.com/for-all-properties-there-exists-a-proof Mon, 19 Feb 2018 10:15:00 +0000 Yannick Moy http://blog.adacore.com/for-all-properties-there-exists-a-proof

With the recent addition of a Manual Proof capability in SPARK 18, it is worth looking at an example which cannot be proved by automatic provers, to see the options that are available for proving it with SPARK. The following code is such an example, where the postcondition of Do_Nothing cannot be proved with provers CVC4 or Z3, although it is exactly the same as its precondition:

   subtype Index is Integer range 1 .. 10;
   type T1 is array (Index) of Integer;
   type T2 is array (Index) of T1;

   procedure Do_Nothing (Tab : T2) with
     Ghost,
     Pre  => (for all X in Index => (for some Y in Index => Tab(X)(Y) = X + Y)),
     Post => (for all X in Index => (for some Y in Index => Tab(X)(Y) = X + Y));

   procedure Do_Nothing (Tab : T2) is null;

The issue is that SMT provers that we use in SPARK like CVC4 and Z3 do not recognize the similarity between the property assumed here (the precondition) and the property to prove (the postcondition). To such a prover, the formula to prove (the Verification Condition or VC) looks like the following in SMTLIB2 format:

(declare-sort integer 0)
(declare-fun to_rep (integer) Int)
(declare-const tab (Array Int (Array Int integer)))
(assert
  (forall ((x Int))
  (=> (and (<= 1 x) (<= x 10))
    (exists ((y Int))
      (and (and (<= 1 y) (<= y 10))
        (= (to_rep (select (select tab x) y)) (+ x y)))))))
(declare-const x Int)
(assert (<= 1 x))
(assert (<= x 10))
(assert
  (forall ((y Int))
    (=> (and (<= 1 y) (<= y 10))
       (not (= (to_rep (select (select tab x) y)) (+ x y))))))
(check-sat)

We see here some of the encoding from SPARK programming language to SMTLIB2 format: the standard integer type Integer is translated into an abstract type integer, with a suitable projection to_rep from this abstract type to the standard Int type of mathematical integers in SMTLIB2; the array types T1 and T2 are translated into SMTLIB2 Array types. The precondition, which is assumed here, is directly transformed into a universally quantified axiom (starting with "forall"), while the postcondition is negated and joined with the other hypotheses, as an SMT solver will try to deduce an inconsistency to prove the goal by contradiction. So the negated postcondition becomes:

   (for some X in Index => (for all Y in Index => not (Tab(X)(Y) = X + Y)));

The existentially quantified variable X becomes a constant x in the VC, with assertions stating its bounds 1 and 10, and the universal quantification becomes another axiom.

Now it is useful to understand how SMT solvers deal with universally quantified axioms. Obviously, they cannot "try out" every possible value of parameters. Here, the quantified variable ranges over all mathematical integers! And in general, we may quantify over values of abstract types which cannot be enumerated. Instead, SMT solvers find suitable "candidates" for instantiating the axioms. The main technique to find such candidates is called trigger-based instantiation. The SMT solver identifies terms in the quantified axiom that contain the quantified variables, and match them with the so-called "ground" terms in the VC (terms that do not contain quantified or "bound" variables). Here, such a term containing x in the first axiom is (to_rep (select (select tab x) y)), or simply (select tab x), while in the second axiom such a term containing y could be (to_rep (select (select tab x) y)) or (select (select tab x) y). The issue with the VC above is that these do not match any ground term, hence neither CVC4 nor Z3 can prove the VC.

Note that Alt-Ergo is able to prove the VC, using the exact same trigger-based mechanism, because it considers (select tab x) from the second axiom as a ground term in matching. Alt-Ergo uses this term to instantiate the first axiom, which in turn provides the term (select (select tab x) sko_y) [where sko_y is a fresh variable corresponding to the skolemisation of the existentially quantified variable y]. Alt-Ergo then uses this new term to instantiate the second axiom, resulting in a contradiction. So Alt-Ergo can deduce that the VC is unsatisfiable, hence proves the original (non-negated) postcondition.

I am going to consider in the following alternative means to prove such a property, when all SMT provers provided with SPARK fail.

solution 1 - use an alternative automatic prover

As the property to prove is an exact duplication of a known property in hypothesis, a different kind of provers, called provers by resolution, is a perfect fit. Here, I'm using E prover, but many others are supported by the Why3 platform used in SPARK, and would be as effective. The first step is to install E prover from its website (www.eprover.org) or from its integration in your Linux distro. Then, you need to run the executable why3config to generate a suitable .why3.conf configuration file in your HOME directory, with the necessary information for Why3 to know how to generate VCs for E prover, and how to call it. Currently, GNATprove cannot be called with --prover=eprover, so instead I called directly the underlying Why3 tool and it proves the desired postcondition:

$ why3 prove -L /path/to/theories -P Eprover quantarrays.mlw
quantarrays.mlw Quantarrays__subprogram_def WP_parameter def : Valid (0.02s)

solution 2 - prove interactively

With SPARK 18 comes the possibility to prove a VC interactively inside the editor GPS. Just right-click on the message about the unproved postcondition and select "Start Manual Proof". Various panels are opened in GPS:

Manual Proof inside GPS

Here, the manual proof is really simple. We start by applying axiom H, as the conclusion of this axiom matches the goal to prove, which makes it necessary to prove the conditions for applying axiom H. Then we use the known bounds on X in axioms H1 and H2 to prove the conditions. And we're done! The following snapshot shows that GPS now confirms that the VC has been proved:

Manual Proof inside GPS

Note that it is possible to call an automatic prover by its name, like "altergo", "cvc4", or "z3" to prove the VC automatically after the initial application of axiom H.

solution 3 - use an alternative interactive prover

It is also possible to use powerful external interactive provers like Coq or Isabelle. You first need to install these on your machine. GNATprove and GPS are directly integrated with Coq, so that you can right-click on the unproved postcondition, select "Prove Check", then manually enter the switch "--prover=coq" to select Coq prover. GPS will then open CoqIDE on the VC as follows:

Start Coq Proof inside CoqIDE (called from GPS)

The proof in Coq is as simple as before. Here is the exact set of tactics to apply to reproduce what we did with manual proof in GPS:

Coq Proof inside CoqIDE (called from GPS)

Note that the tactic "auto" in Coq proves this VC automatically.

What to Remember

There are many ways forward that are available when automatic provers available with GNATprove fail to prove a property. We already presented in various occasions the use of ghost code. Here we described three other ways: using an alternative automatic prover, proving interactively, and using an alternative interactive prover.

[cover image of Kurt Gödel, courtesy of WikiPedia, who demonstrated in fact that no all true properties can be ever proved]

]]>
Bitcoin blockchain in Ada: Lady Ada meets Satoshi Nakamoto http://blog.adacore.com/bitcoin-in-ada Thu, 15 Feb 2018 13:00:00 +0000 Johannes Kanig http://blog.adacore.com/bitcoin-in-ada

Bitcoin is getting a lot of press recently, but let's be honest, that's mostly because a single bitcoin worth 800 USD in January 2017 was worth almost 20,000 USD in December 2017. However, bitcoin and its underlying blockchain are beautiful technologies that are worth a closer look. Let’s take that look with our Ada hat on!

So what's the blockchain?

“Blockchain” is a general term for a database that’s maintained in a distributed way and is protected against manipulation of the entries; Bitcoin is the first application of the blockchain technology, using it to track transactions of “coins”, which are also called Bitcoins.

Conceptually, the Bitcoin blockchain is just a list of transactions. Bitcoin transactions in full generality are quite complex, but as a first approximation, one can think of a transaction as a triple (sender, recipient, amount), so that an initial mental model of the blockchain could look like this:

SenderRecipientAmount
<Bitcoin address><Bitcoin address>0.003 BTC
<Bitcoin address><Bitcoin address>0.032 BTC
.........

Other data, such as how many Bitcoins you have, are derived from this simple transaction log and not explicitly stored in the blockchain.

Modifying or corrupting this transaction log would allow attackers to appear to have more Bitcoins than they really have, or, allow them to spend money then erase the transaction and spend the same money again. This is why it’s important to protect against manipulation of that database.

The list of transactions is not a flat list.  Instead, transactions are grouped into blocks. The blockchain is a list of blocks, where each block has a link to the previous block, so that a block represents the full blockchain up to that point in time:

Thinking as a programmer, this could be implemented using a linked list where each block header contains a prev pointer.  The blockchain is grown by adding new blocks to the end, with each new block pointing to the former previous block, so it makes more sense to use a prev pointer instead of a next pointer.  In a regular linked list, prev pointer points directly to the memory used for the previous block. But the uniqueness of the blockchain is that it's a distributed data structure; it's maintained by a network of computers or nodes. Every bitcoin full node has a full copy of the blockchain, but what happens if members of the network don't agree on the contents of some transaction or block? A simple memory corruption or malicious act could result in a client having incorrect data.  This is why the blockchain has various checks built-in that guarantee that corruption or manipulation can be detected.

How does Bitcoin check data integrity?

Bitcoin’s internal checks are based on a cryptographic hash function. This is just a fancy name for a function that takes anything as input and spits out a large number as output, with the following properties:

  • The output of the function varies greatly and unpredictably even with tiny variations of the input;

  • It is extremely hard to deduce an input that produces some specific output number, other than by using brute force; that is, by computing the function again and again for a large number of inputs until one finds the input that produces the desired output.

The hash function used in Bitcoin is called SHA256.  It produces a 256-bit number as output, usually represented as 64 hexadecimal digits. Collisions (different input data that produces the same output hash value) are theoretically possible, but the output space is so big that collisions on actual data are considered extremely unlikely, in fact practically impossible.

The idea behind the first check of Bitcoin's data integrity is to replace a raw pointer to a memory region with a “safe pointer” that can, by construction, only point to data that hasn’t been tampered with. The trick is to use the hash value of the data in the block as the “pointer” to the data. So instead of a raw pointer, one stores the hash of the previous block as prev pointer:

Here, I’ve abbreviated the 256-bit hash values by their first two and last four hex digits – by design, Bitcoin block hashes always start with a certain number of leading zeroes. The first block contains a "null pointer" in the form of an all zero hash.

Given a hash value, it is infeasible to compute the data associated with it, so one can't really "follow" a hash like one can follow a pointer to get to the real data.  Therefore, some sort of table is needed to store the data associated with the hash value.

Now what have we gained? The structure can no longer easily be modified. If someone modifies any block, its hash value changes, and all existing pointers to it are invalidated (because they contain the wrong hash value). If, for example, the following block is updated to contain the new prev pointer (i.e., hash), its own hash value changes as well. The end result is that the whole data structure needs to be completely rewritten even for small changes (following prev pointers in reverse order starting from the change). In fact such a rewrite never occurs in Bitcoin, so one ends up with an immutable chain of blocks. However, one needs to check (for example when receiving blocks from another node in the network) that the block pointed to really has the expected hash. 

Block data structure in Ada

To make the above explanations more concrete, let's look at some Ada code (you may also want to have bitcoin documentation available).

A bitcoin block is composed of the actual block contents (the list of transactions of the block) and a block header. The entire type definition of the block looks like this (you can find all code in this post plus some supporting code in this github repository):

   type Block_Header is record
      Version : Uint_32;
      Prev_Block : Uint_256;
      Merkle_Root : Uint_256;
      Timestamp : Uint_32;
      Bits : Uint_32;
      Nonce : Uint_32;
   end record;

   type Transaction_Array is array (Integer range <>) of Uint_256;

   type Block_Type (Num_Transactions : Integer) is record
      Header : Block_Header;
      Transactions : Transaction_Array (1 .. Num_Transactions);
   end record;

As discussed, a block is simply the list of transactions plus the block header which contains additional information. With respect to the fields for the block header, for this blog post you only need to understand two fields:

  • Prev_Block a 256-bit hash value for the previous block (this is the prev pointer I mentioned before)

  • Merkle_Root a 256-bit hash value which summarizes the contents of the block and guarantees that when the contents change, the block header changes as well. I will explain how it is computed later in this post.

The only piece of information that’s missing is that Bitcoin usually uses the SHA256 hash function twice to compute a hash. So instead of just computing SHA256(data), usually SHA256(SHA256(data)) is computed. One can write such a double hash function in Ada as follows, using the GNAT.SHA256 library and String as a type for a data buffer (we assume a little-endian architecture throughout the document, but you can use the GNAT compiler’s Scalar_Storage_Order feature to make this code portable):

with GNAT.SHA256; use GNAT.SHA256;

   function Double_Hash (S : String) return Uint_256 is
      D : Binary_Message_Digest := Digest (S);
      T : String (1 .. 32);
      for T'Address use D'Address;
      D2 : constant Binary_Message_Digest := Digest (T);

      function To_Uint_256 is new Ada.Unchecked_Conversion
        (Source => Binary_Message_Digest,
         Target => Uint_256);
   begin
      return To_Uint_256 (D2);
   end Double_Hash;

The hash of a block is simply the hash of its block header. This can be expressed in Ada as follows (assuming that the size in bits of the block header, Block_Header’Size in Ada, is a multiple of 8):

   function Block_Hash (B : Block_Type) return Uint_256 is
      S : String (1 .. Block_Header'Size / 8);
      for S'Address use B.Header'Address;
   begin
      return Double_Hash (S);
   end Block_Hash;

Now we have everything we need to check the integrity of the outermost layer of the blockchain. We  simply iterate over all blocks and check that the previous block indeed has the hash used to point to it:

declare
   Cur : String :=
     "00000000000000000044e859a307b60d66ae586528fcc6d4df8a7c3eff132456";
   S : String (1 ..64);
begin
   loop
      declare
         B : constant Block_Type := Get_Block (Cur);
      begin
         S := Uint_256_Hex (Block_Hash (B));
         Put_Line ("checking block hash = " & S);
         if not (Same_Hash (S,Cur)) then 
            Ada.Text_IO.Put_Line ("found block hash mismatch");
         end if;
         Cur := Uint_256_Hex (B.Prev_Block);
      end;
   end loop;
end;

A few explanations: the Cur string contains the hash of the current block as a hexadecimal string. At each iteration, we fetch the block with this hash (details in the next paragraph) and compute the actual hash of the block using the Block_Hash function. If everything matches, we set Cur to the contents of the Prev_Block field. Uint_256_Hex is the function to convert a hash value in memory to its hexadecimal representation for display.

One last step is to get the actual blockchain data. The size of the blockchain is now 150GB and counting, so this is actually not so straightforward! For this blog post, I added 12 blocks in JSON format to the github repository, making it self-contained. The Get_Block function reads a file with the same name as the block hash to obtain the data, starting at a hardcoded block with the hash mentioned in the code. If you want to verify the whole blockchain using the above code, you have to either query the data using some website such as blockchain.info, or download the blockchain on your computer, for example using the Bitcoin Core client, and update Get_Block accordingly.

How to compute the Merkle Root Hash

So far, we were able to verify the proper chaining of the blockchain, but what about the contents of the block?  The objective is now to come up with the Merkle root hash mentioned earlier, which is supposed to "summarize" the block contents: that is, it should change for any slight change of the input.

First, each transaction is again identified by its hash, similar to how blocks are identified. So now we need to compute a single hash value from the list of hashes for the transactions of the block. Bitcoin uses a hash function which combines two hashes into a single hash:

   function SHA256Pair (U1, U2 : Uint_256) return Uint_256 is
      type A is array (1 .. 2) of Uint_256;
      X : A := (U1, U2);
      S : String (1 .. X'Size / 8);
      for S'Address use X'Address;
   begin
      return Double_Hash (S);
   end SHA256Pair;

Basically, the two numbers are put side-by-side in memory and the result is hashed using the double hash function.

Now we could just iterate over the list of transaction hashes, using this combining function to come up with a single value. But it turns out Bitcoin does it a bit differently; hashes are combined using a scheme that's called a Merkle tree:

One can imagine the transactions (T1 to T6 in the example) be stored at the leaves of a binary tree, where each inner node carries a hash which is the combination of the two child hashes. For example, H7 is computed from H1 and H2. The root node carries the "Merkle root hash", which in this way summarizes all transactions. However, this image of a tree is just that - an image to show the order of hash computations that need to be done to compute the Merkle root hash. There is no actual tree stored in memory.

There is one peculiarity in the way Bitcoin computes the Merkle hash: when a row has an odd number of elements, the last element is combined with itself to compute the parent hash. You can see this in the picture, where H9 is used twice to compute H11.

The Ada code for this is quite straightforward:

   function Merkle_Computation (Tx : Transaction_Array) return Uint_256 is
      Max : Integer :=
          (if Tx'Length rem 2 = 0 then Tx'Length else Tx'Length + 1);
      Copy : Transaction_Array (1 .. Max);
   begin
      if Tx'Length = 1 then
         return Tx (Tx'First);
      end if;
      if Tx'Length = 0 then
         raise Program_Error;
      end if;
      Copy (1 .. Tx'Length) := Tx;
      if (Max /= Tx'Length) then
         Copy (Max) := Tx (Tx'Last);
      end if;
      loop
         for I in 1 .. Max / 2 loop
            Copy (I) := SHA256Pair (Copy (2 * I - 1), Copy (2 *I ));
         end loop;
         if Max = 2 then
            return Copy (1);
         end if;
         Max := Max / 2;
         if Max rem 2 /= 0 then
            Copy (Max + 1) := Copy (Max);
            Max := Max + 1;
         end if;
      end loop;
   end Merkle_Computation;

Note that despite the name, the input array only contains transaction hashes and not actual transactions. A copy of the input array is created at the beginning; after each iteration of the loop in the code, it contains one level of the Merkle tree. Both before and inside the loop, if statements check for the edge case of combining an odd number of hashes at a given level.

We can now update our checking code to also check for the correctness of the Merkle root hash for each checked block. You can check out the whole code from this repository; the branch “blogpost_1” will stay there to point to the code as shown here.

Why does Bitcoin compute the hash of the transactions in this way? Because it allows for a more efficient way to prove to someone that a certain transaction is in the blockchain.

Suppose you want to show someone that you sent her the required amount of Bitcoin to buy some product. The person could, of course, download the entire block you indicate and check for themselves, but that’s inefficient. Instead, you could present them with the chain of hashes that leads to the root hash of the block.

If the transaction hashes were combined linearly, you would still have to show them the entire list of transactions that come after yours in the block. But with the Merkle hash, you can present them with a “Merkle proof”: that is, just the hashes required to compute the path from your transaction to the Merkle root. In your example, if your transaction is T3, it's enough to also provide H4, H7 and H11: the other person can  compute the Merkle root hash from that and compare it with the “official” Merkle root hash of that block.

When I first saw this explanation, I was puzzled why an attacker couldn’t modify transaction T3 to T3b and then “invent” the hashes H4b, H7b and H11b so that the Merkle root hash H12 is unchanged. But the cryptographic nature of the hash function prevents this: today, there is no known attack against the hash function SHA256 used in Bitcoin that would allow inventing such input values (but for the weaker hash function SHA1 such collisions have been found).

Wrap-Up

In this blog post I have shown Ada code that can be used to verify the data integrity of blocks from the Bitcoin blockchain. I was able to check the block and Merkle root hashes for all the blocks in the blockchain in a few hours on my computer, though most of the time was spent in Input/Output to read the data in.

There are many more rules that make a block valid, most of them related to transactions. I hope to cover some of them in later blog posts.

]]>
The Road to a Thick OpenGL Binding for Ada: Part 1 http://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada Mon, 05 Feb 2018 15:59:00 +0000 Felix Krause http://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada

This blog post is part one of a tutorial based on the OpenGLAda project and will cover some the background of the OpenGL API and the basic steps involved in importing platform-dependent C functions.

Motivation

Ada was designed by its onset in the late 70’s to be highly compatible with other languages - for example, there are currently native facilities for directly using libraries from C, FORTRAN, COBOL, C++, and even Java. However, there is still a process (although automate-able to a certain extent) that must be followed to safely and effectively import an API or create what we will refer to here as a binding.

Additionally, foreign APIs may not be the most efficient or user-friendly for direct use in Ada, and so it is often considered useful to go above and beyond making a simple or thin binding and instead craft a small custom library (or thick binding) above the original API to solidify and greatly simplify its use within the Ada language.

In this blog post I will describe the design decisions and architecture of OpenGLAda - a custom thick binding to the OpenGL API for Ada, and, in the process, I hope to provide ideas and techniques that may inspire others to contribute their own bindings for similar libraries.

Below are some examples based on the classic OpenGL Superbible of what is possible using the OpenGLAda binding and whose complete source can be found on my Github repo for OpenGLAda here along with instructions for setting up an Ada environment:

Screenshots of example projects from OpenGLAda

Background

OpenGL, created in 1991 by Silicon Graphics, has had a long history as an industry standard for rendering 3D vector graphics - growing through numerous revisions (currently at version 4.6) both adding new features and deprecating or removing others. As a result, the once simple API has become more complex and difficult to wield at times. Despite this and even with the competition of Microsoft’s DirectX and the creation of new APIs (like Vulkan), OpenGL still remains a big player in the Linux, Mac, and free-software world.

Unlike a typical C library, OpenGL has hundreds (maybe even thousands) of implementations, usually provided by graphics hardware vendors. While the OpenGL API itself is considered platform-independent, making use of it does depends heavily on the target platform's graphics and windowing systems. This is due to the fact that rendering requires a so-called OpenGL context consisting of a drawing area on the screen and all associated data needed for rendering. For this reason, there exist multiple glue APIs that enable using OpenGL in conjunction with several windowing systems.

Design Challenges

A concept that proliferates the design of OpenGL is graceful degradation - meaning that if some feature or function is unavailable on a target platform the client software may supply a workaround or simply skip the part of the rendering process in which the feature is required. This makes it necessary to query for existing features during run-time. Additionally, the code for querying OpenGL features is not part of the OpenGL API itself and must be provided by us and defined separately for each platform we plan to support.

These properties pose the following challenges for our Ada binding:

  1. It must include some platform-dependent code, ideally hiding this from the user to enable platform-independent usage.
  2. It must access OpenGL features without directly linking to them so that missing features can be handled inside the application.

First Steps

Note: I started working on OpenGLAda in 2012 so it only uses the features of the Ada 2005 language level. Some code shown here could be written in a more succinct way with the added constructs in Ada 2012 (most notably aspects and expression functions).

To get started on our binding we need to translate subprogram definitions from the standard OpenGL C header into Ada. Since we are writing a thick binding and are going above and beyond directly using the original C function, these API imports should be invisible to the user. Thus, we will define a set of private packages such as GL.API to house all of these imports. A private package can only be used by the immediate parent package and its children, making it invisible for a user of the library. The public package GL and its public child packages will provide the public interface.

To translate a C subprogram declaration to Ada, we need to map all C types it uses into equivalent Ada types then essentially change the syntax from C to Ada. For the first import, we choose the following subprogram:

void glFlush();

This is a command used to tell OpenGL to execute commands currently stored in internal buffers. It is a very common command and thus is placed directly in the top-level package of the public interface. Since the command has no parameters and returns no values, there are no types involved so we don’t need to care about them for now. Our Ada code looks like this:

package GL is
   procedure Flush;
end GL;

private package GL.API is
   procedure Flush;
   pragma Import
     (Convention    => C,
      Entity        => Flush,
      External_Name => "glFlush");
end GL.API;

package body GL is
   procedure Flush is
   begin
      API.Flush;
   end Flush;
end GL;

Instead of providing an implementation of GL.API.Flush in a package body, we use the pragma Import to tell the Ada compiler that we are importing this subprogram from another library. The first parameter is the calling convention, which defines low-level details about how a subprogram call is to be translated into machine code. It is vital that the caller and the callee agree on the same calling convention; a mistake at this point is hard to detect and, in the worst case, may lead to memory corruption during run-time.

Note that when defining the implementation of the public subprogram GL.Flush, we cannot use a renames clause like we typically would, because our imported backend subprogram is within a private package.

Now, the interesting part: how do we link to the appropriate OpenGL implementation according to the system we are targeting? Not only are there multiple implementations, but their link names also differ.

The solution is to use the GPRBuild tool and define a scenario variable to select the correct linker flags:

library project OpenGL is
   type Windowing_System_Type is
      ("windows", --  Microsoft Windows
       "x11",     --  X Window System (primarily used on Linux)
       "quartz"); --  Quartz Compositor (the macOS window manager)

   Windowing_System : Windowing_System_Type :=
     external ("Windowing_System");

  for Languages use ("ada");
  for Library_Name use "OpenGLAda";
  for Source_Dirs use ("src");

   package Compiler is
      for Default_Switches ("ada") use ("-gnat05");
   end Compiler;

   package Linker is
      case Windowing_System is
         when "windows" =>
            for Linker_Options use ("-lOpenGL32");
         when "x11" =>
            for Linker_Options use ("-lGL");
         when "quartz" =>
            for Linker_Options use ("-Wl,-framework,OpenGL");
      end case;
   end Linker;
end OpenGL;

We will need other distinctions based on the windowing system later, and thus we name the scenario variable Windowing_System accordingly, although, at this point, it would also be sensible to distinguish just the operating system instead. We use Linker_Options instead of Default_Switches in the linker to tell GPRBuild what options we need when linking the final executable.

As you can see, the library we link against is called OpenGL32 on Windows and  GL on Linux. On MacOS, there is the concept of frameworks which are somewhat more sophisticated software libraries. On the gcc command line, they can be given with "-framework <name>", which gcc hands over to the linker. However, this does not work easily with GPRBuild unless we use the "-Wl,option" flag, whose operation is defined as:

Pass "option" as an option to the linker. If option contains commas, it is split into multiple options at the commas. You can use this syntax to pass an argument to the option.

At this point, we have almost successfully wrapped our first OpenGL subprogram. However, there is a nasty little detail we overlooked: Windows APIs use a calling convention different than the standard C one. One usually only needs to care about this when linking against the Win32 API, however, OpenGL is thought to be part of the Windows API as we can see in the OpenGL C header:

GLAPI void APIENTRY glFlush (void);

... and by digging through the Windows version of this header we then find somewhere, wrapped in some #ifdef's, this line:

#define APIENTRY __stdcall

This means our target C function has the calling convention stdcall, which is only used on Windows. Thankfully, GNAT supports this calling convention, and moreover, for every system that is not Windows defines it as synonym for the C calling convention. Thus, we can rewrite our import:

procedure Flush;
pragma Import
  (Convention    => Stdcall,
   Entity        => Flush,
   External_Name => "glFlush");

With the above code, our first wrapper subprogram is ready.


Stay tuned for part two where we will cover a basic type system for interfacing with C, error handling, memory management, and more!

Part two of this article can be found here!

]]>
AdaCore at FOSDEM 2018 http://blog.adacore.com/adacore-at-fodsem-2018 Thu, 18 Jan 2018 14:55:02 +0000 Pierre-Marie de Rodat http://blog.adacore.com/adacore-at-fodsem-2018

Every year, free and open source enthusiasts gather at Brussels (Belgium) for two days of FLOSS-related conferences. FOSDEM organizers setup several “developer rooms”, which are venues that host talks on specific topics. This year, the event will happen on the 3rd and 4th of February (Saturday and Sunday) and there is a room dedicated to the Ada programming language.

Just like last year and the year before, several AdaCore engineers will be there. We have five talks scheduled:

In the Ada devroom:

In the Embedded, mobile and automotive devroom:

In the Source Code Analysis devroom:

Note also in the Embedded, mobile and automotive devroom that the talk from Alexander Senier about the work they are doing at Componolit, which uses SPARK and Genode to bring trust to the Android platform.

If you happen to be in the area, please come and say hi!

]]>
Leveraging Ada Run-Time Checks with Fuzz Testing in AFL http://blog.adacore.com/running-american-fuzzy-lop-on-your-ada-code Tue, 19 Dec 2017 09:36:08 +0000 Lionel Matias http://blog.adacore.com/running-american-fuzzy-lop-on-your-ada-code

Fuzzing is a very popular bug finding method. The concept, very simply, is to continuously inject random (garbage) data as input of a software component, and wait for it to crash. Google's Project Zero team made it one of their major vulnerability-finding tools (at Google scale). It is very efficient at robust-testing file format parsers, antivirus software, internet browsers, javascript interpreters, font face libraries, system calls, file systems, databases, web servers, DNS servers... When Heartbleed came out, people found out that it was, indeed, easy to find. Google even launched a free service to fuzz at scale widely used open-source libraries, and Microsoft created Project Springfield, a commercial service to fuzz your application (at scale also, in the cloud).

Writing robustness tests can be tedious, and we - as developers - are usually bad at it. But when your application is on an open network, or just user-facing, or you have thousands of appliance in the wild, that might face problems (disk, network, cosmic rays :-)) that you won't see in your test lab, you might want to double-, triple-check that your parsers, deserializers, decoders are as robust as possible...

In my experience, fuzzing causes so many unexpected crashes in so many software parts, that it's not unusual to spend more time doing crash triage than preparing a fuzzing session. It is a great verification tool to complement structured testing, static analysis and code review.

Ada is pretty interesting for fuzzing, since all the runtime checks (the ones your compiler couldn't enforce statically) and all the defensive code you've added (through pre-/post-conditions, asserts, ...) can be leveraged as fuzzing targets.

Here's a recipe to use American Fuzzy Lop on your Ada code.

American Fuzzy Lop

AFL is a fuzzer from Michał Zalewski (lcamtuf), from the Google security team. It has an impressive trophy case of bugs and vulnerabilities found in dozens of open-source libraries or tools.

You can even see for yourself how efficient guided fuzzing is in the now classic 'pulling jpegs out of thin air' demonstration on the lcamtuf blog.

I invite you to read the technical description of the tool to get a precise idea of the innards of AFL.

Installation instructions are covered in the Quick Start Guide, and they can be summed as:

  1. Get the latest source
  2. Build it (make)
  3. Make your program ready for fuzzing (here you have to work a bit)
  4. Start fuzzing...

There are two main parts of the tool:

  • afl-clang/afl-gcc to instrument your binary
  • and afl-fuzz that runs your binary and uses the instrumentation to guide the fuzzing session.

Instrumentation

afl-clang / afl-gcc compiles your code and adds a simple instrumentation around branch instructions. The instrumentation is similar to gcov or profiling instrumentation but it targets basic blocks. In the clang world, afl-clang-fast uses a plug-in to add the instrumentation cleanly (the compiler knows about all basic blocks, and it's very easy to add some code at the start of a basic block in clang). In the gcc world the tool only provides a hacky solution.

The way it works is that instead of calling your GCC of predilection, you call afl-gcc. afl-gcc will then call your GCC to output the assembly code generated from your code. To simplify, afl-gcc patches every jump instruction and every label (jump destination) to append an instrumentation block. It then calls your assembler to finish the compilation job.

Since it is a pass on assembly code generated from GCC it can be used to fuzz Ada code compiled with GNAT (since GNAT is based on GCC). In the gprbuild world this means calling gprbuild with the --compiler-subst=lang,tool option (see gprbuild manual).

Note : afl-gcc will override compilation options to force -O3 -funroll-loops. The reason behind this is that the authors of AFL noticed that those optimization options helped with the coverage instrumentation (unrolling loops will add new 'jump' instructions).

With some codebases there can appear a problem with the 'rep ret' instruction. For obscure reasons gcc sometimes insert a 'rep ret' instruction instead of a ‘ret’ (return) instruction. Some info on the gcc mailing list archives and in more detail if you dare on a dedicated website call repzret.org.

When AFL inserts its instrumentation code, the 'rep ret' instruction is not correct anymore ('as' complains). Since 'rep ret' is exactly the same instruction (except a bit slower on some AMD arch) as ‘ret’, you can add a step in afl-as (the assembly patching module) to patch the (already patched) assembly code: add the following code at line 269 in afl-as.c (on 2.51b or 2.52b versions):

    if (!strncmp(line, "\trep ret", 8)) {
       SAYF("[LMA patch] afl-as : replace 'rep ret' with (only) 'ret'\n");
       fputs("\tret\n", outf);
       continue;
    }

... and then recompile AFL. It then works fine, and prints a specific message whenever it encounters the problematic case. I didn't need this workaround for the example programs I chose for this post (you probably won't need it), but it can happen, so here you go...

Though a bit hacky, going through assembly and sed-patching it seems the only way to do this on gcc, for now. It's obviously not available on any other arch (power, arm) as such, as afl-as inserts an x86-specifc payload. Someone wrote a gcc plug-in once and it would need some love to be ported to a gcc-6 (recent GNAT) or -8 series (future GNAT). The plug-in approach would also allow to do in-process fuzzing, speed-up the fuzzing process, and ease the fuzzing of programs with a large initialization/set-up time.

When you don't have the source code or changing your build chain would be too hard, the afl-fuzz manual mentions a Qemu-based option. I haven't tried it though.

The test-case generator

It takes a bunch of valid inputs to your application, and implements a wide variety of random mutations, runs your application with them and then uses the inserted instrumentation to guide itself to new code paths and avoid staying too much on paths that already crash.

AFL looks for crashes. It is expecting a call to abort() (SIGABRT). Its job is to try and crash your software, its search target is "a new unique crash".

It's not very common to get a core dump (SIGSEGV/SIGABRT) in Ada with GNAT, even following an uncaught top-level exception. You'll have to help the fuzzer and provoke core dumps on errors you want to catch. A top-level exception by itself won't do it. In the GNAT world you can dump core using the Core_Dump procedure in the GNAT-specific package GNAT.Exception_Actions. What I usually do is let all exceptions bubble up to a top-level exception handler and filter by name, and only crash/abort on the exceptions I'm interested in. And if the bug you're trying to find with fuzzing doesn't crash your application, make it  a crashing bug.

With all that said, let’s find some open-source libraries to fuzz.

Fuzzing Zip-Ada

Zip-Ada is a nice pure-Ada library to work with zip archives. It can open, extract, compress, decompress most of the possible kinds of zip files. It even has implemented recently LZMA compression. It's 100% Ada, portable, quite readable and simple to use (drop the source, use the gpr file, look up the examples and you're set). And it's quite efficient (say my own informal benchmarks). Anyway it's a cool project to contribute to but I'm no compression wizard. Instead, let's try and fuzz it.

Since it's a library that can be given arbitrary files, maybe of dubious source, it needs to be robust.

I got the source from sourceforge of version 52 (or if you prefer, on github), uncompressed it and found the gprbuild file. Conveniently Zip-Ada comes with a debug mode that enables all possible runtime checks from GNAT, including -gnatVa, -gnato. The zipada.gpr file also references a 'pragma file' (through -gnatec=debug.pra) that contains a 'pragma Initialize_Scalars;' directive so everything is OK on the build side.

Then we need a very simple test program that takes a file name as a command-line argument, then drives the library from there. File parsers are the juiciest targets, so let's read and parse a file: we'll open and extract a zip file. For a first program what we're looking for is procedure Extract in the Unzip package:

-- Extract all files from an archive (from)
procedure Extract (From                 : String;
                   Options              : Option_set := No_Option;
                   Password             : String := "";
                   File_System_Routines : FS_Routines_Type := Null_Routines)

Just give it a file name and it will (try to) parse it as an archive and extract all the files from the archive.

We also need to give AFL what it needs (abort() / core dump) so let's add a top-level exception block that will do that, unconditionally (at first) on any exception.

The example program looks like:

with UnZip; use UnZip;
with Ada.Command_Line;
with GNAT.Exception_Actions;
with Ada.Exceptions;
with Ada.Text_IO; use Ada.Text_IO;

procedure Test_Extract is
begin
  Extract (From                 => Ada.Command_Line.Argument (1),
           Options              => (Test_Only => True, others => False),
           Password             => "",
           File_System_Routines => Null_routines);
exception
  when Occurence : others  =>
     Put_Line ("exception occured [" & Ada.Exceptions.Exception_Name (Occurence)
               & "] [" & Ada.Exceptions.Exception_Message (Occurence)
               & "] [" & Ada.Exceptions.Exception_Information (Occurence) & "]");
     GNAT.Exception_Actions.Core_Dump (Occurence);
end Test_Extract;

And to have it compile, we add it to the list of main programs in the zipada.gpr file.

Then let's build:

gprbuild --compiler-subst=Ada,/home/lionel/afl/afl-2.51b/afl-gcc -p -P zipada.gpr -Xmode=debug

We get a classic gprbuild display, with some additional lines:

...
afl-gcc -c -gnat05 -O2 -gnatp -gnatn -funroll-loops -fpeel-loops -funswitch-loops -ftracer -fweb -frename-registers -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fipa-cp-clone -ffunction-sections -gnatec../za_elim.pra zipada.adb
afl-cc 2.51b by <lcamtuf@google.com>
afl-as 2.51b by <lcamtuf@google.com>
[+] Instrumented 434 locations (64-bit, non-hardened mode, ratio 100%).
afl-gcc -c -gnat05 -O2 -gnatp -gnatn -funroll-loops -fpeel-loops -funswitch-loops -ftracer -fweb -frename-registers -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fipa-cp-clone -ffunction-sections -gnatec../za_elim.pra comp_zip.adb
afl-cc 2.51b by <lcamtuf@google.com>
afl-as 2.51b by <lcamtuf@google.com>
[+] Instrumented 45 locations (64-bit, non-hardened mode, ratio 100%).
...

The 2 additional afl-gcc and afl-as steps show up along with a counter of instrumented locations in the assembly code for each unit. So, some instrumentation was inserted.

Fuzzers are bad with checksums (http://moyix.blogspot.fr/2016/07/fuzzing-with-afl-is-an-art.html is an interesting dive into what can block afl-fuzz and what can be done, and John Regehr had a blog post on what AFL is bad at). For example, there’s no way for a fuzzing tool to go through a checksum test: it would need to generate only test cases that have a matching checksum. So, to make sure we get somewhere, I removed all checksum tests. There was one for zip CRC. Another one for zip passwords, for similar reasons. After I commented out those tests, I recompiled the test program.

Then we’ll need to build a fuzzing environnement:

mkdir fuzzing-session
mkdir fuzzing-session/input
mkdir fuzzing-session/output

We also need to bootstrap the fuzzer with an initial corpus that doesn't crash. If there's a test suite, put the correct files in input/.

Then afl-fuzz can (finally) be launched:

AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON \
  /home/lionel/afl/afl-2.51b/afl-fuzz -m 1024 -i input -o output ../test_extract @@

 -i dir        - input directory with test cases
 -o dir        - output directory for fuzzer findings
 -m megs       - memory limit for child process (50 MB)
  @@ to tell afl to put the input file as a command line argument. By default afl will write to the program's stdin.

The AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON prelude is to silence a warning from afl-fuzz about how your system handles core dumps (see the man page for core). For afl-fuzz it's a problem because whatever is done to handle core dumps on your system might take some time and afl-fuzz will think the program timed out (although it crashed). For you it can also be a problem : It’s possible (see : some linux distros) that it’d instruct your system to do something (send a UI notification, fill your /var/log/messages, send a crash report e-mail to your sysadmin, …) with core dumps automatically (and you might not care). Maybe check first with your sysadmin… If you’re root on your machine, follow afl-fuzz’s advice and change your /proc/sys/kernel/core_pattern to something sensitive.

Let’s go:

JwHBfuE9UC2Mex1k4simAtXAsYtlNqt6teGfgRBa

In less than 2 minutes, afl-fuzz finds several crashes. While it says they’re “unique”, they in fact trigger the same 2 or 3 exceptions. After 3 hours, it “converges” to a list of crashes, and letting it run for 3 days doesn’t bring another one.

It got a string of CONSTRAINT_ERRORs:

  • CONSTRAINT_ERROR : unzip.adb:269 range check failed

  • CONSTRAINT_ERROR : zip.adb:535 range check failed

  • CONSTRAINT_ERROR : zip.adb:561 range check failed

  • CONSTRAINT_ERROR : zip-headers.adb:240 range check failed

  • CONSTRAINT_ERROR : unzip-decompress.adb:650 range check failed

  • CONSTRAINT_ERROR : unzip-decompress.adb:712 index check failed

  • CONSTRAINT_ERROR : unzip-decompress.adb:1384 access check failed

  • CONSTRAINT_ERROR : unzip-decompress.adb:1431 access check failed

  • CONSTRAINT_ERROR : unzip-decompress.adb:1648 access check failed

I sent those and the reproducers to Gautier de Montmollin (Zip-Ada's maintainer). He corrected those quickly (revisions 587 up to 599). Most of those errors now are raised as Zip-Ada-specific exceptions. He also decided to rationalize the list of raised exceptions that could (for legitimate reasons) be raised from the Zip-Ada decoding code.

It also got some ADA.IO_EXCEPTIONS.END_ERROR:

  • ADA.IO_EXCEPTIONS.END_ERROR : zip.adb:894

  • ADA.IO_EXCEPTIONS.END_ERROR : s-ststop.adb:284 instantiated at s-ststop.adb:402

I redid another fuzzing session after all the corrections and improvements confirming the list of exceptions.

This wasn’t a lot of work (for me), mostly using the cycles on my machine that I didn’t use, and I got a nice thanks for contributing :-).

Fuzzing AdaYaml

AdaYaml is a library to parse YAML files in Ada.

Let’s start by cloning the github repository (the one before all the corrections). For those not familiar to git (here's a tutorial) :

git clone https://github.com/yaml/AdaYaml.git
git checkout 5616697b12696fd3dcb1fc01a453a592a125d6dd

Then the source code of the version I tested should be in the AdaYaml folder.

If you don't want anything to do with git, there's a feature on github to download a Zip archive of a version of a repository.

ksoBKV25LRyOwCwsEcIQQxyJpiHcOWUItHBVw_9G

AdaYaml will ask for a bit more work to fuzz: we need to create a simple example program, then add some compilation options to the GPR files (-gnatVa, -gnato) and to add a pragma configuration file to set pragma Initialize_Scalars. This last option, combined with -gnatVa helps surface accesses to uninitialized variables (if you don't know the option : https://gcc.gnu.org/onlinedocs/gcc-4.6.3/gnat_rm/Pragma-Initialize_005fScalars.html and http://www.adacore.com/uploads/technical-papers/rtchecks.pdf). All those options to make sure we catch the most problems possible with runtime checks.

The example program looks like:

with Utils;
with Ada.Text_IO;
with Ada.Command_Line;
with GNAT.Exception_Actions;
with Ada.Exceptions;
with Yaml.Dom;
with Yaml.Dom.Vectors;
with Yaml.Dom.Loading;
with Yaml.Dom.Dumping;
with Yaml.Events.Queue;
procedure Yaml_Test
is
  S : constant String := Utils.File_Content (Ada.Command_Line.Argument (1));
begin
  Ada.Text_IO.Put_Line (S);
  declare
     V : constant Yaml.Dom.Vectors.Vector := Yaml.Dom.Loading.From_String (S);
     E : constant Yaml.Events.Queue.Reference :=
       Yaml.Dom.Dumping.To_Event_Queue (V);
     pragma Unreferenced (E);
  begin
     null;
  end;
exception
  when Occurence : others =>
     Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence));
     GNAT.Exception_Actions.Core_Dump (Occurence);
end Yaml_Test;

The program just reads a file and parses it, transforms it into a vector of DOM objects, then transforms those back to a list of events (see API docs).

The YAML reference spec may help explain a bit what's going on here.

Using the following diagram, and for those well-versed in YAML:

z-svjCqfMHHg6qx8b9nUGAzXbR12sPQaAdPUk9MG

  • the V variable (of our test program) is a "Representation" generated via the Parse -> Compose path
  • the E variable is an "Event Tree" generated from V via "Serialize" (so, going back down to a lower-level representation from the DOM tree).

For this specific fuzzing test, the idea is not to stop at the first stage of parsing but also to go a bit through the data that was decoded, and do something with it (here we stop short of a round-trip to text, we just go back to an Event Tree).

Sometimes a parser faced with incoherent input will keep on going (fail silently) and won't fill (initialize) some fields.

The GPR files to patch are yaml.gpr and the parser_tools.gpr subproject.

The first fuzzing session triggers “expected” exceptions from the parser:

  • YAML.PARSER_ERROR

  • YAML.COMPOSER_ERROR

  • LEXER.LEXER_ERROR

  • YAML.STREAM_ERROR (as it turns out, this one is also unexpected... more on this one later)

Which should happen with malformed input.

So to get unexpected crashes and only those, let’s filter them in the top-level exception handler.

exception
  when Occurence : others =>
     declare
        N : constant String := Ada.Exceptions.Exception_Name (Occurence);
     begin
        Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence));
        if N = "YAML.PARSER_ERROR"
          or else N = "LEXER.LEXER_ERROR"
          or else N = "YAML.STREAM_ERROR"
          or else N = "YAML.COMPOSER_ERROR"
        then
          null;
        else
          GNAT.Exception_Actions.Core_Dump (Occurence);
        end if;
end Yaml_Test;

Then, I recompiled, used some YAML example files as a startup corpus, and started fuzzing.

dq2n8KEUH1vwQcpP5kAltzVkFcamXdz7j7DIbXgk

After 4 minutes 30 seconds, the first crashes appeared.

I let it run for hours, then a day and found a list of issues. I sent all of those and the reproducers to Felix Krause (maintainer of the AdaYaml project).

He was quick to answer and analyse all the exceptions. Here are his comments:

  • ADA.STRINGS.UTF_ENCODING.ENCODING_ERROR : bad input at Item (1)

I guess this happens when you use a unicode escape sequence that codifies a code point beyond the unicode range (0 .. 0x10ffff). Definitely an error and should raise a Lexer_Error instead.

… and he created issue https://github.com/yaml/AdaYaml/issues/4

  • CONSTRAINT_ERROR : text.adb:203 invalid data

This hints to a serious error in my custom string allocator that can lead to memory corruption. I have to investigate to be able to tell what goes wrong here.

… and then he found the problem: https://github.com/yaml/AdaYaml/issues/5

  • CONSTRAINT_ERROR : Yaml.Dom.Mapping_Data.Node_Maps.Insert: attempt to insert key already in map

This happens when you try to parse a YAML mapping that has two identical keys (this is conformant to the standard which disallows that). However, the error should be catched and a Compose_Error should be raised instead.

… and he opened https://github.com/yaml/AdaYaml/issues/3

  • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:283 overflow check failed

  • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:286 overflow check failed

  • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:289 overflow check failed

This is, thankfully, an obvious error: Hex escape sequence in the input may have up to eight nibbles, so they represent a value range of 0 .. 2**32 - 1. I use, however, a Natural to store that value, which is a subtype of Integer, which is of platform-dependent range – in this case, it is probably 32-bit, but since it is signed, its range goes only up to 2**31 - 1. This would suffice in theory, since the largest unicode code point is 0x10ffff, but AdaYaml needs to catch cases that exceed this range.

… and attached to https://github.com/yaml/AdaYaml/issues/4

  • STORAGE_ERROR : stack overflow or erroneous memory access

 … and he created issue https://github.com/yaml/AdaYaml/issues/6 and changed the parsing mode of nested structures to avoid stack overflows (no more recursion).

There were also some “hangs”: AFL monitors the execution time of every test case, and flags large timeouts as hangs, to be inspected separately from crashes. Felix took the examples with a long execution time, and found an issue with the hashing of nodes.

With all those error cases, Felix created an issue that references all the individual issues, and corrected them.

After all the corrections, Felix gave me a analysis of the usefulness of the test:

Your findings mirror the test coverage of the AdaYaml modules pretty well:
There was no bug in the parser, as this is the most well-tested module. One bug each was found in the lexer and the text memory management, as these modules do have high test coverage, but only because they are needed for the parser tests. And then three errors in the DOM code as this module is almost completely untested.

After reading a first draft of this blog post, Felix noted that YAML.STREAM_ERROR was in fact an unexpected error in my test program.

Also, you should not exclude Yaml.Stream_Error. This error means that a malformed event stream has been encountered. Parsing a YAML input stream or serializing a DOM structure should *always* create a valid event stream unless it raises an exception – hence getting Yaml.Stream_Error would actually show that there's an internal error in one of those components. [...] Yaml.Stream_Error would only be an error with external cause if you generate an event stream manually in your code. 

I filtered this exception because I'd encountered it in the test suite available in the AdaYaml github repository (it is in fact a copy of the reference yaml test-suite). I wanted to use the complete test suite as a starting corpus, but examples 8G76 and 98YD crashed and it prevented me from starting the fuzzing session, so instead of removing the crashing test cases, I filtered out the exception...

The fact that 2 test cases from the YAML test suite make my simple program crash is interesting, but can we find more cases ?

I removed those 2 files from the initial corpus, and I focused the small test program on finding cases that crash on a YAML.STREAM_ERROR:

exception
  when Occurence : others =>
     declare
        N : constant String := Ada.Exceptions.Exception_Name (Occurence);
     begin
        Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence));
        if N = "YAML.STREAM_ERROR" then
          GNAT.Exception_Actions.Core_Dump (Occurence);
        end if;
end Yaml_Test;

In less than 5 minutes, AFL finds 5 categories of crashes:

  • raised YAML.STREAM_ERROR : Unexpected event (expected document end): ALIAS
  • raised YAML.STREAM_ERROR : Unexpected event (expected document end): MAPPING_START
  • raised YAML.STREAM_ERROR : Unexpected event (expected document end): SCALAR
  • raised YAML.STREAM_ERROR : Unexpected event (expected document end): SEQUENCE_START
  • raised YAML.STREAM_ERROR : Unexpected event (expected document start): STREAM_END

Felix was quick to answer:

Well, seems like you've found a bug in the parser. This looks like the parser may generate some node after the first root node of a document, although a document always has exactly one root node. This should never happen; if the YAML contains multiple root nodes, this should be a Parser_Error.

I opened a new issue about this, to be checked later.

Fuzzing GNATCOLL.JSON

JSON parsers are a common fuzzing target, not that different from YAML. This could be interesting.

Following a similar pattern as other fuzzing sessions, let’s first build a simple unit test that reads and parses an input file given at the command-line (first argument), using GNATCOLL.JSON (https://github.com/AdaCore/gnatcoll-core/blob/master/src/gnatcoll-json.ads). This time I massaged one of the unit tests into a simple “read a JSON file all in memory, decode it and print it” test program, that we’ll use for fuzzing.

Note: for the exercise here I used GNATCOLL GPL 2016, because that's what I was using for a personal project. You should probably use the latest version when you do this kind of testing, at least before you report your findings.

The test program is very simple:

procedure JSON_Fuzzing_Test is
   Filename  : constant String  := Ada.Command_Line.Argument (1);
   JSON_Data : Unbounded_String := File_IO.Read_File (Filename);
begin
   declare
      Value : GNATCOLL.JSON.JSON_Value :=
        GNATCOLL.JSON.Read (Strm     => JSON_Data,
                            Filename => Filename);
   begin
      declare
         New_JSON_Data : constant Unbounded_String :=
           GNATCOLL.JSON.Write (Item => Value, Compact => False);
      begin
         File_IO.Write_File (File_Name     => "out.json",
                             File_Contents => New_JSON_Data);
      end;
   end;
end JSON_Fuzzing_Test;

The GPR file is simple with a twist : to make sure we compile this program with gnatcoll, and that when we’ll use afl-gcc we’ll compile the library code with our substitution compiler, we’ll “with” the actual “gnatcoll_full.gpr” (actual gnatcoll source code !) and not the one for the compiled library.

Then we build the project in "debug" mode, to get all the runtime checks available:

gprbuild -p -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug 

Then I tried to find a test corpus. One example is https://github.com/nst/JSONTestSuite cited in “Parsing JSON is a minefield”. There’s a test_parsing folder there that contains 318 test cases.

Trying to run them first on the new simple test program shows already several "crash" cases:

  • nice GNATCOLL.JSON.INVALID_JSON_STREAM exceptions

    • Numerical value too large to fit into an IEEE 754 float

    • Numerical value too large to fit into a Long_Long_Integer

    • Unexpected token

    • Expected ',' in the array value

    • Unfinished array, expecting ending ']'

    • Expecting a digit after the initial '-' when decoding a number

    • Invalid token

    • Expecting digits after 'e' when decoding a number

    • Expecting digits after a '.' when decoding a number

    • Expected a value after the name in a JSON object at index N

    • Invalid string: cannot find ending "

    • Nothing to read from stream

    • Unterminated object value

    • Unexpected escape sequence

… which is fine, since you’ll expect this specific exception when parsing user-provided JSON.

Then I got to:

  • raised ADA.STRINGS.INDEX_ERROR : a-strunb.adb:1482

    • n_string_1_surrogate_then_escape_u1.json

    • n_string_1_surrogate_then_escape_u.json

    • n_string_invalid-utf-8-in-escape.json

    • n_structure_unclosed_array_partial_null.json

    • n_structure_unclosed_array_unfinished_false.json

    • n_structure_unclosed_array_unfinished_true.json

For which I opened https://github.com/AdaCore/gnatcoll-core/issues/5

  • raised CONSTRAINT_ERROR : bad input for 'Value: "16#??????"]#"

    • n_string_incomplete_surrogate.json

    • n_string_incomplete_escaped_character.json

    • n_string_1_surrogate_then_escape_u1x.json

 For which I opened https://github.com/AdaCore/gnatcoll-core/issues/6

  • … and STORAGE_ERROR : stack overflow or erroneous memory access
    • n_structure_100000_opening_arrays.json

This last one can be worked around using with ulimit -s unlimited (so, removing the limit of stack size). Still, beware of your stack when parsing user-provided JSON. For AdaYaml similar problems appeared, and were robustified, and I’m not sure whether this potential “denial of service by stack overflow” should be classified as a bug, it’s at least something to know when using GNATCOLL.JSON on user-provided JSON data (I’m guessing most API endpoints these days).

Those exceptions are the ones you don’t expect, and maybe didn’t put a catch-all there. A clean GNATCOLL.JSON.INVALID_JSON_STREAM exception might be better.

Note: on all those test cases, I didn’t check whether the results of the tests were OK. I just checked for crashes. It might be very interesting to check the corrected of GNATCOLL.JSON against this test suite.

Now let’s try through fuzzing to find more cases where you don’t get a clean GNATCOLL.JSON.INVALID_JSON_STREAM.

The first step is adding a final “catch-all” exception handler to abort only on unwanted exceptions (not all of them):

exception
  -- we don’t want to abort on a “controlled” exception
  when GNATCOLL.JSON.INVALID_JSON_STREAM =>
     null;
  when Occurence : others =>
     Ada.Text_IO.Put_Line
       ("exception occured for " & Filename
        & " [" &  Ada.Exceptions.Exception_Name (Occurence)
        & "] [" & Ada.Exceptions.Exception_Message (Occurence)
        & "] [" & Ada.Exceptions.Exception_Information (Occurence) & "]");
     GNAT.Exception_Actions.Core_Dump (Occurence);
end JSON_Fuzzing_Test;

And then clean the generated executable:

gprclean -r -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug

Then rebuild it using afl-gcc:

gprbuild --compiler-subst=Ada,/home/lionel/afl/afl-2.51b/afl-gcc -p -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug

Then we generate an input corpus for AFL, by keeping only the files that didn’t generate a call to abort() with the new JSON_Fuzzing_Test test program.

On first launch (AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON /home/lionel/aws/afl/afl-2.51b/afl-fuzz -m 1024 -i input -o output ../json_fuzzing_test @@), afl-fuzz complains:

[*] Attempting dry run with 'id:000001,orig:i_number_huge_exp.json'...
[-] The program took more than 1000 ms to process one of the initial test cases.
    This is bad news; raising the limit with the -t option is possible, but
    will probably make the fuzzing process extremely slow.
    If this test case is just a fluke, the other option is to just avoid it
    altogether, and find one that is less of a CPU hog.
[-] PROGRAM ABORT : Test case 'id:000001,orig:i_number_huge_exp.json' results in a timeout
        Location : perform_dry_run(), afl-fuzz.c:2776

… and it’s true, the i_number_huge_exp.json file takes a long time to be parsed:

[lionel@lionel fuzzing-session]$ time ../json_fuzzing_test input/i_number_huge_exp.json
input/i_number_huge_exp.json:1:2: Numerical value too large to fit into an IEEE 754 float
real    0m7.273s
user    0m3.717s
sys     0m0.008s

My machine isn’t fast, but still, this is a denial of service waiting to happen. I opened a ticket just in case.

Anyway let’s remove those input files that gave a timeout before we even started the fuzzing (the other ones are n_structure_100000_opening_arrays.json and n_structure_open_array_object.json).

During this first afl-fuzz run, in the start phase, a warning appears a lot of times:

[!] WARNING: No new instrumentation output, test case may be useless.

AFL looks through the whole input corpus, and checks whether input files have added any new basic block coverage to the already tested examples (also from the input corpus).

The initial phase ends with:

[!] WARNING: Some test cases look useless. Consider using a smaller set.
[!] WARNING: You probably have far too many input files! Consider trimming down.

To be the most efficient, afl-fuzz needs the slimmest input corpus with the highest basic block coverage, the most representative of all the OK code paths, and the least redundant possible. You can look through the afl-cmin and afl-tmin tools to minimize your input corpus.

For this session, let’s keep the test corpus as it is (large and redundant), and start the fuzzing session.

In the first seconds of fuzzing, we already get the following state:

Fuzzing GNATCOLL.JSON

Already 3 crashes, and 2 “hangs”. Looking through those, it seems afl-fuzz already found by itself examples of “ADA.STRINGS.INDEX_ERROR : a-strunb.adb:1482” and “CONSTRAINT_ERROR : bad input for 'Value: "16#?????”, although I removed from the corpus all files that showed those problems.

Same thing with the “hang”, afl-fuzz found an example of large float number, although I removed all “*_huge_*” float examples.

Let’s try and focus on finding something else than the ones we know.

I added the following code in the top-level exception handler:

   when Occurence : others =>
     declare
           Text : constant String := Ada.Exceptions.Exception_Information (Occurence);
     begin
        if    Ada.Strings.Fixed.Index (Source => Text, Pattern => "bad input for 'Value:") /= 0 then return;
        elsif Ada.Strings.Fixed.Index (Source => Text, Pattern => "a-strunb.adb:1482") /= 0 then return;
        end if;
     end;

It’s very hacky but it’ll remove some parasites (i.e. the crashes we know) from the crash bin.

Let’s restart the fuzzing session (remove the output/ directory, recreate it, and call afl-fuzz again).

Now after 10 minutes, no crash had occured, so I let the fuzzer run for 2 days straight, and it didn’t find any crash or hang other than the ones already triggered by the test suite.

It did however find some additional stack overflows (with examples that open a lot of arrays) even though I had put 1024m as a memory limit for afl-fuzz… Maybe something to look up...

What next?

Let's start fuzzing your favorite project, and report your results.

If you want to dive deeper in the subject of fuzzing with AFL, here's a short reading list for you:

  • Even simpler, if you have an extensive test case list, you can use afl-cmin (a corpus minimizer) to directly fuzz your parser or application efficiently, see the great success of AFL on sqlite.
  • The fuzzing world took on the work of lcamtuf and you can often hear about fuzzing-specific passes in clang/llvm to help fuzzing where it's bad at (checksums, magic strings, whole-string comparisons...). 
  • There's a lot of tooling around afl-fuzz: aflgo directs the focus of a fuzzing session to specific code parts, and Pythia helps evaluate the efficiency of your fuzzing session. See also afl-cov for live coverage analysis.

If you find bugs, or even just perform a fuzzing pass on your favorite open-source software, don't hesitate to get in touch with the maintainers of the project. From my experience, most of the time, maintainers will be happy to get free testing. Even if it's just to say that AFL didn't find anything in 3 days... It's already a badge of honor :-).

Thanks:

Many thanks to Yannick Moy that sparked the idea of this blog post after I talked his ear off for a year (was it two?) about AFL and fuzzing in Ada, and helped me proof-read it. Thanks to Gautier and Felix who were very reactive and nice about the reports, and who took some time to read drafts of this post. All your suggestions were very helpful.

]]>
Cross-referencing Ada with Libadalang http://blog.adacore.com/cross-referencing-ada-with-libadalang Mon, 18 Dec 2017 13:59:49 +0000 Pierre-Marie de Rodat http://blog.adacore.com/cross-referencing-ada-with-libadalang

Libadalang has come a long way since the last time we blogged about it. In the past 6 months, we have been working tirelessly on name resolution, a pretty complicated topic in Ada, and it is finally ready enough that we feel ready to blog about it, and encourage people to try it out.

WARNING: While pretty far along, the work is still not finished. It is expected that some statements and declarations are not yet resolved. You might also run into the occasional crash. Feel free to report that on our github!

In our last blog post, we learned how to use Libadalang’s lexical and syntactic analyzers in order to highlight Ada source code. You may know websites that display source code with cross-referencing information: this makes it possible to navigate from references to declarations. For instance elixir, Free Electrons’ Linux source code explorer: go to a random source file and click on an identifier. This kind of tool makes it very easy to explore an unknown code base.

So, we extended our code highlighter to generate cross-references links, as a showcase of Libadalang’s semantic analysis abilities. If you are lazy, or just want to play with the code, you can find a compilable set of source files for it at Libadalang’s repository on GitHub (look for ada2web.adb). If you are interested in how to use name resolution in your own programs, we will use this blog post to show how to use Libadalang’s name resolution to expand our previous code highlighter.

Note that if you haven’t read the previous blog post, we recommend you to read it as below, we assume familiarity with topics from it.

Where are my source files?

Unlike lexical and syntactic analysis, which process source files separately, semantic analysis works on a set of source files, or more precisely on a source files plus all its dependencies. This is logical: in order to understand an object declaration in foo.ads, one needs to know about the corresponding type, and if the type is declared in another source file (say bar.ads), both files are required for analysis.

By default, Libadalang assumes that all source files are in the current directory. That’s enough for toy source files, but not at all for real world projects, which are generally spread over multiple directories in a complex nesting scheme. Libadalang can’t know about the files layout of all Ada projects in the world, so we created an abstraction that enables anyone to tell it how to reach source files: the Libadalang.Analysis.Unit_Provider_Interface interface type. This type has exactly one abstract primitive: Get_Unit which, given a unit name and a unit kind (specification or body?) calls Analysis_Context’s Get_From_File or Get_From_Buffer to create the corresponding analysis unit.

In the context of a source code editor (for instance), this allows Libadalang to query a source file even if this file exists only in memory, not in a real source file, or if it’s more up-to-date in memory. Using a custom unit provider in Libadalang is easy: dynamically allocate a concrete implementation of this interface, then pass it to the Unit_Provider formal in Analysis_Context’s constructor: the Create function. Libadalang will take care of deallocating this object when the context is destroyed.

declare
   UP  : My_Unit_Provider_Access :=
      new My_Unit_Provider_Type …;
   Ctx : Analysis_Context := Create (Unit_Provider => UP);
   --  UP will be queried when performing name resolution
begin
   --  Do useful things, and then when done…
   Destroy (Ctx);
end;

Nowadays, a lot of Ada projects use GPRbuild and thus have a project file. That’s fortunate: project files give us exactly the information Libadalang needs: where are source files, what’s their naming scheme. Because of this, Libadalang provides a tagged type that implements this interface to deal with project files: Project_Unit_Provider_Type, from the Libadalang.Unit_Files.Projects package. In order to do this, one first need to load the project file using GNATCOLL.Projects:

declare
   Project_File : GNATCOLL.VFS.Virtual_File;
   Project      : GNATCOLL.Projects.Project_Tree_Access;
   Env          : GNATCOLL.Projects.Project_Environment_Access;
   UP           : Libadalang.Analysis.Unit_Provider_Access;
   Ctx          : Libadalang.Analysis.Analysis_Context; 
begin
   --  First load the project file
   Project := new Project_Tree;
   Initialize (Env);

   --  Initialize Project_File, set the target, create
   --  scenario variables, …
   Project.Load (Project_File, Env);

   --  Now create the unit provider and the analysis context.
   --  Is_Project_Owner is set to True so that the project
   --  is deallocated when UP is destroyed.
   UP := new Project_Unit_Provider_Type’
     (Create (Project, Env, True));
   Ctx := Create (Unit_Provider => UP);

   --  Do useful things, and then when done…
   Destroy (Ctx);
end;

Now that Libadalang knows where the source files are, we can ask it to resolve names!

Let’s jump to definitions

Just like in the highligther, most of the website generator will consist of asking Libadalang to parse source files (Get_From_File), checking for lexing/parsing errors (Has_Diagnostics, Diagnostics) and then dealing with AST nodes and tokens in analysis units. The new bits here are turning identifiers into hypertext links to redirect to their definition. As for highlighting classes, we do this token annotation with an array and a tree traversal:

Unit : Analysis_Unit := …;
--  Analysis unit to process

Xrefs : array (1 .. Token_Count (Unit)) of Basic_Decl :=
   (others => No_Basic_Decl);
--  For each token, the declaration to which the token should
--  link or No_Basic_Decl for no cross-reference.

function Process_Node
  (Node : Ada_Node’Class) return Visit_Status;
--  Callback for AST traversal. For string literals and
--  identifiers, annotate the corresponding in Xrefs to the
--  designated declaration, if found.

With these declarations, we can do the annotations easily:

Root (Unit).Traverse (Process_Node’Access);

But how Process_Node does its magic? That’s easy too:

function Process_Node
  (Node : Ada_Node’Class) return Visit_Status is
begin
   --  Annotate only tokens for string literals and
   --  identifiers.
   if Node.Kind not in Ada_String_Literal | Ada_Identifier
   then
      return Into;
   end if;

   declare
      Token : constant Token_Type :=
         Node.As_Single_Tok_Node.F_Tok;
      Idx   : constant Natural := Natural (Index (Token));
      Decl  : Basic_Decl renames Xrefs (Idx);
   begin
      Decl := Node.P_Referenced_Decl;
   exception
       when Property_Error => null;
   end;
end Process_Node;

String literal and identifier nodes both inherit from the Single_Tok_Node abstract node, hence the conversion to retrieve the underlying token. Then we locate which cell in the Xrefs array they correspond to. And finally we fill it with the result of the P_Referenced_Decl primitive. This function tries to fetch the declaration corresponding to Node. Easy I said!

What’s the exception handler for, you might ask, though? What we call AST node properties (all functions whose name starts with P_) can raise Property_Error exceptions. These can happen if Libadalang works on invalid Ada sources and cannot find query results. As name resolution is still actively developed, it can happen that this exception is raised even for valid source code: if that happens to you, please report this bug! Note that if a property raises an exception that is not a Property_Error, this is another kind of bug: please report it too!

Bind it all together

Now we have a list of Basic_Decl nodes to create hypertext links, but how can we do that? The trick is to get the name of the source file that contains this declaration, plus its source location:

Decl_Unit : constant Analysis_Unit := Decl.Get_Unit;
Decl_File : constant String := Get_Filename (Decl_Unit);
Decl_Line : constant Langkit_Support.Slocs.Line_Number :=
   Decl.Sloc_Range.Start_Line;

Then you can turn this information into a hypertext link. For example, if you generate X.html for the X source file (foo.ads.html for foo.ads, …) and generate LY HTML anchors for line number Y:

Line_No : constant String :=
   Natural'Image (Natural (Decl_Line));
Href : constant String :=
   Decl_File & ".html#L"
   & Line_No (Line_No'First + 1 .. Line_No'Last);

Some amount of plumbing is still needed to have a complete website generator:

  • get a list of all source files to process in the loaded project, using GNATCOLL.Projects’ API;

  • actually output HTML code: code from the previous blog can be reused and updated to do this;
  • generate an index HTML file as an entry point for navigation.

But as usual, covering all these topics will get out of the scope of this blog post and will make an unreasonable long essay. So thank you once more for reading this post to the end!

]]>
Make with Ada 2017- Ada Based IoT Framework http://blog.adacore.com/make-with-ada-2017-ada-based-iot-framework Wed, 13 Dec 2017 15:41:00 +0000 Manuel Iglesias Abbatermarco http://blog.adacore.com/make-with-ada-2017-ada-based-iot-framework

Summary

The Ada IoT Stack consists of an lwIp (“lightweight IP”) stack implementation written in Ada, with an associated high-level protocol to support embedded device connectivity nodes for today’s IoT world. The project was developed for the Make With Ada 2017 competition based on existing libraries and ported to embedded STM32 devices.

Motivation

Being a huge fan of IoT designs, I originally planned to work on a real device such an IoT node or gateway, while using the Ada language. I really enjoy writing programs but my roots are in hardware (I earned a B.S. in electronic engineering). Back then I was not really a programmer, but if you can master the intricacies of hardware I think that experience will eventually help make you a good programmer; I’m not sure it works the same way in the other direction.

With so many programming languages out there, it's difficult to gain experience with all of them. In my case I was not even aware of the Ada language until I found out about the contest. That got me to learn Ada: there is a saying that without deadlines nobody will finish on time, and the contest supplied both the motivation and a deadline. For me the best way to learn something is not to read a book chapter by chapter. Sometimes you need to read a “getting starting” guide to gain a basic knowledge, but after that if you don't have a problem to solve, your motivation might come to an end. I made my choice to continue, with the IoT project.

Network Stack

When I started the IoT project I soon realized that the Ada Drivers Library didn't provide a TCP/IP stack. The Etherscope software by Mr. Carrez from the 2016 Make with Ada contest provided the Ethernet connectivity and UDP protocol but not TCP. I started to look at how to implement the TCP/IP stack, but due to my lack of experience in Ada and the fact that contest was running for almost two months it would not have worked to do something from scratch. So I wrote to Mr. Chouteau (at AdaCore) asking for information about any Ada libraries that implement a TCP stack, and he referred me to a lwIp implementation in Ada and Spark 2014. It was on a Spark 2014 git repository that had not shown up on a Google search. As far as I could tell, the code was only tested on a Linux-flavor OS using a TAP interface. I spent a couple of weeks studying the code and getting it to work on my debian box, which was a little hacky at the end since the TAP C driver implementation (system calls) didn’t work as expected; I ended up coding the TAP interface by hand.

A TAP device operates at layer 2 of the OSI model, which means Ethernet frames. That makes sense here since lwIp implements networking from Ethernet datagrams. Then I realized that I could combine the Etherscope project with the lwIp implementation. After removing several parts of the Etherscope project to get only Ethernet Frames I was ready to feed the lwIp stack. It was not so easy, I spent some time porting the lwIp Ada code to the Ada version for embedded ARM. One obstacle I found was the use of xml files to describe the network datagrams and other structures in a way that's used by a program called xmlada to generate some Ada body files. These describe things like bit and byte positions of TCP flags or fields within the datagram. The problem was that the ARM version don't provide xmlada, so I end up copying the generated files in my project.

After quite some time I got the lwIp stack to work on my STM32F769I board. This was no easy task, especially because the STLink debugger is not so easy to work with. (For example semihosting is basically the only way to have debugger output in the form of "printf". This is really slow and basically interrupts the flow of program execution in a nasty way. The problem here is that the ST board doesn't provide a JTAG interface to the Cortex M4/M7 device, and the STlink on board doesn't have an SWO line connection.)

IoT Stack

The TCP/IP stack was just the beginning; it was really nice to see it working, but quickly gets boring. The original lwIp implements a TCP echo server: you open a socket, connect and then anything you send is replied by the server, which is not very useful for IoT. So I felt I was not making real progress, at least toward something that would give the judges a tangible project to evaluate. Again I was in a rush, this time with more knowledge of Ada but, as before, without wanting to write something from scratch.

One day I found the Simple Components Ada code by Mr. Kazakov. I really got to love it, but the problem was that after reading it a little bit I felt similar to the day I assisted my first German language "unterricht" years ago. I decided to continue spending time reading it until I finally figured out how to start porting it to my lwIp implementation. The first thing I ported was the MQTT client code, because of its simplicity in terms of dependencies on other “Simple Components” classes if we can refer to them like that. One problem solved here was the change in paradigm. Simple Components used Ada GNAT Sockets, and my lwip basically uses a callback scheme:, two different worlds, but previous experience with sockets helps since you know what the code is doing and what it should do in the new environment. At the end, the MQTT port consumed more time than expected since I not only ported the Simple Components but I also added code from the existing MQTT lwIp implementation in C to make up for the lack of timers.

It was really difficult for me to figure out how the connection state can be recovered when the callback executes. For example when a connection is made, certain variables are initialized and kept in a data structure, but after the callback returns, the structure needs to be preserved so that it can be recovered by a connection event and used in the corresponding callback.

The MQTT client gave me the insights and the experience to continue working and to try the more complicated HTTP protocol, this time improving the callback association. The port of the HTTP server was where problems with the lwIp implementation start to arise. I am almost sure that the lwIp code was only tested in a specific TCP/IP echo-controlled environment; the problem is that a TCP connection can behave differently, or more precisely has different scenarios (e.g., when closing a connection), so I ended up "patching the code" to behave as closely as possible to the standard. I also fixed some memory allocation problems with the lwIp Pcb's of the original stack. Nonetheless if you decide to try the code please be aware that the code should be treated as a "development " version.

Ultimately there was not enough time to finish my IoT Node as I had initially intended. The good part is that I really enjoy solving this kind of problem.

Work in progress

I really was impressed with Ada, it has the power to do things that in other languages like C/C++ would be much too prone error-prone. The lack of a good debugger was offset by the increased productivity in writing code that had fewer bugs in the first place. I hope to have some time to continue working on this project in the near future.

To see Manuel's full project log click here.

]]>
Welcoming New Members to the GNAT Pro Family http://blog.adacore.com/adacore-launches-new-gnat-pro-product-lines Wed, 29 Nov 2017 09:10:29 +0000 Jamie Ayre http://blog.adacore.com/adacore-launches-new-gnat-pro-product-lines

As we see the importance of software grow in applications, the quality of that software has become more and more important. Even outside the mission- and safety-critical arena customers are no longer accepting software failures (the famous blue screens of death, and there are many...). Ada has a very strong answer here and we are seeing more and more interest in using the language from a range of industries. It is for this reason that we have completed our product line by including an entry-level offer for C/C++ developers wanting to switch to Ada and reinforced our existing offer with GNAT Pro Assurance for programmers building the most robust software platforms with life cycles spanning decades.

This recent press release explains the positioning of the GNAT Pro family. Details of the full product range can be found here.

We’d love to chat more with you about these new products and how Ada can keep you ahead of your competitors. Drop us an email to info@adacore.com.



]]>
There's a mini-RTOS in my language http://blog.adacore.com/theres-a-mini-rtos-in-my-language Thu, 23 Nov 2017 10:06:15 +0000 Fabien Chouteau http://blog.adacore.com/theres-a-mini-rtos-in-my-language

The first thing that struck me when I started to learn about the Ada programing language was the tasking support. In Ada, creating tasks, synchronizing them, sharing access to resources, are part of the language

In this blog post I will focus on the embedded side of things. First because it's what I like, and also because it's much more simple :)

For real-time and embedded applications, Ada defines a profile called `Ravenscar`. It's a subset of the language designed to help schedulability analysis, it is also more compatible with platforms such as micro-controllers that have limited resources.

So this will not be a complete lecture on Ada tasking. I might do a follow-up with some more tasking features, if you ask for it in the comments ;)

Tasks

So the first thing is to create tasks, right?

There are two ways to create tasks in Ada, first you can declare and implement a single task:

   --  Task declaration
   task My_Task;
   --  Task implementation
   task body My_Task is
   begin
      --  Do something cool here...
   end My_Task;

If you have multiple tasks doing the same job or if you are writing a library, you can define a task type:

   --  Task type declaration
   task type My_Task_Type;
   --  Task type implementation
   task body My_Task_Type is
   begin
      --  Do something really cool here...
   end My_Task_Type;

And then create as many tasks of this type as you want:

   T1 : My_Task_Type;
   T2 : My_Task_Type;

One limitation of Ravenscar compared to full Ada, is that the number of tasks has to be known at compile time.

Time

The timing features of Ravenscar are provided by the package (you guessed it) Ada.Real_Time.

In this package you will find:

  •  a definition of the Time type which represents the time elapsed since the start of the system
  •  a definition of the Time_Span type which represents a period between two Time values
  •  a function Clock that returns the current time (monotonic count since the start of the system)
  •  Various sub-programs to manipulate Time and Time_Span values

The Ada language also provides an instruction to suspend a task until a given point in time: delay until.

Here's an example of how to create a cyclic task using the timing features of Ada.

  task body My_Task is
      Period       : constant Time_Span := Milliseconds (100);
      Next_Release : Time;
   begin
      --  Set Initial release time
      Next_Release := Clock + Period;

      loop
         --  Suspend My_Task until the Clock is greater than Next_Release
         delay until Next_Release;

         --  Compute the next release time
         Next_Release := Next_Release + Period;
         
         --  Do something really cool at 10Hz...
      end loop;

   end My_Task;

Scheduling

Ravenscar has priority-based preemptive scheduling. A priority is assigned to each task and the scheduler will make sure that the highest priority task - among the ready tasks - is executing.

A task can be preempted if another task of higher priority is released, either by an external event (interrupt) or at the expiration of its delay until statement (as seen above).

If two tasks have the same priority, they will be executed in the order they were released (FIFO within priorities).

Task priorities are static, however we will see below that a task can have its priority temporary escalated.

The task priority is an integer value between 1 and 256, higher value means higher priority. It is specified with the Priority aspect:

   Task My_Low_Priority_Task
     with Priority => 1;

   Task My_High_Priority_Task
     with Priority => 2;

Mutual exclusion and shared resources

In Ada, mutual exclusion is provided by the protected objects.

At run-time, the protected objects provide the following properties:

  • There can be only one task executing a protected operation at a given time (mutual exclusion)
  • There can be no deadlock

In the Ravenscar profile, this is achieved with Priority Ceiling Protocol.

A priority is assigned to each protected object, any tasks calling a protected sub-program must have a priority below or equal to the priority of the protected object.

When a task calls a protected sub-program, its priority will be temporarily raised to the priority of the protected object. As a result, this task cannot be preempted by any of the other tasks that potentially use this protected object, and therefore the mutual exclusion is ensured.

The Priority Ceiling Protocol also provides a solution to the classic scheduling problem of priority inversion.

Here is an example of protected object:

   --  Specification
   protected My_Protected_Object
     with Priority => 3
   is

      procedure Set_Data (Data : Integer);
      --  Protected procedues can read and/or modifiy the protected data
      
      function Data return Integer;
      --  Protected functions can only read the protected data

   private
   
      --  Protected data are declared in the private part
      PO_Data : Integer := 0;
   end;
   --  Implementation
   protected body My_Protected_Object is

      procedure Set_Data (Data : Interger) is
      begin
         PO_Data := Data;
      end Set_Data;

      function Data return Integer is
      begin
         return PO_Data;
      end Data;
   end My_Protected_Object;

Synchronization

Another cool feature of protected objects is the synchronization between tasks.

It is done with a different kind of operation called an entry.

An entry has the same properties as a protected procedure except it will only be executed if a given condition is true. A task calling an entry will be suspended until the condition is true.

This feature can be used to synchronize tasks. Here's an example:

   protected My_Protected_Object is
      procedure Send_Signal;
      entry Wait_For_Signal;
   private
      We_Have_A_Signal : Boolean := False;
   end My_Protected_Object;
   protected body My_Protected_Object is

      procedure Send_Signal is
      begin
          We_Have_A_Signal := True;
      end Send_Signal;    
      
      entry Wait_For_Signal when We_Have_A_Signal is
      begin
          We_Have_A_Signal := False;
      end Wait_For_Signal;
   end My_Protected_Object;

Interrupt Handling

Protected objects are also used for interrupt handling. Private procedures of a protected object can be attached to an interrupt using the Attach_Handler aspect.

   protected My_Protected_Object
     with Interrupt_Priority => 255
   is
   
   private
   
      procedure UART_Interrupt_Handler
        with Attach_Handler => UART_Interrupt;
   
   end My_Protected_Object;

Combined with an entry it provides and elegant way to handle incoming data on a serial port for instance:

   protected My_Protected_Object
     with Interrupt_Priority => 255
   is
      entry Get_Next_Character (C : out Character);
      
   private
      procedure UART_Interrupt_Handler
              with Attach_Handler => UART_Interrupt;
      
      Received_Char  : Character := ASCII.NUL;
      We_Have_A_Char : Boolean := False;
   end
   protected body My_Protected_Object is

      entry Get_Next_Character (C : out Character) when We_Have_A_Char is
      begin
          C := Received_Char;
          We_Have_A_Char := False;
      end Get_Next_Character;
      
      procedure UART_Interrupt_Handler is
      begin
          Received_Char  := A_Character_From_UART_Device;
          We_Have_A_Char := True;
      end UART_Interrupt_Handler;      
   end

A task calling the entry Get_Next_Character will be suspended until an interrupt is triggered and the handler reads a character from the UART device. In the meantime, other tasks will be able to execute on the CPU.

Multi-core support

Ada supports static and dynamic allocation of tasks to cores on multi processor architectures. The Ravenscar profile restricts this support to a fully partitioned approach were tasks are statically allocated to processors and there is no task migration among CPUs. These parallel tasks running on different CPUs can communicate and synchronize using protected objects.

The CPU aspect specifies the task affinity:

   task Producer with CPU => 1;
   task Consumer with CPU => 2;
   --  Parallel tasks statically allocated to different cores

Implementations

That's it for the quick overview of the basic Ada Ravenscar tasking features.

One of the advantages of having tasking as part of the language standard is the portability, you can run the same Ravenscar application on Windows, Linux, MacOs or an RTOS like VxWorks. GNAT also provides a small stand alone run-time that implements the Ravenscar tasking on bare metal. This run-time is available, for instance, on ARM Cortex-M micro-controllers.

It's like having an RTOS in your language.

]]>
Make with Ada 2017- A "Swiss Army Knife" Watch http://blog.adacore.com/make-with-ada-2017-a-swiss-army-knife-watch Wed, 22 Nov 2017 14:49:00 +0000 J. German Rivera http://blog.adacore.com/make-with-ada-2017-a-swiss-army-knife-watch

Summary 

The Hexiwear is an IoT wearable development board that has two NXP Kinetis microcontrollers. One is a K64F (Cortex-M4 core) for running the main embedded application software. The other one is a KW40 (Cortex M0+ core) for running a wireless connectivity stack (e.g., Bluetooth BLE or Thread). The Hexiwear board also has a rich set of peripherals, including OLED display, accelerometer, magnetometer, gryroscope, pressure sensor, temperature sensor and heart-rate sensor. This blog article describes the development of a "Swiss Army Knife" watch on the Hexiwear platform. It is a bare-metal embedded application developed 100% in Ada 2012, from the lowest level device drivers all the way up to the application-specific code, for the Hexiwear's K64F microcontroller.

I developed Ada drivers for Hexiwear-specific peripherals from scratch, as they were not supported by AdaCore's Ada drivers library. Also, since I wanted to use the GNAT GPL 2017 Ada compiler but the GNAT GPL distribution did not include a port of the Ada Runtime for the Hexiwear board, I also had to port the GNAT GPL 2017 Ada runtime to the Hexiwear. All this application-independent code can be leveraged by anyone interested in developing Ada applications for the Hexiwear wearable device.


Project Overview

The purpose of this project is to develop the embedded software of a "Swiss Army Knife" watch in Ada 2012 on the Hexiwear wearable device.

The Hexiwear is an IoT wearable development board that has two NXP Kinetis microcontrollers. One is a K64F (Cortex-M4 core) for running the main embedded application software. The other one is a KW40 (Cortex M0+ core) for running a wireless connectivity stack (e.g., Bluetooth BLE or Thread). The Hexiwear board also has a rich set of peripherals, including OLED display, accelerometer, magnetometer, gryroscope, pressure sensor, temperature sensor and heart-rate sensor.

The motivation of this project is two-fold. First, to demonstrate that the whole bare-metal embedded software of this kind of IoT wearable device can be developed 100% in Ada, from the lowest level device drivers all the way up to the application-specific code. Second, software development for this project will produce a series of reusable modules that can be used in the future as a basis for creating "labs" for teaching an Ada 2012 embedded software development class using the Hexiwear platform. Given the fact that the Hexiwear platform is a very attractive platform for embedded software development, its appeal can be used to attract more embedded developers to learn Ada.

The scope of the project will be to develop only the firmware that runs on the application microcontroller (K64F). Ada drivers for Hexiwear-specific peripherals need to be developed from scratch, as they are not supported by AdaCore’s Ada drivers library. Also, since I will be using the GNAT GPL 2017 Ada compiler and the GNAT GPL distribution does not include a port of the Ada Runtime for the Hexiwear board, the GNAT GPL 2017 Ada runtime needs to be ported to the Hexiwear board.

The specific functionality of the watch application for the time frame of "Make with Ada 2017" will include:

  • Watch mode: show current time, date, altitude and temperature
  • Heart rate monitor mode: show heart rate (when Hexiwear worn on the wrist)
  • G-Forces monitor mode: show G forces in the three axis (X, Y, Z). 

In addition, when the Hexiwear is plugged to a docking station, a command-line interface will be provided over the UART port of the docking station. This interface can be used to set configurable parameters of the watch and to dump debugging information.

Summary of Accomplishments 

I designed and implemented the "Swiss Army Knife" watch application and all necessary peripheral drivers 100% in Ada 2012. The only third-party code used in this project, besides the GNAT GPL Ravenscar SFP Ada runtime library, is the following:

  • A font generator, leveraged from AdaCore’s Ada drivers library. 
  • Ada package specification files, generated by the the svd2ada tool, containing declarations for the I/O registers of the Kinetis K64F’s peripherals.

Below are some diagrams depicting the software architecture of the "Swiss Army Knife" watch:

The current implementation of the "Swiss Army Knife" watch firmware, delivered for "Make with Ada 2017" has the following functionality:

  • Three operating modes: 
  • Watch mode: show current time, date, altitude and temperature
  • Heart rate monitor mode: show heart rate monitor raw reading, when Hexiwear worn on the wrist.
  • G-Forces monitor mode: show G forces in the three axis (X, Y, Z). 
  • To switch between modes, the user just needs to do a double-tap on the watch’s display. When the watch is first powered on, it starts in "watch mode".
  • To make the battery charge last longer, the microcontroller is put in deep-sleep mode (low-leakage stop mode) and the display is turned off, after 5 seconds. A quick tap on the watch’s display, will wake it up. Using the deep-sleep mode, makes it possible to extend the battery life from 2 hours to 12 hours, on a single charge. Indeed, now I can use the Hexiwear as my personal watch during the day, and just need to charge it at night.
  • A command-line interface over UART, available when the Hexiwear device is plugged to its docking station. This interface is used for configuring the following attributes of the watch: 
    1. Current time (not persistent on power loss, but persistent across board resets)
    2. Current date (not persistent on power loss, but persistent across board resets)
    3. Background color (persistent on the K64F microcontroller’s NOR flash, unless firmware is re-imaged)
    4. Foreground color (persistent on the K64F microcontroller’s NOR flash, unless firmware is re-imaged)

Also, the command-line interface can be used for dumping the different debugging logs: info, error and debug.

The top-level code of the watch application can be found at: https://github.com/jgrivera67/make-with-ada/tree/master/hexiwear_watch

I ported the GNAT GPL 2017 Ravenscar Small-Foot-Print Ada runtime library to the Hexiwear board and modify it to support the memory protection unit (MPU) of the K64 microcontroller, and more specifically to support MPU-aware tasks. This port of the Ravenscar Ada runtime can be found in GitHub at https://github.com/jgrivera67/embedded-runtimes

The relevant folders are:

I developed device drivers for the following peripherals of the Kinetis K64 micrcontroller as part of this project and contributed their open-source Ada code in GitHub at https://github.com/jgrivera67/make-with-ada/tree/master/drivers/mcu_specific/nxp_kinetis_k64f:

  • DMA Engine
  • SPI controller
  • I2C controller
  • Real-time Clock (RTC)
  • Low power management
  • NOR flash

These drivers are application-independent and can be easily reused for other Ada embedded applications that use the Kinetis K64F microcontroller.

I developed device drivers for the following peripherals of the Hexiwear board as part of this project and contributed their open-source Ada code in GitHub at https://github.com/jgrivera67/make-with-ada/tree/master/drivers/board_specific/hexiwear:

  • OLED graphics display (on top of the SPI and DMA engine drivers, and making use of the advanced DMA channel linking functionality of the Kinetis DMA engine)
  • 3-axis accelerometer (on top of the I2C driver)
  • Heart rate monitor (on top of the I2C driver)
  • Barometric pressure sensor (on top of the I2C driver)

These drivers are application-independent and can be easily reused for other Ada embedded applications that use the Hexiwear board.

I designed the watch application and its peripheral drivers, to use the memory protection unit (MPU) of the K64F microcontroller, from the beginning, not as an afterthought. Data protection at the individual data object level is enforced using the MPU, for the private data structures from every Ada package linked into the application. For this, I leveraged the MPU framework I developed earlier and which I presented in the Ada Europe 2017 Conference. Using this MPU framework for the whole watch application demonstrates that the framework scales well for a realistic-size application. The code of this MPU framework is part of a modified Ravenscar small-foot-print Ada runtime library port for the Kinetis K64F microcontroller, whose open-source Ada code I contributed at https://github.com/jgrivera67/embedded-runtimes/tree/master/bsps/kinetis_k64f_common/bsp

I developed a CSP model of the Ada task architecture of the watch firmware with the goal of formally verifying that it is deadlock free, using the FDR4 tool. Although I ran out of time to successfully run the CSP model through the FDR tool, developing the model helped me gain confidence about the overall completeness of the Ada task architecture as well as the completeness of the watch_task’s state machine. Also, the CSP model itself is a form a documentation that provides a high-level formal specification of the Ada task architecture of the watch code.

Hexiwear "Swiss Army Knife" watch developed in Ada 2012

Open

My Project’s code is under the BSD license It’s hosted on Github. The project is divided in two repositories:

To do the development, I used the GNAT GPL 2017 toolchain - ARM ELF format (hosted on Windows 10 or Linux), including the GPS IDE. I also used the svd2ada tool to generate Ada code from SVD XML files for the Kinetis microcontrollers I used.

Collaborative

I designed this Ada project to make it easy for others to leverage my code. Anyone interested in developing their own flavor of "smart" watch for the Hexiwear platform can leverage code from my project. Also, anyone interested in developing any type of embedded application in Ada/SPARK for the Hexiwear platform can leverage parts of the software stack I have developed in Ada 2012 for the Hexiwear, particularly device drivers and platform-independent/application-independent infrastructure code, without having to start from scratch as I had to. All you need is to get your own Hexiwear development kit (http://www.hexiwear.com/shop/), clone my Git repositories mentioned above and get the GNAT GPL 2017 toolchain for ARM Cortex-M (http://libre.adacore.com/download/, choose ARM ELF format).

The Hexiwear platform is an excellent platform to teach embedded software development in general, and in Ada/SPARK in particular, given its rich set of peripherals and reasonable cost. The Ada software stack that I have developed for the Hexiwear platform can be used as a base to create a series of programming labs as a companion to courses in Embedded and Real-time programming in Ada/SPARK, from basic concepts and embedded programming techniques, to more advanced topics such as using DMA engines, power management, connectivity, memory protection and so on. 

Dependable

  • I used the memory protection unit to enforce data protection at the granularity of individual non-local data objects, throughout the code of the watch application and its associated device drivers.
  • I developed a CSP model of the Ada task architecture of the watch firmware with the goal of formally verifying that it is deadlock free, using the FDR4 tool. Although I ran out of time to successfully run the CSP model through the FDR tool, developing the model helped me gain confidence about the completeness of the watch_task’s state machine and the overall completeness of the Ada task architecture of the watch code.
  • I consistently used the information hiding principles to architect the code to ensure high levels of maintainability and portability, and to avoid code duplication across projects and across platforms.
  • I leveraged extensively the data encapsulation and modularity features of the Ada language in general, such as private types and child units including private child units, and in some cases subunits and nested subprograms.
  • I used gnatdoc comments to document key data structures and subprograms. 
  • I used Ada 2012 contract-based programming features and assertions extensively
  • I used range-based types extensively to leverage the Ada power to detect invalid integer values.
  • I Used Ada 2012 aspect constructs wherever possible
  • I Used GNAT coding style. I use the -gnaty3abcdefhiklmnoOprstux GNAT compiler option to check compliance with this standard.
  • I used GNAT flags to enable rigorous compile-time checks, such as -gnato13 -gnatf -gnatwa -gnatVa -Wall.

Inventive

  • The most innovative aspect of my project is the use of the memory protection unit (MPU) to enforce data protection at the granularity of individual data objects throughout the entire code of the project (both application code and device drivers). Although the firmware of a watch is not a safety-critical application, it serves as a concrete example of a realistic-size piece of embedded software that uses the memory protection unit to enforce data protection at this level of granularity. Indeed, this project demonstrates the feasibility and scalability of using the MPU-based data protection approach that I presented at the Ada Europe 2017 conference earlier this year, for true safety-critical applications.

CSP model of the Ada task architecture of the watch code, with the goal of formally verifying that the task architecture was deadlock free, using the FDR4 model checking tool. Although I ran out of time to successfully run the CSP model through the FDR tool, the model itself provides a high-level formal specification of the Ada task architecture of the watch firmware, which is useful as a concise form of documentation of the task architecture, that is more precise than the task architecture diagram alone.

Future Work

Short-Term Future Plans

  • Implement software enhancements to existing features of the "Swiss Army Knife" watch:

    • Calibrate accuracy of altitude and temperature readings
    • Calibrate accuracy of heart rate monitor reading
    • Display heart rate in BPM units instead of just showing the raw sensor reading
    • Calibrate accuracy of G-force readings
  • Develop new features of the "Swiss Army Knife" watch:

    • Display compass information (in watch mode). This entails extending the accelerometer driver to support the built-in magnetometer
    • Display battery charge remaining. This entails writing a driver for the K64F’s A/D converter to interface with the battery sensor
    • Display gyroscope reading. This entails writing a driver for the Hexiwear’s Gyroscope peripheral
    • Develop Bluetooth connectivity support, to enable the Hexiwear to talk to a cell phone over blue tooth as a slave and to talk to other Bluetooth slaves as a master. As part of this, a Bluetooth "glue" protocol will need to be developed for the K64Fto communicate with the Bluetooth BLE stack running on the KW40, over a UART. For the BLE stack itself, the one provided by the chip manufacturer will be used. A future challenge could entail to write an entire Bluetooth BLE stack in Ada to replace the manufacturer’s KW40 firmware.
  • Finish the formal analysis of the Ada task architecture of the watch code, by successfully running its CSP model through the FDR tool to verify that the architecture is deadlock free, divergence free and that it satisfies other applicable safety and liveness properties.

Long-Term Future Plans

  • Develop more advanced features of the "Swiss Army Knife" watch:

    • Use sensor fusion algorithms combining readings from accelerometer and gyroscope
    • Add Thread connectivity support, to enable the watch to be an edge device in an IoT network (Thread mesh). This entails developing a UART-based interface to the Hexiwear’s KW40 (which would need to be running a Thread (804.15) stack).
    • Use the Hexiwear device as a dash-mounted G-force recorder in a car. Sense variations of the 3D G-forces as the car moves, storing the G-force readings in a circular buffer in memory, to capture the last 10 seconds (or more depending on available memory) of car motion. This information can be extracted over bluetooth. 
    • Remote control of a Crazyflie 2.0 drone, from the watch, over Bluetooth. Wrist movements will be translated into steering commands for the drone.
  • Develop a lab-based Ada/SPARK embedded software development course using the Hexiwear platform, leveraging the code developed in this project. This course could include the following topics and corresponding programming labs:
    1. Accessing I/O registers in Ada.
      • Lab: Layout of I/O registers in Ada and the svd2ada tool
    2. Pin-level I/O: Pin Muxer and GPIO. 
      • Lab: A Traffic light using the Hexiwear’s RGB LED
    3. Embedded Software Architectures: Cyclic Executive.
      • Lab: A watch using the real-time clock (RTC) peripheral with polling
    4. Embedded Software Architectures: Main loop with Interrupts.
      • Lab: A watch using the real-time clock (RTC) with interrupts
    5. Embedded Software Architectures: Tasks.
      • Lab: A watch using the real-time clock (RTC) with tasks
    6. Serial Console.
      • Lab: UART driver with polling and with interrupts)
    7. Sensors: A/D converter.
      • Lab: Battery charge sensor
    8. Actuators: Pulse Width Modulation (PWM).
      • Lab: Vibration motor and light dimmer
    9. Writing to NOR flash.
      • Lab: Saving config data in NOR Flash
    10. Inter-chip communication and complex peripherals: I2C.
      • Lab: Using I2C to interface with an accelerometer
    11. Inter-chip communication and complex peripherals: SPI.
      • Lab: OLED display
    12. Direct Memory Access I/O (DMA).
      • Lab: Measuring execution time and making OLED display rendering faster with DMA
    13. The Memory Protection Unit (MPU).
      • Lab: Using the memory protection unit
    14. Power Management.
      • Lab: Using a microcontroller’s deep sleep mode
    15. Cortex-M Architecture and Ada Startup Code.
      • Lab: Modifying the Ada startup code in the Ada runtime library to add a reset counter)
    16. Recovering from failure.
      • Lab: Watchdog timer and software-triggered resets
    17. Bluetooth Connectivity
      • Lab: transmit accelerometer readings to a cell phone over Bluetooth
]]>
Physical Units Pass the Generic Test http://blog.adacore.com/physical-units-pass-the-generic-test Thu, 16 Nov 2017 07:30:00 +0000 Yannick Moy http://blog.adacore.com/physical-units-pass-the-generic-test

The support for physical units in programming languages is a long-standing issue, which very few languages have even attempted to solve. This issue was mostly solved for Ada in 2012 by our colleagues Ed Schonberg and Vincent Pucci, who introduced special aspects for specifying physical dimensions on types. An aspect Dimension_System allows the programmer to define a new system of physical units, while an aspect Dimension allows setting the dimensions of a given subtype in the dimension system of its parent type. The dimension system in GNAT is completely checked at compile time, with no impact on the executable size or execution time, and it offers a number of facilities for defining units, managing fractional dimensions and printing out dimensioned quantities. For details, see the article "Implementation of a simple dimensionality checking system in Ada 2012" presented at the ACM SIGAda conference HILT 2012 (also attached below).

The GNAT dimension system did not attempt to deal with generics though. As noted in the previous work by Grein, Kazakov and Wilson in "A survey of Physical Units Handling Techniques in Ada":

The conflict between the requirements 1 [Compile-Time Checks] and 2 [No Memory / Speed Overhead at Run-Time] on one side and requirement 4 [Generic Programming] on the other is the source of the problem of dimension handling.

So the solution in GNAT solved 1 and 2 while allowing generic programming but leaving it unchecked for dimensionality correctness. Here is the definition of generic programming by Grein, Kazakov and Wilson:

Here generic is used in a wider sense, as an ability to write code dealing with items of types from some type sets.  In our case it means items of different dimension. For instance, it should be possible to write a dimension-aware integration program, which would work for all valid combinations of dimensions.

This specific issue of programming a generic integration over time that would work for a variety of dimensions was investigated earlier this year by our partners from Technical University of Munich for their Glider autopilot software in SPARK. Working with them, we found a way to upgrade the dimensionality analysis in GNAT to support generic programming, which recently has been implemented in GNAT.

The goal was to apply dimensionality analysis on the instances of a generic, and to preserve dimension as much as possible across type conversions. In our upgraded dimensionality analysis in GNAT, a conversion from a dimensioned type (say, a length) to its dimensionless base type (the root of the dimension system) now preserves the dimension (length here), but will look like valid Ada code to any other Ada compiler. Which makes it possible to define a generic function Integral as follows:

    generic
        type Integrand_Type is digits <>;
        type Integration_Type is digits <>;
        type Integrated_Type is digits <>;
    function Integral (X : Integrand_Type; T : Integration_Type) return Integrated_Type;

    function Integral (X : Integrand_Type; T : Integration_Type) return Integrated_Type is
    begin
       return Integrated_Type (Mks_Type(X) * Mks_Type(T));
    end Integral;

We are using above the standard Mks_Type defined in System.Dim.Mks as root type, but we could do the same with the root of a user-defined dimension system. The generic code is valid Ada since the arguments X and T are converted into the same type (here Mks_Type), in order to multiply them without compilation error. The result, which is now of type Mks_Type, is then converted to the final type Integrated_Type. With the original dimensionality analysis in GNAT, dimensions were lost during these type conversions. 

However, with the upgraded analysis in GNAT, a conversion to the root type indicates that the dimensions of X and T have to be preserved and tracked when converting both to and from the root type Mks_Type. With this, still using the standard types defined in System.Dim.Mks, we can define an instance of this generic that integrates speed over time:

    function Velocity_Integral is new Integral(Speed, Time, Length);

The GNAT-specific dimensionality analysis will perform additional checks for correct dimensionality in all such generic instances, while for any other Ada compiler this program still passes as valid (but not dimensionality-checked) Ada code. For example, for an invalid instance as:

    function Bad_Velocity_Integral is new Integral(Speed, Time, Mass);

GNAT issues the error:

dims.adb:10:05: instantiation error at line 5
dims.adb:10:05: dimensions mismatch in conversion
dims.adb:10:05: expression has dimension [L]
dims.adb:10:05: target type has dimension [M]

One subtlety that we faced when developing the Glider software at Technical University of Munich was that, sometimes, we do want to convert a value from a dimensioned type into another dimensioned type. This was the case in particular because we defined our own dimension system in which angles had their own dimension to verify angular calculations, which worked well most of the time.

However, the angle dimension must be removed when multiplying an angle with a length, which produces an (arc) length, or when using an existing trigonometric function that expects a dimensionless argument. In theses cases, a simple type conversion to the dimensionless root type is not enough, because now the dimension of the input is preserved. We found two solutions to this problem:

  • either define the root of our dimension system as a derived type from a parent type, say Base_Unit_Type, and convert to/from Base_Unit_Type to remove dimensions; or
  • explicitly insert conversion coefficients into the equations with dimensions such that the dimensions do cancel out as required.

For example, our use of an explicit Angle_Type with its own dimension (denoted A) first seemed to cause trouble because of conversions such as this one:

Distance := 2.0 * EARTH_RADIUS * darc; -- expected L, found L.A

where darc is of Angle_Type (dimension A) and EARTH_RADIUS of Length_Type (dimension L). First, we escaped the unit system as follows:

Distance := 2.0 * EARTH_RADIUS * Unit_Type(Base_Unit_Type(darc));

However, this bypasses the dimensionality checking system and can lead to dangerous mixing of physical dimensions. It would be possible to accidentally turn a temperature into a distance, without any warning. A safer way to handle this issue is to insert the missing units explicitly:

Distance := 2.0 * EARTH_RADIUS * darc * 1.0/Radian;

Here, Radian is the unit of Angle_Type, which we need to get rid of to turn an angle into a distance. In other words, the last term represents a coefficient with the required units to turn an angle into a distance. Thus, darc*1.0/Radian still carries the same value as darc, but is dimensionless as required per the equation, and GNAT can perform a dimensionality analysis also in such seemingly dimensionality-defying situations.

Moreover, this solution is less verbose than converting to the base unit type and then back. In fact, it can be made even shorter:

Distance := 2.0 * EARTH_RADIUS * darc/Radian;

With its improved dimensionality analysis, GNAT Pro 18 has solved the conflict between requirements 1 [Compile-Time Checks] and 2 [No Memory / Speed Overhead at Run-Time] on one side and requirement 4 [Generic Programming] on the other side, hopefully making Grein, Kazakov and Wilson happier! The dimensionality analysis in GNAT is a valuable feature for programs that deal with physical units. It increases readability by making dimensions more explicit and it reduces programming errors by checking the dimensions for consistency. For example, we used it on the StratoX Weather Glider from Technical University of Munich, as well as the RESSAC User Case, an example of autonomous vehicle development used as challenge for certification.

For more information on the dimensionality analysis in GNAT, see the GNAT User's Guide. In particular, the new rules that deal with conversions are at the end of the section, and we copy them verbatim below:

  The dimension vector of a type conversion T(expr) is defined as follows, based on the nature of T:
 -  If T is a dimensioned subtype then DV(T(expr)) is DV(T) provided that either expr is dimensionless or DV(T) = DV(expr). The conversion is illegal if expr is dimensioned and DV(expr) /= DV(T). Note that vector equality does not require that the corresponding Unit_Names be the same.
    As a consequence of the above rule, it is possible to convert between different dimension systems that follow the same international system of units, with the seven physical components given in the standard order (length, mass, time, etc.). Thus a length in meters can be converted to a length in inches (with a suitable conversion factor) but cannot be converted, for example, to a mass in pounds.
 -  If T is the base type for expr (and the dimensionless root type of the dimension system), then DV(T(expr)) is DV(expr). Thus, if expr is of a dimensioned subtype of T, the conversion may be regarded as a “view conversion” that preserves dimensionality.
    This rule makes it possible to write generic code that can be instantiated with compatible dimensioned subtypes. The generic unit will contain conversions that will consequently be present in instantiations, but conversions to the base type will preserve dimensionality and make it possible to write generic code that is correct with respect to dimensionality.
 -  Otherwise (i.e., T is neither a dimensioned subtype nor a dimensionable base type), DV(T(expr)) is the empty vector. Thus a dimensioned value can be explicitly converted to a non-dimensioned subtype, which of course then escapes dimensionality analysis.

Thanks to Ed Schonberg and Ben Brosgol from AdaCore for their work on the design and implementation of this enhanced dimensionality analysis in GNAT.

]]>
Make with Ada 2017: Brushless DC Motor Controller http://blog.adacore.com/make-with-ada-2017-brushless-dc-motor-controller Tue, 14 Nov 2017 14:33:39 +0000 Jonas Attertun http://blog.adacore.com/make-with-ada-2017-brushless-dc-motor-controller

Not long after my first experience with the Ada programming language I got to know about the Make With Ada 2017 contest. And, just as it seemed, it turned out to be a great way to get a little bit deeper into the language I had just started to learn

The ada-motorcontrol project involves the design of a BLDC motor controller software platform written in Ada. These types of applications often need to be run fast and the core control software is often tightly connected to the microcontroller peripherals. Coming from an embedded systems perspective with C as the reference language, the initial concerns were if an implementation in Ada actually could meet these requirements.

It turned out, on the contrary, that Ada is very capable considering both these requirements. In particular, accessing peripherals on the STM32 with help of the Ada_Drivers_Library really made using the hardware related operations even easier than using the HAL written in C by ST.

Throughout the project I found uses for many of Ada’s features. For example, the representation clause feature made it simple to extract data from received (and to put together the transmit) serial byte streams. Moreover, contract based programming and object oriented concepts such as abstracts and generics provided means to design clean and easy to use interfaces, and a well organized project.

One of the objectives of the project was to provide a software platform to help developing various motor control applications, with the core functionality not being dependent on some particular hardware. Currently however it only supports a custom inverter board, since unfortunately I found that the HAL provided in Ada_Drivers_Library was not comprehensive enough to support all the peripheral features used. But the software is organized such as to keep driver dependent code separated. To put this to test, I welcome contributions to add support for other inverter boards. A good start would be the popular VESC-board.


Ada Motor Controller Project Log:

Motivation

The recent advances in electric drives technologies (batteries, motors and power electronics) has led to increasingly higher output power per cost, and power density. This in turn has increased the performance of existing motor control applications, but also enabled some new - many of them are popular projects amongst diyers and makers, e.g. electric bike, electric skateboard, hoverboard, segway etc. 

On a hobby-level, the safety aspects related to these is mostly ignored. Professional development of similar applications, however, normally need to fulfill some domain specific standards putting requirements on for example the development process, coding conventions and verification methods. For example, the motor controller of an electric vehicle would need to be developed in accordance to ISO 26262, and if the C language is used, MISRA-C, which defines a set of programming guidelines that aims to prevent unsafe usage of the C language features. 

Since the Ada programming language has been designed specifically for safety critical applications, it could be a good alternative to C for implementing safe motor controllers used in e.g. electric vehicle applications. For a comparison of MISRA-C and Ada/SPARK, see this report. Although Ada is an alternative for achieving functional safety, during prototyping it is not uncommon that a mistake leads to destroyed hardware (burned motor or power electronics). Been there, done that! The stricter compiler of Ada could prevent such accidents. 

Moreover, while Ada is not a particularly "new" language, it includes more features that would be expected by a modern language, than is provided by C. For example, types defined with a specified range, allowing value range checks already during compile time, and built-in multitasking features. Ada also supports modularization very well, allowing e.g. easy integration of new control interfaces - which is probably the most likely change needed when using the controller for a new application. 

This project should consist of and fulfill:

  • Core software for controlling brushless DC motors, mainly aimed at hobbyists and makers.
  • Support both sensored and sensorless operation.
  • Open source software (and hardware).
  • Implementation in Ada on top of the Ravenscar runtime for the stm32f4xx.
  • Should not be too difficult to port to another microcontroller.

And specifically, for those wanting to learn the details of motor control, or extend with extra features:

  • Provide a basic, clean and readable implementation.
  • Short but helpful documentation. 
  • Meaningful and flexible logging.
  • Easy to add new control interfaces (e.g. CAN, ADC, Bluetooth, whatever).

Hardware

The board that will be used for this project is a custom board that I previously designed with the intent to get some hands-on knowledge in motor control. It is completely open source and all project files can be found on GitHub

  • Microcontroller STM32F446, ARM Cortex-M4, 180 MHz, FPU
  • Power MOSFETs 60 V
  • Inline phase current sensing
  • PWM/PPM control input
  • Position sensor input as either hall or quadrature encoder
  • Motor and board temp sensor (NTC)
  • Expansion header for UART/ADC/DAC/SPI/I2C/CAN

It can handle power ranges in the order of what is required by an electric skateboard or electric bike, depending on the used battery voltage and cooling. 

There are other inverter boards with similar specifications. One very popular is the VESC by Benjamin Vedder. It is probably not that difficult to port this project to work on that board as well.

Rough project plan

I thought it would be appropriate to write down a few bullets of what needs to be done. The list will probably grow...

  • Create a port of the Ravenscar runtime to the stm32f446 target on the custom board
  • Add stm32f446 as a device in the Ada Drivers Library
  • Get some sort of hello world application running to show that stuff works
  • Investigate and experiment with interrupt handling with regards to overhead
  • Create initialization code for all used mcu peripherals
  • Sketch the overall software architecture and define interfaces
  • Implementation
  • Documentation...

Support for the STM32F446

The microprocessor that will be used for this project is the STM32F446. In the current version of the Ada Drivers Library and the available Ravenscar embedded runtimes, there is no explicit support for this device. Fortunately, it is very similar to other processors in the stm32f4-family, so adding support for stm32f446 was not very difficult once I understood the structure of the repositories. I forked these and added them as submodules in this project's repo

Compared to the Ravenscar runtimes used by the discovery-boards, there are differences in external oscillator frequency, available interrupt vectors and memory sizes. Otherwise they are basically the same. 

An important tool needed to create the new driver and runtime variants is svd2ada. It generates device specification and body files in ada based on an svd file (basically xml) that describes what peripherals exist, how their registers look like, their address', existing interrupts, and stuff like that. It was easy to use, but a little bit confusing how flags/switches should be set when generating driver and runtime files. After some trail and error I think I got it right. I created a Makefile for generating all these file with correct switches.

I could not find an svd-file for the stm32f446 directly from ST, but found one on the internet. It was not perfect though. Some of the source code that uses the generated data types seems to make assumptions on the structure of these types. Depending on how the svd file looks, svd2ada may or may not generate them in the expected way. There were also other missing and incorrect data in the svd file, so I had to manually correct these. There are probably additional issues that I have not found yet...

It is alive!

I made a very simple application consisting of a task that is periodically delayed and toggles the two leds on the board each time the task resumes. The leds toggles with the expected period, so the oscillator seems to be initialized correctly. 

Next up I need to map the different mcu pins to the corresponding hardware functionality and try to initialize the needed peripherals correctly. 

The control algorithm and its use of peripherals

There are several methods of controlling brushless motors, each with a specific use case. As a first approach I will implement sensored FOC, where the user requests a current value (or torque value). 

To simplify, this method can be divided into the following steps, repeated each PWM period (typically around 20 kHz):

  1. Sample the phase currents
  2. Transform the values into a rotor fixed reference frame
  3. Based on the requested current, calculate a new set of phase voltages
  4. Transform back to the stator's reference frame
  5. Calculate PWM duty cycles as to create the calculated phase voltages

Fortunately, the peripherals of the stm32f446 has a lot of features that makes this easier to implement. For example, it is possible to trigger the ADC directly from the timers that drives the PWM. This way the sampling will automatically be synchronized with the PWM cycle. Step 1 above can thus be started immediately as the ADC triggers the corresponding conversion-complete-interrupt. In fact, many existing implementations perform all the steps 1-to-6 completely within an ISR. The reason for this is simply to reduce any unnecessary overhead since the performed calculations is somewhat lengthy. The requested current is passed to the ISR via global variables. 

I would like to do this the traditional way, i.e. to spend as little time as possible in the ISR and trigger a separate Task to perform all calculations. The sampled current values and the requested current shall be passed via Protected Objects. All this will of course create more overhead. Maybe too much? Need to be investigated.

PWM and ADC is up and running

I have spent some time configuring the PWM and ADC peripherals using the Ada Drivers Library. All in all it went well, but I had to do some smaller changes to the drivers to make it possible to configure the way I wanted. 

  • PWM is complementary output, center aligned with frequency of 20 kHz
  • PWM channels 1 to 3 generates the phase voltages
  • PWM channel 4 is used to trigger the ADC, this way it is possible to set where over the PWM period the sampling should occur
  • By default the sampling occurs in the middle of the positive waveform (V7)
  • The three ADC's are configured to Triple Multi Mode, meaning they are synchronized such that each sampled phase quantity is sampled at the same time. 
  • Phase currents and voltages a,b,c are mapped to the injected conversions, triggered by the PWM channel 4
  • Board temperature and bus voltage is mapped to the regular conversions triggered by a timer at 14 kHz
  • Regular conversions are moved to a volatile array using DMA automatically after the conversions complete
  • ADC create an interrupt after the injected conversions are complete

The drivers always assumed that the PWM outputs are mapped to a certain GPIO, so in order to configure the trigger channel I had to add a new procedure to the drivers. Also, the Scan Mode of the ADCs where not set up correctly for my configuration, and the config of injected sequence order was simply incorrect. I will send a pull request to get these changes merged with the master branch. 

Interrupt overhead/latency

As was described in previous posts the structure used for the interrupt handling is to spend minimum time in the interrupt context and to signal an awaiting task to perform the calculations, which executes at a software priority level with interrupts fully enabled. The alternative method is to place all code in the interrupt context. 

This Ada Gem and its following part describes two different approaches for doing this type of task synchronization. Both use a protected procedure as the interrupt handler but signals the awaiting task in different ways. The first uses an entry barrier and the second a Suspension Object. The idiom using the entry barrier has the advantage that it can pass data as an integrated part of the signaling, while the Suspension Object behaves more like a binary semaphore. 

For the ADC conversion complete interrupt, I tested both methods. The protected procedure used as the ISR read the converted values consisting of six uint16. For the entry barrier method these where passed to the task using an out-parameter. When using the second method the task needed to collect the sample data using a separate function in the protected object.

Overhead in this context I define as the time from that the ADC generates the interrupt, to the time the event triggered task starts running. This includes, first, an isr-wrapper that is a part of the runtime which then calls the installed protected procedure, and second, the execution time of the protected procedure which reads the sampled data, and finally, the signaling to the awaiting task. 

I measured an approximation of the overhead by setting a pin high directly in the beginning of the protected procedure and then low by the waiting task directly when waking up after the signaling. For the Suspension Object case the pin was set low after the read data function call, i.e. for both cases when the sampled data was copied to the task. The code was compiled with the -O3 flag. 

The first idiom resulted in an overhead of ~8.4 us, and the second ~10 us. This should be compared to the period of the PWM which at 20 kHz is 50 us. Obviously the overhead is not negligible, so I might consider using the more common approach for motor control applications of having the current control algorithm in the interrupt context instead. However, until the execution time of the algorithm is known, the entry barrier method will be assumed... 

Note: "Overhead" might be the wrong term since I don't know if during the time measured the cpu was really busy. Otherwise it should be called latency I think...

Purple: Center aligned PWM at 50 % duty where the ADC triggers in the center of the positive waveform. Yellow: Pin state as described above. High means time of overhead/latency.

Reference frames

A key benefit of the FOC algorithm is that the actual control is performed in a reference frame that is fixed to the rotor. This way the sinusoidal three phase currents, as seen in the stator's reference frame, will instead be represented as two DC values, assuming steady state operation. The transforms used (Clarke and Park) requires that the angle between the rotor and stator is known. As a first step I am using a quadrature encoder since that provides a very precise measurement and very low overhead due to the hardware support of the stm32. 

Three types has been defined, each representing a particular reference frame: Abc, Alfa_Beta and Dq. Using the transforms above one can simply write:

declare
   Iabc  : Abc;  --  Measured current (stator ref)
   Idq   : Dq;   --  Measured current (rotor ref)
   Vdq   : Dq;   --  Calculated output voltage (rotor ref)
   Vabc  : Abc;  --  Calculated output voltage (stator ref)
   Angle : constant Float := ...;
begin
   Idq := Iabc.Clarke.Park(Angle);

   --  Do the control...

   Vabc := Vdq.Park_Inv(Angle).Clarke_Inv;
end;

Note that Park and Park_Inv both use the same angle. To be precise, they
both use Sin(Angle) and Cos(Angle). Now, at first, I simply implemented
these by letting each transform calculate Sin and Cos locally. Of
course, that is a waste for this particular application. Instead, I
defined an angle object that when created also computed Sin and Cos of
the angle, and added versions of the transforms to use these
"ahead-computed" values instead.

declare
   --  Same...

   Angle : constant Angle_Obj := Compose (Angle_Rad); 
   --  Calculates Sin and Cos
begin
   Idq := Iabc.Clarke.Park(Angle);

   --  Do the control...

   Vabc := Vdq.Park_Inv(Angle).Clarke_Inv;
end;

This reduced the execution time somewhat (not as much as I thought, though), since the trigonometric functions are the heavy part. Using lookup table based versions instead of the ones provided by Ada.Numerics might be even faster...

It spins!

The main structure of the current controller is now in place. When a button on the board is pressed the sensor is aligned to the rotor by forcing the rotor to a known angle. Currently, the requested q-current is set by a potentiometer. 

As of now, it is definitely not tuned properly, but it at least it shows that the general algorithm is working as intended. 

In order to make this project easier to develop on, both for myself and any other users, I need to add some logging and tuning capabilities. This should allow a user to change and/or log variables in the application (e.g. control parameters) while the controller is running. I have written a tool for doing this (over serial) before, but then in C. It would be interesting to rewrite it in Ada. 

Contract Based Programming

So far, I have not used this feature much. But when writing code for the logging functionality I ran into a good fit for it. 

I am using Consistent Overhead Byte Stuffing (COBS) to encode the data sent over uart. This encoding results in unambiguous packet framing regardless of packet content, thus making it easy for receiving applications to recover from malformed packets. The packets are separated by a delimiter (value 0 in this case), making it easy to synchronize the receiving parser. The encoding ensures that the encoded packet itself does not contain the delimiter value. 

A good feature of COBS is that given that the raw data length is less than 254, then the overhead due to the encoding is always exactly one byte. I could of course simply write this fact as a comment to the encode/decode functions, allowing the user to make this assumption in order to simplify their code. A better way could be to write this condition as contracts. 

   Data_Length_Max : constant Buffer_Index := 253;

   function COBS_Encode (Input : access Data)
                         return Data
   with
      Pre => Input'Length <= Data_Length_Max,
      Post => (if Input'Length > 0 then
                  COBS_Encode'Result'Length = Input'Length + 1
               else
                  Input'Length = COBS_Encode'Result'Length);

   function COBS_Decode (Encoded_Data : access Data)
                         return Data
   with
      Pre => Encoded_Data'Length <= Data_Length_Max + 1,
      Post => (if Encoded_Data'Length > 0 then
                  COBS_Decode'Result'Length = Encoded_Data'Length - 1
               else
                  Encoded_Data'Length = COBS_Decode'Result'Length);

Logging and Tuning

I just got the logging and tuning feature working. It is an Ada-implementation using the protocol as used by a previous project of mine, Calmeas. It enables the user to log and change the value of variables in the application, in real-time. This is very helpful when developing systems where the debugger does not have a feature of reading and writing to memory while the target is running. 

The data is sent and received over uart, encoded by COBS. The interfaces of the uart and cobs packages implements an abstract stream type, meaning it is very simple to change the uart to some other media, and that e.g. cobs can be skipped if wanted. 

Example

The user can simply do the following in order to get the variable V_Bus_Log loggable and/or tunable:

V_Bus_Log  : aliased Voltage_V;
...
Calmeas.Add (Symbol      => V_Bus_Log'Access,
             Name        => "V_Bus",
             Description => "Bus Voltage [V]");

It works for (un)signed integers of size 8, 16 and 32 bits, and for floats. 

After adding a few variables, and connecting the target to the gui:

As an example, this could be used to tune the current controller gains:

As expected, the actual current comes closer to the reference as the gain increases

As of now, the tuning is not done in a "safe" way. The writing to added symbols is done by the separate task named Logger, simply by doing unchecked writes to the address of the added symbol, one byte at a time. At the same time the application is reading the symbol's value from another task with higher prio. The optimal way would be to pass the value through a protected type, but since the tuning is mostly for debugging purposes, I will make it the proper way later on...

Note that the host GUI is not written in Ada (but Python), and is not itself a part of this project. 

Architecture overview

Here is a figure showing an overview of the software:

Summary

This project involves the design of a software platform that provides a good basis when developing motor controllers for brushless motors. It consist of a basic but clean and readable implementation of a sensored field oriented control algorithm. Included is a logging feature that will simplify development and allows users to visualize what is happening. The project shows that Ada successfully can be used for a bare-metal project that requires fast execution. 

The design is, thanks to Ada's many nice features, much easier to understand compared to a lot of the other C-implementations out there, where, as a worst case, everything is done in a single ISR. The combination of increased design readability and the strictness of Ada makes the resulting software safer and simplifies further collaborative development and reuse. 

Some highlights of what has been done:

  • Porting of the Ravenscar profiles to a custom board using the STM32F446
  • Adding support for the STM32F446 to Ada_Drivers_Library project
  • Adding some functionality to Ada_Drivers_Library in order to fully use all peripheral features
  • Fixing a bug in Ada_Drivers_Library related to a bit more advanced ADC usage
  • Written HAL-isch packages so that it is easy to port to another device than STM32
  • Written a communication package and defined interfaces in order to make it easier to add control inputs.
  • Written a logging package that allows the developer to debug, log and tune the application in real-time.
  • Implemented a basic controller using sensored field oriented control
  • Well documented specifications with a generated html version

Future plans:

  • Add hall sensor support and 6-step block commutation
  • Add sensorless operation
  • Add CAN support (the pcb has currently no transceiver, though)
  • SPARK proving
  • Write some additional examples showing how to use the interfaces.
  • Port the software to the popular VESC-board.

]]>
Prove in the Cloud http://blog.adacore.com/prove-in-the-cloud Wed, 18 Oct 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/prove-in-the-cloud

We have put together a byte (8 bits) of examples of SPARK code on a server in the cloud here. (many thanks Nico Setton for that!)

Each example consists in very few lines, a few files at most, and demoes an interesting feature of SPARK. The initial version is incorrect, and hitting the "Prove" button will return with messages that point to the errors. By following the suggested fix in comments you should be able to get the code to prove automatically.

Of course, you could already do this by installing yourself SPARK GPL on a machine (download here and follow these instructions to install additional provers). The benefit with this webpage is that anyone can now experiment live with SPARK without installing first the toolset. This is very much inspired from what Microsoft Research has done with other verification tools as part of their rise4fun website.

Something particularly interesting for academics is that all the code for this widget is open source. So you can setup your own proof server for hands-on sessions, with your own exercises, in a matter of minutes! Just clone the code_examples_server project from GitHub, follow the instructions in the README, populate your server with your exercises and examples, and you're set! No need to ask for IT to setup all boxes for your students, they just need a browser to point to your server location. An exercise consists in a directory with:

  • a file example.yaml with the name and description of the exercise (in YAML syntax)
  • a GNAT project file main.gpr (could be almost empty, or force the use of SPARK_Mode so that all the code is analyzed)
  • the source files for this exercise

For inspiration, see the examples from the Compile_And_Prove_Demo project from GitHub, inside directory 'examples', which were used to populate our little online proof webpage.

Feel free to report problems or suggest improvements to the widget or the examples on their respective GitHub project pages.

]]>
SPARK Tutorial at FDL Conference http://blog.adacore.com/spark-tutorial-at-fdl-conference Tue, 12 Sep 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/spark-tutorial-at-fdl-conference

Researcher Martin Becker from Technische Universität München is giving a SPARK tutorial next week, on Monday 18th, at the Forum on specification & Design Languages in Verona, Italy. Even if you cannot attend, you may find it useful to look at the material for his tutorial, with a complete cookbook to install and setup SPARK, and a 90-minutes slide deck packed with rich and practical information about SPARK, as well as neat hands-on to get a feel for using SPARK in practice.

It is all here with the very good slides in particular. 

]]>
New SPARK Cheat Sheet http://blog.adacore.com/new-spark-cheat-sheet Thu, 24 Aug 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/new-spark-cheat-sheet

Our good friend Martin Becker has produced a new cheat sheet for SPARK, that you may find useful for a quick reminder on syntax that you have not used for some time. It is simpler than the cheat sheets we already had in English and Japanese, and depending on your style you will prefer one of the other. Also particularly useful for trainings! (and I'll use it for sure, since I have multiple trainings in the coming months...)

]]>
Highlighting Ada with Libadalang http://blog.adacore.com/highlighting-ada-with-libadalang Tue, 08 Aug 2017 12:30:00 +0000 Pierre-Marie de Rodat http://blog.adacore.com/highlighting-ada-with-libadalang

While we are working very hard on semantic analysis in Libadalang, it is already possible to leverage its lexical and syntactic analyzers. A useful example for this is a syntax highlighter.

In the context of programming languages, syntax highlighters make it easier for us, humans, to read source code. For example, formatting keywords with a bold font and a special color, while leaving identifiers with a default style, enables a quick understanding of a program’s structure. Syntax highlighters are so useful that they’re integrated into daily developer tools:

  • most “serious” code editors provide highlighters for tons of programming languages (C, Python, Ada, OCaml, shell scripts), markup languages (XML, HTML, BBcode), DSLs (Domain Specific Languages: SQL, TeX), etc.
  • probably all online code browsers ship syntax highlighters at least for mainstream languages: GitHub, Bitbucket, GitLab, …

From programming language theory, that many of us learned in engineering school (some even with passion!), we can distinguish two highlighting levels:

  1. Tokens(lexeme)-based highlighting. Each token is given a style from its token kind (keyword, comment, integer literal, identifier, …).

  2. Syntax-based highlighting. This one is higher-level: for example, give a special color for identifiers that give their name to functions and another color for identifiers that give their name to type declarations.

Most syntax highlighting engines, for instance pygments’s or vim’s, are based on regular expressions, which don’t offer to highlighter writers the same formalism as what we just described. Generally, regular expressions enable us to create an approximation of the two highlighting levels, which yields something a bit fuzzy. On the other hand, it is much easier to write such highlighters and to get pretty results quickly, compared to a fully fledged lexer/parser, which is probably why regular expressions are so popular in this context.

But here we already have a full lexer/parser for the Ada language: Libadalang. So why not use it to build a “perfect” syntax highlighter? This is going to be the exercise of the day. All blog posts so far only demonstrated the use of Libadalang’s Python API; Libadalang is primarily an Ada library, so let’s use its Ada API for once!

One disclaimer, first: the aim of this post is not to say that the world of syntax highlighters is broken and should be re-written with compiler-level lexers and parsers. Regexp-based highlighters are a totally fine compromise in contexts such as text editors; here we just demonstrate how Libadalang can be used to achieve a similar goal, but keep in mind that the result is not technically equivalent. For instance, what we will do below will require valid input Ada sources and will only work one file at a time, unlike editors that might need to work on smaller granularity items to keep the UI responsive, which is more important in this context than correctness.

Okay so, how do we start?

The first thing to do as soon as one wants to use Libadalang is to create an analysis context: this is an object that will enclose the set of sources files to process.

with Libadalang.Analysis;

package LAL renames Libadalang.Analysis;
Ctx : LAL.Analysis_Context := LAL.Create;

Good. At the end of our program, we need to release the resources that were allocated in this context:

LAL.Destroy (Ctx);

Now everything interesting must happen in between. Let’s ask Libadalang to parse a source file:

Unit : LAL.Analysis_Unit := LAL.Get_From_File
   (Ctx, "my_source_file.adb", With_Trivia => True);

The analysis of a source file yields what we call in Libadalang an analysis unit. This unit is tied to the context used to create it.

Here, we also enable a peculiar analysis option: With_Trivia tells Libadalang not to discard “trivia” tokens. Inspired by the Roslyn Compiler, what we call a trivia token is a token that is ignored when parsing source code. In Ada as in most (all?) programming languages, comments are trivia: developers are allowed to put any number of them anywhere between two tokens, this will not change the validity of the program nor its semantics. Because of this most compiler implementations just discard them: keeping them around would hurt performance for no gain. Libadalang is designed for all kinds of tools, not only compilers, so we give the choice to the user of whether or not to keep trivia around.

What we are trying to do here is to highlight a source file. We want to highlight comments as well, so we need to ask to preserve trivia.

At this point, a real world program would have to check if parsing happened with no error. We are just playing here, so we’ll skip that, but you can have a look at the Libadalang.Analysis.Has_Diagnostic and Libadalang.Analysis.Diagnostics functions if you want to take care of this.

Fine, so we assume parsing went well and now we just have to go through tokens and assign a specific style to each of them. First, let’s have a look at the various token-related data types in Libadalang we have to deal with:

  • LAL.Token_Type: reference to a token/trivia in a specific analysis unit. Think of it as a cursor in a standard container. There is one special value: No_Token which, as you may guess, is used to represent the end of the token stream or just an invalid reference, like a null access.

  • LAL.Token_Data_Type: holder for the data related to a specific token. Namely: token kind, whether it’s trivia, index in the token/trivia stream and source location range.

  • Libadalang.Lexer.Token_Data_Handlers.Token_Index: a type derived from Integer to represent the indexes in token/trivia streams.

Then let’s define holders to annotate the token stream:

type Highlight_Type is (Text, Comment, Keyword, Block_Name, ...);

Instead of directly assigning colors to the various token kinds, this enumeration defines categories for highlighting. This makes it possible to provide different highlighting styles later: one set of colors for a white background, and another one for a black background, for example.

subtype Token_Index is
   Libadalang.Lexer.Token_Data_Handlers.Token_Index;

type Highlight_Array is
   array (Token_Index range <>) of Highlight_Type;

type Highlights_Holder (Token_Count, Trivia_Count : Token_Index) is
record
   Token_Highlights  : Highlight_Array (1 .. Token_Count);
   Trivia_Highlights : Highlight_Array (1 .. Trivia_Count);
end record;

In Libadalang, even though tokens and trivia make up a logical interleaved stream, they are stored as two separate streams, hence the need for two arrays. So here is a procedure to make the annotating process easier:

procedure Set
  (Highlights : in out Highlights_Holder;
   Token      : LAL.Token_Data_Type;
   HL         : Highlight_Type)
is
   Index : constant Token_Index := LAL.Index (Token);
begin
   if LAL.Is_Trivia (Token) then
      Highlights.Trivia_Highlights (Index) := HL;
   else
      Highlights.Token_Highlights (Index) := HL;
   end if;
end Set;

Now let’s start the actual highlighting! We begin with the token-based one as described earlier.

Basic_Highlights : constant
  array (Libadalang.Lexer.Token_Kind) of Highlight_Type :=
 (Ada_Identifier => Identifier,
      Ada_All .. Ada_Return
    | Ada_Elsif | Ada_Reverse
    | -- ...
      => Keyword,
    --  ...
  );

The above declaration associate a highlighting class for each token kind defined in Libadalang.Lexer. The only work left is to determine highlighting classes iterating on each token in Unit:

Token : LAL.Token_Type := LAL.First_Token (Unit);

while Token /= LAL.No_Token loop
   declare
      TD : constant LAL.Token_Data_Type := LAL.Data (Token);
      HL : constant Highlight_Type :=
         Basic_Highlights (LAL.Kind (TD));
   begin
      Set (Highlights, TD, HL);
   end;
   Token := LAL.Next (Token);
end loop;

Easy, right? Once this code has run, we already have a pretty decent highlighting for our analysis unit! The second pass is just a refinement that uses syntax as described at the top of this blog post:

function Syntax_Highlight
  (Node : access LAL.Ada_Node_Type'Class) return LAL.Visit_Status;
 
LAL.Traverse (LAL.Root (Unit), Syntax_Highlight'Access);

LAL.Traverse will traverse Unit’s syntax tree (AST) and call the Syntax_Highlight function on each node. This function is a big dispatcher on the kind of the visited node:

function Syntax_Highlight
  (Node : access LAL.Ada_Node_Type'Class) return LAL.Visit_Status
is
   procedure Highlight_Block_Name
      (Name : access LAL.Name_Type'Class) is
   begin
      Highlight_Name (Name, Block_Name, Highlights);
   end Highlight_Block_Name;
begin
   case Node.Kind is
      when LAL.Ada_Subp_Spec =>
         declare
            Subp_Spec : constant LAL.Subp_Spec :=
               LAL.Subp_Spec (Node);

            Params : constant LAL.Param_Spec_Array_Access :=
               Subp_Spec.P_Node_Params;

         begin
            Highlight_Block_Name
              (Subp_Spec.F_Subp_Name, Highlights);
            Highlight_Type_Expr
              (Subp_Spec.F_Subp_Returns, Highlights);
            for Param of Params.Items loop
               Highlight_Type_Expr
                 (Param.F_Type_Expr, Highlights);
            end loop;
         end;

      when LAL.Ada_Subp_Body =>
         Highlight_Block_Name
           (LAL.Subp_Body (Node).F_End_Id, Highlights);

      when LAL.Ada_Type_Decl =>
         Set (Highlights,
              LAL.Data (Node.Token_Start),
              Keyword_Type);
         Highlight_Block_Name
           (LAL.Type_Decl (Node).F_Type_Id, Highlights);

      when LAL.Ada_Subtype_Decl =>
         Highlight_Block_Name
           (LAL.Subtype_Decl (Node).F_Type_Id, Highlights);

      --  ...
   end case;
   return LAL.Into;
end Syntax_Highlight;

Depending on the nature of the AST node to process, we apply specific syntax highlighting rules. For example, the first one above: for subprogram specifications (Subp_Spec), we highlight the name of the subprogram as a “block name” while we highlight type expressions for the return type and the type of all parameters as “type expressions”. Let’s go deeper: how do we highlight names?

procedure Highlight_Name
  (Name       : access LAL.Name_Type'Class;
   HL         : Highlight_Type;
   Highlights : in out Highlights_Holder) is
begin
   if Name = null then
      return;
   end if;

   case Name.Kind is
      when LAL.Ada_Identifier | LAL.Ada_String_Literal =>
         --  Highlight the only token that this node has
         declare
            Tok : constant LAL.Token_Type :=
              LAL.Single_Tok_Node (Name).F_Tok;
         begin
            Set (Highlights, LAL.Data (Tok), HL);
         end;

      when LAL.Ada_Dotted_Name =>
         --  Highlight both the prefix, the suffix and the
         --  dot token.

         declare
            Dotted_Name : constant LAL.Dotted_Name :=
               LAL.Dotted_Name (Name);
            Dot_Token   : constant LAL.Token_Type :=
               LAL.Next (Dotted_Name.F_Prefix.Token_End);
         begin
            Highlight_Name
              (Dotted_Name.F_Prefix, HL, Highlights);
            Set (Highlights, LAL.Data (Dot_Token), HL);
            Highlight_Name
              (Dotted_Name.F_Suffix, HL, Highlights);
         end;

      when LAL.Ada_Call_Expr =>
         --  Just highlight the name of the called entity
         Highlight_Name
           (LAL.Call_Expr (Name).F_Name, HL, Highlights);

      when others =>
         return;
   end case;
end Highlight_Name;

The above may be quite long, but what it does isn’t new: just as in the Syntax_Highlight function, we execute various actions depending on the kind of the input AST node. If it’s a mere identifier, then we just have to highlight the corresponding only token. If it’s a dotted name (X.Y in Ada), we highlight the prefix (X), the suffix (Y) and the dot in between as names. And so on.

At this point, we could create other syntactic highlighting rules for remaining AST nodes. This blog post is already quite long, so we’ll stop there.

There is one piece that is missing before our syntax highlighter can become actually useful: output formatted source code. Let’s output HTML, as this format is easy to produce and quite universal. We start with a helper analogous to the previous Set procedure, to deal with the dual token/trivia streams:

function Get
  (Highlights : Highlights_Holder;
   Token      : LAL.Token_Data_Type) return Highlight_Type
is
   Index : constant Token_Index := LAL.Index (Token);
begin
   return (if LAL.Is_Trivia (Token)
           then Highlights.Trivia_Highlights (Index)
           else Highlights.Token_Highlights (Index));
end Get;

And now let’s get to the output itself. This starts with a simple iteration on token, so the outline is similar to the first highlighting pass we did above:

Token     : LAL.Token_Type := LAL.First_Token (Unit);
Last_Sloc : Slocs.Source_Location := (1, 1);

Put_Line ("<pre>");
while Token /= LAL.No_Token loop
   declare
      TD         : constant LAL.Token_Data_Type :=
         LAL.Data (Token);
      HL         : constant Highlight_Type :=
         Get (Highlights, TD);
      Sloc_Range : constant Slocs.Source_Location_Range :=
         LAL.Sloc_Range (TD);

      Text : constant Langkit_Support.Text.Text_Type :=
         LAL.Text (Token);
   begin
      while Last_Sloc.Line < Sloc_Range.Start_Line loop
         New_Line;
         Last_Sloc.Line := Last_Sloc.Line + 1;
         Last_Sloc.Column := 1;
      end loop;

      if Sloc_Range.Start_Column > Last_Sloc.Column then
         Indent (Integer (Sloc_Range.Start_Column - Last_Sloc.Column));
      end if;

      Put_Token (Text, HL);
      Last_Sloc := Slocs.End_Sloc (Sloc_Range);
   end;
   Token := LAL.Next (Token);
end loop;
Put_Line ("</pre>");

The tricky part here is that tokens alone are not enough: we use the source location information (line and column numbers) associated to tokens in order to re-create line breaks and whitespaces: this is what the inner while loop and if statement do. As usual, we delegate “low-level” actions to dedicated procedures:

procedure Put_Token
  (Text : Langkit_Support.Text.Text_Type;
   HL   : Highlighter.Highlight_Type) is
begin
   Put ("<span style=""color: #" & Get_HTML_Color (HL)
        & ";"">");
   Put (Escape (Text));
   Put ("</span>");
end Put_Token;

procedure New_Line is
begin
   Put ((1 => ASCII.LF));
end New_Line;

procedure Indent (Length : Natural) is
begin
   Put ((1 .. Length => ' '));
end Indent;

Writing the Escape function, which wraps special HTML characters such as < or > into HTML entities (< and cie), and Get_HTML_Color, which returns a suitable hexadecimal string to encode the color corresponding to a highlighting category (for instance: #ff0000, i.e. the red color, for keywords) is left as an exercise to the reader.

Note that Escape must deal with a Text_Type formal. This type, which is really a subtype of Wide_Wide_String, is used to encode source excerpts in a uniform way in Libadalang, regardless of the input encoding. In order to do something useful with them, one must transcode it to UTF-8, for example. One way to do this is to use GNATCOLL.Iconv, but this is out of the scope of this post.

So here we are! Now you know how to:

  • parse Ada source files with Libadalang;

  • iterate on the stream of tokens/trivia in the resulting analysis unit, as well as process the associated data;

  • traverse the syntax tree of this unit;

  • combine the above in order to create a syntax highlighter.

Thank you for reading this post to the end! If you are interested in pursuing this road, you can find a compilable set of sources for this syntax highlighter on Libadalang’s repository on Github. And because we cannot decently dedicate a whole blog post to a syntax highlighter without a little demo, here is one:

Little demo of Ada source code syntax highlighting with Libadalang
]]>
Pretty-Printing Ada Containers with GDB Scripts http://blog.adacore.com/pretty-printing-ada-containers-with-gdb-scripts Tue, 25 Jul 2017 13:00:00 +0000 Pierre-Marie de Rodat http://blog.adacore.com/pretty-printing-ada-containers-with-gdb-scripts

When things don’t work as expected, developers usually do one of two things: either add debug prints to their programs, or run their programs under a debugger. Today we’ll focus on the latter activity.

Debuggers are fantastic tools. They turn our compiled programs from black boxes into glass ones: you can interrupt your program at any point during its execution, see where it stopped in the source code, inspect the value of your variables (even some specific array item), follow chains of accesses, and even modify all these values live. How powerful! However, sometimes there’s so much information available that navigating through it to reach the bit of state you want to inspect is just too complex.

Take a complex container, such as an ordered map from Ada.Containers.Ordered_Maps, for example. These are implemented in GNAT as  binary trees: collections of nodes, each node having a link to its parent and its left and children a key and a value. Unfortunately, finding a particular node in a debugger that corresponds to the key you are looking for is a painful task. See for yourself:

with Ada.Containers.Ordered_Maps;

procedure PP is
   package Int_To_Nat is
  	new Ada.Containers.Ordered_Maps (Integer, Natural);

   Map : Int_To_Nat.Map;
begin
   for I in 0 .. 9 loop
  	Map.Insert (I, 10 * I);
   end loop;

   Map.Clear;  --  BREAK HERE
end PP;

Build this program with debug information and execute it until line 13:

$ gnatmake -q -g pp.adb
$ gdb -q ./pp
Reading symbols from ./pp...done.
(gdb) break pp.adb:13
Breakpoint 1 at 0x406a81: file pp.adb, line 13.
(gdb) r
Breakpoint 1, pp () at pp.adb:13
13         Map.Clear;  --  BREAK HERE

(gdb) print map
$1 = (tree => (first => 0x64e010, last => 0x64e1c0, root => 0x64e0a0, length => 10, tc => (busy => 0, lock => 0)))

# “map” is a record that contains a bunch of accesses…
# not very helpful. We need to go deeper.
(gdb) print map.tree.first.all
$2 = (parent => 0x64e040, left => 0x0, right => 0x0, color => black, key => 0, element => 0)

# Ok, so what we just saw above is the representation of the node
# that holds the key/value association for key 0 and value 0. This
# first node has no child (see left and right above), so we need to
# inspect its parent:
(gdb) print map.tree.first.parent.all
$3 = (parent => 0x64e0a0, left => 0x64e010, right => 0x64e070, color => black, key => 1, element => 10)

# Great, we havethe second element! It has a left child,
# which is our first node ($2). Now let’s go to its right
# child:
(gdb) print map.tree.first.parent.right.all
$4 = (parent => 0x64e040, left => 0x0, right => 0x0, color => black, key => 2, element => 20)

# That was the third element: this one has no left or right
# child, so we have to get to the parent of $3:
(gdb) print map.tree.first.parent.parent.all
$5 = (parent => 0x0, left => 0x64e040, right => 0x64e100, color => black, key => 3, element => 30)

# So far, so good: we already visited the left child ($4), so
# now we need to visit $5’s right child:
(gdb) print map.tree.first.parent.parent.right.all
$6 = (parent => 0x64e0a0, left => 0x64e0d0, right => 0x64e160, color => black, key => 5, element => 50)

# Key 5? Where’s the node for the key 4? Oh wait, we should
# also visit the left child of $6:
(gdb) print map.tree.first.parent.parent.right.left.all
$7 = (parent => 0x64e100, left => 0x0, right => 0x0, color => black, key => 4, element => 40)

# Ad nauseam…

Manually visiting a binary tree is much easier for computers than it is for humans, as everyone knows. So in this case, it seems easier to write a debug procedure in Ada that iterates on the container and prints each key/value association and call this debug procedure from GDB.

But this has its own drawbacks: first, it forces you to remember to write this procedure for each container instantiation you do, eventually forcing you to rebuild your program each time you debug it. But there’s worse: if you debug your program from a core dump, it’s not possible to call a debug procedure from GDB. Besides, if the state of your program is somehow corrupted, due to stack overflow or a subtle memory handling bug (dangling pointers, etc.), calling this debug procedure will probably corrupt your process even more, making the debugging session a nightmare!

This is where GDB comes to the rescue: there’s a feature called pretty-printers, which makes it possible to hook into GDB to customize how it displays values. For example, you can write a hook that intercepts values whose type matches the instantiation of ordered maps and that displays only the “useful” content of the maps.

We developed several GDB scripts to implement such pretty-printers for the most common standard containers in Ada: not only vectors, hashed/ordered maps/sets, linked lists, but also unbounded strings. You can find them in the dedicated repository hosted on GitHub: https://github.com/AdaCore/gnat-gdb-scripts.

With these scripts properly installed, inspecting the content of containers becomes much easier:

(gdb) print map
$1 = pp.int_to_nat.map of length 10 = {[0] = 0, [1] = 10, [2] = 20,
  [3] = 30, [4] = 40, [5] = 50, [6] = 60, [7] = 70, [8] = 80,
  [9] = 90}

Note that beginning with GNAT Pro 18, GDB ships with these pretty-printers, so there’s no setup other than adding the following commands to your .gdbinit file:

python import gnatdbg; gnatdbg.setup()

Happy debugging!

]]>
Proving Loops Without Loop Invariants http://blog.adacore.com/proving-loops-without-loop-invariants Thu, 20 Jul 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/proving-loops-without-loop-invariants

For all the power that comes with proof technology, one sometimes has to pay the price of writing a loop invariant. Along the years, we've strived to facilitate writing loop invariants by designing a methodology in four easy steps for writing a loop invariant, by providing loop patterns and their corresponding loop invariants, by generating automatically the part of loop invariants that talks about unmodified parts of objects, but writing loops invariants remains difficult sometimes, in particular for beginners.

At the same time, some loops look so simple that they don't seem to require a loop invariant. Take for example the following loop that initializes an array, with value J at index J for every index:

  subtype Index is Integer range 1 .. 10;
   type Arr is array (Index) of Integer;

   procedure Init (A : out Arr) is
   begin
      for J in Index loop
         A (J) := J;
      end loop;
   end Init;

Suppose you want to prove that indeed Init ensures that A(J) is equal to J at every index J:

 procedure Init (A : out Arr) with
     Post => (for all J in Index => A(J) = J);

Previously, you would have needed a loop invariant for GNATprove to be able to prove this postcondition. This is not needed anymore. Instead, GNATprove unrolls the loop in Init as if it were defined as:

   procedure Init (A : out Arr) is
   begin
      A (1) := 1;
      A (2) := 2;
      A (3) := 3;
      A (4) := 4;
      A (5) := 5;
      A (6) := 6;
      A (7) := 7;
      A (8) := 8;
      A (9) := 9;
      A (10) := 10;
   end Init;

This allows GNATprove to prove the postcondition without loop invariant.

Not every loop can be unrolled this way. Firstly, we need to know how many times the loop should be unrolled, so it's not possible to unroll while-loops, plain-loops, or for-loops that have bounds not known at compile time. Secondly, we don't want to unroll loops that have thousands of iterations, or it would lead to an explosion in complexity that would defy the purpose of this feature. Hence we limit unrolling to loops with less than 20 iterations.

Finally, we want to give the user control over this feature, so we do not unroll loops which contain already a loop invariant (or a loop variant), which is a sign that the user wants to use here the usual proof mechanism.

This new feature in SPARK has proved very effective on the first examples we've tried it. For example, it allowed to prove a Tic-Tac-Toe implementation by my colleague Quentin Ochem (for a student training course) without loop invariants, instead of the 16 loop invariants that were originally needed. I also tried it on the computation of the longest common prefix from two starting points in a text, a typical beginner example in SPARK. The original code uses a maximal length of 100000 for the text array, and a while-loop to move forward in the array. Just replace that with a maximal length of 10 and a for-loop, and you get an automatic proof of the rich postcondition of this function without any loop invariant! Transforming a while-loop into a for-loop is not that hard, provided you have a higher bound N on the number of iterations. The while-loop:

while Cond loop
      ...
   end loop;

becomes:

   for K in 1 .. N loop
      exit when not Cond;
      ...
   end loop;

Similarly, on another archetypal learning example, binary search, one can change the while-loop "while Left <= Right loop" into a simple for-loop "for J in U'Range loop" as control will exit as soon as the array is found to be ordered. Again, GNATprove manages to prove the rich postcondition of binary search without a loop invariant in that case.

Besides its interest for beginners, loop unrolling could also be convenient to explore new algorithms, as a way to get confidence in the implementation on a small scale before moving to the final production-level scale. Not only the provers can be relied on to prove for-loops with low number of iterations (under 20), but counterexamples also tend to be better on the unrolled loops, so interaction with the tool is twice improved.

That feature does not eliminate the need for loop invariants in the general case, but it makes it easier to learn and to experiment with SPARK, before the need for loop invariants pushes one to learn how to write them. To know more about this feature, see the SPARK User's Guide.

]]>
Research Corner - Focused Certification of SPARK in Coq http://blog.adacore.com/research-corner-focused-certification-of-spark-in-coq Tue, 18 Jul 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/research-corner-focused-certification-of-spark-in-coq

The SPARK toolset aims at giving guarantees to its users about the properties of the software analyzed, be it absence of runtime errors or more complex properties. But the SPARK toolset being itself a complex tool, it is not free of errors. So how do we get confidence in the results of the analysis? The established means for getting confidence in tools in industry in through a process called sometimes tool certification, sometimes tool qualification. It requires to describe at various levels of details (depending on the criticality of the tool usage) the intended functionality of the tool, and to demonstrate (usually through testing) that the tool correctly implements these functionalities.

The academic way of obtaining confidence is also called "certification" but it uncovers a completely different reality. It requires to provide mathematical evidence, through mechanized proof, that the tool indeed performs a formally specified functionality. Examples of that level of certification are the CompCert compiler and the SEL4 operating system. This level of assurance is very costly to achieve, and as a result not suitable for the majority of tools.

For SPARK, we have worked with our academic partners from Kansas State University and Conservatoire National des Arts et Métiers to achieve a middle ground, establishing mathematical evidence of the correctness of a critical part of the SPARK toolset. The part on which we focused is the tagging of nodes requiring run-time checks by the frontend of the SPARK technology. This frontend is a critical and complex part, which is shared between the formal verification tool GNATprove and the compiler GNAT. It is responsible for generating semantically annotated Abstract Syntax Trees of the source code, with special tags for nodes that require run-time checks. Then, GNATprove relies on these tags to generate formulas to prove, so a missing tag means a missing verification. Our interest in getting better assurance on this part of the technology is not theoretical: that's a part where we repeatedly had errors in the past, leading to missing verifications.

Interested in knowing more? See the paper attached which has been accepted at SEFM 2017 conference, or look at the initial formalization work presented at HILT conference in 2013.

]]>
Applied Formal Logic: Searching in Strings http://blog.adacore.com/applied-formal-logic-searching-in-strings Thu, 29 Jun 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/applied-formal-logic-searching-in-strings

A friend pointed me to recent posts by Tommy M. McGuire, in which he describes how Frama-C can be used to functionally prove a brute force version of string search, and to find a previously unknown bug in a faster version of string search called quick search. Frama-C and SPARK share similar history, techniques and goals. So it was tempting to redo the same proofs on equivalent code in SPARK, and completing them with a functional proof of the fixed version of quick search. This is what I'll present in this post.

Contrary to strings in C which start at index 0, standard strings in SPARK range over positive numbers, and usually start at index 1. I could have made my own strings to start at index 0, but there is no reason to stick to C convention when writing the algorithm in SPARK. At the same time, it's convenient to force the string to start at index 1 with an explicit predicate, which I do like that:

 subtype Text is String with Predicate => Text'First = 1;

Following the order of exposure of Tommy M. McGuire's posts, here is the implementation for the brute force algorithm in SPARK:

 function Brute_Force (Needle, Haystack : in Text) return Natural is
      Diff : Boolean;
   begin
      for I in 1 .. Haystack'Length - Needle'Length + 1 loop
         Diff := False;

         for J in Needle'Range loop
            Diff := Needle(J) /= Haystack(J + (I - 1));
            exit when Diff;
         end loop;

         if not Diff then
            return I;
         end if;
      end loop;

      return 0;
   end Brute_Force;

I am doing here without the parameters n and h which were used in the C version to denote the length of strings needle and haystack, since these are readily available as attributes Haystack'Length and Needle'Length in SPARK. Since I'm working on strings starting at index 1, there are a few adjustments compared to the C version. The use of a temporary variable Diff is needed to detect that the inner loop was exited due to a difference between Needle and the portion of Haystack starting at J, as the for-loop in SPARK does not increment its index in the last iteration of the loop, contrary to its C version.

On this initial version, GNATprove issues one message about a possible integer overflow when computing "Haystack'Length - Needle'Length + 1". It automatically proves all other run-time checks (2 initialization checks, 1 array index check,  2 integer range checks, 2 integer overflow checks). GNATprove also provides a counterexample to understand the possible failure, which can be displayed in our IDE GPS by clicking on the magnify icon on the left of the message/line:

You have to scroll right in the IDE to see all the values, so here are the relevant ones: Haystack'First = 1 and Haystack'Last = 2147483647 and Needle'First = 1 and Needle'Last = 0. In that case, Haystack'Length is 2147483647 and Needle'Length is 0, which means that "Haystack'Length - Needle'Length + 1" is one past the largest signed 32-bits integer. Hence the overflow. One way to avoid this issue is to require that Needle is not the empty string, so its length is at least 1:

function Brute_Force (Needle, Haystack : in Text) return Natural with
     Pre => Needle'Length >= 1;

This precondition is sufficient for GNATprove to prove all checks in Brute_Force, but I've made it stronger like done by TMM in his post, as it does not make sense to look for a needle that is longer than the haystack:

function Brute_Force (Needle, Haystack : in Text) return Natural with
     Pre => Needle'Length in 1 .. Haystack'Length;

Note that, compared to what is needed with Frama-C, we don't need here to provide loop assigns or loop invariants. GNATprove automatically computes the variables that are modified in a loop, as well as the range of for-loop indexes. Still following the order of exposure of TMM's posts, let's turn to the functional contract for searching a string. I'm directly translating here the functions partial_match_at and match_at given by TMM from C to SPARK, as well as the contract of brute_force. Functions Partial_Match_At and Match_At are ghost functions in SPARK (with aspect Ghost), which means that they can be used only in assertions/contracts and ghost code. A difference with Frama-C is that ghost code is executable like regular code in SPARK, so one must show absence of run-time errors in ghost code as well, hence the precondition on Partial_Match_At below:

 --  There is a partial match of the needle at location loc in the
   --  haystack, of length len.
   function Partial_Match_At
     (Needle, Haystack : Text; Loc : Positive; Len : Natural) return Boolean
   is
     (for all I in 1 .. Len => Needle(I) = Haystack(Loc + (I - 1)))
   with Ghost,
        Pre => Len <= Needle'Length
          and then Loc - 1 <= Haystack'Length - Len;

   --  There is a complete match of the needle at location loc in the
   --  haystack.
   function Match_At (Needle, Haystack : Text; Loc : Positive) return Boolean is
     (Loc - 1 <= Haystack'Length - Needle'Length
      and then Partial_Match_At (Needle, Haystack, Loc, Needle'Length))
   with Ghost;

The contract on Brute_Force is similar to the one in Frama-C, with a shift by one for the origin of strings, Brute_Force'Result instead of \result to denote the result of the function, and an if-expression instead of behaviors (SPARK has a similar notion of contract cases, but they must always have disjoint guards in SPARK, so are not applicable here):

function Brute_Force (Needle, Haystack : in Text) return Natural with
     Pre  => Needle'Length in 1 .. Haystack'Length,
     Post => Brute_Force'Result in 0 .. Haystack'Length - Needle'Length + 1
       and then
       (if Brute_Force'Result > 0 then
          Match_At (Needle, Haystack, Brute_Force'Result)
        else
          (for all K in Haystack'Range =>
             not Match_At (Needle, Haystack, K)));

Before we even try to prove that this contract is satisfied by the implementation of Brute_Force, it is a good idea to test it on a few inputs, to get rid of silly mistakes. Here is a test driver to do precisely that:

with String_Search; use String_Search;

procedure Test_Search is
   All_Men : constant Text :=
     "We hold these truths to be self-evident, that all men are created equal,"
     & " that they are endowed by their Creator with certain unalienable "
     & "Rights, that among these are Life, Liberty and the Pursuit of "
     & "Happiness. That to secure these rights, Governments are instituted "
     & "among Men, deriving their just powers from the consent of the governed";
begin
   pragma Assert (Brute_Force (All_Men, "just powers") > 0);
   pragma Assert (Brute_Force (All_Men, "austin powers") = 0);
end Test_Search;

Just compile the code with assertions on (switch -gnata), run it, and... it fails the precondition of Brute_Force:

raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : failed precondition from string_search.ads:24

What happened here is that I put arguments in the wrong order in the call to Brute_Force. I'm not making this up, this really happened to me (I am that bad!). Anyway, that illustrates that testing is a good idea, even if here it detected a bug in the test itself. The fix in SPARK is to use named parameters to avoid such issues. They don't have to appear in the same order as in the function signature, but it's a good idea nonetheless:

 pragma Assert (Brute_Force (Needle => "just powers", Haystack => All_Men) > 0);

Once fixed, the test passes without errors. Like in the case of Frama-C, we need to add loop invariants for GNATprove to prove that Brute_Force satisfies its contract. Loop invariants in SPARK are different from the classical loop invariants used in Frama-C: you can put them anywhere in the loop, and they don't have to hold when reaching/exiting the loop but only when execution reaches the program point of the loop invariant. I prefer in general to put loop invariants at the end of loops, because it's more natural to express what has been achieved so far:

function Brute_Force (Needle, Haystack : in Text) return Natural is
      Diff : Boolean;
   begin
      for I in 1 .. Haystack'Length - Needle'Length + 1 loop
         Diff := False;

         for J in Needle'Range loop
            Diff := Needle(J) /= Haystack(J + (I - 1));
            exit when Diff;
            pragma Loop_Invariant (Partial_Match_At (Needle, Haystack, I, J));
            pragma Loop_Invariant (Diff = (Needle(J) /= Haystack(J + (I - 1))));
         end loop;

         if not Diff then
            return I;
         end if;

         pragma Loop_Invariant
           (for all K in 1 .. I => not Match_At (Needle, Haystack, K));
      end loop;

      return 0;
   end Brute_Force;

A subtlety above is that, since we're replacing the implicit loop invariant in the inner loop (located at the start of the loop) by an explicit loop invariant at the end of the inner loop, we need to repeat in that loop invariant the information about the current value of Diff, otherwise this information is not available on the path starting from the loop invariant and exiting the loop in the last iteration. Otherwise this is similar to what was done in Frama-C. With these loop invariants, GNATprove proves all checks in Brute_Force, including its postcondition.

I kept above the implementation structure originating from the C version of brute_force, but in SPARK we can simplify it by replacing the inner loop with a direct comparison of Needle with a slice of Haystack:

   function Brute_Force (Needle, Haystack : in Text) return Natural is
   begin
      for I in 1 .. Haystack'Length - Needle'Length + 1 loop
         if Needle = Haystack(I .. I + (Needle'Last - 1)) then
            return I;
         end if;

         pragma Loop_Invariant
           (for all K in 1 .. I => not Match_At (Needle, Haystack, K));
      end loop;

      return 0;
   end Brute_Force;

This version is also completely proved by GNATprove.

Now turning to the more involved algorithm for string search called quick search presented in this other post by TMM. Translating the implementation, contracts and loop invariants in SPARK is quite easy. As for the brute force version, more precise types in SPARK allow to get rid of a number of annotations:

type Shift_Table is array (Character) of Positive;

   procedure Make_Bad_Shift (Needle : Text; Bad_Shift : out Shift_Table) with
     Pre  => Needle'Length < Integer'Last,
     Post => (for all C in Character => Bad_Shift(C) in 1 .. Needle'Length + 1);

   function QS (Needle, Haystack : in Text) return Natural with
     Pre => Needle'Length < Integer'Last
       and then Haystack'Length < Integer'Last - 1
       and then Needle'Length in 1 .. Haystack'Length;

I am also getting rid of a loop in Make_Bad_Shift and a loop in QS compared to their C version, as we can directly assign and compare strings in SPARK:

   procedure Make_Bad_Shift (Needle : Text; Bad_Shift : out Shift_Table) is
   begin
      Bad_Shift := (others => Needle'Length + 1);

      for J in Needle'Range loop
         Bad_Shift(Needle(J)) := Needle'Length - J + 1;
         pragma Loop_Invariant (for all C in Character => Bad_Shift(C) in 1 .. Needle'Length + 1);
      end loop;
   end Make_Bad_Shift;

   function QS (Needle, Haystack : in Text) return Natural is
      Bad_Shift : Shift_Table;
      I : Positive;

   begin
      --  Preprocessing
      Make_Bad_Shift (Needle, Bad_Shift);

      --  Searching
      I := 1;
      while I <= Haystack'Length - Needle'Length + 1 loop
         if Needle = Haystack(I .. I + (Needle'Last - 1)) then
            return I;
         end if;
         I := I + Bad_Shift(Haystack(I + Needle'Length));  --  Shift
      end loop;

      return 0;
   end QS;

GNATprove proves all checks on the above code, including postconditions, except for the array index check when computing "Haystack(I + Needle'Length)". This is precisly the bug that was discovered by TMM, that he presents in his post. GNATprove further helps by providing a counterexample to understand the possible failure:

Indeed, when I=2 and Haystack'Last=2, "I + Needle'Length" is outside of the bounds of Haystack whenever Needle is not the empty string. We can fix that by exiting early from the loop before the assignment to I in the loop:

exit when I = Haystack'Length - Needle'Length + 1;

With this fix, GNATprove proves all checks on the code of quick search.

Now turning to proving the functional behavior of quick search. The postcondition of QS is the same as the one of Brute_Force, given that only the algorithm changes between the two:

function QS (Needle, Haystack : in Text) return Natural with
     Pre => Needle'Length < Integer'Last
       and then Haystack'Length < Integer'Last - 1
       and then Needle'Length in 1 .. Haystack'Length,
     Post => QS'Result in 0 .. Haystack'Length - Needle'Length + 1
       and then
       (if QS'Result > 0 then
          Match_At (Needle, Haystack, QS'Result)
        else
          (for all K in Haystack'Range =>
             not Match_At (Needle, Haystack, K)));

In order to prove the contract of QS, we'll need to specify and prove the functional behavior of Make_Bad_Shift first. As explained by TMM in his post, Make_Bad_Shift is used to align the last instance of a given character in the needle with a matching character in the haystack. So for every such character C, either it does not occur in the needle in which case Bad_Shift(C) has the value "Needle'Length + 1", or it occurs (possibly multiple times) in the needle in which case it occurs last at index "Needle'Length - Bad_Shift(C) + 1". This is what is expressed in the following postcondition:

 procedure Make_Bad_Shift (Needle : Text; Bad_Shift : out Shift_Table) with
     Pre  => Needle'Length < Integer'Last,
     Post => (for all C in Character => Bad_Shift(C) in 1 .. Needle'Length + 1)
       and then (for all C in Character =>
                   (if Bad_Shift(C) = Needle'Length + 1 then
                      (for all K in Needle'Range => C /= Needle(K))
                    else
                      Needle(Needle'Length - Bad_Shift(C) + 1) = C
                      and (for all K in Needle'Length - Bad_Shift(C) + 2 .. Needle'Last => Needle(K) /= C)
                 ));

In order to prove that the implementation of Make_Bad_Shift satisfies this postcondition, we simply have to repeat this postcondition as a loop invariant, accumulating that information as the loop index J progresses (see how occurrences of Needle'Last in the postcondition were replaced by occurrences of J in the loop invariant):

 procedure Make_Bad_Shift (Needle : Text; Bad_Shift : out Shift_Table) is
   begin
      Bad_Shift := (others => Needle'Length + 1);

      for J in Needle'Range loop
         Bad_Shift(Needle(J)) := Needle'Length - J + 1;
         pragma Loop_Invariant (for all C in Character => Bad_Shift(C) in 1 .. Needle'Length + 1);
         pragma Loop_Invariant (for all C in Character =>
                                  (if Bad_Shift(C) = Needle'Length + 1 then
                                     (for all K in 1 .. J => C /= Needle(K))
                                   else
                                      Needle(Needle'Length - Bad_Shift(C) + 1) = C
                                      and (for all K in Needle'Length - Bad_Shift(C) + 2 .. J => Needle(K) /= C)
                                ));
      end loop;
   end Make_Bad_Shift;

GNATprove proves all checks on the above code.

Now turning to QS, we need to establish a loop invariant very similar to the one used in Brute_Force, except here we want to establish the property that Needle does not match up to index "I + Bad_Shift(Haystack(I + Needle'Length)) - 1" instead of just I:

  pragma Loop_Invariant
           (for all K in 1 .. I + Bad_Shift(Haystack(I + Needle'Length)) - 1 => not Match_At (Needle, Haystack, K));

We also need to bound I in the loop invariant, as we're inserting the above loop invariant in the middle of the loop, hence we do not get "for free" that I satisfies the loop test:

 pragma Loop_Invariant (I <= Haystack'Length - Needle'Length);

With these additions, GNATprove proves all checks in QS, including its postcondition, but it does not prove its loop invariant:

string_search.adb:111:81: medium: loop invariant might fail after first iteration, cannot prove not Match_At (Needle, Haystack, K) (e.g. when Haystack = (0 => 'NUL', 5 => 'NUL', others => 'SOH') and Haystack'First = 1 and Haystack'Last = 6 and I = 4 and K = 5 and Needle = (0 => 'SOH', 3 => 'SOH', 4 => 'SOH', 6 => 'SOH', others => 'NUL') and Needle'First = 1 and Needle'Last = 2)
string_search.adb:111:81: medium: loop invariant might fail in first iteration, cannot prove not Match_At (Needle, Haystack, K) (e.g. when Haystack = (0 => 'NUL', 2 => 'NUL', others => 'SOH') and Haystack'First = 1 and Haystack'Last = 3 and I = 1 and K = 2 and Needle = (0 => 'SOH', 3 => 'SOH', others => 'NUL') and Needle'First = 1 and Needle'Last = 2)

This is expected. There is a big reasoning gap to go from the postcondition of Make_Bad_Shift to the loop invariant in QS. We are going to use ghost code to close that gap and convince GNATprove that the loop invariant holds in every iteration. What we need to show is that, for every starting position that is skipped (for K in the range I + 1 to I + Bad_Shift(Haystack(I + Needle'Length)) - 1), the needle cannot align with the haystack at that position. In fact, we know exactly at which position these alignments would fail: at the position "I + Needle'Length" in Haystack. Looking at the postcondition of Make_Bad_Shift, this corresponds to position "I + Needle'Length - K + 1" in Needle. Let's write it down just before the loop invariant:

       for K in I + 1 .. I + Bad_Shift(Haystack(I + Needle'Length)) - 1 loop
            pragma Assert (Haystack(I + Needle'Length) /= Needle(I + Needle'Length - K + 1));
            pragma Assert (not Match_At (Needle, Haystack, K));
         end loop;

GNATprove proves the above assertions, using the first one to prove the second one, so we can now accumulate this information in a loop invariant for all values of positions that are skipped:

 for K in I + 1 .. I + Bad_Shift(Haystack(I + Needle'Length)) - 1 loop
            pragma Assert (Haystack(I + Needle'Length) /= Needle(I + Needle'Length - K + 1));
            pragma Loop_Invariant
              (for all L in 1 .. K => not Match_At (Needle, Haystack, L));
         end loop;

With this addition of ghost code, GNATprove proves all checks in QS, including its postcondition and loop invariants. In the final version of that code, I'm using a local ghost procedure Prove_QS instead of inlining the ghost code in the implementation of QS. That way, GNATprove still internally inlines the implementation of Prove_QS to prove QS, but the compiler will completely get rid of the body and call to Prove_QS in the final executable built without assertions:

 function QS (Needle, Haystack : in Text) return Natural is
      Bad_Shift : Shift_Table;
      I : Positive;

      procedure Prove_QS with Ghost is
         Shift : constant Positive := Bad_Shift(Haystack(I + Needle'Length));
      begin
         for K in I + 1 .. I + Shift - 1 loop
            pragma Assert (Haystack(I + Needle'Length) /= Needle(I + Needle'Length - K + 1));
            pragma Loop_Invariant
              (for all L in 1 .. K => not Match_At (Needle, Haystack, L));
         end loop;
      end Prove_QS;

   begin
      --  Preprocessing
      Make_Bad_Shift (Needle, Bad_Shift);

      --  Searching
      I := 1;
      while I <= Haystack'Length - Needle'Length + 1 loop
         if Needle = Haystack(I .. I + (Needle'Last - 1)) then
            return I;
         end if;
         exit when I = Haystack'Length - Needle'Length + 1;

         Prove_QS;

         pragma Loop_Variant (Increases => I);
         pragma Loop_Invariant (I <= Haystack'Length - Needle'Length);
         pragma Loop_Invariant
           (for all K in 1 .. I + Bad_Shift(Haystack(I + Needle'Length)) - 1 => not Match_At (Needle, Haystack, K));

         I := I + Bad_Shift(Haystack(I + Needle'Length));  --  Shift
      end loop;

      return 0;
   end QS;

I also added a loop variant to ensure that the while-loop will terminate. For-loops always terminate in SPARK because the loop index cannot be assigned by the user (contrary to what C allows), but while-loops or plain-loops might not terminate, hence the use of a loop variant to verify their termination.

The code presented in this post is available on GitHub: spec and body. Now a challenge for Frama-C users is to translate back the functional proof of QS in SPARK into C and Frama-C!

The project SPARK-by-Example by Christophe Garion and Jérôme Hugues contains other examples of functionally proven string algorithms, which correspond to the SPARK version of the work done by Jens Gerlach with Frama-C in the ACSL-by-Example project.

]]>
The Adaroombot Project http://blog.adacore.com/the-adaroombot-project Tue, 20 Jun 2017 12:48:00 +0000 Rob Tice http://blog.adacore.com/the-adaroombot-project

Owning an iRobot RoombaⓇ is an interesting experience. For those not familiar, the RoombaⓇ is a robot vacuum cleaner that’s about the diameter of a small pizza and stands tall enough to still fit under your bed. It has two independently driven wheels, a small-ish dust bin, a rechargeable battery, and a speaker programmed with pleasant sounding beeps and bloops telling you when it’s starting or stopping a cleaning job. You can set it up to clean on a recurring schedule through buttons on the robot, or with the new models, the mobile app. It picks up an impressive amount of dust and dirt and it makes you wonder how you used to live in a home that was that dirty.

A Project to Learn Ada


I found myself new to AdaCore without any knowledge of the Ada programming language around the same time I acquired a RoombaⓇ for my cats to use as a golf cart when I wasn’t home. In order to really learn Ada I decided I needed a good project to work on. Having come from an embedded Linux C/C++ background I decided to do a project involving a Raspberry Pi and something robotic that it could control. It just so happens that iRobot has a STEM initiative robot called the CreateⓇ 2 which is aimed towards embedded control projects. That’s how the AdaRoombot project was born.

The first goal of the project was to have a simple Ada program use the CreateⓇ 2’s serial port to perform some control algorithm. Mainly this would require the ability to send commands to the robot and receive feedback information from the robot’s numerous sensors. As part of the CreateⓇ 2 documentation package, there is a PDF detailing the serial port interface called the iRobot CreateⓇ 2 Open Interface Specification.

On the command side of things there is a simple protocol: each command starts with a one-byte opcode specifying which command is being issued and then is followed by a number of bytes carrying the data associated with the opcode, or the payload. For example, the Reset command has an opcode of 7 and has zero payload bytes. The Set Day/Time command has an opcode of 168 and has a 3-byte payload with a byte specifying the Day, another for the Hour, and another for the Minute. The interface for the Sensor data is a little more complicated. The host has the ability to request data from individual sensors, a list of sensors, or tell the robot to just stream the list over and over again for processing. To make things simple, I choose to just receive all the sensor data on each request.

Because we are using a Raspberry Pi, it is quite easy to communicate with a serial port using the Linux tty interface. As with most userspace driver interfaces in Linux, you open a file and read and write byte data to the file. So, from a software design perspective, the lowest level of the program abstraction should take robot commands and transform them into byte streams to write to the file, and conversely read bytes from the file and transform the byte data to sensor packets. The next level of the program should perform some algorithm by interpreting sensor data and transmitting commands to make the robot perform some task and the highest level of the program should start and stop the algorithm and do some sort of system monitoring.

The high level control algorithm I used is very simple: drive straight until I hit something, then turn around and repeat. However, the lower levels of the program where I am interfacing with peripherals is much more exciting. In order to talk to the serial port, I needed access to file I/O and Linux’s terminal I/O APIs. 


Ada has cool features

Ada has a nifty way to interface with the Linux C libraries that can be seen near the bottom of “src/communication.ads”. There I am creating Ada specs for C calls, and then telling the compiler to use the C implementations supplied by Linux using pragma Import. This is similar to using extern  in C. I am also using pragma Convention which tells the compiler to treat Ada records like C structs so that they can be passed into the imported C functions. With this I have the ability to interface to any C call I want using Ada, which is pretty cool. Here is an example mapping the C select call into Ada:

--  #include <sys/select.h>
--  fd_set represents file descriptor sets for the select function.
--  It is actually a bit array.
type Fd_Set is mod 2 ** 32;
pragma Convention (C, Fd_Set);

--  #include <sys/time.h>
--  time_t tv_sec - number of whole seconds of elapsed time.
--  long int tv_usec - Rest of the elapsed time in  microseconds.
type Timeval is record
    Tv_Sec  : C.Int;
    Tv_Usec : C.Int;
end record;
pragma Convention (C, Timeval);

function C_Select (Nfds : C.Int;
                  Readfds   : access Fd_Set;
                  Writefds  : access Fd_Set;
                  Exceptfds : access Fd_Set;
                  Timeout   : access Timeval)
                  return C.int;
pragma Import (C, C_Select, "select");

The other neat low-level feature to note here can be seen in “src/types.ads”. The record Sensor_Collection is a description of the data that will be received from the robot over the serial port. I am using a feature called a representation clause to tell the compiler where to put each component of the record in memory, and then overlaying the record on top of a byte stream. By doing this, I don’t have to use any bit masks or bit shift to access individual bits or fields within the byte stream. The compiler has taken care of this for me. Here is an example of a record which consists of Boolean values, or bits in a byte:

type Sensor_Light_Bumper is record
    LT_Bump_Left         : Boolean;
    LT_Bump_Front_Left   : Boolean;
    LT_Bump_Center_Left  : Boolean;
    LT_Bump_Center_Right : Boolean;
    LT_Bump_Front_Right  : Boolean;
    LT_Bump_Right        : Boolean;
end record
 with Size => 8;

for Sensor_Light_Bumper use record
    LT_Bump_Left at 0 range 0 .. 0;
    LT_Bump_Front_Left at 0 range 1 .. 1;
    LT_Bump_Center_Left at 0 range 2 .. 2;
    LT_Bump_Center_Right at 0 range 3 .. 3;
    LT_Bump_Front_Right at 0 range 4 .. 4;
    LT_Bump_Right at 0 range 5 .. 5;
end record;

In this example, LT_Bump_Left is the first bit in the byte, LT_Bump_Front_Left is the next bit, and so on. In order to access these bits, I can simply use the dot notation to access members of the record, where with C I would have to mask and shift. Components that span multiple bytes can also include an endianness specification. This is useful because on this specific platform data is little endian, but the serial port protocol is big endian. So instead of byte swapping, I can specify certain records as having big endian mapping. The compiler then handles the swapping.

These are some of the really cool low level features available in Ada. On the high-level programming side, the algorithm development, Ada feels more like C++, but with some differences in philosophy. For instance, certain design patterns are more cumbersome to implement in Ada because of things like, Ada objects don’t have explicit constructors or destructors. But, after a small change in mind-set it was fairly easy to make the robot drive around the office.

Adaroombot Bottom
Adaroombot Top

The code for AdaRoombot, which is available on Github, can be compiled using the GNAT GPL cross compiler for the Raspberry Pi 2 located on the AdaCore Libre site. The directions to build and run the code is included in the README file in the root directory of the repo. The next step is to add some vision processing and make the robot chase a ball down the hallway. Stay tuned….

The code is available on GitHub: here.


Want to start creating and hacking with Ada? We have a great challenge for you!

The Make with Ada competition, hosted by AdaCore, calls on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and offers over €8000 in total prizes. Celebrating its sophomore year, the Make With Ada Competition is designed to promote the benefits of the languages to the software development community at large. For more information and to register, go to makewithada.org.

]]>
GNAT GPL 2017 is out! http://blog.adacore.com/gnat-gpl-2017-is-out Thu, 15 Jun 2017 13:20:09 +0000 Pierre-Marie de Rodat http://blog.adacore.com/gnat-gpl-2017-is-out

For those users of the GNAT GPL edition, we are pleased to announce the availability of the 2017 release of GNAT GPL and SPARK GPL.

SPARK GPL 17 offers improved automation of proofs, thanks to improvements in the underlying prover Alt-Ergo and a finer-grain splitting of conjunctions. Interaction during proof has been improved thanks to the new statistics, display and replay modes. Type invariants from Ada are now also supported in SPARK. Note that the optional provers CVC4 and Z3 are no longer distributed with SPARK GPL 2017, and should be installed separately.

This release also marks the first introduction of “future language” Ada 2020 features:

  • delta aggregates (partial aggregate notation)
  • AI12-0150-1 class-wide invariants now employ class-wide pre-/postcondition-like semantics (static call resolution).

This release supports the ARM ELF bare metal target, hosted on Windows and Linux, as well as the following native platforms:

  • Mac OS (64 bits)
  • Linux (64 bits)
  • Windows (32 bits)

The GNATemulator technology has been added to the bare metal target, making it easier to develop and test on those platforms.

The compiler toolchain is now based on GCC 6. The native runtime comes with a Zero Foot Print runtime, and the ARM ELF compiler comes with runtimes for a variety of boards, including support for the Raspberry Pi 2.

The latest version of the GPS IDE comes with many bug fixes and enhancements, notably in the areas of debugger integration and support for bare-metal development.

The GPL 2017 release can be downloaded:

Feeling inspired and want to start Making with Ada today? Check out the Make With Ada Competition! http://makewithada.org

]]>
Ada on the first RISC-V microcontroller http://blog.adacore.com/ada-on-the-first-risc-v-microcontroller Tue, 13 Jun 2017 12:30:00 +0000 Fabien Chouteau http://blog.adacore.com/ada-on-the-first-risc-v-microcontroller

Updated July 2018

The RISC-V open instruction set is getting more and more news coverage these days. In particular since the release of the first RISC-V microcontroller from SiFive.

Since the initial release of this blog post we have improved the support of Ada/SPARK on RISC-V and the HiFive1 board.

In GNAT Community Edition 2018, the HiFive1 is now directly supported on Linux. This means that the procedure to use the board is greatly simplified:

  • Download and install GNAT riscv32-elf hosted on Linux. This package contains the RISC-V cross compiler as well the required Ada run-times
  • Download and install GNAT native for Linux. This package contains the GNAT Programming Studio IDE
  • Download the Ada_Drivers_Library project
  • Start GNAT Programming Studio
  • Click on "File -> Open Project..."
  • Select the "examples/HiFive1/hifive1_example.gpr" file in the Ada_Drivers_Library sources and click Ok
  • Click on the "Build all" icon to compile the example
  • Follow the instructions in the freedom-e-sdk to flash the example on the board using OpenOCD

Feeling inspired and want to start Making with Ada? We have the perfect challenge for you!

The Make with Ada competition, hosted by AdaCore, calls on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and offers over €8000 in total prizes.

]]>
Research Corner - FLOSS Glider Software in SPARK http://blog.adacore.com/research-corner-floss-glider-software-in-spark Sun, 11 Jun 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/research-corner-floss-glider-software-in-spark

Two years ago, we redeveloped the code of a small quadcopter called Crazyflie in SPARK, as a proof-of-concept to show it was possible to prove absence of run-time errors (no buffer overflows, not division by zero, etc.) on such code. Actually, this was done with very modest effort: the rewrite of the stabilization code was all done by an intern in two months. Since then, we maintain the resulting code as FLOSS on GitHub, and it has been used for example by the people involved in CAP 2018 project as a prototyping platform.

The researchers Martin Becker and Emanuel Regnath have raised the bar by developing the code for the autopilot of a small glider in SPARK in three months only. This time, we talk of an autonomous drone operating beyond line of sight. In such a limited timeframe, they achieved both high level of SPARK coverage (portion of the code in SPARK) and high level of automatic proof. They also developed their own agile process around SPARK, using scripts that you can find on this blog. They mostly targeted absence of run-time errors (the Silver level of SPARK assurance) but this is already an impressive feat! In particular they reported about the challenges with proofs of floating-point computations, a topic we have already talked about on this blog.

What's even more interesting for others tempted to do something similar in academia or in industry is that they have published a paper about their experience at SAFECOMP, presented their work at the Frama-C & SPARK Day, and released their code as FLOSS. And of course they are now targeting a more ambitious project to apply the same techniques with SPARK!

]]>
Research Corner - Floating-Point Computations in SPARK http://blog.adacore.com/research-corner-floating-point-computations-in-spark Thu, 08 Jun 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/research-corner-floating-point-computations-in-spark

It is notoriously hard to prove properties of floating-point computations, including the simpler bounding properties that state safe bounds on the values taken by entities in the program. Thanks to the recent changes in SPARK 17, users can now benefit from much better provability for these programs, by combining the capabilities of different provers. For the harder cases, this requires using ghost code to state intermediate assertions proved by one of the provers, to be used by others. This work is described in an article which was accepted at VSTTE 2017 conference. In this article, we describe the mechanisms for adapting the formulas to prove to different provers based on  different technologies to interpret floating-point computations. As we already presented on this blog, the improvements in particular with the abstract interpreter CodePeer and the SMT prover Z3 are very important.

One figure from the article which explains well the current situation is the following:

In order to prove the postcondition of the procedure analyzed (last line), it is necessary to insert 12 intermediate assertions in the source code, which are proved each by one of the provers provided with SPARK Pro (CVC4, ALt-Ergo, Z3 and CodePeer). The green cells in that figure correspond to provers which prove the corresponding formula, with the time they take for that in seconds. Some of these assertions require the use of ghost code to define entities only meant for verification.

While this may seem like a lot of work to prove here a single postcondition, you may convince yourself by reading the article (where we give a mathematical argument for the proof) that this is not trivial either. And the improvements we observed with provers not (yet) included in SPARK (the provers AE_fpa and Colibri on the right hand side of the figure) give us much hope that the situation is going to improve a lot in the coming year.

]]>
Frama-C & SPARK Day Slides and Highlights http://blog.adacore.com/frama-c-spark-day-slides-and-highlights Fri, 02 Jun 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/frama-c-spark-day-slides-and-highlights

We had a very successful event gathering the people interested in formal program verification for C programs (with Frama-C) and for Ada programs (with SPARK). This year, 139 people registered and around 110-120 actually came. The slides are on the page of the event, under "Slides".

If you did not attend the introduction presentation by Claude Marché, I recommend it. It presents very clearly the common history of Frama-C and SPARK, and the common research topics and joint projects today, that lead to this shared event. As Claude says it in conclusion:

Frama-C and SPARK share not only a common history but


  • A will to transfer academic research to the industry of critical software
  • Common challenges, approaches, technical solutions

Of particular interest for SPARK users are the presentations of Carl Brandon, Peter Chapin, Martin Becker and Stefan Berghofer:

  1. rl Brandon and Peter Chapin presented their Lunar IceCube satellite project that was already mentioned on this blog. It's a 15 million dollar project, with a launch valued 18 million dollar, so Carl and Peter will need every help from SPARK for developing a perfect on-board software! They are trying to open source the core platform on which Lunar IceCube operates, called CubedOS, which isolates applicative threads (14 in their case) from the underlying operating system and Ada runtime. CubedOS is similar to the core Flight System developed at NASA, but it's written in SPARK and the goal is to prove at least correct data flow and absence of run-time errors on the code, possibly even some key properties if time allows.
  2. Martin Becker presented his work on a glider software in SPARK. It was a "crazy" project (as he puts it) with such limited timeframe (3 months) and resources (2 persons), so the result is even more impressive. They developed most of the software in SPARK, and achieved both high level of SPARK coverage (portion of the code in SPARK) and automatic proof. They also developed their own agile process around SPARK, using scripts that you can find on this blog.
  3. Stefan Berghofer presented his work on formal verification of cryptographic software in SPARK, using the bridge to interactive prover Isabelle for the more complex properties. His team at secunet, together with their colleagues from University of Rapperswil, has been at the forefront of formal program verification with SPARK for many years. Their Muen separation kernel in SPARK is described in this blog post.

I also liked a lot the presentation of Christophe Garion and Jérôme Hugues. They took a fairly large piece of critical software (10,000 sloc in Ada and 15,000 sloc in C), the PolyORB-Hi runtime for distributed software generated from an AADL description, and applied Frama-C on the C version and SPARK on the Ada version to achieve absence of run-time errors and proof of properties. Their conclusion, which matches the initial assessment they did in 2014, is that it is much easier to achieve high assurance through formal program verification in SPARK than in C, mostly because the language really supports it. What I like is that they did exactly what some reviewers keep asking (in my opinion sometimes mistakenly) when we submit articles describing use of SPARK on industrial projects: that we should also submit a comparison with the same goals achieved with other technologies, on programs in other programming languages. This is rarely feasible. Well, they did it!

Unfortunately, the very interesting presentation by Jean-Marc Mota on experiments on the adoption of SPARK at Thales is not available. This blog post presents the levels of software assurance that we defined for SPARK as the result of these experiments.


]]>
New Guidance for Adoption of SPARK http://blog.adacore.com/new-guidance-for-adoption-of-spark Sat, 27 May 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/new-guidance-for-adoption-of-spark

During 2016, AdaCore has collaborated with Thales to carry multiple experiments in applying SPARK to existing software projects in Ada. We discovered two things during these experiments:

  1. Adoption of formal verification, especially on existing projects, can be achieved in stages.
  2. Specific guidance is essential for adoption, both to define achievable objectives and to address the likely issues that will arise.

The stages we have defined are called Stone, Bronze, Silver, Gold and Platinum. They are described in the document called Implementation Guidance for the Adoption of SPARK that we co-authored with Thales. For each level, we define the benefits but also the impact on process and the costs and limitations. That's the high level view of each level. Then we give detailed guidelines on how to achieve that level on existing or new code.

What we discovered later was that the highest four levels (except Stone) map well with the verication objectives targeted at different DAL/SIL levels at Altran UK, with highest levels typically applied only at highest DAL/SIL levels (DAL A/SIL 4). The levels, guidelines and mapping with DAL/SIL were presented at the recent High Confidence Software and Systems conference.

]]>
DIY Coffee Alarm Clock http://blog.adacore.com/diy-coffee-alarm-clock Tue, 16 May 2017 13:00:00 +0000 Fabien Chouteau http://blog.adacore.com/diy-coffee-alarm-clock

A few weeks ago one of my colleagues shared this kickstarter project : The Barisieur. It’s an alarm clock coffee maker, promising to wake you up with a freshly brewed cup of coffee every morning. I jokingly said “just give me an espresso machine and I can do the same”. Soon after, the coffee machine is in my office. Now it is time to deliver :)

The basic idea is to control the espresso machine from an STM32F469 board and use the beautiful screen to display the clock face and configuration interface.

Hacking the espresso machine

The first step is to be able to control the machine with the 3.3V signal of the microcontroller. To do this, I open the case to get to the two push buttons on the top. Warning! Do not open this kind of appliance if you don’t know what you are doing. First, it can be dangerous, second, these things are not made to be serviceable so there’s a good chance you will never be able to put it back together.

The push buttons are made with two exposed concentric copper traces on a small PCB and a conductive membrane that closes the circuit when the button is pushed.

I use a multimeter to measure the voltage between two circles of one of the buttons. To my surprise the voltage is quite high, about 16V. So I will have to use a MOSFET transistor to act as an electronic switch rather than just connecting the microcontroller to the espresso machine signals.

I put that circuit on an Arduino proto shield that is then plugged behind the STM32F469 disco board. The only things left to do are to drill a hole for the wires to go out of the the machine and to make a couple of metal brackets to attach to the board. Here’s a video showing the entire hacking process.

Writing the alarm clock software

For the clock face and configuration interface I will use Giza, one of my toy projects that I developed to play with the object oriented programming features of Ada. It’s a simplistic/basic UI framework.

Given the resolution of the screen (800x480) and the size of the text I want to display, it will be too slow to use software font rendering. Instead, I will take advantage of the STM32F4’s 2D graphic hardware acceleration (DMA2D) and have some bitmap images embedded in the executable. DMA2D can very quickly copy chunks of memory - typically bitmaps - but also convert them from one format to the other. This project is the opportunity to implement support of indexed bitmap in the Ada_Drivers_Library.

I also add support for STM32F4’s real time clock (RTC) to be able to keep track of time and date and of course trigger the coffee machine at the time configured by the user.

It’s time to put it all together and ask my SO to perform in the the high budget video that you can see at the beginning of this post :)

The code is available on GitHub: here.


Feeling inspired and want to start Making with Ada today? We have the perfect challenge for you!

The Make with Ada competition, hosted by AdaCore, calls on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and offers over €8000 in total prizes. Celebrating its sophomore year, the Make With Ada Competition is designed to promote the benefits of the languages to the software development community at large. For more information and to register, go to makewithada.org.

]]>
(Many) More Low Hanging Bugs http://blog.adacore.com/many-more-low-hanging-bugs Fri, 05 May 2017 13:00:00 +0000 Yannick Moy http://blog.adacore.com/many-more-low-hanging-bugs

In a previous post, we reported our initial experiments to create lightweight checkers for Ada source code, based on the new Libadalang technology. The two checkers we described discovered 12 issues in the codebase of the tools we develop at AdaCore. In this post, we are reporting on 6 more lightweight checkers, which have discovered 114 new issues in our codebase. These 6 checkers allowed us to detect errors and code quality issues for 4 of them (check_deref_null, check_test_not_null, check_same_test, check_bad_unequal) and refactoring opportunities for 2 of them (check_same_then_else, check_useless_assign). Every checker runs in seconds on our codebase, which made it easy to improve them until the checkers had no false alarms. Currently, none of these checkers uses the recent semantic analysis capability in Libadalang, which might be useful in the future to improve their precision. In each of these checkers, we took inspiration from similar lightweight checkers in other static analysis tools, in particular PVS-Studio and its gallery of real-life examples.

Checkers on Dereference

Our first checker is a favorite of many tools. It checks whether a pointer that has been derefenced, is later tested against the null value. This is suspicious, as we'd expect the sequence of events to be opposite. This can point either to an error (the pointer should not be dereferenced without a null check) or a code quality issue (the null test is useless). In fact, we found both when applying the checker on the codebase of our tools. Here's an example of error found in the GNAT compiler, in g-spipat.adb, where procedure Dump dereferences P.P at line 2088:

   procedure Dump (P : Pattern) is

      subtype Count is Ada.Text_IO.Count;
      Scol : Count;
      --  Used to keep track of column in dump output

      Refs : Ref_Array (1 .. P.P.Index);
      --  We build a reference array whose N'th element points to the
      --  pattern element whose Index value is N.

and then much later at line 2156 it checks for P.P being null:

      --  If uninitialized pattern, dump line and we are done

      if P.P = null then
         Put_Line ("Uninitialized pattern value");
         return;
      end if;

The code was fixed to declare array Refs after we know P.P is not null. And here is an example of code quality issue also in the GNAT compiler, at line 2797 of g-comlin.adb, where parameter Line is dereferenced and then tested against null:

      Sections_List : Argument_List_Access :=
                        new Argument_List'(1 .. 1 => null);
      Found         : Boolean;
      Old_Line      : constant Argument_List := Line.all;
      Old_Sections  : constant Argument_List := Sections.all;
      Old_Params    : constant Argument_List := Params.all;
      Index         : Natural;

   begin
      if Line = null then
         return;
      end if;

The code was fixed by declaring Line as a parameter of a not null access type. In some cases the dereference and the test are both in the same expression, for example in this case in our tool GNATstack, at line 97 of dispatching_calls.adb:

         if
           Static_Class.Derived.Contains (Explored_Method.Vtable_Entry.Class)
              and then
           Explored_Method /= null
              and then

The code was fixed here by checking for non-null before dereferencing Explored_Method. Overall, this checker found 11 errors in our codebase and 9 code quality issues.

A second checker in this category looks for places where a test that a pointer is null dominates a dereference of the same pointer. This is, in general, an indication of a logic error, in particular for complex boolean expressions, as shown by the examples from PVS-Studio gallery. We found no such error in our codebase, which may be an indication of our good test coverage. Indeed, any execution of such code will raise an exception in Ada.

Checkers on Tested Expressions

Our first checker on tested expressions looks for identical subexpressions being tested in a chain of if-elsif statements. It points to either errors or code quality issues. Here is an example of error it found in the GNAT compiler, at line 7380 of sem_ch4.adb:

                  if Nkind (Right_Opnd (N)) = N_Integer_Literal then
                     Remove_Address_Interpretations (Second_Op);

                  elsif Nkind (Right_Opnd (N)) = N_Integer_Literal then
                     Remove_Address_Interpretations (First_Op);
                  end if;

The code was fixed by testing Left_Opnd(N) instead of Right_Opnd(N) in the second test. Overall, this checker found 3 errors in our codebase and 7 code quality issues.

A second checker in this category looks for expressions of the form "A /= B or A /= C" where B and C are different literals, which are always True. In general "and" is meant instead of "or". This checker found one error in our QGen code generator, at line 675 of himoco-blockdiagramcmg.adb:

               if Code_Gen_Mode /= "Function"
                 or else Code_Gen_Mode /= "Reusable function"
               then
                  To_Flatten.Append (Obj);
               end if;

Checkers for Code Duplication

Our first checker for code duplication looks for identical code in different branches of an if-statement or case-statement. It may point to typos or logical errors, but in our codebase it pointed only to refactoring opportunities. Still, some of these cause code duplication of more than 20 lines of code, for example at line 1023 of be-checks.adb in CodePeer:

            elsif VN_Kind (VN) = Binexpr_VN
              and then Operator (VN) = Logical_And_Op
              and then Int_Sets.Is_In (Big_True, To_Int_Set_Part (Expect))
            then
               --  Recurse to propagate check down to operands of "and"
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (First_Operand (VN)),
                  Srcpos,
                  File_Name,
                  First_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (Second_Operand (VN)),
                  Srcpos,
                  File_Name,
                  Second_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);
...
            elsif VN_Kind (VN) = Binexpr_VN
              and then Operator (VN) = Logical_Or_Op
              and then Int_Sets.Is_In (Big_False, To_Int_Set_Part (Expect))
            then
               --  Recurse to propagate check down to operands of "and"
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (First_Operand (VN)),
                  Srcpos,
                  File_Name,
                  First_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (Second_Operand (VN)),
                  Srcpos,
                  File_Name,
                  Second_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);

or at line 545 of soap-generator-skel.adb in GPRbuild:

                  when WSDL.Types.K_Derived =>

                     if Output.Next = null then
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    => "Result",
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     else
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    =>
                                "Result."
                                  & Format_Name (O, To_String (N.Name)),
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     end if;

                  when WSDL.Types.K_Enumeration =>

                     if Output.Next = null then
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    => "Result",
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     else
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    =>
                                "Result."
                                  & Format_Name (O, To_String (N.Name)),
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     end if;

Overall, this checker found 62 code quality issues in our codebase.

Our last checker looks for useless assignment to a local variable, where the value is never read subsequently. This can be very obvious, such as this case at line 1067 of be-value_numbers-factory.adb in CodePeer:

      Global_Obj.Obj_Id_Number    := Obj_Id_Number (New_Obj_Id);
      Global_Obj.Obj_Id_Number    := Obj_Id_Number (New_Obj_Id);

or more hidden, such as this case at line 895 of bt-xml-reader.adb, still in CodePeer:

                              if Next_Info.Sloc.Column = Msg_Loc.Column then
                                 Info := Next_Info;
                                 Elem := Next_Cursor;
                              end if;
                              Elem := Next_Cursor;

Overall, this checker found 9 code quality issues in our codebase.

Setup Recipe 

So you actually want to try the above scripts on your own codebase? This is possible right now with your latest GNAT Pro release or the latest GPL release for community & academic users! Just follow the instructions we described in the Libadalang repository and you will be able to run the scripts inside your favorite Python2 interpreter.

Conclusion

Summing over the 8 checkers that we implemented so far (referenced in this post and a previous one), we've found and fixed 24 errors and 102 code quality issues in our codebase. This is definitely showing that these kind of checkers are worth integrating in static analysis tools and we look forward to integrating these and more in our static analyzer CodePeer for Ada programs.

Another lesson is that all of these checkers were developed in a couple of hours each, thanks to the powerful Python API available with Libadalang. While we had to develop some boilerplate to traverse the AST in various directions and multiple workarounds to make for the absence of semantic analysis in the (then available) version of Libadalang, this was relatively little work, and work that we can expect to share across similar checkers in the future. We're now looking forward to the version of Libadalang with on-demand semantic analysis, which will allow us to create even more powerful and useful checkers.

[cover image by Courtney Lemon]

]]>
A Usable Copy-Paste Detector in A Few Lines of Python http://blog.adacore.com/a-usable-copy-paste-detector-in-few-lines-of-python Tue, 02 May 2017 13:00:00 +0000 Yannick Moy http://blog.adacore.com/a-usable-copy-paste-detector-in-few-lines-of-python

After we created lightweight checkers based on the recent Libadalang technology developed at AdaCore, a colleague gave us the challenge of creating a copy-paste detector based on Libadalang. It turned out to be both easier than anticipated, and much more efficient and effective than we could have hoped for. In the near future, we plan to use this new detector to refactor the codebase of some of our tools.

First Attempt: Hashes and Repeated Suffix Trees

Our naive strategy for detecting copy-paste was to reduce it to a string problem, in order to benefit from the existing efficient algorithms on string problems. Our reasoning was that each line of code could be represented by a hash code, so that a file could be represented by a string of hash codes. After a few Web searches, we found the perfect match for this translated problem, on the WikiPedia page for the longest repeated substring problem, which is helpfully pointing to a C implementation used to solve this problem efficiently based on Suffix Trees, a data structure to represent efficiently all suffixes of a string (say, "adacore", "dacore", "acore", "core", "ore", "re" and "e" if your string is "adacore").

So we came up with an implementation in Python of the copy-paste detector, made up of 3 steps:

Step 1: Transform the source code into a string of hash codes

This a simple traversal of the AST produced by Libadalang, producing roughly a hash for each logical line of code. Traversal is made very easy with the API offered by Libadalang, as each node of the AST is iterable in Python to get its children. For example, here is the default case of the encoding function producing the hash codes:

        # Default case, where we hash the kind of the first token for the node,
        # followed by encodings for its subnodes.
        else:
            return ([Code(hash(node.token_start.kind), node, f)] +
                    list(itertools.chain.from_iterable(
                        [enc(sub) for sub in node])))

We recurse here on the AST to concatenate the substrings of hash codes computed for subnodes. The leaf case is obtained for expressions and simple statements, for which we compute a hash of a string obtained from the list of tokens for the node. The API of Libadalang makes it very easy, using again the ability to iterate over a node to get its children. For example, here is the default case of the function computing the string from a node:

            return ' '.join([node.token_start.kind]
                            + [strcode(sub) for sub in node])

We recurse here on the AST to concatenate the kind of the first token for the node with the substrings computed for subnodes. Of course, we are not interested in exactly representing each line of code in this representation. For example, we represent all identifiers by a special wildcard character $, in order to detect copy-pastes even when identifiers are not the same.

Step 2: Construct the Suffix Tree for the string of hash codes

The algorithm by Ukkonen is quite subtle, but it was easy to translate an existing C implementation into Python. For those curious enough, a very instructive series of 6 blog posts leading to this implementation describes Ukkonen's algorithm in details.

Step 3: Compute the longest repeated substring in the string of hash codes

For that, we look at the internal node of the Suffix Tree constructed above with the greatest height (computed in number of hashes). Indeed, this internal node corresponds to two or more suffixes that share a common prefix. For example, with string "adacore", there is a single internal node, which corresponds to the common prefix "a" for suffixes "adacore" and "acore", after which the suffixes are different. The children of this internal node in the Suffix Tree contain the information of where the suffixes start in the string (position 0 for "adacore" and 2 for "acore"), so we can compute positions in the string of hash codes where hashes are identical and for how many hash codes. Then, we can translate this information into files, lines of code and number of lines.

The steps above allow to detect only the longest copy-paste across a codebase (in terms of number of hash codes, which may be different from number of lines of code). Initially, we did not find a better way to detect all copy-pastes longer than a certain limit than to repeat steps 2 and 3 after we remove from the string of hash codes those that correspond to the copy-paste previously detected. This algorithm ran in about one hour on the full codebase of GPS, consisting in 350 ksloc (as counted by sloccount), and it reported both very valuable copy-pastes of more than 100 lines of code, as well as spurious ones. To be clear, the spurious ones were not bugs in the implementation, but limitations of the algorithm that captured "copy-pastes" that were valid duplications of similar lines of code. Then we improved it.

Improvements: Finer-Grain Encoding and Collapsing

The imprecisions of our initial algorithm came mostly from two sources: it was sometimes ignoring too much of the source code, and sometimes too little. That was the case in particular for the abstraction of all identifiers as the wildcard character $, which led to spurious copy-pastes where the identifiers were semantically meaningful and could not be replaced by any other identifier. We fixed that by distinguishing local identifiers that are abstracted away from global identifiers (from other units) that are preserved, and by preserving all identifiers that could be the names of record components (that is, used in a dot notation like Obj.Component). Another example of too much abstraction was that we abstracted all literals by their kind, which again lead to spurious copy-pastes (think of large aggregates defining the value of constants). We fixed that by preserving the value of literals.

As an example of too little abstraction, we got copy-pastes that consisted mostly of sequences of small 5-to-10 lines subprograms, which could not be refactored usefully to share common code. We fixed that by collapsing sequences of such subprograms into a single hash code, so that their relative importance towards finding large copy-pastes was reduced. We made various other adjustments to the encoding function to modulate the importance of various syntactic constructs, simply by producing more or less hash codes for a given construct. An interesting adjustment consisted in ignoring the closing tokens in a construct (like the "end Proc;" at the end of a procedure) to avoid having copy-pastes that start on such meaningless starting points. It seems to be a typical default of token-based approaches, that our hash-based approach allows to solve easily, by simply not producing a hash for such tokens.

After these various improvements, the analysis of GPS codebase came down to 2 minutes, an impressive improvement from the initial one hour! The code for this version of the copy-paste detector can be found in the GitHub repository of Libadalang.

Optimizations: Suffix Arrays, Single Pass 

To improve on the above running time, we looked for alternative algorithms for performing the same task. And we found one! Suffix Arrays are an alternative to Suffix Trees, which is simpler to implement and from which we saw that we could generate all copy-pastes without regenerating the underlying data structure after finding a given copy-paste. We implemented in Python the algorithm in C++ found in this paper, and the code for this alternative implementation can be found in the GitHub repository of Libadalang. This version found the same copy-pastes as the previous one, as expected, with a running time of 1 minute for the analysis of GPS codebase, a 50% improvement!

Looking more closely at the bridges between Suffix Trees and Suffix Arrays (essentially you can reconstruct one from the other), we also realized that we could use the same one-pass algorithm to detect copy-pastes with Suffix Trees, instead of recreating each time the Suffix Tree for the text where the copy-paste just detected had been removed. The idea is that, instead of repeatedly detecting the longest copy-paste on a newly created Suffix Tree, we traverse the initial Suffix Tree and issue all copy-pastes with a maximal length, where copy-pastes that are not maximal can be easily recognized by checking the previous hash in the candidate suffixes. For example, if two suffixes for this copy-paste start at indexes 5 and 10 in the string of hashes, we check the hashes at indexes 4 and 9: if they are the same, then the copy-paste is not maximal and we do not report it. With this change, the running time for our original algorithm is just above 1 minute for the analysis of GPS codebase, i.e. close to the alternative implementation based on Suffix Arrays.

So we ended up with two implementations for our copy-paste detector, one based on Suffix Trees and one based on Suffix Arrays. We'll need to experiment further to decide which one to keep in a future plug-in for our GPS and GNATbench IDEs.

Results on GPS

The largest source base on which we tried this tool is our IDE GNAT Programming Studio (GPS). This is about 350'000 lines of source code. It uses object orientation, tends to have medium-sized subprograms (20 to 30 lines), although there are some much longer ones. In fact, we aim at reducing the size of the longest subprograms, and a tool like gnatmetric will help find them. We are happy to report that most of the code duplication occurred in recent code, as we are transitioning and rewriting some of the old modules.

Nonetheless, the tool helped detect a number of duplicate chunks, with very few spurious detections (corresponding to cases where the tool reports a copy-paste that turns out to be only similar code).

Let's take a look at three copy-pastes that were detected.

Example 1: Intended temporary duplication of code

gps/gvd/src/debugger-base_gdb-gdb_cli.adb:3267:1: copy-paste of 166 lines detected with code from line 3357 to line 3522 in file gps/gvd/src/debugger-base_gdb-gdb_mi.adb

This is a large subprogram used to handle the Memory view in GPS. We have recently started changing the code to use the gdb MI protocol to communicate with gdb, rather than simulate an interactive session. Since the intent is to remove the old code, the duplication is not so bad, but is useful in reminding us we need to clean things up here, preferably soon before the code diverges too much.

Example 2: Unintended almost duplication of code

gps/builder/core/src/commands-builder-scripts.adb:266:1: copy-paste of 21 lines detected with code from line 289 to line 309

This code is in the handling of the python functions GPS.File.compile() and GPS.File.make(). Interestingly enough, these two functions were not doing the same thing initially, and are also documented differently (make attempts to link the file after compiling it). Yet the code is almost exactly the same, except that GPS does not spawn the same build target (see comment in the code below). So we could definitely use an if-expression here to avoid the duplication of the code.

      elsif Command = "compile" then
         Info := Get_Data (Nth_Arg (Data, 1, Get_File_Class (Kernel)));
         Extra_Args := GNAT.OS_Lib.Argument_String_To_List
           (Nth_Arg (Data, 2, ""));

         Builder := Builder_Context
           (Kernel.Module (Builder_Context_Record'Tag));

         Launch_Target (Builder      => Builder,
                        Target_Name  => Compile_File_Target,    -- <<< use Build_File_Target here for "make"
                        Mode_Name    => "",
                        Force_File   => Info,
                        Extra_Args   => Extra_Args,
                        Quiet        => False,
                        Synchronous  => True,
                        Dialog       => Default,
                        Via_Menu     => False,
                        Background   => False,
                        Main_Project => No_Project,
                        Main         => No_File);

         Free (Extra_Args);

The tool could be slightly more helpful here by highlighting the exact differences between the two blocks. As the blocks get longer, it is harder to spot a change in one identifier (as is the case here). This is where an integration in our IDEs like GPS and GNATbench would be useful, and of course possibly with some support for automatic refactoring of the code, also based on Libadalang.

Example 3: Unintended exact duplication of code

gps/code_analysis/src/codepeer-race_details_models.adb:39:1: copy-paste of 20 lines detected with code from line 41 to line 60 in file gps/code_analysis/src/codepeer-race_summary_models.adb

This one is an exact duplication of a function. The tool could perhaps be slightly more helpful by showing those exact duplicates first, since they will often be the easiest ones to remove, simply by moving the function to the spec.

   function From_Iter (Iter : Gtk.Tree_Model.Gtk_Tree_Iter) return Natural is
      pragma Warnings (Off);
      function To_Integer is
        new Ada.Unchecked_Conversion (System.Address, Integer);
      pragma Warnings (On);

   begin
      if Iter = Gtk.Tree_Model.Null_Iter then
         return 0;

      else
         return To_Integer (Gtk.Tree_Model.Utils.Get_User_Data_1 (Iter));
      end if;
   end From_Iter;

Setup Recipe

So you actually want to try the above scripts on your own codebase? This is possible right now with your latest GNAT Pro release or the latest GPL release for community & academic users! Just follow the instructions we described in the Libadalang repository, you will then be able to run the scripts inside your favorite Python2 interpreter.

Conclusion

What we took from this experiment was that: (1) it is easier than you think to develop a copy-paste detector for your favorite language, and (2) technology like Libadalang is key to facilitate the necessary experiments that lead to an efficient and effective detector. On the algorithmic side, we think it's very beneficial to use a string of hash codes as intermediate representation, as this allows to precisely weigh in which language constructs contribute what.

Interestingly, we did not find other tools or articles describing this type of intermediate approach between token-based approaches and syntactic approaches, which provides an even faster analysis than token-based approaches, while avoiding their typical pitfalls, and allows fine-grained control based on the syntactic structure, without suffering from the long running time typical of syntactic approaches.

We look forward to integrating our copy-paste detector in GPS and GNATbench, obviously initially for Ada, but possibly for other languages as well (for example C and Python) as progress on langkit, the Libadalang's underlying technology, allows. The integration of Libadalang in GPS was completed not long ago, so it's easier than ever.

]]>
VerifyThis Challenge in SPARK http://blog.adacore.com/verifythis-challenge-in-spark Fri, 28 Apr 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/verifythis-challenge-in-spark

This year again, the VerifyThis competition took place as part of ETAPS conferences. This is the occasion for builders and users of formal program verification platforms to use their favorite tools on common challenges. The first challenge this year was a good fit for SPARK, as it revolves around proving properties of an imperative sorting procedure.

I am going to use this challenge to show how one can reach different levels of software assurance with SPARK. I'm referring here to the five levels of software assurance that we have used in our guidance document with Thales:

  • Stone level - valid SPARK
  • Bronze level - initialization and correct data flow
  • Silver level - absence of run-time errors (AoRTE)
  • Gold level - proof of key integrity properties
  • Platinum level - full functional proof of requirements

Stone level - valid SPARK

We start with a simple translation in Ada of the simplified variant of pair insertion sort given in page 2 of the challenge sheet:

package Pair_Insertion_Sort with
  SPARK_Mode
is
   subtype Index is Integer range 0 .. Integer'Last-1;
   type Arr is array (Index range <>) of Integer
     with Predicate => Arr'First = 0;

   procedure Sort (A : in out Arr);

end Pair_Insertion_Sort;

package body Pair_Insertion_Sort with
  SPARK_Mode
is
   procedure Sort (A : in out Arr) is
      I, J, X, Y, Z : Integer;
   begin
      I := 0;
      while I < A'Length-1 loop
         X := A(I);
         Y := A(I+1);
         if X < Y then
            Z := X;
            X := Y;
            Y := Z;
         end if;

         J := I - 1;
         while J >= 0 and then A(J) > X loop
            A(J+2) := A(J);
            J := J - 1;
         end loop;
         A(J+2) := X;

         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            J := J - 1;
         end loop;
         A(J+1) := Y;
         I := I+2;
      end loop;

      if I = A'Length-1 then
         Y := A(I);
         J := I - 1;
         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            J := J - 1;
         end loop;
         A(J+1) := Y;
      end if;
   end Sort;

end Pair_Insertion_Sort;

Stone level is reached immediately on this code, as it is in the SPARK subset of Ada.

Bronze level - initialization and correct data flow

Bronze level is also reached immediately, although the flow analysis in SPARK detected a problem the first time I ran it:

pair_insertion_sort.adb:35:39: medium: "Y" might not be initialized
pair_insertion_sort.adb:39:20: medium: "Y" might not be initialized

The problem was that the line initializing Y to A(I) near the end of the program was missing from my initial version. As I copied the algorithm in pseudo-code from the PDF and removed the comments in the badly formatted result, I also removed that line of code! After restoring that line, flow analysis did not complain anymore, I had reached Bronze level.

Silver level - absence of run-time errors (AoRTE)

Then comes Silver level, which was suggested as an initial goal in the challenge: proving absence of runtime errors. As the main reason for run-time errors in this code is the possibility of indexing outside of array bounds, we need to provide bounds for the values of variables I and J used to index A. As these accesses are performed inside loops, we need to do so in loop invariants. Exactly 5 loop invariants are needed here, and with these GNATprove can prove the absence of run-time errors in the code in 7 seconds on my machine:

package body Pair_Insertion_Sort with
  SPARK_Mode
is
   procedure Sort (A : in out Arr) is
      I, J, X, Y, Z : Integer;
   begin
      I := 0;
      while I < A'Length-1 loop
         X := A(I);
         Y := A(I+1);
         if X < Y then
            Z := X;
            X := Y;
            Y := Z;
         end if;

         J := I - 1;
         while J >= 0 and then A(J) > X loop
            A(J+2) := A(J);
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            J := J - 1;
         end loop;
         A(J+2) := X;

         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            J := J - 1;
         end loop;
         A(J+1) := Y;

         pragma Loop_Invariant (I in 0 .. A'Length-2);
         pragma Loop_Invariant (J in -1 .. A'Length-3);
         I := I+2;
      end loop;

      if I = A'Length-1 then
         Y := A(I);
         J := I - 1;
         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            pragma Loop_Invariant (J in 0 .. A'Length-2);
            J := J - 1;
         end loop;
         A(J+1) := Y;
      end if;
   end Sort;

end Pair_Insertion_Sort;

Gold level - proof of key integrity properties

Then comes Gold level, which is the first task of the verification challenge: proving that A is sorted on return. This can be expressed easily with a ghost function Sorted:

 function Sorted (A : Arr; I, J : Integer) return Boolean is
      (for all K in I .. J-1 => A(K) <= A(K+1))
   with Ghost,
        Pre => J > Integer'First
          and then (if I <= J then I in A'Range and J in A'Range);

   procedure Sort (A : in out Arr) with
     Post => Sorted (A, 0, A'Length-1);

Then, we need to augment the previous loop invariants with enough information to prove this sorting property. Let's look at the first (inner) loop. The invariant of this loop is that it maintains the array sorted up to the current high bound I+1, except for a hole of one index at J+1. That's easily expressed with the ghost function Sorted:

            pragma Loop_Invariant (Sorted (A, 0, J));
            pragma Loop_Invariant (Sorted (A, J+2, I+1));

For these loop invariants to be proved, we need to know that the next value possibly passed across the hole at the next iteration (from index J-1 to J+1) is lower than the value currently at J+2. Well, we know that the value at index J-1 is lower than the value at index J thanks to the property Sorted(A,0,J). All we need to add is that A(J+2)=A(J):

 pragma Loop_Invariant (A(J+2) = A(J));

This proves the loop invariant. Next, after the loop, value X is inserted at index J+2 in the array. For the sorting property to hold from J+2 upwards after the loop, X needs to be less than the value at J+2 in the loop invariant. As A(J+2) and A(J) are equal, we can write:

 pragma Loop_Invariant (A(J) > X);

All the other loops are similar. With these additional loop invariants, GNATprove can prove the sorting property in the code in 12 seconds on my machine:

package body Pair_Insertion_Sort with
  SPARK_Mode
is
   procedure Sort (A : in out Arr) is
      I, J, X, Y, Z : Integer;
   begin
      I := 0;
      while I < A'Length-1 loop
         X := A(I);
         Y := A(I+1);
         if X < Y then
            Z := X;
            X := Y;
            Y := Z;
         end if;

         J := I - 1;
         while J >= 0 and then A(J) > X loop
            A(J+2) := A(J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J));
            pragma Loop_Invariant (Sorted (A, J+2, I+1));
            pragma Loop_Invariant (A(J+2) = A(J));
            pragma Loop_Invariant (A(J) > X);
            J := J - 1;
         end loop;
         A(J+2) := X;

         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J));
            pragma Loop_Invariant (Sorted (A, J+1, I+1));
            pragma Loop_Invariant (A(J+1) = A(J));
            pragma Loop_Invariant (A(J) > Y);
            J := J - 1;
         end loop;
         A(J+1) := Y;

         --  loop invariant for absence of run-time errors
         pragma Loop_Invariant (I in 0 .. A'Length-2);
         --  loop invariant for sorting
         pragma Loop_Invariant (J in -1 .. A'Length-3);
         pragma Loop_Invariant (Sorted (A, 0, I+1));
         I := I+2;
      end loop;

      if I = A'Length-1 then
         Y := A(I);
         J := I - 1;
         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-2);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J));
            pragma Loop_Invariant (Sorted (A, J+1, A'Length-1));
            pragma Loop_Invariant (A(J+1) = A(J));
            pragma Loop_Invariant (A(J) > Y);
            J := J - 1;
         end loop;
         A(J+1) := Y;
      end if;
   end Sort;

end Pair_Insertion_Sort;

Platinum level - full functional proof of requirements

Then comes Platinum level, which is the second task of the verification challenge: proving that A on return is a permutation of its value on entry. For that, we are going to use the ghost function Is_Perm that my colleague Claire Dross presented in this blog post, that expresses that the number of occurrences of any integer in arrays A and B coincide (that is, they represent the same multisets, which is a way to express that they are permutations of one another):

function Is_Perm (A, B : Arr) return Boolean is
     (for all E in Integer => Occ (A, E) = Occ (B, E));

and the procedure Swap that she presented in the same blog post, for which I simply give the contract here:

  procedure Swap (Values : in out Arr;
                   X      : in     Index;
                   Y      : in     Index)
   with
     Pre  => X in Values'Range
       and then Y in Values'Range
       and then X /= Y,
     Post => Is_Perm (Values'Old, Values)
       and then Values (X) = Values'Old (Y)
       and then Values (Y) = Values'Old (X)
       and then (for all Z in Values'Range =>
                   (if Z /= X and Z /= Y then Values (Z) = Values'Old (Z)))

The only changes I made with respect to her initial version were to change in various places Natural for Integer, as arrays in our challenge store integers instead of natural numbers. In order to use the same strategy that she showed for selection sort on our pair insertion sort, we need to rewrite a bit the algorithm to swap array cells (as suggested in the challenge specification). For example, instead of the assignment in the first (inner) loop:

  A(J+2) := A(J);

we now have:

 Swap (A, J+2, J);

This has an effect on the loop invariants seen so far, which must be slightly modified. Then, we need to express that every loop maintains in A a permutation of the entry value for A. For that, we create a ghost constant B that stores the entry value of A:

B : constant Arr(A'Range) := A with Ghost;

and use this constant in loop invariants of the form:

pragma Loop_Invariant (Is_Perm (B, A));

We also need to express that the values X and Y that are pushed down the array are indeed the ones found around index J in all loops. For example in the first (inner) loop:

pragma Loop_Invariant ((A(J) = X and A(J+1) = Y) or (A(J) = Y and A(J+1) = X));

With these changes, GNATprove can prove the permutation property in the code in 27 seconds on my machine (including all the ghost code copied from Claire's blog post):

   procedure Sort (A : in out Arr) is
      I, J, X, Y, Z : Integer;
      B : constant Arr(A'Range) := A with Ghost;
   begin
      I := 0;
      while I < A'Length-1 loop
         X := A(I);
         Y := A(I+1);
         if X < Y then
            Z := X;
            X := Y;
            Y := Z;
         end if;

         J := I - 1;
         while J >= 0 and then A(J) > X loop
            Swap (A, J+2, J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J-1));
            pragma Loop_Invariant (Sorted (A, J+2, I+1));
            pragma Loop_Invariant (if J > 0 then A(J+2) >= A(J-1));
            pragma Loop_Invariant (A(J+2) > X);
            --  loop invariant for permutation
            pragma Loop_Invariant (Is_Perm (B, A));
            pragma Loop_Invariant ((A(J) = X and A(J+1) = Y) or (A(J) = Y and A(J+1) = X));
            J := J - 1;
         end loop;
         if A(J+2) /= X then
            Swap (A, J+2, J+1);
         end if;

         while J >= 0 and then A(J) > Y loop
            Swap (A, J+1, J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J-1));
            pragma Loop_Invariant (Sorted (A, J+1, I+1));
            pragma Loop_Invariant (if J > 0 then A(J+1) >= A(J-1));
            pragma Loop_Invariant (A(J+1) > Y);
            --  loop invariant for permutation
            pragma Loop_Invariant (Is_Perm (B, A));
            pragma Loop_Invariant (A(J) = Y);
            J := J - 1;
         end loop;

         --  loop invariant for absence of run-time errors
         pragma Loop_Invariant (I in 0 .. A'Length-2);
         --  loop invariant for sorting
         pragma Loop_Invariant (J in -1 .. A'Length-3);
         pragma Loop_Invariant (Sorted (A, 0, I+1));
         --  loop invariant for permutation
         pragma Loop_Invariant (Is_Perm (B, A));
         I := I+2;
      end loop;

      if I = A'Length-1 then
         Y := A(I);
         J := I - 1;
         while J >= 0 and then A(J) > Y loop
            Swap (A, J+1, J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-2);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J-1));
            pragma Loop_Invariant (Sorted (A, J+1, A'Length-1));
            pragma Loop_Invariant (if J > 0 then A(J+1) >= A(J-1));
            pragma Loop_Invariant (A(J+1) > Y);
            --  loop invariant for permutation
            pragma Loop_Invariant (Is_Perm (B, A));
            pragma Loop_Invariant (A(J) = Y);
            J := J - 1;
         end loop;
      end if;
   end Sort;

The complete code for this challenge can be found on GitHub.

Conclusions

Two features of SPARK were particularly useful to help debug unprovable properties. Counterexamples was the first, issuing messages such as the following:

pair_insertion_sort.adb:20:16: medium: array index check might fail (e.g. when A = (others => 1) and A'Last = 3 and J = 2)

Here it allowed me to realize that the upper bound I set for J was too high, so that J+2 was outside of A's bounds.

The second very useful feature was the ability to execute assertions and contracts, issuing messages such as the following:

raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : Loop_Invariant failed at pair_insertion_sort.adb:51

Here it allowed me to realize that a loop invariant which I was trying to prove was in fact incorrect!

As this challenge shows, the five levels of software assurance in SPARK do not require the same level of effort at all. Stone and Bronze levels are rather easy to achieve on small codebases (although they might require refactoring on large codebases), Silver level requires some modest effort, Gold level requires more expertise to drive automatic provers, and finally Platinum level requires a much larger effort than all the previous levels (including possibly some rewriting of the algorithm to make automatic proof possible like here).

Thanks to the organisers of this year's VerifyThis competition, and congrats to the winning teams, two of which used Why3!

]]>
GPS for bare-metal developers http://blog.adacore.com/gps-for-bare-metal-development Wed, 19 Apr 2017 12:57:03 +0000 Anthony Leonardo Gracio http://blog.adacore.com/gps-for-bare-metal-development

In my previous blog article, I exposed some techniques that helped me rewrite the Crazyflie’s firmware from C into Ada and SPARK 2014, in order to improve its safety.

I was still an intern at that time and, back in the day, the support for bare-metal development in GPS was a bit minimalistic: daily activities like flashing and debugging my own firmware on the Crazyflie were a bit painful to do without having to go outside of GPS.

This is not the case anymore. GPS now comes with a number of features regarding bare-metal development that make it very easy for newcomers (as I was) to develop their own software for a particular board.

Bare-metal Holy Grail: Build, Flash, Debug

Building your modified software in order to flash it or debug it on a board is a very common workflow in bare-metal development. GPS offers support for all these different steps, allowing you to perform them at once with one single click.

In particular, GPS now supports two different tools for connecting to your remote board in order to flash and/or debug it:

  • ST-Link utility tools (namely st-util and st-flash) for STM32-based boards

  • OpenOCD, a connection tool supporting various types of boards and probes, specifically the ones that use a JTAG interface

Once installed on your host, using these tools in order to flash or debug your project directly from GPS is very easy. As pictures are worth a thousand words, here is a little tutorial video showing how to set up your bare-metal project from GPS in order to build, flash and debug it on a board:



Monitoring the memory usage

When it comes to bare-metal development, flashing and debugging your project on the targeted board is already a pretty advanced step: it means that you have already been able to compile and, above all, to link your software correctly.

The linking phase can be a real pain due to the limited memory resources of these boards: writing software that does not fit in the board’s available memory is something that can happen pretty quickly as long as your project grows and grows.

To address these potential issues, a Memory Usage view has been introduced. By default, this view is automatically spawned each time you build your executable and displays a complete view of the static memory usage consumed by your software, even when the linking phase has failed. The Memory Usage view uses a map file generated by the GNU ld linker to report the memory usage consumed at three different levels:

  1. Memory regions, which correspond to the MEMORY blocks defined in the linker script used to link the executable
  2. Memory sections (e.g: .data, .bss etc.)
  3. Object files

Having a complete report of the memory usage consumed at each level makes it very convenient to identify which parts of your software are consuming too much memory for the available hardware ressources. This is the case in the example below, where we can see that the .bss section of our executable is just too big for the board's RAM, due to the huge uninitialized array declared in leds.ads.

Conclusion

Bare-metal development support has been improved greatly in GPS, making it easier for newcomers to build, flash and debug their software. Moreover, the Memory Usage view allows the bare-metal developers to clearly identify the cause of memory overflows. 

We don't want to stop our efforts regarding bare-metal development support so don't hesitate to submit ideas of improvements on our GitHub's repository!

]]>
User-friendly strings API http://blog.adacore.com/user-friendly-strings-api Mon, 10 Apr 2017 13:47:00 +0000 Emmanuel Briot http://blog.adacore.com/user-friendly-strings-api

User friendly strings API

In a previous post, we described the design of a new strings package, with improved performance compared to the standard Ada unbounded strings implementation. That post focused on various programming techniques used to make that package as fast as possible.

This post is a followup, which describes the design of the user API for the strings package - showing various solutions that can be used in Ada to make user-friendly data structures.

Tagged types

One of the features added in Ada 2005 is a prefix dot notation when calling primitive operations of tagged types. For instance, if you have the following declarations, you can call the subprograms Slice in one of two ways:

declare
   type XString is tagged private;
   procedure Slice (Self : in out XString; Low, High : Integer);   --  a primitive operation
   S : XString;
begin
   S.Slice (1, 2);      --  the two calls do the same thing, here using a prefix notation
   Slice (S, 1, 2);
end;

This is a very minor change. But in practice, people tend to use the prefix notation because it is more "natural" for people who have read or written code in other programming languages.

In fact, it is so popular that there is some demand to extend this prefix notation, in some future version of the language, to types other than tagged types. Using a tagged type has a cost, since it makes variables slightly bigger (they now have a hidden access field).

In our case, though, a XString was already a tagged type since it is also controlled, so there is no additional cost.

Indexing

The standard Ada strings are quite convenient to use (at least once you understand that you should use declare blocks since you need to know their size in advance). For instance, one can access characters with expressions like:

A : constant Character := S (2);      --  assuming 2 is a valid index
B : constant Character := S (S'First + 1);   --  better, of course
C : constant Character := S (S'Last);

The first line of the above example hides one of the difficulties for newcomers: strings can have any index range, so using just "2" is likely to be wrong here. Instead, the second line should be used. The GNATCOLL strings avoid this pitfall by always indexing from 1. As was explained in the first blog post, this is both needed for the code (so that internally we can reallocate strings as needed without changing the indexes manipulated by the user), and more intuitive for a lot of users.

The Ada unbounded string has a similar approach, and all strings are indexed starting at 1. But you can't use the same code as above, and instead you need to write the more cumbersome: 

S := To_Unbounded_String (...);
A : constant Character := Element (S, 2);     --  second character of the string, always
B : constant Character := Ada.Strings.Unbounded.Element (S, 2);   --  when not using use-clauses

GNATCOLL.Strings takes advantage of some new aspects introduced in Ada 2012 to provide custom indexing functions. So the spec looks like:

type XString is tagged private
    with Constant_Indexing  => Get;

function Get (Self : XString; Index : Positive) return Character;

S : XString := ...;
A : constant Character := S (2);    --   always the second character

After adding this simple Constant_Indexing aspect, we are now back to the simple syntax we were using for standard String, using parenthesis to point to a specific character. But here we also know that "2" is always the second character, so we do not need to use the 'First attribute.

Variable indexing

There is a similar aspect, named Variable_Indexing which can be used to modify a specific character of the string, as we would do with a standard string. So we can write:

type XString is tagged private
    with Variable_Indexing  => Reference;

type Character_Reference (Char : not null access Char_Type) is limited null record
    with Implicit_Dereference => Char;

function Reference
    (Self  : aliased in out XString;
     Index : Positive) return Character_Reference;

S : XString := ...;
S (2) := 'A';
S (2).Char := 'A';    --  same as above, but not taking advantage of Implicit_Derefence

Although this is simple to use for users of the library, the implementation is actually much more complex than for Constant_Indexing

First of all, we need to introduce a new type, called a Reference Type (in our example this is the Character_Reference), which basically is a type with an access discriminant and the Implicit_Dereference aspect. This type acts as a safer replacement for an access type (for instance it can't be copied quite as easily). Through this type, we have an access to a specific character, and therefore users can replace that character easily.

But the real difficulty is exactly because of this access. Imaging that we assign the string to another one. At that point, they share the internal buffer when using the copy-on-write mode described in the first post. But if we then modify a character we modify it in both strings, although from the user's point of view these are two different instances ! Here are three examples of code that would fail:


begin
    S2 := S;
    S (2) := 'A';
    --  Now both S2 and S have 'A' as their second character
end;

declare
    R : Character_Reference := S.Reference (2);
begin
    S2 := S;    --   now sharing the internal buffer
    R := 'A';   -- Both S2 and S have 'A' as their second character
end;

declare
    R : Character_Reference := S.Reference (2);
begin
    S.Append ("ABCDEF");
    R := 'A';   --  might point to deallocated memory
end;

Of course, we want our code to be safe, so the implementation needs to prevent the above errors. As soon we take a Reference to a character we must ensure the internal buffer is no longer shared. This fixes the first example above: Since the call to S (2) no longer shares the buffer, we are indeed only modifying S and not S2.

The second example looks similar, but in fact the sharing is done after we have taken the reference. So in fact when we take a Reference we also need to make the internal buffer unshareable. This is done by setting a special value for the refcounting, so that the assignment always make a copy of S's internal buffer, rather than share it.

There is a hidden cost to using Variable_Indexing, since we are no longer able to share the buffer as soon as a reference was taken. This is unfortunately unavoidable, and is one of those examples where convenience of use goes counter to performance...

The third example is much more complex. Here, we have a reference to a character in the string, but when we append data to the string we are likely to reallocate memory. The system could be allocating new memory, copying the data, and then freeing the old one. And our existing reference would point to the memory area that we have just freed !

I have not found a solution here (and C++ implementations seem to have similar limitations). We cannot simply prevent reallocation (i.e. changing the size of the string), since the user might simply have taken a reference, changed the character, and dropped the reference. At that point, reallocating would be safe. For now, we rely on documentation and on the fact that it is some extra work to preserve a reference as we did in the example, it is much more natural to save the index, and then take another reference when needed, as in:


declare
    Saved : constant Integer := 2;
begin
    S.Append ("ABCDEF");
    S (Saved) := 'A';     --  safe
end;

Iteration

Finally, we want to make it easy to traverse a string and read all its characters (and possibly modify them, of course). With the standard strings, one would do:

declare
    S : String (1 .. 10) := ...;
begin
    for Idx in S'Range loop
        S (Idx) := 'A';
    end loop;

    for C of S loop
       Put (C);
    end loop
end;

declare
    S : Unbounded_String := ....;
begin
    for Idx in 1 .. Length (S) loop
       Replace_Element (S, Idx, 'A');
       Put (Element (S, Idx));
    end loop;
end;

Obviously, the version with unbounded strings is a lot more verbose and less user-friendly. It is possible, since Ada 2012, to provide custom iteration for our own strings, via some predefined aspects. As the gem shows, this is a complex operation...

GNAT provides a much simpler aspect that fulfills all practical needs, namely the Iterable aspect. We use it as such:

type XString is tagged private
    with Iterable => (First => First,
                      Next => Next,
                      Has_Element => Has_Element,
                      Get => Get);

function First (Self : XString) return Positive is (1);
function Next (Self : XString; Index : Positive) return Positive is (Index + 1);
function Has_Element (Self : XString; Index : Positive) return Boolean is (Index <= Self.Length);
function Get (Self : Xstring; Idx : Positive) return Character;   --   same one we used for indexing

These functions are especially simple here because we know the range of indexes. So basically they are declared as expression functions and made inline. This means that the loop can be very fast in practice. For consistency with the way the "for...of" loop is used with standard Ada containers, we chose to have Get return the character itself, rather than its index.  And now we can use:

S : XString := ...;

for C of S loop
    Put (C);
end loop;

As explained, the for..of loop returns the character itself, which means we can't change the string directly. So we need to provide a second iterator that will return the indexes. We do this via a new small type that takes care of everything, as in:

 type Index_Range is record
    Low, High : Natural;
 end record
 with Iterable => (First       => First,
                   Next        => Next,
                   Has_Element => Has_Element,
                   Element     => Element);

 function First (Self : Index_Range) return Positive is (Self.Low);
 function Next (Self : Index_Range; Index : Positive) return Positive  is (Index + 1);
 function Has_Element (Self : Index_Range; Index : Positive)  return Boolean is (Index <= Self.High);
 function Element
    (Self : Index_Range; Index : Positive) return Positive
    is (Index);

function Iterate (Self : XString) return Index_Range
    is ((Low => 1, High => Self.Length));

S : XString := ...;

for Idx in S.Iterate loop
    S (Idx) := 'A';
end loop;

The type Index_Range is very similar to the previous use of the Iterable aspect. We just need to introduce one extra primitive operation for the string, which is used to initiate the iteration.

In fact, we could make Iterate and Index_Range more complex, so that for instance we can iterate on every other character, or on every 'A' in the string, or any kind of iteration scheme. This would of course require a more complex Index_Range, but opens up nice possibilities.

Slicing

With standard strings, one can get a slice by using notation like:

  declare
      S : String (1 .. 10) := ...;
   begin
      Put (S (2 .. 5));
      S (2 .. 5) := 'ABCD';
   end;

Unfortunately, there is no possible equivalent for GNATCOLL strings, because the range notation ("..") cannot be redefined. We have to use code like:

declare
    S : XString := ...;
begin
    Put (S.Slice (2, 5));
    S.Replace (2, 5, 'ABCD');
end;

It would be nice if Ada had a new aspect to let us map the ".." notation to a function, for instance something like:

type XString is tagged private
    with  Range_Indexing  => Slice,           --   NOT Ada 2012
          Variable_Range_Indexing => Replace;    --   NOT Ada 2012

function Slice (Self : XString; Low, High : Integer) return XString;
--  Low and High must be of same type, but that type could be anything
--  The return value could also be anything

procedure Replace (S : in out XString; Low, High : Integer; Replace_With : String);
--  Likewise, Low and High must be of same type, but this could be anything.
--  and Replace_With could be any type

Perhaps in some future version of the language?

Conclusion

Ada 2012 provides quite a number of new aspects that can be applied to types, and make user API more convenient to use. Some of them are complex to use, and GNAT sometimes provides a simpler version.

]]>
GNATprove Tips and Tricks: Proving the Ghost Common Divisor (GCD) http://blog.adacore.com/gnatprove-tips-and-tricks-proving-the-ghost-common-denominator-gcd Thu, 06 Apr 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/gnatprove-tips-and-tricks-proving-the-ghost-common-denominator-gcd

Euclid's algorithm for computing the greatest common divisor of two numbers is one of the first ones we learn in school (source: myself), and also one of the first algorithms that humans devised (source: WikiPedia). So it's quite appealing to try to prove it with an automatic proving toolset like SPARK. It turns out that proving it automatically is not so easy, just like understanding why it works is not so easy. To prove it, I am going to use a fair amount of ghost code, to convey the necessary information to GNATprove, the SPARK proof tool.

Let's start with the specification of what the algorithm should do: computing the greatest common divisor of two positive numbers. The algorithm can be described for natural numbers or even negative numbers, but we'll keep things simple by using positive inputs. Given that Euclid himself only considered positive numbers, that should be enough for us here. A common divisor C of two integers A and B should divide both of them, which we usually express efficiently in software as (A mod C = 0) and (B mod C = 0). This gives the following contract for GCD:

 function Divides (A, B : in Positive) return Boolean is (B mod A = 0) with Ghost;

   function GCD (A, B : in Positive) return Positive with
     Post => Divides (GCD'Result, A)
       and then Divides (GCD'Result, B);

Note that we're already using some ghost code here: the function Divides is marked as Ghost because it is only used in specifications. The problem with the contract above is that it is satisfied by a very simple implementation that does not actually compute the GCD:

 function GCD (A, B : in Positive) return Positive is
   begin
      return 1;
   end GCD;

And GNATprove manages to prove that easily. So we need to specify the GCD function more precisely, by stating that it returns not only a common divisor of A and B, but the greatest of them:

 function GCD (A, B : in Positive) return Positive with
     Post => Divides (GCD'Result, A)
       and then Divides (GCD'Result, B)
       and then (for all X in GCD'Result + 1 .. Integer'Min (A, B) =>
                   not (Divides (X, A) and Divides (X, B)));

Here, I am relying on the property that the GCD is lower than both A and B, to only look for a greater divisor between the value returned (GCD'Result) and the minimum of A and B (Integer'Min (A, B)). That contract uniquely defines the GCD, and indeed GNATprove cannot prove that the previous incorrect implementation of GCD implements this contract.

Let's implement now the GCD, starting with a very simple linear search that looks for the greatest common divisor of A and B:

  function GCD (A, B : in Positive) return Positive is
      C : Positive := Integer'Min (A, B);
   begin
      while C > 1 loop
         exit when A mod C = 0 and B mod C = 0;
         pragma Loop_Invariant (for all X in C .. Integer'Min (A, B) =>
                                  not (Divides (X, A) and Divides (X, B)));
         C := C - 1;
      end loop;

      return C;
   end GCD;

We start at the minimum of A and B, and check if this value divides A and B. If so, we're done and we exit the loop. Otherwise we decrease this value by 1 and we continue. This is not efficient, but this computes the GCD as desired. In order to prove that implementation, we need a simple loop invariant that states that no common divisor of A and B has been encountered in the loop so far. GNATprove proves easily that this implementation implements the contract of GCD (with --level=2 -j0 in less than a second).

Can we increase the efficiency of the algorithm? If we had to do this computation by hand, we would try (Integer'Min (A, B) / 2) rather than (Integer'Min (A, B) - 1) as second possible value for the GCD, because we know that no value in between has a chance of dividing both A and B. Here is an implementation of that idea:

function GCD (A, B : in Positive) return Positive is
      C : Positive := Integer'Min (A, B);
   begin
      if A mod C = 0 and B mod C = 0 then
         return C;

      else
         C := C / 2;

         while C > 1 loop
            exit when A mod C = 0 and B mod C = 0;
            pragma Loop_Invariant (for all X in C .. Integer'Min (A, B) =>
                                     not (Divides (X, A) and Divides (X, B)));
            C := C - 1;
         end loop;

         return C;
      end if;
   end GCD;

I kept the same loop invariant as before, because we need that loop invariant to prove the postcondition of GCD. But GNATprove does not manage to prove that this invariant holds the first time that it enters the loop:

math_simple_half.ads:9:20: medium: postcondition might fail, cannot prove not (Divides (X, A) and Divides (X, B)

Indeed, we know that the GCD cannot lie between (Integer'Min (A, B) / 2) and Integer'Min (A, B) in that case, but provers don't. We have to provide the necessary information for GNATprove to deduce this fact. This is where we are going to use ghost code more extensively. As we need to prove a quantified property, we are going to establish it progressively, though a loop, accumulating the knowledge gathered in a loop invariant:

      ...
         C := C / 2;
         for J in C + 1 .. Integer'Min (A, B) - 1 loop
            pragma Loop_Invariant (for all X in C + 1 .. J =>
                                     not Divides (X, Integer'Min (A, B)));
         end loop;

         while C > 1 loop
         ...

With the help of this ghost code, the loop invariant in our original loop is now fully proved! But the loop invariant of the ghost loop itself is not proved, for iterations beyond the initial one:

math_simple_half.adb:24:38: medium: loop invariant might fail after first iteration, cannot prove not Divides (X, Integer'min)

The property we need here is that, given a value J between (Integer'Min (A, B) / 2 + 1) and (Integer'Min (A, B) - 1), J cannot divide Integer'Min (A, B). We are going to establish this property in a separate ghost lemma, that we'll call inside the loop to get this property. The lemma is an abstraction of the property just stated:

  procedure Lemma_Not_Divisor (Arg1, Arg2 : Positive) with
     Ghost,
     Global => null,
     Pre  => Arg1 in Arg2 / 2 + 1 .. Arg2 - 1,
     Post => not Divides (Arg1, Arg2)
   is
   begin
      null;
   end Lemma_Not_Divisor;

Even better, by isolating the desired property from its context of use, GNATprove can now prove it! Here, it is the underlying prover Z3 which manages to prove the postcondition of Lemma_Not_Divisor. It remains to call this lemma inside the loop:

         ...
         C := C / 2;
         for J in C + 1 .. Integer'Min (A, B) - 1 loop
            Lemma_Not_Divisor (J, Integer'Min (A, B));
            pragma Loop_Invariant (for all X in C + 1 .. J =>
                                     not Divides (X, Integer'Min (A, B)));
         end loop;

         while C > 1 loop
         ...

We're almost done. The only check that GNATprove does not manage to prove is the range check on the decrement to C in our original loop:

math_simple_half.adb:32:20: medium: range check might fail

It looks trivial though. The loop test states that (C > 1), so it should be easy to prove that C can be decremented and remain positive, right? Well, it helps here to understand how proof is carried on loops, by creating an artificial loop starting from the loop invariant in an iteration and ending in the same loop invariant at the next iteration. Because the loop invariant is not the first statement of the loop here, the loop test (C > 1) is not automatically added to the loop invariant (which would be incorrect in general, if the first statement modified C), so provers do not see this hypothesis when trying to prove the range check later. All that is needed here is to explicitly carry the loop test inside the loop invariant. Here is the final implementation of the more efficient GCD, which is proved easily by GNATprove (with --level=2 -j0 in less than two seconds):

 procedure Lemma_Not_Divisor (Arg1, Arg2 : Positive) with
     Ghost,
     Global => null,
     Pre  => Arg1 in Arg2 / 2 + 1 .. Arg2 - 1,
     Post => not Divides (Arg1, Arg2)
   is
   begin
      null;
   end Lemma_Not_Divisor;

   function GCD (A, B : in Positive) return Positive is
      C : Positive := Integer'Min (A, B);
   begin
      if A mod C = 0 and B mod C = 0 then
         return C;

      else
         C := C / 2;
         for J in C + 1 .. Integer'Min (A, B) - 1 loop
            Lemma_Not_Divisor (J, Integer'Min (A, B));
            pragma Loop_Invariant (for all X in C + 1 .. J =>
                                     not Divides (X, Integer'Min (A, B)));
         end loop;

         while C > 1 loop
            exit when A mod C = 0 and B mod C = 0;
            pragma Loop_Invariant (C > 1);
            pragma Loop_Invariant (for all X in C .. Integer'Min (A, B) =>
                                     not (Divides (X, A) and Divides (X, B)));
            C := C - 1;
         end loop;

         return C;
      end if;
   end GCD;

It's time to try to prove Euclid's variant of the algorithm, still with the same contract on GCD:

 function GCD (A, B : in Positive) return Positive is
      An : Positive := A;
      Bn : Natural := B;
      C  : Positive;
   begin
      while Bn /= 0 loop
         C  := An;
         An := Bn;
         Bn := C mod Bn;
      end loop;

      return An;
   end GCD;

The key insight here to justify the large jumps when decreasing the value of C is that, at each iteration of the loop, any common divisor between A and B is still a common divisor of An and Bn. And conversely. So the greatest common divisor of A and B turns out to be also the greatest common divisor of An and Bn. Which is just An when Bn turns out to be zero. Here is the loop invariant that expresses that property:

         pragma Loop_Invariant
           (for all X in Positive =>
              (Divides (X, A) and Divides (X, B))
                =
              (Divides (X, An) and (Bn = 0 or else Divides (X, Bn))));

In order to prove this loop invariant, we need to state a ghost lemma like before, that helps prove the property needed at a particular iteration of the loop:

 procedure Lemma_Same_Divisor_Mod (A, B : Positive) with
     Ghost,
     Global => null,
     Post => (for all X in Positive =>
                (Divides (X, A) and Divides (X, B))
                  =
                (Divides (X, B) and then (Divides (B, A) or else Divides (X, A mod B))))

Because this lemma is more complex, it is not proved automatically like the previous one. Here, we need to implement it with ghost code, extracting three further lemmas that help proving this one. Here is the complete code for this implementation:

procedure Lemma_Divisor_Mod (A, B, X : Positive) with
     Ghost,
     Global => null,
     Pre  => Divides (X, A) and then Divides (X, B) and then not Divides (B, A),
     Post => Divides (X, A mod B)
   is
   begin
      null;
   end Lemma_Divisor_Mod;

   procedure Lemma_Divisor_Transitive (A, B, X : Positive) with
     Ghost,
     Global => null,
     Pre  => Divides (B, A) and then Divides (X, B),
     Post => Divides (X, A)
   is
   begin
      null;
   end Lemma_Divisor_Transitive;

   procedure Lemma_Divisor_Mod_Inverse (A, B, X : Positive) with
     Ghost,
     Global => null,
     Pre  => not Divides (B, A) and then Divides (X, A mod B) and then Divides (X, B),
     Post => Divides (X, A)
   is
   begin
      null;
   end Lemma_Divisor_Mod_Inverse;

   procedure Lemma_Same_Divisor_Mod (A, B : Positive) with
     Ghost,
     Global => null,
     Post => (for all X in Positive =>
                (Divides (X, A) and Divides (X, B))
                  =
                (Divides (X, B) and then (Divides (B, A) or else Divides (X, A mod B))))
   is
   begin
      for X in Positive loop
         if Divides (X, A) and then Divides (X, B) and then not Divides (B, A) then
            Lemma_Divisor_Mod (A, B, X);
         end if;
         if Divides (B, A) and then Divides (X, B) then
            Lemma_Divisor_Transitive (A, B, X);
         end if;
         if not Divides (B, A) and then Divides (X, A mod B) and then Divides (X, B) then
            Lemma_Divisor_Mod_Inverse (A, B, X);
         end if;
         pragma Loop_Invariant
           (for all Y in 1 .. X =>
              (Divides (Y, A) and Divides (Y, B))
                =
              (Divides (Y, B) and then (Divides (B, A) or else Divides (Y, A mod B))));
      end loop;
   end Lemma_Same_Divisor_Mod;

   function GCD (A, B : in Positive) return Positive is
      An : Positive := A;
      Bn : Natural := B;
      C  : Positive;
   begin
      while Bn /= 0 loop
         C  := An;
         An := Bn;
         Bn := C mod Bn;
         Lemma_Same_Divisor_Mod (C, An);
         pragma Loop_Invariant
           (for all X in Positive =>
              (Divides (X, A) and Divides (X, B))
                =
              (Divides (X, An) and (Bn = 0 or else Divides (X, Bn))));
      end loop;

      pragma Assert (Divides (An, An));
      pragma Assert (Divides (An, A));
      pragma Assert (Divides (An, B));

      pragma Assert (for all X in An + 1 .. Integer'Min (A, B) =>
                       not (Divides (X, A) and Divides (X, B)));
      return An;
   end GCD;

The implementation of GCD as well as the main lemma Lemma_Same_Divisor_Mod are proved easily by GNATprove (with --level=2 -j0 in less than ten seconds). The three elementary lemmas Lemma_Divisor_Mod, Lemma_Divisor_Transitive and Lemma_Divisor_Mod_Inverse are not proved automatically. Currently I only reviewed them and convinced myself that they were true. A better solution in the future will be to prove them in Coq, which should not be difficult given that there are similar existing lemmas in Coq, and include them in the SPARK lemma library.

All the code for these versions of GCD can be found in the GitHub repository of SPARK 2014, including another version of GCD where I changed the definition of Divides to reflect more accurately the mathematical notion of divisibility:

   function Divides (A, B : in Positive) return Boolean is
     (for some C in Positive => A * C = B)
   with Ghost;

This is also proved automatically by GNATprove, and this proof also requires the introduction of suitable lemmas that relate the mathematical notion of divisibility to its programming counterpart (B mod A = 0).

]]>
New strings package in GNATCOLL http://blog.adacore.com/new-strings-package-in-gnatcoll Tue, 04 Apr 2017 13:30:00 +0000 Emmanuel Briot http://blog.adacore.com/new-strings-package-in-gnatcoll

New strings package in GNATCOLL

GNATCOLL has recently acquired two new packages, namely GNATCOLL.Strings and GNATCOLL.Strings_Impl. The latter is a generic package, one instance of which is provided as GNATCOLL.Strings.

But why a new strings package? Ada already has quite a lot of ways to represent and manipulate strings, is a new one needed?

This new package is an attempt at finding a middle ground between the standard String type (which is efficient but inflexible) and unbounded strings (which are flexible, but could be more efficient).

GNATCOLL.Strings therefore provides strings (named XString, as in extended-strings) that can grow as needed (up to Natural'Last, like standard strings), yet are faster than unbounded strings. They also come with an extended API, which includes all primitive operations from unbounded strings, in addition to some subprograms inspired from GNATCOLL.Utils and the Python and C++ programming languages.

Small string optimization

GNATCOLL.Strings uses a number of tricks to improve on the efficiency.  The most important one is to limit the number of memory allocations.  For this, we use a trick similar to what all C++ implementations do nowadays, namely the small string optimization.

The idea is that when a string is short, we can avoid all memory allocations altogether, while still keeping the string type itself small. We therefore use an Unchecked_Union, where a string can be viewed in two ways:

    Small string

      [f][s][ characters of the string 23 bytes               ]
         f  = 1 bit for a flag, set to 0 for a small string
         s  = 7 bits for the size of the string (i.e. number of significant
              characters in the array)

    Big string

      [f][c      ][size     ][data      ][first     ][pad    ]
         f = 1 bit for a flag, set to 1 for a big string
         c = 31 bits for half the capacity. This is the size of the buffer
             pointed to by data, and which contains the actual characters of
             the string.
         size = 32 bits for the size of the string, i.e. the number of
             significant characters in the buffer.
         data = a pointer (32 or 64 bits depending on architecture)
         first = 32 bits, see the handling of substrings below
         pad = 32 bits on a 64 bits system, 0 otherwise.
             This is because of alignment issues.

So in the same amount of memory (24 bytes), we can either store a small string of 23 characters or less with no memory allocations, or a big string that requires allocation. In a typical application, most strings are smaller than 23 bytes, so we are saving very significant time here.

This representation has to work on both 32 bits systems and 64 bits systems, so we have careful representation clauses to take this into account.  It also needs to work on both big-endian and little-endian systems. Thanks to Ada's representation clauses, this one in fact relatively easy to achieve (well, okay, after trying a few different approaches to emulate what's done in C++, and that did not work elegantly). In fact, emulating via bit-shift operations ended up with code that was less efficient than letting the compiler do it automatically because of our representation clauses.

Character types

Applications should be able to handle the whole set of unicode characters. In Ada, these are represented as the Wide_Character type, rather than Character, and stored on 2 bytes rather than 1. Of course, for a lot of applications it would be wasting memory to always store 2 bytes per character, so we want to give flexibility to users here.

So the package GNATCOLL.Strings_Impl is a generic. It has several formal parameters, among which:

   * Character_Type is the type used to represent each character. Typically,  it will be Character, Wide_Character, or even possibly Wide_Wide_Character. It could really be any scalar type, so for instance we could use this package to represent DNA with its 4-valued nucleobases.

   * Character_String is an array of these characters, as would be represented in Ada. It will typically be a String or a Wide_String. This type is used to make this package work with the rest of the Ada world.

Note about unicode: we could also always use a Character, and use UTF-8 encoding internally. But this makes all operations (from taking the length to moving the next character) slower, and more fragile. We must make sure not to cut a string in the middle of a multi-byte sequence. Instead, we manipulate a string of code points (in terms of unicode). A similar choice is made in Ada (String vs Wide_String), Python and C++.

Configuring the size of small strings

The above is what is done for most C++ implementations nowadays.  The maximum 23 characters we mentioned for a small string depends in fact on several criteria, which impact the actual maximum size of a small string:

   * on 32 bits system, the size of the big string is 16 bytes, so the maximum size of a small string is 15 bytes.
   * on 64 bits system, the size of the big string is 24 bytes, so the maximum size of a small string is 23 bytes.
   * If using a Character as the character type, the above are the actual number of characters in the string. But if you are using a Wide_Character, this is double the maximum length of the string, so a small string is either 7 characters or 11 characters long.

This is often a reasonable number, and given that applications mostly use small strings, we are already saving a lot of allocations. However, in some cases we know that the typical length of strings in a particular context is different. For instance, GNATCOLL.Traces builds messages to output in the log file. Such messages will typically be at most 100 characters, although they can of course be much larger sometimes.

We have added one more formal parameter to GNATCOLL.Strings_Impl to control the maximum size of small strings. If for instance we decide that a "small" string is anywhere from 1 to 100 characters long (i.e. we do not want to allocate memory for those strings), it can be done via this parameter. Of course, in such cases the size of the string itself becomes much larger.

In this example it would be 101 bytes long, rather than the 24 bytes.  Although we are saving on memory allocations, we are also spending more time copying data when the string is passed around, so you'll need to measure the performance here.

The maximum size for the small string is 127 bytes however, because this size and the 1-bit flag need to fit in 1 bytes in the representation clauses we showed above. We tried to make this more configurable, but this makes things significantly more complex between little-endian and big-endian systems, and having large "small" strings would not make much sense in terms of performance anyway.

Typical C++ implementations do not make this small size configurable.

Task safety

Just like unbounded strings, the strings in this package are not thread safe. This means that you cannot access the same string (read or write) from two different threads without somehow protecting the access via a protected type, locks,...

In practice, sharing strings would rarely be done, so if the package itself was doing its own locking we would end up with very bad performance in all cases, for a few cases where it might prove useful.

As we'll discuss below, it is possible to use two different strings that actually share the same internal buffer, from two different threads. Since this is an implementation detail, this package takes care of guaranteeing the integrity of the shared data in such a case.

Copy on write

There is one more formal parameter, to configure whether this package should use copy-on-write or not. When copy on write is enabled, you can have multiple strings that internally share the same buffer of characters. This means that assigning a string to another one becomes a reasonably fast operation (copy a pointer and increment a refcount). Whenever the string is modified, a copy of the buffer is done so that other copies of the same string are not impacted.

But in fact, there is one drawback with this scheme: we need reference counting to know when we can free the shared data, or when we need to make a copy of it. This reference counting must be thread safe, since users might be using two different strings from two different threads, but they share data internally.

Thus the reference counting is done via atomic operations, which have some impact on performance. Since multiple threads try to access the same memory addresses, this is also a source of contention in multi-threaded applications.

For this reason, the current C++ standard prevents the use of copy-on-write for strings.

In our cases, we chose to make this configurable in the generic, so that users can decide whether to pay the cost of the atomic operations, but save on the number of memory allocations and copy of the characters.  Sometimes it is better to share the data, but sometimes it is better to systematically copy it. Again, actual measurements of the performance are needed for your specific application.

Growth strategy

When the current size of the string becomes bigger than the available allocated memory (for instance because you are appending characters), this package needs to reallocate memory. There are plenty of strategies here, from allocating only the exact amount of memory needed (which saves on memory usage, but is very bad in terms of performance), to doubling the current size of the string until we have enough space, as currently done in the GNAT unbounded strings implementation.

The latter approach would therefore allocate space for two characters, then for 4, then 8 and so on.

This package has a slightly different strategy. Remember that we only start allocating memory past the size of small strings, so we will for instance first allocate 24 bytes. When more memory is needed, we multiply this size by 1.5, which some researchers have found to be a good comprise between waste of memory and number of allocations. For very large strings, we always allocate multiples of the memory page size (4096 bytes), since this is what the system will make available anyway. So we will basically allocate the following: 24, 36, 54, 82, 122,...

An additional constraint is that we only ever allocate even number of bytes. This is called the capacity of the string. In the layout of the big string, as shown above, we store half that capacity, which saves one bit that we use for the flag.

Growing memory

This package does not use the Ada new operator. Instead, we use functions from System.Memory directly, in part so that we can use the realloc system call. This is much more efficient when we need to grow the internal buffer, since in most cases we won't have to copy the characters at all. Saving on those additional copies has a significant impact, as we'll see in the performance measurements below.

Substrings

One other optimization performed by this package (which is not done for unbounded strings or various C++ implementations) is to optimize substrings when also using copy-on-write.

We simply store the index within the shared buffer of the first character of the string , instead of always starting at one.

From the user's point of view, this is an implementation detail. Strings are always indexed from 1, and internally we convert to an actual position in the buffer. This means that if we need to reallocate the buffer, for instance when the string is modified, we transparently change the index of the first character, but the indexes the user was using are still valid.

This results in very significant savings, as shown below in the timings for Trim for instance. Also, we can do an operation like splitting a string very efficiently.

For instance, the following code doesn't allocate any memory, beside setting the initial value of the string. It parses a file containing some "key=value" lines, with optional spaces, and possibly empty lines:



     declare
        S, Key, Value : XString;
        L             : XString_Array (1 .. 2);
        Last          : Natural;
     begin
        S.Set (".......");

        --  Get each line
        for Line in S.Split (ASCII.LF) loop

           --  Split into at most two substrings
           Line.Split ('=', Into => L, Last => Last);

           if Last = 2 then
              Key := L (1);
              Key.Trim;    --  Removing leading and trailing spaces

              Value := L (2);
              Value.Trim;

           end if;
        end loop;
     end;

Conclusion

We use various tricks to improve the performance over unbounded strings.  From not allocating any memory when a string is 24 bytes (or a configurable limit) or less, to growing the string as needed via a careful growth strategy, to sharing internal data until we need to modify it, these various techniques combine to provide a fast and flexible implementation.

Here are some timing experiments done on a laptop (multiple operating systems lead to similar results). The exact timings are irrelevant, so the results are given in percentage of what the unbounded string takes to performance similar operations.

We configured the GNATCOLL.Strings package in different ways, either with or without copy-on-write, and with a small string size from the default 0..23 bytes or a larger 0..127 bytes.

    Setting a small string multiple times (e.g. Set_Unbounded_String)

        unbounded_string                 = 100 %
        xstring-23 without copy on write =  11 %
        xstring-23 with copy on write    =  12 %
        xstring-127 with copy on write   =  17 %

        Here we see that not doing any memory allocation makes XString
        much faster than unbounded string. Most of the time is spent
        copying characters around, via memcpy.

    Setting a large string multiple times (e.g. Set_Unbounded_String)

        unbounded_string                 = 100 %
        xstring-23 without copy on write =  41 %
        xstring-23 with copy on write    =  50 %
        xstring-127 with copy on write   =  32 %

        Here, XString apparently proves better are reusing already
        allocated memory, although the timings are similar when creating
        new strings instead. Most of the difference is probably related to the
        use of realloc instead of alloc.

    Assigning small strings (e.g.   S2 := S1)

        unbounded_string                 = 100 %
        xstring-23 without copy on write =  31 %
        xstring-23 with copy on write    =  27 %
        xstring-127 with copy on write   =  57 %

    Assigning large strings (e.g.   S2 := S1)

        unbounded_string                 = 100 %
        xstring-23 without copy on write = 299 %
        xstring-23 with copy on write    =  63 %
        xstring-127 with copy on write   =  60 %

        When not using copy-on-write (which unbounded strings do), we need
        to reallocate memory, which shows on the second line.

    Appending to large string  (e.g.    Append (S, "...."))

        unbounded_string                 = 100 %
        xstring-23 without copy on write =  39 %
        xstring-23 with copy on write    =  48 %
            same, with tasking           = 142 %
        xstring-127 with copy on write   =  49 %

        When we use tasking, XStrings use atomic operations for the reference
        counter, which slows things down. They become slower than unbounded
        strings, because the latter in fact have a bug when using two
        different strings from two different threads, and they share data
        (they try to save on an atomic operation... this bug is being worked
        on).

    Removing leading and trailing spaces (e.g.   Trim (S, Ada.Strings.Both))

        unbounded_string                 = 100 %
        xstring-23 without copy on write =  50 %
        xstring-23 with copy on write    =  16 %
        xstring-127 with copy on write   =  18 %

        Here we see the benefits of the substrings optimization, which shares
        data for the substrings.
]]>
Simics helps run 60 000 GNAT Pro tests in 24 hours http://blog.adacore.com/simics-helps-run-60-000-gnat-pro-tests-in-24-hours Fri, 31 Mar 2017 14:06:00 +0000 Jerome Guitton http://blog.adacore.com/simics-helps-run-60-000-gnat-pro-tests-in-24-hours

This post has been updated in March 2017 and was originally posted in March 2016.

A key aspect of AdaCore’s GNAT Pro offering is the quality of the product we’re delivering and our proactive approach to resolving issues when they appear. To do so, we need both intensive testing before delivering anything to our customers and to produce “wavefront” versions every day for each product we offer. Doing so each and every day is a real challenge, considering the number of supported configurations, the number of tests to run, and the limit of a 24-hour timeframe. At AdaCore, we rely heavily on virtualization as part of our testing strategy. In this article, we will describe the extent of our GNAT Pro testing on VxWorks, and how Simics helped us meet these challenges.

Broad Support for VxWorks Products

The number of tests to run is proportional to the number of configurations that we support. We have an impressive matrix of configurations to validate:

  • Versions: VxWorks, VxWorks Cert, and VxWorks 653… available on the full range of versions that we support (e.g. 5.5, 6.4 to 6.9, 653 2.1 to 2.5...)
  • CPUs: arm, ppc, e500v2, and x86...
  • Program type: Real Time Process, Downloadable Kernel Module, Static Kernel Module, vThreads, ARINC 653 processes, and with the Cert subsets…
  • Ada configuration variants: zero cost exceptions versus setjmp/longjmp exceptions and Ravenscar tasking profile versus full Ada tasking...

Naturally, there are some combinations in this matrix of possibility that are not supported by GNAT, but the reality is we cover most of it. So the variety of configurations is very high.

The matrix is growing fast. Between 2013 and 2015, we have widened our offer to support the new VxWorks ports (VxWorks 7, VxWorks 653 3.0.x) on a large range of CPUs (arm, e500v2, ppc..), including new configurations that were needed by GNAT Pro users (x86_64). This represents 32 new supported configurations to be added to the 168 existing ones. It was obviously a challenge to qualify all these new versions against our existing test suites, with the goal of supporting the new VxWorks versions as soon as possible.

Simics supports a wide range of VxWorks configurations, and has a good integration with the core OS itself. This made Simics a natural solution for our testing strategy of GNAT Pro. On VxWorks 7 and 653 3.0.x, it allowed us to quickly set up an appropriate testing environment, as predefined material exists to make it work smoothly with most Wind River OSes; we could focus on our own technology right away, instead of spending time on developing, stabilizing and maintaining a new testing infrastructure from scratch.

Representative Testing on Virtual Hardware

Another benefit of Simics is that it not only supports all versions of VxWorks, but also emulates a wide range of hardware platforms. This allows us to test on platforms which are representative of the hardware that will be used in production by GNAT Pro users.

A stable framework for QA led to productivity increases

Stability is an important property of the testing framework; otherwise, “glitches” caused by the testing framework start causing spurious failures, often randomly, that the QA team then needs to analyze each time. Multiplied by the very large number of tests we run each day, and the large number of platforms we test on, lack of stability can quickly lead to an unmanageable situation. Trust of the test framework is a key factor for efficiency.

In that respect, and despite the heavy parallelism required in order to complete the validation of all our products in time, Simics proved to be a robust solution and behaved well under pressure, thus helping us focus our energy on differences caused by our tools, rather than on spurious differences caused by the testing framework.

Test execution speed allowed extensive testing

Some additional info about how broad our testing is. On VxWorks, we mostly have three sets of testsuites:

  • The standard Ada validation testsuite (ACATS): around 3,600 tests;
  • The regression testing of GNAT: around 15,000 tests;
  • Tool-specific testsuites: around 1,800 tests.

All in all, just counting the VxWorks platforms, we are running around 350,000 tests each day. 60,000 of them are run on Simics, mostly on the more recent ports (VxWorks 7 and 653 3.x).

In order to run all these tests, an efficient testing platform is needed. With Simics, we were able to optimize the execution by:

  • Proper tuning of the simulation target;
  • Stopping and restarting the platform at an execution point where tests can be run right away, using checkpoints;
  • Developing additional plugins to have efficient access to the host filesystem from the simulator.

We will give more technical details about these optimizations in a future article.

March 2017 Update

AdaCore’s use of Simics has not substantially changed since last year: it is still a key component of GNAT Pro’s testing strategy on VxWorks platforms. In 2017, GNAT Pro has been ported to PowerPC 64 VxWorks 7; with its broad support for VxWorks products, Simics has been a natural solution to speed up this port.

April 2018 Update

This year we smoothly switched to Simics 5 on all supported configurations; Simics has been of great help to speed up the new port of GNAT Pro to AArch64 VxWorks 7.


]]>
Two Projects to Compute Stats on Analysis Results http://blog.adacore.com/two-projects-to-compute-stats-on-analysis-results Thu, 30 Mar 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/two-projects-to-compute-stats-on-analysis-results

The project by Daniel King allows you to extract the results from the log file gnatprove.out generated by GNATprove, into an Excel spreadsheet. What's nice is that you can then easily sort units according to the metrics you are following inside your spreadsheet viewer.

The project by Martin Becker allows you to extract the results from the JSON files generated by GNATprove for each unit analyzed, aggregate these results, and output them into either textual or JSON format. What's nice is that you can then integrate the generated JSON with your automated scripts/setup.

I have installed both on my laptop, and used them on examples such as Tokeneer, and I am happy to report that the results are consistent! :-) I don't know yet which one will become more useful with my setup, but both are worth a try.

]]>
GNATcoverage moves to GitHub http://blog.adacore.com/gnatcoverage-moves-to-github Wed, 29 Mar 2017 13:13:51 +0000 Pierre-Marie de Rodat http://blog.adacore.com/gnatcoverage-moves-to-github

Following the current trend, the GNATcoverage project moves to GitHub! Our new address is: https://github.com/AdaCore/gnatcoverage

GNATcoverage is a tool we developed to analyze and report program coverage. It supports the Ada and C programming languages, several native and embedded platforms, as well as various coverage criteria, from object code level instruction and branch coverage up to source level decision or MC/DC coverage, qualified for use in avionics certification contexts. For source-level analysis, GNATcoverage works hand-in-hand with the GNAT Pro compilers.

Originally developed as part of the Couverture research project, GNATcoverage became a supported product for a first set of targets in the 2009/2010 timeframe.

Since the beginning of the project, the development happened on the OpenDO forge. This has served us well, but we are now in the process of moving all our projects to GitHub. What does this change for you?

  • We will now use GitHub issues and pull requests for discussions that previously happened on mailing lists. We hope these places will be more visible and community-friendly.

In the near future, we’ll close the project on the OpenDO forge. We are keen to see you on GitHub!

]]>
Writing on Air http://blog.adacore.com/writing-on-air Mon, 27 Mar 2017 13:00:00 +0000 Jorge Real http://blog.adacore.com/writing-on-air

While searching for motivating projects for students of the Real-Time Systems course here at Universitat Politècnica de València, we found a curious device that produces a fascinating effect. It holds a 12 cm bar from its bottom and makes it swing, like an upside-down pendulum, at a frequency of nearly 9 Hz. The free end of the bar holds a row of eight LEDs. With careful and timely switching of those LEDs, and due to visual persistence, it creates the illusion of text... floating in the air!

The web shows plenty of references to different realizations of this idea. They are typically used for displaying date, time, and also rudimentary graphics. Try searching for "pendulum clock LED", for example. The picture in Figure 1 shows the one we are using.

Figure 1. The pendulum, speaking about itself

The software behind this toy is a motivating case for the students, and it contains enough real-time and synchronisation requirements to also make it challenging. 

We have equipped the lab with a set of these pendulums, from which we have disabled all the control electronics and replaced them with STM32F4 Discovery boards. We use also a self-made interface board (behind the Discovery board in Figure 1) to connect the Discovery with the LEDs and other relevant signals of the pendulum. The task we propose our students is to make it work under the control of a Ravenscar program running on the Discovery. We use GNAT GPL 2016 for ARM hosted on Linux, along with the Ada Drivers Library.

There are two different problems to solve: one is to make the pendulum bar oscillate with a regular period; the other one is to then use the LEDs to display some text.

Swing that bar!

The bar is fixed from the bottom to a flexible metal plate (see Figure 2). The stable position of the pendulum is vertical and still. There is a permanent magnet attached to the pendulum, so that the solenoid behind it can be energised to cause a repulsion force that makes the bar start to swing.

Figure 2. Pendulum mechanics
Figure 3. Detail of barrier pass detector

At startup, the solenoid control is completely blind to the whereabouts of the pendulum. An initial sequence must be programmed with the purpose of moving the bar enough to make it cross the barrier (see detail in Figure 3), a pass detector that uses an opto-coupler sensor located slightly off-center the pendulum run. This asymmetry is crucial, as we'll soon see.

Once the bar crosses the barrier at least three times, we have an idea about the pendulum position along time and we can then apply a more precise control sequence to keep the pendulum swinging regularly. The situation is pretty much like swinging a kid swing: you need to give it a small, regular push, at the right time. In our case, that time is when the pendulum enters the solenoid area on its way to the right side, since the solenoid repels the pendulum rightwards. That happens at about one sixth of the pendulum cycle, so we first need to know when the cycle starts and what duration it has. And for that, we need to pay close attention to the only input of the pendulum: the barrier signal.

Figure 4 sketches a chronogram of the barrier signal. Due to its asymmetric placement, the signal captured from the opto-coupler is also asymmetric.

Figure 4. Chronogram of the barrier signal and correspondence with extreme pendulum positions

To determine the start time and period of the next cycle, we take note of the times when rising and falling edges of the barrier signal occur. This is easy work for a small Finite State Machine (FSM), triggered by barrier interrupts to the Discovery board. Once we have collected the five edge times T1 to T5 (normally would correspond to 2 full barrier crossings plus the start of a third one) we can calculate the period by subtracting T5 - T1. Regarding the start time of the next cycle, we know the pendulum initiated a new cycle (reached its left-most position) just in between the two closest pulses (times T1 and T4). So, based on the information gathered, we estimate that the next cycle will start at time T5 + (T4 - T1) / 2.

But… all we know when we detect a barrier edge is whether it is rising or falling. So, when we detect the first rising edge of Barrier, we can't be sure whether it corresponds to T1 (the second barrier crossing) or T3 (the first). We have arbitrarily guessed it is T1, so we must verify this guess and fix things if it was incorrect. This check is possible precisely due to the asymmetric placement of the pass detector: if our guess was correct, then T3 - T1 should be less than T5 - T3. Otherwise we need to re-assign our measurements (T3, T4 and T5 become T1, T2 and T3) and then move on to the adequate FSM state (waiting for T4).

Once we know when the pendulum will be in the left-most position (the cycle start time) and the estimated duration of the next cycle, we can give a solenoid pulse at the cycle start time plus one sixth of the period. The pulse duration, within reasonable range, affects mostly the amplitude of the pendulum run, but not so much its period. Experiments with pulse durations between 15 and 38 milliseconds showed visible changes in amplitude, but period variations of only about 100 microseconds, for a period of 115 milliseconds (less than 0.1%). We found 18-20 ms to work well.

So, are we done with the pendulum control? Well... almost, but no, we’re not: we are also concerned by robustness. The software must be prepared for unexpected situations, such as someone or something suddenly stopping the bar. If our program ultimately relies on barrier interrupts and they do not occur, then it is bound to hang. A timeout timing event is an ideal mechanism to revive a dying pendulum. If the timeout expires, then the barrier-based control is abandoned and the initialisation phase engaged again, and again if needed, until the pendulum makes sufficient barrier crossings to let the program retake normal operation. After adding this recovery mechanism, we can say we are done with the pendulum control: the bar will keep on swinging while powered.

Adding lyrics to that swing

Once the pendulum is moving at a stable rate, we are ready to tackle the second part of the project: using the eight LEDs to display some text. Knowing the cycle start time and estimated period duration, one can devise a plan to display each line of a character at the proper period times. We have already calculated the next cycle start time and duration for the pendulum control. All we need to do now is to timely provide that information to a displaying task.

Figure 5. Time to display an exclamation mark!

The pendulum control functions described above are implemented by a package with the following (abbreviated) specification:

        with STM32F4;       use STM32F4;
        with Ada.Real_Time; use Ada.Real_Time;

        package Pendulum_IO is

           --  Set LEDs using byte pattern (1 => On, 0 => Off)
           procedure Set_LEDs (Pattern : in Byte);  

           --  Synchronization point with start of new cycle
           procedure Wait_For_Next_Cycle (Init_Time      : out Time; 
                                          Cycle_Duration : out Time_Span);

        private
              task P_Controller with Storage_Size => 4 * 1024;
        end Pendulum_IO;

The specification includes subprograms for setting the LEDs (only one variant shown here) and procedure Wait_For_Next_Cycle, which in turn calls a protected entry whose barrier (in the Ada sense, this time) is opened by the barrier signal interrupt handler, when the next cycle timing is known. This happens at time T5 (see Figure 4), when the current cycle is about to end but with sufficient time before the calling task must start switching LEDs. The P_Controller task in the private part is the one in charge of keeping the pendulum oscillating.

Upon completion of a call to Wait_For_Next_Cycle, the caller knows the start time and period of the next pendulum cycle (parameters Init_Time and Cycle_Period). By division of the period, we can also determine at what precise times we need to switch the LEDs. Each character is encoded using an 8 tall x 5 wide dot matrix, and we want to fit 14 characters in the display. Adding some left and right margins to avoid the slowest segments, and a blank space to the right of each character, we  subdivide the period in 208 lines. These lines represent time windows to display each particular character chunk. Since the pendulum period is around 115 milliseconds, it takes just some 550 microseconds for the pendulum to traverse one line.

If that seems tight, there is an even tighter requirement than this inter-line delay. The LEDs must be switched on only during an interval between 100 and 200 microseconds. Otherwise we would see segments, rather than dots, due to the pendulum speed. This must also be taken into account when designing the plan for the period, because the strategy changes slightly depending on the current pendulum direction. When it moves from left to right, the first 100 microseconds of a line correspond to it's left part, whereas the opposite is true for the opposite direction.


Dancing with the runtime

Apart from careful planning of the sequence to switch the LEDs, this part is possibly less complex, due to the help of Wait_For_Next_Cycle. However, the short delays imposed by the pendulum have put us in front of a limitation of the runtime support. The first try to display some text was far from satisfactory. Often times, dots became segments. Visual glitches happened all the time as well. Following the track to this issue, we ended up digging into the Ravenscar runtime (the full version included in GNAT GPL 2016) to eventually find that the programmed precision for timing events and delay statements was set to one millisecond. This setting may be fine for less demanding applications, and it causes a relatively low runtime overhead; but it was making it impossible for us to operate within the pendulum’s tight delays. Things started to go well after we modified and recompiled the runtime sources to make delays and timing events accurate to 10 microseconds. It was just a constant declaration, but it was not trivial to find it! Definitely, this is not a problem we ask our students to solve: they use the modified runtime.

If you come across the same issue and the default accuracy of 1 millisecond is insufficient for your application, look for the declaration of constant Tick_Period in the body of package System.BB.Board_Support (file s-bbbosu.adb in the gnarl-common part of either the full or the small footprint versions of the runtime). For an accuracy of 10 microseconds, we set the constant to Clock_Frequency / 100_000.

More fun

There are many other things that can be done with the pendulum, such as scrolling a text longer than the display width, or varying the scrolling speed by pressing the user button in the Discovery board (both features are illustrated in the video below, best viewed in HD); or varying the intensity of the text by changing the LEDs flashing time; or displaying graphics rather than just text... 

One of the uses we have given the pendulum is as a chronometer display for times such as the pendulum period, the solenoid pulse width, or other internal program delays. This use has proved very helpful to better understand the process at hand and also to diagnose the runtime delay accuracy issue. 

The pendulum can also be used as a rudimentary oscilloscope. Figure 6 shows the pendulum drawing the chronograms of the barrier signal and the solenoid pulse. The top two lines represent these signals, respectively, as the pendulum moves rightwards. The two bottom lines are for the leftwards semi-period and must be read leftwards. In Figure 7, the two semi-periods are chronologically re-arranged. The result cannot be read as in a real oscilloscope, because of the varying pendulum speed; but knowing that, it is indicative.

Figure 6. Pendulum used as an oscilloscope (original image)
Figure 7. Oscilloscope image, chronologically re-arranged

Want to see it?

I plan to take a pendulum with me to the Ada-Europe 2017 Conference in Vienna. It will be on display during the central days of the conference (13, 14 and 15 June) and I'll be around for questions and suggestions.

Credit, where it's due

My colleague Ismael Ripoll was the one who called my attention to the pendulum, back in 2005. We implemented the text part only (we did not disconnect the solenoid from the original pendulum's microcontroller). Until porting (and extending) this project to the Discovery board, we’ve been using an industrial PC with a digital I/O card to display text in the pendulum. The current setup is about two orders of magnitude cheaper. And it also fits much better the new focus of the subject on real-time and also embedded systems.

I'm thankful to Vicente Lliso, technician at the DISCA department of UPV, for the design and implementation of the adapter card connecting the Discovery board with the pendulum, for his valuable comments and for patiently attending my requests for small changes here and there.

My friend and amateur photographer Álvaro Doménech produced excellent photographical material to decorate this entry, as well as the pendulum video. Álvaro is however not to be blamed for the oscilloscope pictures, which I took with just a mobile phone camera.

And many thanks to Pat Rogers, from AdaCore, who helped me with the delay accuracy issue and invited me to write this entry. It was one of Pat's excellent tutorials at the Ada-Europe conference that pushed me into giving a new angle to this pendulum project... and to others in the near future!

]]>
SPARK Tetris on the Arduboy http://blog.adacore.com/spark-tetris-on-the-arduboy Mon, 20 Mar 2017 13:00:00 +0000 Fabien Chouteau http://blog.adacore.com/spark-tetris-on-the-arduboy

One of us got hooked on the promise of a credit-card-size programmable pocket game under the name of Arduboy and participated in its kickstarter in 2015. The kickstarter was successful (but late) and delivered  the expected working board in mid 2016. Of course, the idea from the start was to program it in Ada , but this is an 8-bits AVR microcontroller (the ATmega32u4 by Atmel) not supported anymore by GNAT Pro. One solution would have been to rebuild our own GNAT compiler for 8-bit AVR from the GNAT FSF repository and use the AVR-Ada project. Another solution, which we explore in this blog post, is to use the SPARK-to-C compiler that we developed at AdaCore to turn our Ada code into C and then use the Arduino toolchain to compile for the Arduboy board.

This is in fact a solution we are now proposing to those who need to compile their code for a target where we do not propose an Ada compiler, in particular small microcontrollers used in industrial automation and automotive industries. Thanks to SPARK-to-C, you can now develop your code in SPARK, compile it to C, and finally compile the generated C code to your target. We have built the universal SPARK compiler! This product will be available to AdaCore customers in the coming months.

We started from the version of Tetris in SPARK that we already ported to the Atmel SAM4S, Pebble-Time smartwatch and Unity game engine. For the details on what is proved on Tetris, see the recording of a talk at FOSDEM 2017 conference. The goal was to make this program run on the Arduboy.

SPARK-to-C Compiler

What we call the SPARK-to-C compiler in fact accepts both less and more than SPARK language as input. It allows pointers (which are not allowed in SPARK) but rejects tagged types and tasking (which are allowed in SPARK). The reason this is the case is that it’s easy to compile Ada pointers into C pointers but much harder to support object oriented or concurrent programming.

SPARK-to-C supports, in particular, all of Ada’s scalar types (enumerations, integers, floating-point, fixed-point, and access) as well as records and arrays and subtypes of these. More importantly, it can generate all the run-time checks to detect violations of type constraints such as integer and float range checks and checks for array accesses out of bounds and access to a null pointer or invalid pointer. Therefore, you can program in Ada and get the guarantee that the executable compiled from the C code generated by SPARK-to-C preserves the integrity of the program, as if you had compiled it directly from Ada with GNAT.

Compiling Ada into C poses interesting challenges. Some of them are resolved by following the same strategy used by GNAT during compilation to binary code. For example, bounds of unconstrained arrays are bundled with the data for the array in so-called "fat pointers", so that both code that directly references Array'First and Array'Last as well as runtime checks for array accesses can access the array bounds in C. This is also how exceptions, both explicit in the code and generated for runtime checks, are handled. Raising an exception is translated into a call to the so-called "last chance handler", a function provided by the user that can perform some logging before terminating the program. This is exactly how exceptions are handled in Ada for targets that don’t have runtime support. In general, SPARK-to-C provides very little runtime support, mostly for numerical computations (sin, cosine, etc.), accessing a real time clock, and outputting characters and strings. Other features require specific source-to-source transformations of Ada programs. For example, functions that return arrays in Ada are transformed into procedures with an additional output parameter (a pointer to some preallocated space in the caller) in C.

The most complex part of SPARK-to-C deals with unnesting nested subprograms because, while GCC supports nested functions as an extension, this is not part of standard C. Hence C compilers cannot be expected to deal with nested functions. Unnesting in SPARK-to-C relies on a tight integration of a source-to-source transformation of Ada code in the GNAT frontend, with special handling of nested subprograms in the C-generation backend. Essentially, the GNAT frontend creates an 'activation record' that contains a pointer field for each uplevel variable referenced in the nested subprogram. The nested subprogram is then transformed to reference uplevel variables through the pointers in the activation record passed as additional parameters. A further difficulty is making this work for indirect references to uplevel variables and through references to uplevel types based on these variables (for example the bound of an array type). SPARK-to-C deals also with these cases: you can find all details in the comments of the compiler file exp_unst.ads

Compiling Tetris from SPARK to C

Once SPARK-to-C is installed, the code of Tetris can be compiled into C with the version of GPRbuild that ships i SPARK-to-C:

$ gprbuild -P<project> --target=c

For example, the SPARK expression function Is_Empty from Tetris code:

function Is_Empty (B : Board; Y : Integer; X : Integer) return Boolean is
      (X in X_Coord and then Y in Y_Coord and then B(Y)(X) = Empty);

is compiled into the C function tetris_functional__is_empty, with explicit checking of array bounds before accessing the board:

boolean tetris_functional__is_empty(tetris_functional__board b, integer y, integer x) {
  boolean C123s = false;
  if ((x >= 1 && x <= 10) && (y >= 1 && y <= 50)) {
    if (!((integer)y >= 1 && (integer)y <= 50))
      __gnat_last_chance_handler(NULL, 0);
    if (!((integer)x >= 1 && (integer)x <= 10))
      __gnat_last_chance_handler(NULL, 0);
    if ((b)[y - 1][x - 1] == tetris_functional__empty) {
      C123s = true;
    }
  }
  return (C123s);
}

or into the following simpler C function when using compilation switch -gnatp to avoid runtime checking:

boolean tetris_functional__is_empty(tetris_functional__board b, integer y, integer x) {
  return (((x >= 1 && x <= 10) && (y >= 1 && y <= 50)) && ((b)[y - 1][x - 1] == tetris_functional__empty));
}

SPARK to C Tetris

Running on Arduboy

To interface the SPARK Tetris implementation with the C API of the Arduboy, we use the standard language interfacing method of SPARK/Ada:

procedure Arduboy_Set_Screen_Pixel (X : Integer; Y : Integer);
pragma Import (C, Arduboy_Set_Screen_Pixel, "set_screen_pixel");

A procedure Arduboy_Set_Screen_Pixel is declared in Ada but not implemented. The pragma Import tells the compiler that this procedure is implemented in C with the name “set_screen_pixel”.

SPARK-to-C will translate calls to the procedure “Arduboy_Set_Screen_Pixel” to calls to the C function “set_screen_pixel”. We use the same technique for all the subprograms that are required for the game (button_right_pressed, clear_screen, game_over, etc.).

The program entry point is in the Arduino sketch file SPARK_Tetris_Arduboy.ino (link). In this file, we define and export the C functions (set_screen_pixel() for instance) and call the SPARK/Ada code with _ada_main_tetris().

It’s that simple :)

If you have an Arduboy, you can try this demo by first following the quick start guide, downloading the project from GitHub, loading the Arduino sketch SPARK_Tetris_Arduboy/SPARK_Tetris_Arduboy.ino, and then clicking the upload button.

]]>
Research Corner - Auto-active Verification in SPARK http://blog.adacore.com/research-corner-auto-active-verification-in-spark Thu, 09 Mar 2017 05:00:00 +0000 Claire Dross http://blog.adacore.com/research-corner-auto-active-verification-in-spark

GNATprove performs auto-active verification, that is, verification is done automatically, but usually requires annotations by the user to succeed. In SPARK, annotations are most often given in the form of contracts (pre and postconditions). But some language features, in particular ghost code, allow proof guidance to be much more involved. As an example of how far we can go to guide the proof, see the paper we are presenting at NASA Formal Methods symposium 2017. It describes how an imperative red black tree implementation in SPARK was verified using intensive auto-active verification. The code presented in this paper is available as a distributed example in the SPARK github repository and is described in the SPARK user guide (see 'red_black_trees').

]]>
Rod Chapman on Software Security http://blog.adacore.com/rod-chapman-on-software-security Tue, 07 Mar 2017 05:00:00 +0000 Yannick Moy http://blog.adacore.com/rod-chapman-on-software-security

Rod Chapman gave an impactful presentation at Bristech conference last year. His subject: programming Satan's computer! His way of pointing out how difficult it is to produce secure software. Of course, it would not be Rod Chapman if he did not have also a few hints at how they have done it at Altran UK over the years. And SPARK is central to this solution, although it does not get mentioned explicitly in the talk! (although Rod lifts the cover in answering a question at the end)

Really worth seeing, Rod is a captivating speaker.

]]>
AdaCore attends FOSDEM http://blog.adacore.com/adacore-attends-fosdem Wed, 22 Feb 2017 05:00:00 +0000 AdaCore Admin http://blog.adacore.com/adacore-attends-fosdem

Earlier this month AdaCore attended FOSDEM in Brussels, an event focused on the use of free and open source software. Two members of our technical team were there to give talks: "Prove with SPARK: No Math, Just Code" from Yannick Moy and "64 bit Bare Metal Programming on RPI-3" from Tristan Gingold.

Yannick Moy presented how to prove key properties of Tetris in SPARK and run it on ARM Cortex M. The presentation focused on the accessibility of the proof technology to software engineers, who simply have to code the specification, which does not require a specific math background. Click here to watch Yannick's presentation in full.

Tristan Gingold presented the Raspberry PI 3 board, how to write and build a first example and a demo of a more advanced multi-core application. With almost no tutorials on the internet, he addressed the main new feature of the popular RPI-3 board: 4x 64 bit cores. Click here to watch Tristan's presentation in full. 

For more information and updates on future events that AdaCore will be attending, please visit our website or follow our Twitter @AdaCoreCompany.

]]>
Getting started with the Ada Drivers Library device drivers http://blog.adacore.com/getting-started-with-the-ada-drivers-library-device-drivers Tue, 14 Feb 2017 14:00:00 +0000 Pat Rogers http://blog.adacore.com/getting-started-with-the-ada-drivers-library-device-drivers

The Ada Drivers Library (ADL) is a collection of Ada device drivers and examples for ARM-based embedded targets. The library is maintained by AdaCore, with development originally (and predominantly) by AdaCore personnel but also by the Ada community at large.  It is available on GitHub and is licensed for both proprietary and non-proprietary use.

The ADL includes high-level examples in a directory at the top of the library hierarchy. These examples employ a number of independent components such as cameras, displays, and touch screens, as well as middleware services and other device-independent interfaces. The stand-alone components are independent of any given target platform and appear in numerous products. (A previous blog entry examined one such component, the Bosch BLO055 inertial measurement unit (IMU)). Other examples show how to create high-level abstractions from low-level devices. For instance, one shows how to create abstract data types representing serial ports.

In this entry we want to highlight another extremely useful resource: demonstrations for the low-level device drivers. Most of these drivers are for devices located within the MCU package itself, such as GPIO, UART/USART, DMA, ADC and DAC, and timers. Other demonstrations are for some of the stand-alone components that are included in the supported target boards, for example gyroscopes and accelerometers. Still other demonstrations are for vendor-defined hardware such as a random number generator.

These demonstrations show a specific utilization of a device, or in some cases, a combination of devices. As such they do not have the same purpose as the high-level examples. They may just display values on an LCD screen or blink LEDs. Their purpose is to provide working examples that can be used as starting points when incorporating devices into client applications. As working driver API references they are invaluable.

Approach

Typically there are multiple, independent demonstration projects for each device driver because each is intended to show a specific utilization. For example, there are five distinct demonstrations for the analog-to-digital conversion (ADC) driver. One shows how to set up the driver to use polling to get the converted value. Another shows how to configure the driver to use interrupts instead of polling. Yet another shows using a timer to trigger the conversions, and another builds on that to show the use of DMA to get the converted values to the user. In each case we simply display the resulting values on an LCD screen rather than using them in some larger application-oriented sense.

Some drivers, the I2C and SPI communication drivers specifically,  do not have dedicated demonstrations of their own. They are used to implement drivers for devices that use those protocols, i.e., the drivers for the stand-alone components. The Bosch BLO055 IMU mentioned earlier is an example.

Some of the demonstrations illustrate vendor-specific capabilities beyond typical functionality. The STM32 timers, for example, have direct support for quadrature motor encoders. This support provides CPU-free detection of motor rotation to a resolution of a fraction of a degree. Once the timer is configured for this purpose the application merely samples a register to get the encoder count. The timer will even provide the rotation direction. See the encoder demonstration if interested.

Implementation

All of the drivers and demonstration programs are written in Ada 2012. They use preconditions and postconditions, especially when the driver is complicated. The preconditions capture API usage requirements that are otherwise expressed only within the documentation, and sometimes not expressed at all. Similarly, postconditions help clients understand the effects of calls to the API routines, effects that are, again, only expressed in the documentation. Some of the devices are highly sophisticated -- a nice way of saying blindingly complicated -- and their documentation is complicated too. Preconditions and postconditions provide an ideal means of capturing information from the documentation, along with overall driver usage experience. The postconditions also help with the driver implementation itself, acting as unit tests to ensure implementer understanding. Other Ada 2012 features are also used, e.g., conditional and quantified expressions.

The STM32.Timers package uses preconditions and postconditions extensively because the STM timers are "highly sophisticated." STM provides several kinds of timer with significantly different capabilities. Some are defined as "basic," some "advanced," and others are "general purpose." The only way to know which is which is by the timer naming scheme ("TIM" followed by a number) and the documentation.  Hence TIM1 and TIM8 are advanced timers, whereas TIM6 and TIM7 are basic timers.  TIM2 through TIM5 are general purpose timers but not the same as TIM9 through TIM14, which are also general purpose. We use preconditions and postconditions to help keep it all straight. For example, here is the declaration of the routine for enabling an interrupt on a given timer. There are several timer interrupts possible, represented by the enumeration type Timer_Interrupt. The issue is that basic timers can only have one of the possible interrupts specified, and only advanced timers can have two of those possible. The preconditions express those restrictions to clients.

procedure Enable_Interrupt
   (This   : in out Timer;
    Source : Timer_Interrupt)
with  
   Pre =>
      (if Basic_Timer (This) then Source = Timer_Update_Interrupt) and
      (if Source in Timer_COM_Interrupt | Timer_Break_Interrupt then Advanced_Timer (This)),
   Post => Interrupt_Enabled (This, Source);

The preconditions reference Boolean functions Basic_Timer and Advanced_Timer in order to distinguish among the categories of timers. They simply compare the timer specified to a list of timers in those categories. 

The postcondition tells us that the interrupt will be enabled after the call returns. That is useful for the user but also for the implementer because it serves as an actual check that the implementation does what is expected. When working with hardware, though, we have to keep in mind that the hardware may clear the tested condition before the postcondition code is called. For example, a routine may set a bit in a register in order to make the attached device do something, but the device may clear the bit as part of its response. That would likely happen before the postcondition code could check that the bit was set. When looking throughout the drivers code you may notice some "obvious" postconditions are not specified. That may be the cause.

The drivers use compiler-dependent facilities only when essential. In particular, they use an AdaCore-defined aspect specifying that access to a given memory-mapped register is atomic even when only one part of it is read or updated. This access reflects the hardware requirements and simplifies the driver implementation code considerably.

Organization

The device driver demonstrations are vendor-specific because the corresponding devices exist either within the vendor-defined MCU packages or outside the MCU on the vendors' target boards. The first vendor supported by the library was STMicroelectroncs (STM), although other vendors are beginning to be represented too. As a result, the device driver demonstrations are currently for MCU products and boards from STM and are, therefore, located in a library subdirectory specific to STM. Look for them in the /Ada_Drivers_Library/ARM/STM32/driver_demos/ subdirectory of your local copy from GitHub. There you will see some drivers immediately. These are for drivers that are shared across an entire MCU family. Others are located in further subdirectories containing either a unique device's driver, or devices that do exist across multiple MCUs but nonetheless differ in some significant way.

Let's look at one of the demonstration projects, the "demo_LIS3DSH_tilt" project, so that we can highlight the more important parts.  This program demonstrates basic use of the LIS3DSH accelerometer chip. The four LEDs surrounding the accelerometer will come on and off as the board is moved, reflecting the directions of the accelerations measured by the device.

The first thing to notice is the "readme.md" file. As you might guess, this file explains what the project demonstrates and, if necessary, how to set it up. In this particular case the text also mentions the board that is intended for execution, albeit implicitly, because the text mentions the four LEDs and an accelerometer that are specific to one of the STM32 Discovery boards. In other words, the demo is intended for a specific target board. At the time of this writing, all the STM demonstration projects run on either the STM32F4 or the STM32F429I Discovery boards from STMicroelectronics. They are very inexpensive, amazingly powerful boards. Some demonstrations will run on either one because they do not use board-specific resources.

But even if a demonstration does not require a specific target board, it still matters which board you use because the demo's project file (the "gpr file") specifies the target. If you use a different target the executable will download but may not run correctly, perhaps not at all.

The executable may not run because the specified target's runtime library is used to build the binary executable. These libraries have configurations that reflect the differences in the target board, especially memory and clock rates, so using the runtime that matches the board is critical. This is the first thing to check when the board you are using simply won't run the demonstration at all.

The demonstration project file specifies the target by naming another project in a with-clause. This other project represents a specific target board. Here is the elided content of this demonstration's project file. Note the second with-clause that specifies a gpr file for the STM32F407 Discovery board. That is one of the two lines to change if you want to use the F429I Discovery instead.

with "../../../../boards/common_config.gpr";
with "../../../../boards/stm32f407_discovery.gpr";

project Demo_LIS3DSH_Tilt extends "../../../../examples/common/common.gpr" is

   ...
   for Runtime ("Ada") use STM32F407_Discovery'Runtime("Ada");
   ...

end Demo_LIS3DSH_Tilt;

The other line to change in the project file is the one specifying the "Runtime" attribute. Note how the the value of the attribute is specified in terms of another project's Runtime attribute. That other project is the one named in the second with-clause, so when we change the with-clause we must change the name of the referenced project too.

That's really all you need to change in the gpr file. GPS and the builder will handle everything else automatically.

There is, however, another effect of the with-clause naming a specific target. The demonstration programs must refer to the target MCU in order to use the devices in the MCU package. They may also need to refer to devices on the target board. Different MCU packages have differing numbers of devices (eg, USARTs) in the package. Similarly, different boards have different external components (accelerometers versus gyroscopes, for example). We don't want to limit the code in the ADL to work with specific boards, but that would be the case if the code referenced the targets by name, via packages representing the specific MCUs and boards. Therefore, the ADL defines two packages that represent the MCU and the board indirectly. These are the STM32.Device and STM32.Board packages, respectively. The indirection is then resolved by the gpr file named in the with-clause. In this demonstration the clause names the STM32F407_Discovery project so that is the target board represented by the STM32.Board package. That board uses an STM32F407VG MCU so that is the MCU represented by the STM32.Device package. Each package contains declarations for objects and constants representing the specific devices on that specific MCU and target board.

You'll also see a file named ".gdbinit" at the same level as the readme.md and gpr files. This is a local gdb script that automatically resets the board when debugging. It is convenient but not essential.

At that same level you'll also see a "gnat.adc" file containing configuration pragmas. These files contain a single pragma that ensures all interrupt handlers are elaborated before any interrupts can trigger them, among other things. It is not essential for the correct function of these demonstrations but is a good idea in general.

Other than those files you'll see subdirectories for the source files and compiler's products (the object and ALI files, and the executable file).

And that's it. Invoke GPS on the project file and everything will be handled by the IDE.

Application Use

We mentioned that you must change the gpr file if you want to use a different target board. That assumes you are running the demonstration programs themselves. There is no requirement that you do so. You could certainly take the bulk of the code and use it on some other target that has the same MCU family inside. That's the whole point of the demonstrations: showing how to use the device drivers! The Certyflie project, also on the AdaCore GitHub, is just such a project. It uses these device drivers so it uses an STM32.Device package for the on-board STM32F405 MCU, but the target board is a quad-copter instead of one of the Discovery kits.

Concluding Remarks

Finally, it must be said that not all available devices have drivers in the ADL, although the most important do. More drivers and demonstrations are needed. For example, the hash processor and the cryptographic processor on the STM Cortex-M4 MCUs do not yet have drivers. Other important drivers are missing as well. CAN and Ethernet support is either minimal or lacking entirely. And that's not even mentioning the other vendors possible. We need the active participation of the Ada community and hope you will join us!

]]>
Proving Tetris With SPARK in 15 Minutes http://blog.adacore.com/proving-tetris-with-spark-in-15-minutes Fri, 10 Feb 2017 05:00:00 +0000 Yannick Moy http://blog.adacore.com/proving-tetris-with-spark-in-15-minutes

Here is the page of the talk with the slides and recording.

]]>
Going After the Low Hanging Bug http://blog.adacore.com/going-after-the-low-hanging-bug Mon, 30 Jan 2017 14:00:00 +0000 Raphaël Amiard http://blog.adacore.com/going-after-the-low-hanging-bug

At AdaCore, we've been developing deep static analysis tools (CodePeer and SPARK) since 2008. And if you factor in the fact that some developers of these tools had been developing these tools (or others) for the past decades, it's fair to say that we've got a deep expertise in deep static analysis tools.

At the same time, many Web companies have adopted light-weight static analysis integrated in their agile code-review-commit cycle. Some of these tools are deployed for checking all commits in the huge codebases of Google (Tricorder) or Facebook (Infer). Others are commercial tools implementing hundreds of small checkers (SonarLintPVS-Studio, Flawfinder, PC-lint, CppCheck). The GNAT compiler implements some of these checkers through its warnings, our coding standard checker GNATcheck implements others, but we are also missing in our technology some useful checkers that rely on intraprocedural or interprocedural control and data flow analysis typically out of reach of a compiler or a coding standard checker.

Some of these checkers are in fact implemented with much greater precision in our deep static analysis tools CodePeer and SPARK, but running these tools requires developing a degree of expertise in the underlying technology to be used effectively, and their use can be costly in terms of resources (machines, people). Hence, these tools are typically used for high assurance software, where the additional confidence provided by deep static analysis outweights the costs. In addition, our tools target mostly absence of run-time errors and a few logical errors like unused variables or statements and a few suspicious constructs. Thus they don't cover the full spectrum of checkers implemented in light-weight static analyzers.

Luckily, the recent Libadalang technology, developed at AdaCore, provides an ideal basis on which to develop such light-weight static analysis, as it can parse and analyze thousands of lines of code in seconds. As an experiment, we implemented two simple checkers using the Python binding of Libadalang, and we found a dozen bugs in the codebases of the tools we develop at AdaCore (including the compiler and static analyzers). That's what we describe in the following.


Checker 1: When Computing is a Waste of Cycles

The first checker detects arguments of some arithmetic and comparison operators which are syntactically identical, in cases where this could be expressed with a constant instead (like "X - X" or "X <= X"):

import libadalang as lal

def same_tokens(left, right):
    return len(left) == len(right) and all(
        le.kind == ri.kind and le.text == ri.text
        for le, ri in zip(left, right)
    )

def has_same_operands(binop):
    return same_tokens(list(binop.f_left.tokens), list(binop.f_right.tokens))

def interesting_oper(op):
    return not isinstance(op, (lal.OpMult, lal.OpPlus, lal.OpDoubleDot,
                               lal.OpPow, lal.OpConcat))

c = lal.AnalysisContext('utf-8')
unit = c.get_from_file(source_file)
for b in unit.root.findall(lal.BinOp):
    if interesting_oper(b.f_op) and has_same_operands(b):
        print 'Same operands for {} in {}'.format(b, source_file)

(Full source available at https://github.com/AdaCore/lib...)

Despite all the extensive testing that is done on our products, this simple 20-lines checker found 1 bug in the GNAT compiler, 3 bugs in CodePeer static analyzer and 1 bug in the GPS IDE! We show them below so that you can convince yourself that they are true bugs, really worth finding.

The bug in GNAT is on the following code in sem_prag.adb:

            --  Attribute 'Result matches attribute 'Result

            elsif Is_Attribute_Result (Dep_Item)
              and then Is_Attribute_Result (Dep_Item)
            then
               Matched := True;

One or the references to Dep_Item should really be Ref_Item. Here is the correct version:

            --  Attribute 'Result matches attribute 'Result

            elsif Is_Attribute_Result (Dep_Item)
              and then Is_Attribute_Result (Ref_Item)
            then
               Matched := True;

Similarly, one of the three bugs in CodePeer can be found in be-ssa-value_numbering-stacks.adb:

            --  Recurse on array base and sliding amounts;
            if VN_Kind (Addr_With_Index) = Sliding_Address_VN
              and then Num_Sliding_Amounts (Addr_With_Index) =
                         Num_Sliding_Amounts (Addr_With_Index)
            then

where one of the references to Addr_With_Index above should really be to Addr_With_Others_VN, and the other two are in be-value_numbers.adb:

         return VN_Global_Obj_Id (VN2).Obj_Id_Number =
           VN_Global_Obj_Id (VN2).Obj_Id_Number
           and then VN_Global_Obj_Id (VN2).Enclosing_Module =
           VN_Global_Obj_Id (VN2).Enclosing_Module;

where two of the four references to VN2 should really be to VN1.

The bug in GPS is in language-tree-database.adb:

               if Get_Construct (Old_Obj).Attributes /=
                 Get_Construct (New_Obj).Attributes
                 or else Get_Construct (Old_Obj).Is_Declaration /=
                 Get_Construct (New_Obj).Is_Declaration
                 or else Get_Construct (Old_Obj).Visibility /=
                 Get_Construct (Old_Obj).Visibility

The last reference to Old_Obj should really be New_Obj.


Checker 2: When Testing Gives No Information

The second checker detects syntactically identical expressions which are chained together in a chain of logical operators, so that one of the two identical tests is useless (as in "A or B or A"):

import libadalang as lal

def list_operands(binop):
    def list_sub_operands(expr, op):
        if isinstance(expr, lal.BinOp) and type(expr.f_op) is type(op):
            return (list_sub_operands(expr.f_left, op)
                    + list_sub_operands(expr.f_right, op))
        else:
            return [expr]

    op = binop.f_op
    return (list_sub_operands(binop.f_left, op)
            + list_sub_operands(binop.f_right, op))

def is_bool_literal(expr):
    return (isinstance(expr, lal.Identifier)
            and expr.text.lower() in ['true', 'false'])

def has_same_operands(expr):
    ops = set()
    for op in list_operands(expr):
        tokens = tuple((t.kind, t.text) for t in op.tokens)
        if tokens in ops:
            return op
        ops.add(tokens)

def same_as_parent(binop):
    par = binop.parent
    return (isinstance(binop, lal.BinOp)
            and isinstance(par, lal.BinOp)
            and type(binop.f_op) is type(par.f_op))

def interesting_oper(op):
    return isinstance(op, (lal.OpAnd, lal.OpOr, lal.OpAndThen, lal.OpOrElse,
                           lal.OpXor))

c = lal.AnalysisContext('utf-8')
unit = c.get_from_file(source_file)
for b in unit.root.findall(lambda e: isinstance(e, lal.BinOp)):
    if interesting_oper(b.f_op) and not same_as_parent(b):
        oper = has_same_operands(b)
        if oper:
            print 'Same operand {} for {} in {}'.format(oper, b, source_file)

(Full source available at https://github.com/AdaCore/lib...)

Again, this simple 40-lines checker found 4 code quality issues in the GNAT compiler, 2 bugs in CodePeer static analyzer, 1 bug and 1 code quality issue in GPS IDE and 1 bug in QGen code generator. Ouch!

The four code quality issues in GNAT are simply duplicated checks that are not useful. For example in par-endh.adb:

         --  Cases of normal tokens following an END

          (Token = Tok_Case   or else
           Token = Tok_For    or else
           Token = Tok_If     or else
           Token = Tok_Loop   or else
           Token = Tok_Record or else
           Token = Tok_Select or else

         --  Cases of bogus keywords ending loops

           Token = Tok_For    or else
           Token = Tok_While  or else

The test "Token = Tok_For" is present twice. Probably better for maintenance to have it once only. The three other issues are similar.

The two bugs in CodePeer are in utils-arithmetic-set_arithmetic.adb:

      Result  : constant Boolean
        := (not Is_Singleton_Set (Set1)) and then (not Is_Singleton_Set (Set1))
        and then (Num_Range_Pairs (Set1, Set2) > Range_Pair_Limit);

The second occurrence of Set1 should really be Set2.

The code quality issue in GPS is in mi-parser.adb:

           Token = "traceframe-changed" or else
           Token = "traceframe-changed" or else

The last line is useless. The bug in GPS is in vcs.adb:

      return (S1.Label = null and then S2.Label = null
              and then S2.Icon_Name = null and then S2.Icon_Name = null)

The first reference to S2 on the last line should really be S1. Note that this issue had already been detected by CodePeer, which is run as part of GPS quality assurance, and it had been fixed on the trunk by one of the GPS developers. Interestingly here, two tools using either a syntactic heuristic or a deep semantic interpretation (allowing CodePeer to detect that "S2.Icon_Name = null" is always true when reaching the last subexpression) reach the same conclusion on that code.

Finally, the bug in QGen is in himoco-change_buffers.ads:

   procedure Apply_Update (Self : in out Change_Buffer)
   with Post =>
   --  @req TR-CHB-Apply_Update
   --  All signals, blocks and variables to move shall
   --  be moved into their new container. All signals to merge shall be
   --  merged.
     (
        (for all Elem of Self.Signals_To_Move =>
             (Elem.S.Container.Equals (Elem.Move_Into.all)))
      and then
        (for all Elem of Self.Blocks_To_Move =>
             (if Elem.B.Container.Is_Not_Null then
                  Elem.B.Container.Equals (Elem.Move_Into.all)))
      and then
        (for all Elem of Self.Signals_To_Move =>
             (Elem.S.Container.Equals (Elem.Move_Into.all)))
      and then

The first and third conjuncts in the postcondition are the same. After checking with QGen developers, the final check here was actually meant for Self.Variables_To_Move instead of Self.Signals_To_Move. So we detected here a bug in the specification (expressed as a contract), using a simple syntactic checker!


Setup Recipe

So you actually want to try the above scripts on your own codebase? This is possible right now with your latest GNAT Pro release or the latest GPL release for community & academic users! Just follow the instructions we described in the Libadalang repository, you will then be able to run the scripts inside your favorite Python2 interpreter.


Conclusion

Overall, this little experiment was eye opening, in particular for us who develop these tools, as we did not expect such gross mistakes to have gone through our rigorous reviews and testing. We will continue investigating what benefits light-weight static analysis might provide, and if this investigation is successful, we will certainly include this capability in our tools. Stay tuned!

[cover image by Max Pixel, Creative Commons Zero - CC0]

]]>
Hash it and Cache it http://blog.adacore.com/hash-it-and-cache-it Tue, 24 Jan 2017 05:00:00 +0000 Johannes Kanig http://blog.adacore.com/hash-it-and-cache-it

GNATprove already uses quite a few techniques to avoid doing the same work over and over again. This older blog post explains the main mechanisms that are in play. In this blog post, we describe yet another mechanism, that can decrease the work done by gnatprove considerably. In this other post we have described how we changed the architecture of the SPARK tools to achieve more parallelism.

When GNATprove analyses your SPARK program, it internally creates an intermediate file in the Why3 format, and then spawns a program called "gnatwhy3" to process this file. It is usually in the gnatwhy3 process that GNATprove spends most of its analysis time. Couldn't we save a lot of this time if we "cache" calls to gnatwhy3? That is, we could detect if the intermediate Why3 file is the same as the last time, and no other relevant conditions have changed, and simply take the analysis results of last time.

This is exactly what this new SPARK feature does. It computes a hash of the intermediate file and all relevant parameters, and uses a memcached server to check if a previous run of gnatwhy3 exists with this hash. If yes, the result of the last analysis is retrieved from the server. If not, the gnatwhy3 tool is run as usual.

The nice property of this new feature is that the memcached server can be shared across members of a team. All you need is to set up a machine on your network which has such a server running. Memcached is free, open source and very easy to set up. Then you can specify this server to your invocation of GNATprove if you add ``--memcached-server=hostname:portnumber``  to the commmand line. If your team uses this option, and your colleague developer has just run the SPARK tools on a checkout of your project, you should see speedups in your own usage of the tools (and vice-versa).

]]>
Introducing Libadalang http://blog.adacore.com/introducing-libadalang Mon, 23 Jan 2017 14:00:00 +0000 Raphaël Amiard http://blog.adacore.com/introducing-libadalang

Show me your abstract syntax tree

AdaCore is working on a host of tools that works on Ada code. The compiler, GNAT, is the most famous and prominent one, but it is far from being the only one.

Over the years we have done tools with a variety of requirements regarding processing Ada code, that we can put on a spectrum:

  • Some tools, like a compiler, or some static analyzers, will need to ensure that they are working on correct Ada code, both from a syntactic and semantic point of view.

  • Some tools, like a source code pretty-printer, can relax some of those constraints. While the semantic correctness of the code can be used by the pretty printer, it is not necessary per-se. We can even imagine a pretty-printer working on a syntactically incorrect source, if it ensures not to change its intended meaning.

  • Some other tools, like an IDE, will need to work on code that is incorrect, and even evolving dynamically.

At AdaCore, we already have several interleaved tools to process Ada code: The GNAT compiler, the ASIS library, GNAT2XML, the GPS IDE. A realization of the past years, however, has been that we were lacking a unified solution to process code at the end of the previously described spectrum: Potentially evolving, potentially incorrect Ada code.

Libadalang

Libadalang is meant to fill that gap, providing an easy to use library to syntactically and semantically analyze Ada code. The end-goal is both to use it internally, in our tools and IDEs, to provide the Ada-aware engine, and to propose it to customers and Ada users, so that they can create their own custom Ada aware tools.

Unlike the tools that we currently propose to implement your own tools, Libadalang will provide different levels, allowing a user to work on a purely syntactic level if needed, or access more semantic information.

We will also provide interfaces to multiple languages:

  • You will be able to use Libadalang from Ada of course.

  • But you will also be able to use it from Python, for users who want to prototype easily, or do one off analyses/statistics about their code.

We will in a following series of blog posts showcase how to use Libadalang to solve concrete problems, so that interested people can get a feel of how to use Libadalang.

Stay tuned

Libadalang is not ready for public consumption yet, but you can see the progress being made on GitHub: https://github.com/AdaCore/lib...

Stay tuned!

]]>
New Year's Resolution for 2017: Use SPARK, Say Goodbye to Bugs http://blog.adacore.com/new-years-resolution-for-2017-no-bugs-with-spark Wed, 04 Jan 2017 13:14:00 +0000 Yannick Moy http://blog.adacore.com/new-years-resolution-for-2017-no-bugs-with-spark

NIST has recently published a report called "Dramatically Reducing Software Vulnerabilities" in which they single out five approaches which have the potential for creating software with 100 times fewer vulnerabilities than we do today. One of these approaches is formal methods. In the introduction of the document, the authors explain that they selected the five approaches that meet the following three criteria:

  • Dramatic impact,
  • 3 to 7-year time frame and
  • Technical activities.

The dramatic impact criteria is where they aim at "reducing vulnerabilities by two orders of magnitude". The 3 to 7-year time frame was meant to select "existing techniques that have not reached their full potential for impact". The technical criteria narrowed the selection to the technical area.

Among formal methods, the report highlights strong suits of SPARK, such as "Sound Static Program Analysis" (the raison d'être of SPARK), "Assertions, Pre- and Postconditions, Invariants, Aspects and Contracts" (all of which are available in SPARK), and "Correct-by-Construction". The report also cites SPARK projects Tokeneer and iFACTS as example of mature uses of formal methods.

Another of the five approaches selected by NIST to dramatically reduce software vulnerabilities is what they call "Additive Software Analysis Techniques", where results of analysis techniques are combined. This has been on our radar since 2010 when we first planned an integration between our static analysis tool CodePeer and our formal verification toolset SPARK. We have finally achieved a first step in the integration of the two tools in SPARK 17, by using CodePeer as a first level of proof tool inside SPARK Pro

Paul Black who lead the work on this report was interviewed a few months ago, and he talks specifically about formal methods at 7:30 in the podcast. His host Kevin Greene from US Homeland Security mentions that "There has been a lot of talk especially in the federal community about formal methods." To which Paul Black answers later that "We do have to get a little more serious about formal methods."

NIST is not the only ones to support the use of SPARK. Editor Bill Wong from Electronic Design has included SPARK in his "2016 Gifts for the Techie", saying:

It is always nice to give something that is good for you, so here is my suggestion (and it’s a cheap one): Learn SPARK. Yes, I mean that Ada programming language subset.

For those who'd like to follow NIST or Bill Wong's advice, here is where you should start:

]]>
SPARK and CodePeer, a Good Match! http://blog.adacore.com/spark-and-codepeer-a-good-match Sat, 31 Dec 2016 05:00:00 +0000 Johannes Kanig http://blog.adacore.com/spark-and-codepeer-a-good-match

As a reader of this blog, you probably know that SPARK, to prove the absence of runtime errors in SPARK programs, translates the program into logical formulas that are then passed to powerful tools called SMT-solvers, that can prove the validity of the logical formulas. Simply put, for every check in your SPARK program, the tool generates a logical formula such that, if the SMT solver manages to prove it, the runtime check cannot happen.

The CodePeer tool for the static analysis of Ada programs functions in a vastly different way: It tracks the ranges of variables and expressions in your program.  When it encounters a runtime check (which often can also be expressed as a range), it will verify if the range of the variable and the range of the check allow for a potential runtime failure or not. 

Though the applied techniques and use cases of CodePeer and SPARK are very different, ultimately, both tools will analyse the user code and try, for each potential runtime check, to prove that it cannot happen. If they fail to achieve this proof, a message will be issued to the user.  We always felt that there was some kind of synergy possible between CodePeer and SPARK, but we never quite saw how they would fit together, given their very different use cases and underlying techniques. 

However, with the SPARK 17.1 release, we have finally brought the two technologies together.  In hindsight, the idea is very simple, but isn't it always like that?  We simply run the CodePeer tool as part of the SPARK analysis. Every check that has already identified as "cannot fail" by the CodePeer tool is not translated to logical formulas, and directly reported as proved.  From the user point of view, CodePeer becomes simply another proof tool. 

In our tests, we have found CodePeer to be very effective. It quickly discharges all the simple runtime checks that can be proved simply by looking at the ranges of the involved variables and expressions.  It is particularly efficient when applied to runtime checks involving floating point variables

Starting from SPARK Pro version 17.1 targeted for release in February, the CodePeer engine will be part of the SPARK Pro package.  It can be enabled on the command line with the switch "--codepeer=on", or by selecting the corresponding checkbox in the GPS and GNATBench plug-ins.

Note that CodePeer and SPARK are still separate products, because, as mentioned above, they cater to different use cases.

For more information, see the SPARK User's Guide.

]]>
SPARK Cheat Sheets (en & jp) http://blog.adacore.com/spark-cheat-sheets-en-jp Fri, 16 Dec 2016 05:00:00 +0000 Yannick Moy http://blog.adacore.com/spark-cheat-sheets-en-jp

Here is the SPARK cheat sheet usually distributed in trainings, both in English and in Japanese.

Attachments

]]>
Make with Ada: DIY instant camera http://blog.adacore.com/make-with-ada-diy-instant-camera Mon, 12 Dec 2016 09:00:00 +0000 Fabien Chouteau http://blog.adacore.com/make-with-ada-diy-instant-camera

There are moments in life where you find yourself with an AdaFruit thermal printer in one hand, and an OpenMV camera in the other. You bought the former years ago, knowing that you would do something cool with it, and you are playing with the latter in the context of a Hackaday Prize project. When that moment comes — and you know it will come — it’s time to make a DIY instant camera. For me it was at the end of a warm Parisian summer day. The idea kept me awake until 5am, putting the pieces together in my head, designing an enclosure that would look like a camera. Here’s the result:


The Hardware

On the hardware side, there’s nothing too fancy. I use a 2 cell LiPo battery from my drone. It powers the thermal printer and a 5V regulator. The regulator powers the OpenMV module and the LCD screen. There’s a push button for the trigger and a slide switch for the mode selection, both are directly plugged in the OpenMV IOs. The thermal printer is connected via UART, while the LCD screen uses SPI. Simple.

The Software

For this project I added support for the OpenMV in the Ada_Drivers_Library. It was the opportunity to do the digital camera interface (DCMI) driver as well as two Omnivision camera sensors, ST7735 LCD driver and the thermal printer.

The thermal printer is only capable of printing black or white pixel bitmap (not even gray scale), this is not great for a picture. Fortunately, the printing head has 3 times more pixels than the height of a QQVGA image, which is the format I get from the OpenMV camera. If I also multiply the width by 3, for each RGB565 pixel from the camera I can have 9 black or white pixels on the paper (from 160x120 to 480x360). This means I can use a dithering algorithm (a very naive one) to produce a better quality image. A pixel of grayscale value X from 0 to 255 will be transformed in a 3x3 black and white matrix, like this:

This is a quick and dirty dithering, one could greatly improve image quality by using the Floyd–Steinberg algorithm or similar (it would require more processing and more memory).

As always, the code is on GitHub.

Have fun!

]]>
Building High-Assurance Software without Breaking the Bank http://blog.adacore.com/formal-methods-webinar-building-high-assurance-software-without-breaking-the-bank Tue, 06 Dec 2016 05:00:00 +0000 AdaCore Admin http://blog.adacore.com/formal-methods-webinar-building-high-assurance-software-without-breaking-the-bank

AdaCore will be hosting a joint webcast next Monday 12th December 2pm ET/11am PT with SPARK experts Yannick Moy and Rod Chapman. Together, they will present the current status of the SPARK solution and explain how it can be successfully adopted in your current software development processes.

Attendees will learn:

  • How to benefit from formal program verification
  • Lessons learned from SPARK projects
  • How to integrate SPARK into existing projects
  • Where to learn about SPARK
  • Why "too hard, too costly, too risky" is a myth

The late computer scientist Edsger Dijkstra once famously said "Program testing can be used to show the presence of bugs, but never to show their absence." This intrinsic drawback has become more acute in recent years, with the need to make software "bullet proof" against increasingly complex requirements and pervasive security attacks. Testing can only go so far. Fortunately, formal program verification offers a practical complement to testing, as it addresses security concerns while keeping the cost of testing at an acceptable level.

Formal verification has a proven track record in industries where reliability is paramount, and among the available technologies, the SPARK toolset features prominently. It has been used successfully for developing high confidence software in industries including Aerospace, Defense, and Air Traffic Management. SPARK tools can address specific requirements for robustness (absence of run-time errors) and functional correctness (contract-based verification) that are relevant in critical systems, including those that are subject to certification requirements.

You can join us for this webinar by registering here.


]]>
Make With Ada Winners Announced! http://blog.adacore.com/make-with-ada-winners-announced Tue, 29 Nov 2016 05:00:00 +0000 AdaCore Admin http://blog.adacore.com/make-with-ada-winners-announced

Judging for the first annual Make with Ada competition has come to an end and we can now reveal the results.

Huge congratulations to Stephane Carrez who came in 1st place this year, gaining a prize of €5000, with his EtherScope project!

The EtherScope monitoring tool analyses Ethernet traffic and can read network packets (TCP, UDP, IGMP, etc.), performs real-time analysis, and displays the results on a 480x272 touch panel. It runs on a STM32F746 micro-controller from STMicroelectronics and its interface can filter results at different levels and report various types of information. 

All sources for the application are available on Github here


In 2nd place, winning €2000, is German Rivera with his Autonomous Car Framework! The framework was developed in Ada 2012 for developing control software for the NXP cup race car. The system is made up of a ‘toy size’ race car chassis, paired with a FRDM-KL25Z board and a TFC-shield board. The car kit also has two DC motors for the rear wheels, a steering servo for the front wheels and a line scan camera.

In 3rd place, winning €1000, is Shawn Nock with his Bluetooth Beacons project! His “iBeacon” project targeted a Nordic Semiconductor nRF51822 System-on-a-Chip (a Cortex-M0 part with integrated 2.4GHz Radio) with a custom development board.    

The two runners up received a Crazyflie 2.0 nano drone each as special prizes. The Ada Lovelace Special Prize was awarded to Sébastien Bardot for inventiveness with his explorer and mapper robot.

The Robert Dewar Special Prize was awarded to German Rivera for dependability with his IoT Networking Stack for the NXP FRDM-K64F board.

Participants had to design an embedded software project for an ARM Cortex M or R processor, using Ada and/or SPARK as the main programming language and put it into action. 

Well done to all the winners and participants, we received some great projects and got the chance to review some interesting ideas this year.

To keep up to date with future competitions and all things Ada, follow @adaprogrammers on Twitter or visit the official Make with Ada website at www.makewithada.org.

]]>
GNATprove Tips and Tricks: a Lemma for Sorted Arrays http://blog.adacore.com/gnatprove-tips-and-tricks-a-lemma-for-sorted-arrays Mon, 28 Nov 2016 05:00:00 +0000 Sylvain Dailler http://blog.adacore.com/gnatprove-tips-and-tricks-a-lemma-for-sorted-arrays

A few weeks ago, we began, working on a new feature that might help users to prove the correctness of their programs manipulating sorted arrays. We report on the creation of the first lemma of a new unit on arrays in the SPARK lemma library

The objective is to provide users with a way to prove some complex properties on programs (complex for SMT-based automatic provers). The lemmas are written as ghost procedures with no side-effects. They are proved in SPARK and their postcondition can be reused by provers by calling the procedure.

In particular, our first lemma is on the transitivity of the order in arrays. It is used to refine this (simplified) assertion on an array Arr that states that the array is sorted by comparing adjacent elements:

(for all I in Arr'Range =>
  (if I /= Arr'First then
    Arr (I - 1) < Arr (I)))

into this equivalent expression that the array is sorted by comparing any two elements:

(for all I in Arr'Range =>
  (for all J in Arr'Range =>
    (if I < J then Arr (I) < Arr (J))))

This new condition is logically equivalent to the first one, but in practice it can be used by SMT provers when we need to compare two elements of the sorted array that are not adjacent. In some rare cases, provers like CVC4 ans Z3 are able to prove that the two conditions are equivalent, but this requires doing inductive reasoning which is very partially supported in such provers. So, in general, SMT provers are just unable to prove it. This is the reason for us providing this new lemma, that we proved once and for all (using the Coq proof assistant).

How to use it:

First, import the spark_lemmas library into your project file.

with "spark_lemmas";

To use this lemma on an array Arr, you first need to instantiate the package SPARK.Unconstrained_Array_Lemmas with your own array index type (Index_Type), array element type (Element_T), ordering function on Element_T (Less) and create an unconstrained array type as follow:

type A is array (T_Index range <>) of T_Value;

You can now instantiate the generic package:

package Test is new SPARK.Unconstrained_Array_Lemmas
     (Index_Type => Index_Type,
      Element_T  => Element_T,
      A          => A,
      Less       => "<");

To use it on an array Arr, you only need to cast your array type into A and apply the lemma:

Test.Lemma_Transitive_Order (A (Arr));

Safety conditions:

Note that this lemma relies on a meta argument of correctness on the types and function we use to instantiate it. For usability reasons, we used a SPARK_Mode => Off on the library code so that users are not polluted with Coq proved conditions (also they would have needed to install Coq to get a complete proof). We proved this lemma completely in Coq for instances of the types and transitive order. And because our proof does not depend on the size and bounds of the types, which are the only semantic differences visible from the lemma implementation, user instantiated lemmas should also be correct.

So, if you are a user of SPARK and find difficult proofs in your code that would fit in a library like this one, please contact us, we might be able to provide lemmas for your problems on some cases.

]]>
Integrate new tools in GPS (2) http://blog.adacore.com/integrate-new-tools-in-gps-2 Tue, 22 Nov 2016 10:23:00 +0000 Emmanuel Briot http://blog.adacore.com/integrate-new-tools-in-gps-2

Customizing build target switches

In the first post in this series (Integrate new tools in GPS) we saw how to create new build targets in GPS to spawn external tools via a menu or toolbar button, and then display the output of that tool in its own console, as well as show error messages in the Locations view.

The customization left to the user is however limited in this first version. If the build target's Launch Mode was set to Manual With Dialog, a dialog is displayed when the menu or button is selected. This dialog lets users edit the command line just before the target is executed.

The next time that the dialog is open, it will have the modified command line by default.

This is however not very user friendly, since the user has to remember all the valid switches for the tool. For the standard tools, GPS provides nicer dialogs with one widget for each valid switch, so let's do that for our own tool.

Whenever we edit targets interactively in GPS, via the /Build/Settings/Targets menu, GPS saves the modified build target in a local file, which takes precedence over the predefined targets and those defined in plug-ins. This ensure that your modifications will always apply, but has the drawback that any change in the original target is not visible to the user.

So let's remove the local copy that GPS might have made of our build target. The simplest, for now, is to remove the file

  • Windows:  %USER_PROFILE%\.gps\targets.xml
  • Linux and Mac: ~/.gps/targets.xml

Describing the switches

We now have to describe what are the valid switches for our tool.

For a given tool (for instance my_style_checker), we could have multiple build targets. For instance:

  • one that is run manually by the user via a toolbar button
  • and another build target with a slightly different command line that is run every time the user saves a file.

To avoid duplication, GPS introduces the notion of a target model. This is very similar to a build target, but includes information that is shared by all related build targets. We will therefore modify the plugin we wrote in the first part to move some information to the target model, and in addition add a whole section describing the valid switches.

import GPS
GPS.parse_xml("""
    <target-model name="my_style_checker_model">
        <description>Basis for all targets that spawn my_style_checker</description>
        <iconname>gps-custom-build-symbolic</iconname>
        <switches command="my_style_checker" columns="2" lines="1">
           <title column="1" line="1">Styles</title>
           <check label="Extended checks"
                  switch="--extended"
                  tip="Enable more advanced checks"
                  column="1" line="1" />
           <check label="Experimental checks"
                  switch="--experimental"
                  column="1" line="1" />
           <combo switch="--style" separator="=" noswitch="gnat">
              <combo-entry label="GNAT" value="gnat" />
              <combo-entry label="Style 1" value="style1" />
              <combo-entry label="Style 2" value="style2" />
           </combo>

           <title column="2" line="1">Processing</title>
           <spin label="Multiprocessing"
                 switch="-j"
                 min="0"  max="100"  default="1"
                 column="2" line="1" />
        </switches>
    </target-model>

    <target model="my_style_checker_model"
            category="File"
            name="My Style Checker">
        <in-toolbar>TRUE</in-toolbar>
        <in-menu>TRUE</in-menu>
        <launch-mode>MANUALLY_WITH_DIALOG</launch-mode>
        <command-line>
           <arg>my_style_checker</arg>
           <arg>%F</arg>
        </command-line>
        <output-parsers>
           output_chopper
           utf_converter
           location_parser
           console_writer
           end_of_build
        </output-parsers>
    </target>
""")

Let's go over the details.

  • on line 3, we create our target model. To avoid confusion, we give it a name different from that of the build target itself. This name will be visible in the /Build/Settings/Targets dialog, when users interactively create new targets.
  • on line 6, we describe the set of switches that we want to make configurable graphically. The switches will be organized into two columns, each of which contains one group of switches (a "line" in the XML, for historical reasons).
  • on line 7, we add a title at the top of the left column (column "1").
  • on line 8 and 12, we add two check boxes. When they are selected, the corresponding switch is added to the command line. On the screenshot below, you can see that "Extended checks" is enabled, and thus the "--extended" switch has been added. These two switches are  also in column "1".
  • on line 15, we add a combo box, which is a way to let users choose one option among many. When no switch is given on the command line, this is the equivalent of "--switch=gnat". We need to specify that an extra  equal sign needs to be set between the switch and its value.
  • on line 22, we add a spin box, as a way to select a numerical value. No switch on the command line is the equivalent of "-j1".
  • on line 29, we alter the definition of our build target by referencing our newly created target model, instead of the previous "execute", and then remove duplicate information which is already available in the model.

At this point, when the user executes the build target via the toolbar button, we see the following dialog that let's users modify the command line more conveniently.

 The same switches are visible in the /Build/Settings/Targets dialog

Conclusion

In this post, we saw how to further integrate our tool in GPS, by letting users set the command line switches interactively.

We have now reached the limits of what is doable via just build targets. Later posts will extend the plugin with python code, to add support for preferences and custom menus. We will also see how to chain multiple actions via workflows (save all files, compile all code, then run the style checker for instance).

]]>
Automatic Generation of Frame Conditions for Array Components http://blog.adacore.com/automatic-generation-of-frame-conditions-for-array-components Mon, 21 Nov 2016 05:00:00 +0000 Claire Dross http://blog.adacore.com/automatic-generation-of-frame-conditions-for-array-components

One of the most important challenges for SPARK users is to come up with adequate contracts and annotations, allowing GNATprove to verify the expected properties in a modular way. Among the annotations mandated by the SPARK toolset, the hardest to come up with are probably loop invariants.

As explained in a previous post, loop invariants are assertions that are used as cut point by the GNATprove tool to verify loop statements. In general, it is necessary to explicit in a loop invariant every information computed in the loop statement, or only the information computed in the last iteration will be available for proof.

A good loop invariant not only needs to describe how parts of variables modified by the loop evolve during the loop execution, but should also explicitly state the preservation of unmodified parts of modified variables. This last part is often forgotten, as it seems obvious to developers.

previous post explains how GNATprove can automatically infer loop invariants for preservation of unmodified record components, and so, even if the record is itself nested inside a record or an array. Recently, this generation was improved to also support the simplest cases of partial array updates.

For a loop that updates an array variable (or an array component of a composite variable) by assigning to one (or more) of its indexes, GNATprove can infer preservation of unmodified fields in two cases:

  • If an array is assigned at an index which is constant through the loop iterations, all the other components are preserved.
  • If an array is assigned at the loop index, all the following components (or preceding components if the loop is reversed) are preserved.

This can be demonstrated on an example. Let us consider the following loop which swaps lines 5 and 7 of a square matrix of size 10:

type Matrix is array (Positive range <>, Positive range <>) of Natural;
   M : Matrix (1 .. 10, 1 .. 10) := (5      => (others => 1),
                                     7      => (others => 2),
                                     others => (others => 0));
   Tmp : Natural;

   for C in M'Range (2) loop
      pragma Loop_Invariant (for all I in M'First (2) .. C - 1 =>
                               M (5, I) = M'Loop_Entry (7, I)
                             and M (7, I) = M'Loop_Entry (5, I));
      Tmp := M (5, C);
      M (5, C) := M (7, C);
      M (7, C) := Tmp;
   end loop;

   pragma Assert (for all I in 1 .. 10 =>
                    (for all J in 1 .. 10 =>
                       (if I = 5 then M (I, J) = 2
                        elsif I = 7 then M (I, J) = 1
                        else M (I, J) = 0)));

We only describe in the invariant the effect of the computation on the modified part of M, that is, that it has effectively swapped the two lines up to index C. To discharge the following assertion, we also need to know that elements stored at other lines are preserved by the loop. As the first array index in the assignments is constant, GNATprove infers an invariant stating that components M (I, J) are not modified if I is not 5 or 7:

 pragma Loop_Invariant (for all I in M'Range (1) =>
                              (if I /= 5 and I /= 7 then
                                (for all J in M'Range (2) =>
                                   M (I, J) = M'Loop_Entry (I, J))));

But it is not enough to verify our example. We also need to know that, at line 5 and 7, components stored after column C have not been modified yet, or we won't be able to verify the loop invariant itself. Luckily, as the second index of the assignments is the loop index, GNATprove can also infer this invariant, coming up with the additional:

      pragma Loop_Invariant (for all I in M'Range (1) =>
                              (if I = 5 or I = 7 then
                                (for all J in C .. M'Last (2) =>
                                   M (I, J) = M'Loop_Entry (I, J))));

The above example illustrates a case where GNATprove is able to successfully infer adequate loop invariants, leaving the user with only the most meaningful part to write. However, loop invariant generation for preservation of array components in GNATprove relies on heuristics, which makes it unable to deal with more complex cases. As an example, let us consider a slight variation of the above, which does not iterate directly on the column number, but introduces a shift of 1:

   M : Matrix (1 .. 10, 1 .. 10) := (5      => (others => 1),
                                     7      => (others => 2),
                                     others => (others => 0));
   Tmp : Natural;

   for C in M'First (2) - 1 .. M'Last (2) - 1 loop
      pragma Loop_Invariant (for all I in M'First (2) .. C =>
                               M (5, I) = M'Loop_Entry (7, I)
                             and M (7, I) = M'Loop_Entry (5, I));
      Tmp := M (5, C + 1);
      M (5, C + 1) := M (7, C + 1);
      M (7, C + 1) := Tmp;
   end loop;

   pragma Assert (for all I in 1 .. 10 =>
                    (for all J in 1 .. 10 =>
                       (if I = 5 then M (I, J) = 2
                        elsif I = 7 then M (I, J) = 1
                        else M (I, J) = 0)));

Like in the previous case, GNATprove can infer that lines different from 5 and 7 are preserved. Unfortunately, as the second index in the update is now a variable computation, it cannot come up with the second part of the invariant, and reports a failed attempt at verifying the loop invariant. The counter-example provided by GNATprove informs us that the verification could not succeed at the second line of the invariant (at line 5), for index I = 1, when updating column 2 (C = 1). Clearly, we are missing information about the preservation of components following column C at lines 5 and 7. If we add this invariant manually:

 pragma Loop_Invariant (for all I in C + 1 .. M'Last (2) =>
                                 M (5, I) = M'Loop_Entry (5, I)
                               and M (7, I) = M'Loop_Entry (7, I));

the proof is achieved again.

Another case for which GNATprove's heuristics are not precise yet, is relational updates. As an example, consider the following loop which increments the diagonal of a matrix:

 M : Matrix (1 .. 10, 1 .. 10) := (others => (others => 0));

   for I in M'Range (1) loop
      pragma Loop_Invariant (for all K in M'First (1) .. I - 1 =>
                               M (K, K) = M'Loop_Entry (K, K) + 1);
      M (I, I) := M (I, I) + 1;
   end loop;

   pragma Assert (for all I in 1 .. 10 =>
                    (for all J in 1 .. 10 =>
                       (if I = J then M (I, J) = 1
                        else M (I, J) = 0)));

Here again, we only specify the interesting part of the invariant, namely, that the elements encountered so far on the diagonal of the matrix have been incremented by one. Since we are updating the matrix exactly at the loop index, we could hope that GNATprove would be able to infer the appropriate invariant. Unfortunately, this is not the case, as the tool reports a failed attempt at verifying the assertion following the loop. Note that the loop invariant itself is proved, which means that GNATprove could infer that, at each iteration, the indexes following the current index had not been modified yet. So, what went wrong? The counter-example provided by GNATprove, here again, can be of some help. It precises that the verification was not completed when I is 8 and J is 9. Unsurprisingly, it means that the tool was not able to infer an invariant stating that elements outside the diagonal but located on columns smaller than the current index were preserved by the loop. Indeed, this would require handling specifically the case were both indexes of an update are the same value, which GNATprove does not do yet. If the following invariant is added manually to the loop, the verification succeeds:

 pragma Loop_Invariant (for all K in M'First (1) .. I - 1 =>
                                 (for all H in M'First (1) .. I - 1 =>
                                      (if K /= H then
                                            M (K, H) = M'Loop_Entry (K, H))));
GNATprove also handles preserved array components of components of record or array objects, as well as some cases of updates through a slice. See the corresponding section in the user guide for more information.
]]>
GNATprove Tips and Tricks: What’s Provable for Real Now? http://blog.adacore.com/gnatprove-tips-and-tricks-whats-provable-for-real-now Thu, 17 Nov 2016 05:00:00 +0000 Yannick Moy http://blog.adacore.com/gnatprove-tips-and-tricks-whats-provable-for-real-now

One year ago, we presented on this blog what was provable about fixed-point and floating-point computations (the two forms of real types in SPARK). I used as examples codes computing approximations of Pi using the Leibniz formula and a refinement of it through the Shanks transformation. GNATprove managed to prove the exact result of the computation in only one case, the simple algorithm implemented with fixed-point values.

Since then, we have integrated static analysis in SPARK. Static analyzer CodePeer can be called as part of SPARK analysis using the switch --codepeer=on. CodePeer analysis is particularly interesting when analyzing code using floating-point computations, as CodePeer is both fast and precise for proving bounds of floating-point operations. For more details about this feature, see the SPARK User's Guide.

During the last year, we have also modified completely the way floating-point numbers are seen by SMT provers. In particular, we are now using the native support for floating-point numbers in prover Z3. So Z3 can reason precisely about floating-point computations, instead of relying on a partial axiomatization like CVC4 and Alt-Ergo still do. Improved support for floating-point computations is also expected to be available in CVC4 and Alt-Ergo in 2017.

Both of these features lead to dramatic changes in provability for code doing fixed-point and floating-point computations. On the example codes that I presented last year, GNATprove manages to prove all the results of the four computations (with fixed-points or floating-points, with Leibniz or Shanks) when CodePeer is used (without Z3). GNATprove manages to prove three of the four computations when Z3 is used (without CodePeer). The one that is not proved is the computation of the Shanks algorithm implemented using fixed-point values. In fact it is not provable with Z3 because this algorithm involves a division between fixed-point values that needs to be rounded, and the Ada standard specifies that either the fixed-point value above or the fixed-point value below is a suitable rounding. In SPARK, we strive to stay independent of such compiler-specific implementation choices, so we don't specify which value is chosen between the two different results. The reason why CodePeer manages to prove it is because CodePeer follows the choices made by GNAT in this case, as documented in the SPARK User's Guide. In addition, using either CodePeer or Z3 is sufficient to prove all the other run-time checks in the example codes (range checks, overflow checks and divisions by zero checks).

These features were included in the preview release SPARK 17.0, and initial feedback from customers show that in some cases they do improve the results of proof on code doing fixed-point and floating-point computations. In some other cases, we have witnessed regressions in proof due to the fact CVC4 and Alt-Ergo now prove fewer checks. We are actively working to revert these regressions, and we think that the new combination of analyses will lead to large improvements for proof on such codebases in the future.

]]>
Integrate new tools in GPS http://blog.adacore.com/integrate-new-tools-in-gps Tue, 15 Nov 2016 15:10:03 +0000 Emmanuel Briot http://blog.adacore.com/integrate-new-tools-in-gps

Integrate external tools in GPS

The GNAT Programming Studio is a very flexible platform. Its goal is to provide a graphical user interface (GUI) on top of existing command line tools. Out of the box, it integrates support for compilers (via makefiles or gprbuild) for various languages (Ada, C, C++, SPARK, Python,...), verification tools (like codepeer), coverage analysis for gcov and gnatcoverage, version control systems (git, subversion,...), models (via QGEN) and more.

But it can't possibly integrate all the tools you are using daily.

This blog describes how such tools can be integrated in GPS, from basic integration to more advanced python based plugins. It is the first in a series that will get your started in writing plug-ins for GPS.

Basic integration: build targets

Every time GPS spawns an external tool, it does this through what is called
a Build Target. They can be edited via the /Build/Settings/Targets menu (see also the online documentation)

Build Targets dialog

On the left of the dialog, you will see a list of all existing build targets, organized into categories. Clicking on any of these shows, in the right panel, the command line used to execute that target, including the name of the tool, the switches,...

The top of the right panel describes how this build targets integrates in GPS. For instance, the Display Target section tells GPS whether to add a  button to the main toolbar, or an entry in the /Build menu or perhaps in some of the contextual menus.

Selecting any of the resulting toolbar buttons or menus will either execute the action directly, no question asked (if the Launch Mode is set to "Manually with no dialog"), or display a dialog to let users modify the switches for that one specific run (if the Launch Mode is set to "Manually with Dialog"). It is even possible to execute that build target every time the user saves a file, which is useful if you want to run some kind of checks.

Creating a new target

To create a new build target, click on the [+] button on the left. This pops up a small dialog asking for the name of the target (as displayed in GPS), a target model (which presets a number of things for the target -- I recommed using "execute -- Run an executable" as a starting point), and finally the category in which the target is displayed.


Create new target

Click on [OK].

In the right panel, you can then edit the command line to run, where the target should be displayed, and how it is run.


Edit target properties

The command line supports a number of macros, like %F, that will be replaced with information from the current context when the target is run. For instance, using "%F" will insert the full path name for the current file. Hover the mouse on the command line field to see all other existing macros.

Press [OK] when you are done modifying all the settings.

(On some versions of GPS, you need to restart GPS before you can execute the build target, or macro expansion will not be done properly)

Target added to toolbar
Target added to menu

Running the target

When you select either the toolbar button or the menu, GPS will open a new console and show the output of your tool. In this example, we have created a simple program that outputs a dummy error on the first line of the file passed in argument.

Running the target

Advanced integration via target models

As you saw in the previous screenshot, the result of running our custom command is that its output is sent to a new window named "Run". But nothing is clickable. In our example, we would like users to be able to click on the location and have GPS display the source location.

This is no doable just via the GUI, so we'll need to write a small plugin. Let's remove the target we created earlier, by once again opening the /Build/Settings/Targets dialog, selecting our new target, and clicking on [-].

Let's then create a new file

  • Windows: %USER_PROFILE%\.gps\plug-ins\custom_target.py
  • Linux and Mac: ~/.gps/plug-ins/custom_target.py

The initial contents of this file is:

import GPS
GPS.parse_xml("""
    <target model="execute" category="File" name="My Style Checker">
        <in-toolbar>TRUE</in-toolbar>
        <in-menu>TRUE</in-menu>
        <launch-mode>MANUALLY</launch-mode>
        <iconname>gps-custom-build-symbolic</iconname>
        <command-line>
            <arg>my_style_checker</arg>
            <arg>%F</arg>
        </command-line>
        <output-parsers>
           output_chopper
           utf_converter
           location_parser
           console_writer
           end_of_build
        </output-parsers>
   </target>
 """)

This file is very similar to what the GUI dialog we used in the first part did (the GUI created a file ~/.gps/targets.xml or %USER_PROFILE%\.gps\targets.xml). See the online documentation.

One major difference though is the list of output parsers on line 12. They tell GPS what should be done with whatever the tool outputs. In our case:

  1. we make sure that the output is not split in the middle of lines (via "output_chopper");
  2. we convert the output to UTF-8 (via "utf_converter");
  3. more importantly for us, we then ask GPS, with "location_parser" to detect error messages and create entries in the Locations view for them, so that users can click them;
  4. we also want to see the full output of the tool in its own console ("console_writer");
  5. Finally, we let GPS performs various cleanups with "end_of_build". Do not forget this last one, since it is also responsible for expanding the %F macro.

The full list of predefined output parsers can be found in the online documentation.

If we then restart GPS and execute our build target, we now get two consoles showing the output of the tool, as happens for compilers for instance.

Locations view

Conclusion

With the little plugin we wrote, our tool is now much better integrated in GPS. We have

  • menus to spawn the tool, with arguments that depend on the current context
  • display the tool's output in a new console
  • make error messages clickable by the user to show the proper sources

In later posts, we will show how to make this more configurable, via custom GPS menus and preferences. We will also explain what workflows are and how they can be used to chain multiple commands.

]]>