The AdaCore Blog http://blog.adacore.com/ An insight into the AdaCore ecosystem en-us Sun, 27 May 2018 11:49:21 +0000 Sun, 27 May 2018 11:49:21 +0000 Taking on a Challenge in SPARK http://blog.adacore.com/taking-on-a-challenge-in-spark Tue, 08 May 2018 14:01:00 +0000 Johannes Kanig http://blog.adacore.com/taking-on-a-challenge-in-spark

Last week, the programmer Hillel posted a challenge (the link points to a partial postmortem of the provided solutions) on Twitter for someone to prove a correct implementation of three small programming problems: Leftpad, Unique, and Fulcrum.

This was a good opportunity to compare the SPARK language and its expressiveness and proof power to other systems and paradigms, so I took on the challenge. The good news is that I was able to prove all three solutions and that the SPARK proofs of each complete in no more than 10 seconds. I also believe the Fulcrum solution in particular shows some aspects of SPARK that are especially nice.

I will now explain my solutions to each problem, briefly for Leftpad and Unique and in detail for Fulcrum. At the end, I discuss my takeaways from this challenge.

Leftpad

Hillel mentioned that the inclusion of Leftpad into the challenge was kind of a joke. A retracted JavaScript package that implemented Leftpad famously broke thousands of projects back in 2016. Part of the irony was that Leftpad is so simple one shouldn’t depend on a package for this functionality.

The specification of Leftpad, according to Hillel, is as follows:

Takes a padding character, a string, and a total length, returns the string padded to that length with that character. If length is less than the length of the string, does nothing.

It is always helpful to start with translating the specification to a SPARK contract. To distinguish between the two cases (padding required or not), we use contract cases, and arrive at this specification (see the full code in this github repository):


   function Left_Pad (S : String; Pad_Char : Character; Len : Natural)
                      return String
   with Contract_Cases =>
     (Len > S'Length =>
            Left_Pad'Result'Length = Len and then
            (for all I in Left_Pad'Result'Range =>
                 Left_Pad'Result (I) =
                        (if I <= Len - S'Length then Pad_Char
                         else S (I - (Len - S'Length + 1) + S'First))),
      others => Left_Pad'Result = S);

In the case where padding is required, the spec also nicely shows how the result string is composed of both padding chars and chars from the input string.

The implementation in SPARK is of course very simple; we can even use an expression function to do it:

   function Left_Pad (S : String; Pad_Char : Character; Len : Natural)
                      return String
   is ((1 .. Len - S'Length => Pad_Char) & S);

Unique

The problem description, as defined by Hillel:

Takes a sequence of integers, returns the unique elements of that list. There is no requirement on the ordering of the returned values.

An explanation in this blog wouldn’t add anything to the commented code, so I suggest you check out the code for my solution directly.

Fulcrum

The Fulcrum problem was the heart of the challenge. Although the implementation is also just a few lines, quite a lot of specification work is required to get it all proved.

The problem description, as defined by Hillel:

Given a sequence of integers, returns the index i that minimizes |sum(seq[..i]) - sum(seq[i..])|. Does this in O(n) time and O(n) memory.

(Side note: It took me quite a while to notice it, but the above notation seq[..i] apparently means the slice up to and excluding the value at index i. I have taken it instead to mean the slice up to and including the value at index i. Consequently, I used seq[i+1..] for the second slice. This doesn't change the nature or difficulty of the problem.)

I’m pleased with the solution I arrived at, so I will present it in detail in this blog post. It has two features that I think none of the other solutions has:

  • It runs in O(1) space, compared to the O(n) permitted by the problem description and required by the other solutions;

  • It uses bounded integers and proves absence of overflow; all other solutions use unbounded integers to side-step this issue.

Again, our first step is to transform the problem statement into a subprogram specification (please refer to the full solution on github for all the details: there are many comments in the code). For this program, we use arrays to represent sequences of integers, so we first need an array type and a function that can sum all integers in an array. My first try looked like this:

type Seq is array (Integer range <>) of Integer;

function Sum (S : Seq) return Integer is
  (if S’Length = 0 then 0
   else S (S’First) + Sum (S (S’First + 1 .. S’Last)));

So we are just traversing the array via recursive calls, summing up the cells as we go. Assuming an array S of type Seq and an array index I, we could now write the required difference between the left and right sums as follows (assuming S is non-empty, which we assume throughout):

abs (Sum (S (S’First .. I)) - Sum (S (I + 1 .. S’Last)))

However, there are two problems with this code, and both are specific to how SPARK works. The first problem could be seen as a limitation of SPARK: SPARK allows recursive functions such as Sum, but can’t use them to prove code and specifications that refer to them. This would result in a lot of unproved checks in our code. But my colleague Claire showed me a really nice workaround, which I now present .

The idea is to not compute a single sum value, but instead an array of partial sums, where we later can select the partial sum we want. Because such an array can’t be computed in a single expression, we define a new function Sum_Acc as a regular function (not an expression function) with a postcondition:

   function Sum_Acc (S : Seq) return Seq 
   with Post =>
     (Sum_Acc'Result'Length = S'Length and then
      Sum_Acc'Result'First = S'First and then
      Sum_Acc'Result (S'First) = S (S'First) and then
      (for all I in S'First + 1 .. S'Last =>
            Sum_Acc'Result (I) = Sum_Acc'Result (I - 1) + S (I)));

The idea is that each cell of the result array contains the sum of the input array cells, up to and including the corresponding cell in the input array. For an input array (1,2,3), the function would compute the partial sums (1,3,6). The last value is always the sum of the entire array. In the postcondition, we express that the result array has the same length and bounds as the input array, that the first cell of the result array is always equal to the first cell of the input array, and that each following cell is the sum of the previous cell of the result array, plus the input cell at the same index.

The implementation of this Sum_Acc function is straightforward and can be found in the code on github. We also need a Sum_Acc_Rev function which computes the partial sum starting from the end of the array. It is almost the same as Sum_Acc, but the initial (in fact last) value of the array will be zero, owing to the asymmetric definition of the two sums in the initial problem description. You can also find its specification and implementation in the github repository.

For our array S and index I, the expression to compute the difference between left and right sum now becomes:

abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I))

The second problem is that we are using Integer both as the type of the array cells and the sum result, but Integer is a bounded integer type (32 or 64 bits, depending on your platform), so the sum function will overflow! All other Fulcrum solutions referred to in Hillel’s summary post use unbounded integers, so they avoid this issue. But in many contexts, unbounded integers are unacceptable, because they are unpredictable in space and time requirements and require dynamic memory handling. This is why I decided to increase the difficulty of the Fulcrum problem a little and include proof of absence of overflow.

To solve the overflow problem, we need to bound the size of the values to be summed and how many we can sum them. We also need to take into account negative values, so that we also don’t exceed the minimum value. Luckily, range types in SPARK make this simple:

   subtype Int is Integer range -1000 .. 1000;
   subtype Nat is Integer range 1 .. 1000;

   type Seq is array (Nat range <>) of Int;
   type Partial_Sums is array (Nat range <>) of Integer;

We will use Int as the type for the contents of arrays, and Nat for the array indices (effectively limiting the size of arrays to at most 1000). The sums will still be calculated in Integer, so we need to define a new array type to hold the partial sums (we need to change the return type of the partial sums functions to Partial_Sums to make this work). So if we were to sum the largest possible array with 1000 cells, containing only the the highest (or lowest) value 1000 (or -1000), the absolute value of the sum would not exceed 1 million, which even fits easily into 32 bits. Of course, we could also choose different, and much larger, bounds here.

We now can formulate the specification of Find_Fulcrum as follows:

  function Find_Fulcrum (S : Seq) return Nat
  with Pre => S'Length > 0,
       Post =>
         (Find_Fulcrum'Result in S'Range and then
         (for all I in S'Range =>
               abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I)) >=
               abs (Sum_Acc (S) (Find_Fulcrum'Result) - Sum_Acc_Rev (S) (Find_Fulcrum'Result))));

The implementation of Fulcrum

The Sum_Acc and Sum_Acc_Rev functions we already defined suggest a simple solution to Fulcrum that goes as follows:

  1. call Sum_Acc and Sum_Acc_Rev and store the result;

  2. compute the index where the difference between the two arrays is the smallest.

The problem with this solution is that step (1) takes O(n) space but we promised to deliver a constant space solution! So we need to do something else.

In fact, we notice that every program that calls Sum_Acc and Sum_Acc_Rev will be already O(n) in space, so we should never call these functions outside of specifications. The Ghost feature of SPARK lets the compiler check this for us. Types, objects and functions can be marked ghost, and such ghost entities can only be used in specifications (like the postcondition above, or loop invariants and other intermediate assertions), but not in code. This makes sure that we don’t call these functions accidentally, slowing down our code. Marking a function ghost can be done just by adding with Ghost to its declaration.

The constant space implementation idea is that if, for some index J - 1, we have the partial sums Left_Sum = Sum_Acc (J - 1) and Right_Sum = Sum_Acc_Rev (J - 1), we can compute the values for the next index J + 1 by simply adding S (J) to Left_Sum and subtracting it from Right_Sum. This simple idea gives us the core of the implementation:

for I in S'First + 1 .. S'Last loop
    Left_Sum := Left_Sum + S (I);
    Right_Sum := Right_Sum - S (I);
    if abs (Left_Sum - Right_Sum) < Min then
       Min := abs (Left_Sum - Right_Sum);
       Index := I;
    end if;
end loop;
return Index;

To understand this code, we also need to know that Min holds the current minimal difference between the two sums and Index gives the array index where this minimal difference occurred.

In fact, the previous explanations of the code can be expressed quite nicely using this loop invariant (it also holds at the beginning of the loop):

pragma Loop_Invariant
      (Left_Sum = Sum_Acc (S) (I - 1) and then
       Right_Sum = Sum_Acc_Rev (S) (I - 1) and then
       Min = abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index)) and then
       (for all K in S'First .. I - 1 =>
               abs (Sum_Acc (S) (K) - Sum_Acc_Rev (S) (K)) >=
               abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index))));

The only part that’s missing is the initial setup, so that the above conditions hold for I = S’First. For the Right_Sum variable, this requires a traversal of the entire array to compute the initial right sum. For this we have written a helper function Sum which is O(n) time and O(1) space. So we end up with this initialization code for Find_Fulcrum:

Index     : Nat     := S'First;
Left_Sum  : Integer := S (S'First);
Right_Sum : Integer := Sum (S);
Min       : Integer := abs (Left_Sum - Right_Sum);

and it can be seen that these initial values establish the loop invariants for the first iteration of the loop (where I = S’First + 1, so I - 1 = S’First).

Some metrics

It took me roughly 15 minutes to come up with the Leftpad proof. I don’t have exact numbers for the two other problems, but I would guess roughly one hour for Unique and 2-3 hours for Fulcrum.

Fulcrum is about 110 lines of code, including specifications, but excluding comments and blank lines. An implementation without any contracts would be at 35 lines, so we have a threefold overhead of specification to code … though that’s typical for specification of small algorithmic problems like Fulcrum.

All proofs are done automatically, and SPARK verifies each example in less than 10 seconds.

Some thoughts about the challenge

First of all, many thanks to Hillel for starting that challenge and to Adrian Rueegsegger for bringing it to my attention. It was fun to do, and I believe SPARK did reasonably well in this challenge.

Hillel’s motivation was to counter exaggerated praise of functional programming (FP) over imperative programming (IP). So he proved these three examples in imperative style and challenged the functional programming community to do the same. In this context, doing the problems in SPARK was probably besides the point, because it’s not a functional language.

Going beyond the FP vs IP question asked in the challenge, I think we can learn a lot by looking at the solutions and the discussion around the challenge. This blog post by Neelakantan Krishnaswami argues that the real issue is the combination of language features, in particular aliasing in combination with side effects. If you have both, verification is hard.

One approach is to accept this situation and deal with it. In Frama-C, a verification tool for C, it is common to write contracts that separate memory regions. Some tools are based on separation logic, which directly features separating conjunction in the specification language. Both result in quite complex specifications, in my opinion.

Or you can change the initial conditions and remove language features. Functional programming removes side effects to make aliasing harmless. This has all kinds of consequences, for example the need for a garbage collector and the inability to use imperative algorithms from the literature. SPARK keeps side effects but removes aliasing, essentially by excluding pointers and a few other language rules. SPARK makes up for the loss of pointers by providing built-in arrays and parameter modes (two very common usages of pointers in C), but shared data structures still remain impossible to write in pure SPARK: one needs to leave the SPARK subset and go to full Ada for that. Rust keeps pointers, but only non-aliasing ones via its borrow checker; however, formal verification for Rust is still in very early stages as far as I know. The SPARK team is also working on support for non-aliasing pointers.

Beyond language features, another important question is the applicability of a language to a domain. Formal verification of code matters most where a bug would have a big impact. Such code can be found in embedded devices, for example safety-critical code that makes sure the device is safe to use (think of airplanes, cars, or medical devices). Embedded programming, in particular safety-critical embedded programming, is special because different constraints apply. For example, execution times and memory usage of the program must be very predictable, which excludes languages that have a managed approach to memory (memory usage becomes less predictable) with a GC (often, the GC can kick in at any time, which makes execution times unpredictable). In these domains, functional languages can’t really be applied directly (but see the use of Haskell in sel4). Unbounded integers can’t be used either because of the same issue - memory usage can grow if some computation yields a very large result, and execution time can vary as well. These issues were the main motivation for me to provide a O(1) space solution that uses bounded integers for the Fulcrum problem and proving absence of overflow.

Programming languages aren’t everything either. Another important issue is tooling, in particular proof automation. Looking at the functional solutions of Fulcrum (linked from Hillel’s blog post), they contain a lot of manual proofs. The Agda solution is very small despite this fact, though it uses a simple quadratic algorithm; I would love to see a variant that’s linear.

I believe that for formal verification to be accepted in industrial projects, most proofs must be able to be completed automatically, though some manual effort is acceptable for a small percentage of the proof obligations. The Dafny and SPARK solutions are the only ones (as far as I could see) that fare well in this regard. Dafny is well-known for its excellent proof automation via Boogie and Z3. SPARK also does very well here, all proofs being fully automatic.

]]>
PolyORB now lives on Github http://blog.adacore.com/polyorb-now-lives-on-github Wed, 18 Apr 2018 15:52:00 +0000 Thomas Quinot http://blog.adacore.com/polyorb-now-lives-on-github

PolyORB, AdaCore's versatile distribution middleware, now lives on Github. Its new home is https://github.com/AdaCore/polyorb

PolyORB is a development toolsuite and a library of runtime components that implements several distribution models, including CORBA and the Ada 95 Distributed Systems Annex. Originally developed as part of academic research at Telecom ParisTech, it became part of the GNAT Pro family in 2003.

Since then, it has been used in a number of industrial applications in a wide variety of domains such as:
 * air traffic flow management
 * enterprise document management
 * scientific data processing in particle physics experiments

AdaCore has always been committed to involving the user community in the development of PolyORB. Over the past 15 years, many contributions from industrial as well as hobbyist users have been integrated, and community releases were previously made available in conjunction with GNAT GPL.

Today we are pleased to further this community engagement and renew our commitment to an open development process by making the PolyORB repository (including full history) available on Github. This will allow users of GNAT GPL to benefit from the latest developments and contribute fixes and improvements.

We look forward to seeing your issues and pull requests on this repository!

]]>
SPARKZumo Part 2: Integrating the Arduino Build Environment Into GPS http://blog.adacore.com/sparkzumo-part-2-integrating-the-arduino-build-environment-into-gps Wed, 04 Apr 2018 04:00:00 +0000 Rob Tice http://blog.adacore.com/sparkzumo-part-2-integrating-the-arduino-build-environment-into-gps

This is part #2 of the SPARKZumo project where we go through how to actually integrate a CCG application in with other source code and how to create GPS plugins to customize features like automating builds and flashing hardware. To read more about the software design of the project visit the other blog post here.

The Build Process

At the beginning of our build process we have a few different types of source files that we need to bring together into one binary, Ada/SPARK, C++, C, and an Arduino sketch. During a typical Arduino build, the build system converts the Arduino sketch into valid C++ code, brings in any libraries (user and system) that are included in the sketch, synthesizes a main, compiles and links that all together with the Arduino runtime and selected BSP, and generates the resulting executable binary. The only step we are adding to this process is that we need to run CCG on our SPARK code to generate a C library that we can pass to the Arduino build as a valid Arduino library. The Arduino sketch then pulls the resulting library into the build via an include.  

Build Steps

From the user’s perspective, the steps necessary to build this application are as follows:

  1. Run CCG on the SPARK/Ada Code to produce C files and Ada Library Information files, or ali files. For more information on these files, see the GNAT Compilation Model documentation.
  2. Copy the resulting C files into a directory structure valid for an Arduino library


    1. We will use the lib directory in the main repo to house the generated Arduino library.
  3. Run c-gnatls on the ali files to determine which runtime files our application depends on.
  4. Copy those runtime files into the Arduino library structure. 
  5. Make sure our Arduino sketch has included the header files generated by the CCG tool.
  6. Run the arduino-builder tool with the appropriate options to tell the tool where our library lives and which board we are compiling for.


    1. The arduino-builder tool will use the .build directory in the repo to stage the build
  7. Then we can flash the result of the compilation to our target board.

That seems like a lot of work to do every time we need to make a change to our software! 

Since these steps are the same every time, we can automate this. Since we should try to make this as host agnostic as possible, meaning we would like for this to be used on Windows and Linux, we should use a scripting language which is fairly host agnostic. It would also be nice if we could integrate this workflow into GPS so that we can develop our code, prove our code, and build and flash our code without leaving our IDE. It is an Integrated Development Environment after all.

Configuration Files

The arduino-builder program is the command line version of the Arduino IDE. When you build an application with the Arduino IDE it creates a build.options.json file with the options you select from the IDE. These options include the location of any user libraries, the hardware to build for, where the toolchain lives, and where the sketch lives. We can pass the same options to the arduino-builder program or we can pass it the location of a build.options.json file. 

For this application I put a build.options.json file in the conf directory of the repository. This file should be configured properly for your build system. The best way, I have found, to get this file configured properly is to install the Arduino IDE and build one of the example applications. Then find the generated build.options.json file generated by the IDE and copy that into the conf directory of the repository. You then only need to modify:

  1. The “otherLibrariesFolders” to point to the absolute path of the lib folder in the repo.
  2. The”sketchLocation” to point at the SPARKZumo.ino file in the repo.

The other conf files in the conf directory are there to configure the flash utilities. When flashing the AVR on the Arduino Uno, the avrdude flash utility is used. This application takes the information from the flash.yaml file and the path of the avrdude.conf file to configure the flash command. Avrdude uses this to inform the flashing utility about the target hardware. The HiFive board uses openocd as its flashing utility. The openocd.cfg file has all the necessary configuration information that is passed to the openocd tool for flashing. 

The GPS Plugin

[DISCLAIMER: This guide assumes you are using version 18.1 or newer of GPS]

Under the hood, GPS, or the GNAT Programming Studio, has a combination of Ada, graphical frameworks, and Python scripting utilities. Using the Python plugin interface, it is very easy to add functionality to our GPS environment. For this application we will add some buttons and menu items to automate the process mentioned above. We will only be using a small subset of the power of the Python interface. For a complete guide to what is possible you can visit the Customizing and Extending GPS and Scripting API Reference for GPS sections of the GPS User’s Guide.

Plugin Installation Locations

Depending on your use case you can add Python plugins in a few locations to bring them into your GPS environment. There are already a handful of plugins that come with the GPS installation. You can find the list of these plugins by going to Edit->Preferences and navigating to the Plugin tab (near the bottom of the preferences window on the left sidebar). Because these plugins are included with the installation, they live under the installation directory in <installation directory>/share/gps/plug-ins. If you would like to modify you installation, you can add your plugins here and reload GPS. They will then show up in the plugin list. However, if you reinstall GPS, it will overwrite your plugin!

There is a better place to put your plugins such that they won’t disappear when you update your GPS installation. GPS adds a folder to your Home directory which includes all your user defined settings for GPS, such as your color theme, font settings, pretty printer settings, etc. This folder, by default, lives in <user’s home directory>/.gps. If you navigate to this folder you will see a plug-ins folder where you can add your custom plugins. When you update your GPS installation, this folder persists.

Depending on your application, there may be an even better place to put your plugin. For this specific application we really only want this added functionality when we have the SPARKzumo project loaded. So ideally, we want the plugin to live in the same folder as the project, and to load only when we load the project. To get this functionality, we can name our plugin <project file name>.ide.py and put it in the same directory as our project. When GPS loads the project, it will also load the plugin. For example, our project file is named zumo.gpr, so our plugin should be called zumo.ide.py. The source for the zumo.ide.py file is located here.

The Plugin Skeleton

When GPS loads our plugin it will call the initialize_project_plugin function. We should implement something like this to create our first button:

import GPS
import gps_utils
class ArduinoWorkflow:
   def __somefunction(self):
       # do stuff here
   def __init__(self):
       gps_utils.make_interactive(
               callback=self.__somefunction,
               category="Build",
               name="Example",
               toolbar='main',
               menu='/Build/Arduino/' + "Example",
               description="Example")
def initialize_project_plugin():
   ArduinoWorkflow()

This simple class will create a button and a menu item with the text Example. When we click this button or menu item it will callback to our somefunction function. Our actual plugin creates a few buttons and menu items that look like this:

Buttons in GPS created by user plug-in
Menus in GPS created by user plug-in

Task Workflows

Now that we have the ability to run some scripts by clicking buttons we are all set! But there’s a problem; when we execute a script from a button, and the script takes some time to perform some actions, GPS hangs waiting for the script to complete. We really should be executing our script asynchronously so that we can still use GPS while we are waiting for the tasks to complete. Python has a nice feature called coroutines which can allow us to run some tasks asynchronously. We can be super fancy and implement these coroutines using generators!

Or…

ProcessWrapper

GPS has already done this for us with the task_workflow interface. The task_workflow call wraps our function in a generator and will asynchronously execute parts of our script. We can modify our somefunction function now to look like this:

def __somefunction(self, task):
       task.set_progress(0, 1)
       try:
           proc = promises.ProcessWrapper(["script", "arg1", "arg2"], spawn_console="")
       except:
           self.__error_exit("Could not launch script.")
           return
       ret, output = yield proc.wait_until_terminate()
       if ret is not 0:
           self.__error_exit("Script returned an error.")
           return
       task.set_progress(1, 1)

In this function we are going to execute a script called script and pass 2 arguments to it. We wrap the call to the script in a ProcessWrapper which returns a promise. We then yield on the result. The process will run asynchronously, and the main thread will transfer control back to the main process. When the script is complete, the yield returns the stdout and exit code of the process. We can even feed some information back to the user about the progress of the background processes using the task.set_progress call. This registers the task in the task window in GPS. If we have many tasks to run, we can update the task window after each task to tell the user if we are done yet.

TargetWrapper

The ProcessWrapper interface is nice if we need to run an external script but what if we want to trigger the build or one of the gnat tools? 

Triggering CCG

Just for that, there’s another interface: TargetWrapper. To trigger the build tools, we can run something like this:

builder = promises.TargetWrapper("Build All")
retval = yield builder.wait_on_execute()
if retval is not 0:
     self.__error_exit("Failed to build all.")
     return

With this code, we are triggering the same action as the Build All button or menu item. 

Triggering GNATdoc

We can also trigger the other tools within the GNAT suite using the same technique. For example, we can run the GNATdoc tool against our project to generate the project documentation:

gnatdoc = promises.TargetWrapper("gnatdoc")
retval = yield gnatdoc.wait_on_execute(extra_args=["-P", GPS.Project.root().file().path, "-l"])
   if retval is not 0:
           self.__error_exit("Failed to generate project documentation.")
           return

Here we are calling gnatdoc with the arguments listed in extra_args. This command will generate the project documentation and put it in the directory specified by the Documentation_Dir attribute of the Documentation package in the project file. In this case, I am putting the docs in the docs folder of the repo so that my GitHub repo can serve those via a GitHub Pages website

Accessing Project Configuration

The file that drives the GNAT tools is the GNAT Project file, or the gpr file. This file has all the information necessary for GPS and CCG to process the source files and build the application. We can access all of this information from the plugin as well to inform where to find the source files, where to find the object files, and what build configuration we are using. For example, to access the list of source files for the project we can use the following Python command: GPS.Project.root().sources().

Another important piece of information that we would like to get from the project file is the current value assigned to the “board” scenario variable. This will tell us if we are building for the Arduino target or the HiFive target. This variable will change the build configuration that we pass to arduino-builder and which flash utility we call. We can access this information by using the following command: GPS.Project.root().scenario_variables(). This will return a dictionary of all scenario variables used in the project. We can then access the “board” scenario variable using the typical Python dictionary syntax GPS.Project.root().scenario_variables()[‘board’].

Determining Runtime Dependencies

Because we are using the Arduino build system to build the output of our CCG tool, we will need to include the runtime dependency files used by our CCG application in the Arduino library directory. To detect which runtime files we are using we can run the c-gnatls command against the ali files generated by the CCG tool. This will output a set of information that we can parse. The output of c-gnatls on one file looks something like this

$ c-gnatls -d -a -s obj/geo_filter.ali 
geo_filter.ads
geo_filter.adb
<CCG install direction>/libexec/gnat_ccg/lib/gcc/x86_64-pc-linux-gnu/7.3.1/adainclude/interfac.ads
<CCG install directory>/libexec/gnat_ccg/lib/gcc/x86_64-pc-linux-gnu/7.3.1/adainclude/i-c.ads
line_finder_types.ads
<CCG install directory>/libexec/gnat_ccg/lib/gcc/x86_64-pc-linux-gnu/7.3.1/adainclude/system.ads
types.ads

When we parse this output we will have to make sure we run c-gnatls against all ali files generated by CCG, we will need to strip out any files listed that are actually part of our sources already, and we will need to remove any duplicate dependencies. The c-gnatls tool also lists the Ada versions of the runtime files and not the C versions. So we need to determine the C equivalents and then copy them into our Arduino library folder. The __get_runtime_deps function is responsible for all of this work. 

Generating Lookup Tables

If you had a chance to look at the first blog post in this series, I talked about a bit about code in this application that was used to do some filtering of discrete states using a graph filter. This involved mapping some states onto some physical geometry and sectioning off areas that belonged to different states. The outcome of this was to map each point in a 2D graph to some state using a lookup table. 

To generate this lookup table I used a python library called shapely to compute the necessary geometry and map points to states. Originally, I had this as a separate utility sitting in the utils folder in the repo and would copy the output of this program into the geo_filter.ads file by hand. Eventually, I was able to bring this utility into the plugin workflow using a few interesting features of GPS.

GPS includes pip

Even though GPS has the Python env embedded in it, you can still bring in outside packages using the pip interface. The syntax for installing an external dependency looks something like:

import pip
ret = pip.main(["install"] + dependency)

Where dependency is the thing you are looking to install. In the case of this plugin, I only need the shapely library and am installing that when the GPS plugin is initialized.

Accessing Ada Entities via Libadalang

The Libadalang library is now included with GPS and can be used inside your plugin. Using the libadalang interface I was able to access the value of user defined named numbers in the Ada files. This was then passed to the shapely application to compute the necessary geometry.

ctx = lal.AnalysisContext()
unit = ctx.get_from_file(file_to_edit)
myVarNode = unit.root.findall(lambda n: n.is_a(lal.NumberDecl) and n.f_ids.text=='my_var')
value = int(myVarNode[0].f_expr.text)

This snippet creates a new Libadalang analysis context, loads the information from a file and searches for a named number declaration called ‘my_var’. The value assigned to ‘my_var’ is then stored in our variable value.

I was then able to access the location where I wanted to put the output of the shapely application using Libadalang:

array_node = unit.root.findall(lambda n: n.is_a(lal.ObjectDecl) and n.f_ids.text=='my_array')
agg_start_line = int(array_node[0].f_default_expr.sloc_range.start.line)
agg_start_col = int(array_node[0].f_default_expr.sloc_range.start.column)
agg_end_line = int(array_node[0].f_default_expr.sloc_range.end.line)
agg_end_col = int(array_node[0].f_default_expr.sloc_range.end.column)

This gave me the line and column number of the start of the array aggregate initializer for the lookup table ‘my_array’. 

Editing Files in GPS from the Plugin

Now that we have the computed lookup table, we could use the typical python file open mechanism to edit the file at the location obtained from Libadalang. But since we are already in GPS, we could just use the GPS.EditorBuffer interface to edit the file. Using the information from our shapely application and the line and column information obtained from Libadalang we can do this:

buf = GPS.EditorBuffer.get(GPS.File(file_to_edit))
agg_start_cursor = buf.at(agg_start_line, agg_start_col)
agg_end_cursor = buf.at(agg_end_line, agg_end_col)
buf.delete(agg_start_cursor, agg_end_cursor)
array_str = "(%s));" % ("(%s" % ("),\n(".join([', '.join([item for item in row]) for row in array])))
buf.insert(agg_start_cursor, array_str[agg_start_col - 1:])

First we open a buffer to the file that we want to edit. Then we create a GPS.Location for the beginning and end of the current array aggregate positions that we obtained from Libadalang. Then we remove the old information in the buffer. We then turn the array we received from our shapely application into a string and insert that into the buffer. 

We have just successfully generated some Ada code from our GPS plugin!

Writing Your Own Python Plugin

Most probably, there is already a plugin that exists in the GPS distribution that does something similar to what you want to do. For this plugin, I used the source for the plugin that enables flashing and debugging of bare-metal STM32 ARM boards. This file can be found in your GPS installation at <install directory>/share/gps/support/ui/board_support.py. You can also see this file on the GPS GitHub repository here.

In most cases, it makes sense to search through the plugins that already exist to get a starting point for your specific application, then you can fill in the blanks from there. You can view the entire source of GPS on AdaCore’s Github repository

That wraps up the overview of the build system for this application. The source for the project can be found here. Feel free to fork this project and create new and interesting things.

Happy Hacking!

]]>
A Modern Syntax for Ada http://blog.adacore.com/a-modern-syntax-for-ada Sun, 01 Apr 2018 14:00:00 +0000 Fabien Chouteau http://blog.adacore.com/a-modern-syntax-for-ada

One of the most criticized aspect of the Ada language throughout the years has been its outdated syntax. Fortunately, AdaCore decided to tackle this issue by implementing a new, modern, syntax for Ada.

The major change is the use of curly braces instead of begin/end. Also the following keywords have been shortened:

  • return becomes ret
  • function becomes fn
  • is becomes :
  • with becomes include

For instance, the following function:

with Ada.Numerics;

function Fools (X : Float) return Float is
begin
   return X * Ada.Numerics.Pi;
end;

is now written:

include Ada.Numerics;

fn Fools (X : Float) ret Float :
{
   ret X * Ada.Numerics.Pi;
};

This modern syntax is a major milestone in the adoption of Ada. John Dorab recently discovered the qualities of Ada:

I have an eye condition that prevents me from reading code without curly braces. Thanks to this new syntax, I can now benefit from the advanced type system, programming by contract, portability, functional safety, [insert more cool Ada features here] of Ada. Also, it looks like most other programming languages, so it must be better.

This new syntax is also a boost in productivity for Ada developers. Mr Fisher testifies:

I write at around 10 lines of code per day. With this new syntax I save up to 30 keystrokes. That’s at a huge increase to my productivity! The code is less readable for debugging, code reviews and maintenance in general, but I write a little bit more of it.

The standardization effort related to this new syntax is expected to start in the coming year and to last a few years. In order to allow early adopters to get their hands on Ada without having to wait for the next standard, we have created a font that allows you to display Ada code with the new syntax:

The font contains other useful ligatures, like displaying the Ada assignment operator “:=” as “=”, and the Ada equality operator “=” as “==”. It was created based on Courier New, using Glyphr Studio and cloudconvert. It’s work in progress, so feel free to extend it! It is attached below.

In the future, we also plan to go beyond a pure syntactic layer with Libadalang. For example, we could emulate the complex type promotions/conversions rules of C by inserting Unchecked_Conversion calls each time the programmer tries to convert a value to an incompatible type. If you have other ideas, please let us know in the comments below!

Attachments

]]>
Getting Rid of Rust with Ada http://blog.adacore.com/getting-rid-of-rust-with-ada Sun, 01 Apr 2018 08:00:00 +0000 Fabien Chouteau http://blog.adacore.com/getting-rid-of-rust-with-ada

There are a lot of DIY CNC projects out there (router, laser, 3D printer, egg drawing, etc.), but I never saw a DIY CNC sandblaster. So I decided to make my own.

Hardware

The CNC frame is one of those cheap kits that you can get on ebay for instance. Mine was around 200 euros, and it is actually a good value for the price. I built the kit and then replaced the electronic controller with an STM32F469 discovery board and an arduino CNC shield.

For the sandblaster itself, my father and I hacked this simple solution made from a soda bottle and pipes/fittings that you can find in any hardware store.

The sand is falling from the tank in a small tube mostly thanks to gravity. The sand tank still needs to be pressurised to avoid air coming up from the nozzle.

The sandblaster was then mounted to the CNC frame where the engraving spindle is supposed to be, and the sand tank is somewhat fixed above the machine. As you can shortly see on the video, I’m controlling the airflow manually as I didn’t have a solenoid valve to make the machine fully autonomous.

Software

On the software side I re-used my Ada Gcode controller from a previous project. I still wanted to add something to it, so this time I used a board with a touch screen to create a simple control interface.

Conclusion

This machine is actually not very practical. The 1.5 litre soda bottle holds barely enough sand to write 3 letters and the dust going everywhere will jam the machine after a few minutes of use. But this was a fun project nonetheless!

PS: Thank you dad for letting me use your workshop once again ;)

]]>
SPARKZumo Part 1: Ada and SPARK on Any Platform http://blog.adacore.com/sparkzumo-part-1-ada-and-spark-on-any-platform Wed, 28 Mar 2018 04:00:00 +0000 Rob Tice http://blog.adacore.com/sparkzumo-part-1-ada-and-spark-on-any-platform
Pololu robot with Arduino Uno Rev 3 mounted and SiFive HiFive1

So you want to use SPARK for your next microcontroller project? Great choice! All you need is an Ada 2012 ready compiler and the SPARK tools. But what happens when an Ada 2012 compiler isn’t available for your architecture?

This was the case when I started working on a mini sumo robot based on the Pololu Zumo v1.2

The chassis is complete with independent left and right motors with silicone tracks, and a suite of sensors including an array of infrared reflectance sensors, a buzzer, a 3-axis accelerometer, magnetometer, and gyroscope. The robot’s control interface uses a pin-out and footprint compatible with Arduino Uno-like microcontrollers. This is super convenient, because I can use any Arduino Uno compatible board, plug it into the robot, and be ready to go. But the Arduino Uno is an AVR, and there isn’t a readily available Ada 2012 compiler for AVR… back to the drawing board…

Or…

What if we could still write SPARK code and be able to compile it into some C code. Then use the Arduino compiler to compile and link this code in with the Arduino BSPs and runtimes? This would be ideal because I wouldn’t need to worry about writing a BSP for the board I am using, and I would only have to focus on the application layer. And I can use SPARK! Luckily, AdaCore has a solution for exactly this! 

CCG to the rescue!

The Common Code Generator, or CCG, was developed to solve the issue where an Ada compiler is not available for a specific architecture, but a C compiler is readily available. This is the case for architectures like AVR, PIC, Renesas, and specialized DSPs from companies like TI and Analog Devices. CCG can take your Ada or SPARK code, and “compile” it to a format that the manufacturer’s supplied C compiler can understand. With this technology, we now have all of the benefits of Ada or SPARK on any architecture.

Note that this is not fundamentally different from what’s already happening in a compiler today. Compilation is essentially a series of translations from one language to the other, each one being used for specific optimization or analysis phase. In the case of GNAT for example the process is as follows:

  1. The Ada code is first translated into a simplified version of Ada (called the expanded tree). 

  2. Then into the gcc tree format which is common to all gcc-supported languages.

  3. Then into a format ideal for computing optimizations called gimple. 

  4. Then into a generic assembly language called RTL. 

  5. And finally to the actual target assembler.

With CCG, C becomes one of these intermediate languages, with GNAT taking care of the initial compilation steps and a target compiler taking care of the final ones. One important consequence of this is that the C code is not intended to be maintainable or modified. CCG is not a translator from Ada or SPARK to C, it’s a compiler, or maybe half a compiler.

Ada Compilation Steps

There are some limitations to this though, that are important to know, which are today mostly due to the fact that the technology is very young and targets a subset of Ada. Looking at the limitations more closely, they resemble the limitations imposed by the SPARK language subset on a zero-footprint runtime. I would generally use the zero-footprint runtime in an environment where the BSP and runtime were supplied by a vendor or an RTOS, so this looks like a perfect time to use CCG to develop SPARK code for an Arduino supported board using the Arduino BSP and runtime support.  For a complete list of supported and unsupported constructs you can visit the CCG User’s Guide.

Another benefit I get out of this setup is that I am using the Arduino framework as a hardware abstraction layer. Because I am generating C code and pulling in Arduino library calls, theoretically, I can build my application for many processors without changing my application code. As long as the board is supported by Arduino and is pin compatible with my hardware, my application will run on it!

Abstracting the Hardware

Left to Right: SiFive HiFive1 RISC V board, Arduino Uno Rev 3

For this application I looked at targeting two different architectures, the Arduino Uno Rev 3 which has an ATmega328p on board, and a SiFive HiFive1 which has a Freedom E310 on board. These were chosen because they are pin compatible but are massively different from the software perspective. The ATmega328p is a 16 bit AVR and the Freedom E310 is a 32 bit RISC-V. The system word size isn’t even the same! The source code for the project is located here.

In order to abstract the hardware differences away, two steps had to be taken:

  1. I used a target configuration file to tell the CCG tool how to represent data sizes during the code generation. By default, CCG assumes word sizes based on the default for the host OS. To compile for the 16 bit AVR, I used the target.atp file located in the base directory to inform the tool about the layout of the hardware. The configuration file looks like this:
  2. Bits_BE                       0
    Bits_Per_Unit                 8
    Bits_Per_Word                16
    Bytes_BE                      0
    Char_Size                     8
    Double_Float_Alignment        0
    Double_Scalar_Alignment       0
    Double_Size                  32
    Float_Size                   32
    Float_Words_BE                0
    Int_Size                     16
    Long_Double_Size             32
    Long_Long_Size               64
    Long_Size                    32
    Maximum_Alignment            16
    Max_Unaligned_Field          64
    Pointer_Size                 32
    Short_Enums                   0
    Short_Size                   16
    Strict_Alignment              0
    System_Allocator_Alignment   16
    Wchar_T_Size                 16
    Words_BE                      0
    float         15  I  32  32
    double        15  I  32  32
  3. The bsp folder contains all of the differences between the two boards that were necessary to separate out. This is also where the Arduino runtime calls were pulled into the Ada code. For example, in bsp/wire.ads you can see many pragma Import calls used to bring in the Arduino I2C calls located in wire.h.

In order to tell the project which version of these files to use during the compilation, I created a scenario variable in the main project, zumo.gpr

type Board_Type is ("uno", "hifive");
Board : Board_Type := external ("board", "hifive");

Common_Sources := ("src/**", "bsp/");
Target_Sources := "";
case Board is
   when "uno" =>
      Target_Sources := "bsp/atmega328p";
   when "hifive" =>
      Target_Sources := "bsp/freedom_e310-G000";
end case;

for Source_Dirs use Common_Sources & Target_Sources;

Software Design

Interaction with Arduino Sketch

A typical Arduino application exposes two functions to the developer through the sketch file: setup and loop. The developer would fill in the setup function with all of the code that should be run once at start-up, and then populates the loop function with the actual application programming. During the Arduino compilation, these two functions get pre-processed and wrapped into a main generated by the Arduino runtime. More information about the Arduino build process can be found here.

Because we are using the Arduino runtime we cannot have the actual main entry point for the application in the Ada code (the Arduino pre-processor generates this for us). Instead, we have an Arduino sketch file called SPARKZumo.ino which has the typical Arduino setup() and loop() functions. From setup() we need to initialize the Ada environment by calling the function generated by the Ada binder, sparkzumoinit(). Then, we can call whatever setup sequence we want.

CCG maps Ada package and subprogram namespacing into C-like namespacing, so package.subprogram in Ada would become package__subprogram() in C. The setup function we are calling in the sketch is sparkzumo.setup in Ada, which becomes sparkzumo__setup() after CCG generates the files. The loop function we are calling in the sketch is sparkzumo.workloop in Ada, which becomes sparkzumo__workloop().

Handling Exceptions

Even though we are generating C code from Ada, the CCG tool can still expand the Ada code to include many of the compiler generated checks associated with Ada code before generating the C code. This is very cool because we still have much of the power of the Ada language even though we are compiling to C.

If any of these checks fail at runtime, the __gnat_last_chance_handler is called. The CCG system supplies a definition for what this function should look like, but leaves the implementation up to the developer. For this application, I put the handler implementation in the sketch file, but am calling back into the Ada code from the sketch to perform more actions (like blink LEDs and shut down the motors). If there is a range check failure, or a buffer overflow, or something similar, my __gnat_last_chance_handler will dump some information to the serial port then call back into the Ada code to  shut down the motors, and flash an LED on an infinite loop. We should never need this mechanism because since we are using SPARK in this application, we should be able to prove that none of these will ever occur!

Standard.h file

The minimal runtime that does come with the CCG tool can be found in the installation directory under the adalib folder. Here you will find the C versions of the Ada runtimes files that you would typically find in the adainclude directory.

The important file to know about here is the standard.h file. This is the main C header file that will allow you to map Ada to C constructs. For instance, this header file defines the fatptr construct used under Ada arrays and strings, and other integral types like Natural, Positive, and Boolean.

You can and should modify this file to fit within your build environment. For my application, I have included the Arduino.h at the top to bring in the Arduino type system and constructs. Because the Arduino framework defines things like Booleans, I have commented out the versions defined in the standard.h file so that I am consistent with the rest of the Arduino runtime. You can find the edited version of the standard.h file for this project in the src directory.

Drivers

For the application to interact with all of the sensors available on the robot, we need a layer between the runtime and BSP, and the algorithms. The src/drivers directory contains all of the code necessary to communicate with the sensors and motors. Most of the initial source code for this section was a direct port from the zumo-shield library that was originally written in C++. After porting to Ada, the code was modified to be more robust by refactoring and adding SPARK contracts.

Algorithms

Even though this is a sumo robot, I decided to start with a line follower algorithm for the proof of concept. The source code for the line follower algorithm can be found in src/algos/line_finder. The algorithm was originally a direct port of the Line Follow example in the zumo-shield examples repo.

The C++ version of this algorithm worked ok but wasn’t really able to handle occasions where the line was lost, or the robot came to a fork, or an intersection. After refactoring and adding SPARK features, I added a detection lookup so that the robot could determine what type of environment the sensors were looking at. The choices are: Lost (meaning no line is found), Online (meaning there’s a single line), Fork (two lines diverge), BranchLeft (left turn), BranchRight (right turn), Perpendicular intersection (make a decision to go left or right), or Unknown (no clue what to do, let’s keep doing what we were doing and see what happens next). After detecting a change in state, the robot would make a decision like turn left, or turn right to follow a new line. If the robot was in a Lost state, it would go into a “re-finding” algorithm where it would start to do progressively larger circles.

This algorithm worked ok as well, but was a little strange. Occasionally, the robot would decide to change direction in the middle of a line, or start to take a branch and turn back the other way. The reason for this was that the robot was detecting spurious changes in state and reacting to them instantaneously. We can call this state noise. In order to minimize this state noise, I added a state low-pass filter using a geometric graph filter. 

The Geometric Graph Filter

Example plot of geometric graph filter

If you ask a mathematician they will probably tell you there’s a better way to filter discrete states than this, but this method worked for me! Lets picture mapping 6 points corresponding to the 6 detection states onto a 2d graph, spacing them out evenly along the perimeter of a square. Now, let’s say we have a moving window average with X positions. Each time we get a state reading from the sensors we look up the corresponding coordinate for that state in the graph and add the coordinate to the window. For instance, if we detect a Online state our corresponding coordinate is (15, 15). If we detect a Perpendicular state our coordinate is (-15, 0). And so on. If we average over the window we will end up with a coordinate somewhere in the inside of the square. If we then section off the area of the square into sections, and assign each section to map to the corresponding state, we will then find that our average is sitting in one of those sections that maps to one of our states. 

For an example, let’s assume our window is 5 states wide and we have detected the following list of states (BranchLeft, BranchLeft, Online, BranchLeft, Lost). If we map these to coordinates we get the following window: ((-15, 15), (-15, 15), (15, 15), (-15, 15), (-15, -15)). When we average these coordinates in the window we get a point with the coordinates (-9, 9). If we look at our lookup table we can see that this coordinate is in the BranchLeft polygon.

One issue that comes up here is that when the average point moves closer to the center of the graph, there’s high state entropy, meaning our state can change more rapidly and noise has a higher effect. To solve this, we can hold on to the previous calculated state, and if the new calculated state is somewhere in the center of the graph, we throw away the new calculation and pass along the previous calculation. We don’t purge the average window though so that if we get enough of one state, the average point can eventually migrate out to that section of the graph.  

To avoid having to calculate this geometry every time we get a new state, I generated a lookup table which maps every point in the polygon to a state. All we have to do is calculate the average in the window and do the lookup at runtime. There are some python scripts that are used to generate most of the src/algos/line_finder/geo_filter.ads file. This script also generates a visual of the graph. For more information on these scripts, see part #2 [COMING SOON!!] of this blog post! One issue that I ran into was that I had to use a very small graph which decreased my ability to filter. This is because the amount of RAM I had available on the Arduino Uno was very small. The larger the graph, the larger the lookup table, the more RAM I needed. 

There are a few modifications to this technique that could be done to make it more accurate and more fair. Using a square and only 2 dimensions to map all the states means that the distance between any two states is different than the distance between any other 2 states. For example, it’s easier to switch between BranchLeft and Online than it is to switch between BranchLeft and Fork. For the proof of concept this technique worked well though. 

Future Activity

The code still needs a bit of work to get the IMU sensors up and going. We have another project called the Certyflie which has all of the gimbal calculations to synthesize roll, pitch, and yaw data from an IMU. The Arduino Uno is a bit too weak to perform these calculations properly. One issue is that there is no floating point unit on the AVR. The RISC-V has an FPU and is much more powerful. One option is to add a bluetooth transceiver to the robot and send the IMU data back to a terminal on a laptop for synthesization.

Another issue that came up during this development is that the HiFive board uses level shifters on all of the GPIO lines. The level shifters use internal pull-ups which means that the processor cannot read the reflectance sensors. The reflectance sensor is actually just a capacitor that is discharged when light hits the substrate. So to read the sensor we need to pull the GPIO line high to charge the capacitor then pull it low and read the amount of time it takes to discharge. This will tell us how much light is hitting the sensor. Since the HiFive has the pull ups on the GPIO lines, we can’t pull the line low to read the sensor. Instead we are always charging the sensor. More information about this process can be found on the IR sensor manufacturer’s website under How It Works.

There will be a second post coming soon, which will describe how to actually build this crazy project. There I detail the development of the GPS plugin that I used to build everything and flash the board. As always, the code for the entire project is available here: https://github.com/Robert-Tice/SPARKZumo

Happy Hacking!

]]>
Two Days Dedicated to Sound Static Analysis for Security http://blog.adacore.com/sound-static-analysis-for-security Wed, 14 Mar 2018 15:37:00 +0000 Yannick Moy http://blog.adacore.com/sound-static-analysis-for-security

AdaCore has been working with CEA, Inria and NIST to organize a two-days event dedicated to sound static analysis techniques and tools, and how they are used to increase the security of software-based systems. The program gathers top-notch experts in the field, from industry, government agencies and research institutes, around the three themes of analysis of legacy code, use in new developments and accountable software quality.

The theme "analysis of legacy code" is meant to all those who have to maintain an aging codebase while facing new security threats from the environment. This theme will be introduced by David A. Wheeler whose contributions to security and open source are well known. From the many articles I like from him, I recommend his in-depth analysis of Heartbleed and the State-of-the-Art Resources (SOAR) for Software Vulnerability Detection, Test, and Evaluation, a government official report detailing the tools and techniques for building secure software. David is leading the CII Best Practiced Badge Program to increase security of open source software. The presentations in this theme will touch on analysis of binaries, analysis of C code, analysis of Linux kernel code and analysis of nuclear control systems.

The theme "use in new developments" is meant to all those who start new projects with security requirements. This theme will be introduced by K. Rustan M. Leino, an emblematic researcher in program verification, who has inspired many of the profound changes in the field from his work on ESC/Modula-3 with Greg Nelson, to his work on a comprehensive formal verification environment around the Dafny language, with many others in between: ESC/Java, Spec#, Boogie, Chalice, etc. The presentations in this theme will touch on securing mobile platforms and our critical infrastructure, as well as describing techniques for verifying floating-point programs and more complex requirements.

The theme "accountable software quality" is meant to all those who need to justify the security of their software, either because they have a regulatory oversight or because of commercial/corporate obligations. This theme will be introduced by David Cok, former VP of Technology and Research at GrammaTech, who is well-known for his work on formal verification tools for Java: ESC/Java, ESC/Java2, now OpenJML. The presentations in this theme will touch on what soundness means for static analysis and the demonstrable benefits it brings, the processes around the use of sound static analysis (including the integration between test and proof results), and the various levels of assurance that can be reached.

The event will take place at the National Institute of Standards and Technologies (NIST) at the invitation of researcher Paul Black. Paul co-authored a noticed report last year on Dramatically Reducing Software Vulnerabilities which highlighted sound static analysis as a promising venue. He will introduce the two days of conference with his perspective on the issue.

The workshop will end with tutorials on Frama-C & SPARK given by the technology developers (CEA and AdaCore), so that attendees can have first-hand experience with using the tools. There will be also vendor displays to discuss with techno providers. All in all, a very unique event to attend, especially when you know that, thanks to our sponsors, participation is free! But registration is compulsory. To see the full program and register for the event, see the webpage of the event.

]]>
Secure Software Architectures Based on Genode + SPARK http://blog.adacore.com/secure-software-architectures-based-on-genode-spark Mon, 05 Mar 2018 13:19:00 +0000 Yannick Moy http://blog.adacore.com/secure-software-architectures-based-on-genode-spark

SPARK user Alexander Senier recently presented their use of SPARK for building secure mobile architecture at BOB Konferenz in Germany. What's nice is that they build on the guarantees that SPARK provides at software level, using them to create a secure software architecture based on the Genode operating system framework. At 19:07 in the video he presents 3 interesting architectural designs (policy objects, trusted wrappers, and transient components) that make it possible to build a trustworthy system out of untrustworthy building blocks (like a Web browser or a network stack). Almost as exciting as Alchemy's goal of transforming lead into gold!

Their solution is to design architectures where untrusted components must communicate through trusted ones. They use Genode to enforce the rule that no other communications are allowed and SPARK to make sure that trusted components can really be trusted. You can see an example of an application they build with these technologies at Componolit at 33:37 in the video: a baseband firewall, to protect the Android platform on a mobile device (e.g., your phone) from attacks that get through the baseband processor, which manages radio communications on your mobile.

As the title of the talk says, for security of connected devices in the modern world, we are at a time "when one beyond-mainstream technology is not enough". For more info on what they do, see Componolit website.

]]>
Ada on the micro:bit http://blog.adacore.com/ada-on-the-microbit Mon, 26 Feb 2018 13:26:00 +0000 Fabien Chouteau http://blog.adacore.com/ada-on-the-microbit

The micro:bit is a very small ARM Cortex-M0 board designed by the BBC for computer education. It's fitted with a Nordic nRF51 Bluetooth enabled 32bit ARM microcontroller. At $15 it is one of the cheapest yet most fun piece of kit to start embedded programming.

In this blog post I will explain how to start programming your micro:bit in Ada.

How to set up the Ada development environment for the Micro:Bit

pyOCD programmer

The micro:bit comes with an embedded programming/debugging probe implementing the CMSIS-DAP protocol defined by ARM. In order to use it, you have to install a Python library called pyOCD. Here is the procedure:

On Windows:

Download the binary version of pyOCD from this link:

https://launchpad.net/gcc-arm-embedded-misc/pyocd-binary/pyocd-20150430/+download/pyocd_win.exe

Plug your micro:bit using an USB cable and run pyOCD in a terminal:

C:\Users\UserName\Downloads>pyocd_win.exe -p 1234 -t nrf51822
Welcome to the PyOCD GDB Server Beta Version
INFO:root:Unsupported board found: 9900
INFO:root:new board id detected: 9900000037024e450073201100000021000000009796990
1
INFO:root:board allows 5 concurrent packets
INFO:root:DAP SWD MODE initialised
INFO:root:IDCODE: 0xBB11477
INFO:root:4 hardware breakpoints, 0 literal comparators
INFO:root:CPU core is Cortex-M0
INFO:root:2 hardware watchpoints
INFO:root:GDB server started at port:1234

On Linux (Ubuntu):

Install pyOCD from pip:

$ sudo apt-get install python-pip
$ pip install --pre -U pyocd

pyOCD will need permissions to talk with the micro:bit. Instead of running the pyOCD as privileged user (root), it's better to add a UDEV rules saying that the device is accessible for non-privileged users:

$ sudo sh -c 'echo SUBSYSTEM==\"usb\", ATTR{idVendor}==\"0d28\", ATTR{idProduct}==\"0204\", MODE:=\"666\" > /etc/udev/rules.d/mbed.rules'
$ sudo udevadm control --reload

Now that there's a new UDEV rule and if you already plugged your micro:bit before, you have to unplug it and plug it back again.

To run pyOCD, use the following command:

$ pyocd-gdbserver -S -p 1234
INFO:root:DAP SWD MODE initialised
INFO:root:ROM table #0 @ 0xf0000000 cidr=b105100d pidr=2007c4001
INFO:root:[0]<e00ff000: cidr=b105100d, pidr=4000bb471, class=1>
INFO:root:ROM table #1 @ 0xe00ff000 cidr=b105100d pidr=4000bb471
INFO:root:[0]<e000e000:SCS-M0+ cidr=b105e00d, pidr=4000bb008, class=14>
INFO:root:[1]<e0001000:DWT-M0+ cidr=b105e00d, pidr=4000bb00a, class=14>
INFO:root:[2]<e0002000:BPU cidr=b105e00d, pidr=4000bb00b, class=14>
INFO:root:[1]<f0002000: cidr=b105900d, pidr=4000bb9a3, class=9, devtype=13, devid=0>
INFO:root:CPU core is Cortex-M0
INFO:root:4 hardware breakpoints, 0 literal comparators
INFO:root:2 hardware watchpoints
INFO:root:Telnet: server started on port 4444
INFO:root:GDB server started at port:1234
[...]

Download the Ada Drivers Library

Ada drivers library if a firmware library written in Ada. We currently have support for some ARM Cortex-M microcontrollers like the STM32F4/7 or the nRF51, but also the HiFive1 RISC-V board.

You can download or clone the repository from GitHub:

https://github.com/AdaCore/Ada_Drivers_Library

$ git clone https://github.com/AdaCore/Ada_Drivers_Library

Install the Ada ZFP run-time

In Ada_Drivers_Library, go to the microb:bit example directory and download or clone the run-time from this GitHub repository:https://github.com/Fabien-Chouteau/zfp-nrf51

$ cd Ada_Drivers_Library/examples/MicroBit/
$ git clone https://github.com/Fabien-Chouteau/zfp-nrf51

Install the GNAT ARM ELF toolchain

If you have a GNAT Pro ARM ELF subscription, you can download the  toolchain from your GNATtracker account. Otherwise you can use the Community release of GNAT from this address: https://www.adacore.com/community

Open the example project and build it

Start GNAT Programming studio (GPS) and open the Micro:Bit example project: "Ada_Drivers_Library/examples/MicroBit/microbit_example.gpr".

Press F4 and then press Enter to build the project.

Program and debug the board

Make sure your pyocd session is still running and then in GPS, start a debug session with the top menu "Debug -> Initialize -> main". GPS will start Gdb and connect it to pyOCD.

In the gdb console, use the "load" command to program the board:

(gdb) load
Loading section .text, size 0xbd04 lma 0x0
Loading section .ARM.exidx, size 0x8 lma 0xbd04
[...]

Reset the board with this command:

(gdb) monitor reset

And finally use the "continue" command to run the program:

(gdb) continue

You can interrupt the execution with the "CTRL+backslash" shortcut and then insert breakpoints, step through the application, inspect memory, etc.

Conclusion

That’s it, your first Ada program on the Micro:Bit! If you have an issue with this procedure, please tell us in the comments section below.

Note that the current support is limited but we working on adding tasking support (Ravenscar), improving the library as well as the integration into GNAT Programing Studio, so stay tuned.

In the meantime, here is an example of the kind of project that you can do with Ada on the Micro:Bit

]]>
Tokeneer Fully Verified with SPARK 2014 http://blog.adacore.com/tokeneer-fully-verified-with-spark-2014 Fri, 23 Feb 2018 09:49:00 +0000 Yannick Moy http://blog.adacore.com/tokeneer-fully-verified-with-spark-2014

Tokeneer is a software for controlling physical access to a secure enclave by means of a fingerprint sensor. This software was created by Altran (Praxis at the time) in 2003 using the previous generation of SPARK language and tools, as part of a project commissioned by the NSA to investigate the rigorous development of critical software using formal methods.

The project artefacts, including the source code, were released as open source in 2008. Tokeneer was widely recognized as a milestone in industrial formal verification. Original project artefacts, including the original source code in SPARK 2005, are available here.

We recently transitioned this software to SPARK 2014, and it allowed us to go beyond what was possible with the previous SPARK technology. The initial transition by Altran and AdaCore took place in 2013-2014, when we translated all the contracts from SPARK 2005 syntax (stylized comments in the code) to SPARK 2014 syntax (aspects in the code). But at the time we did not invest the time to fully prove the resulting translated code. This is what we have now completed. The resulting code is available on GitHub. It will also be available in future SPARK releases as one of the distributed examples.

What we did

With a few changes, we went from 234 unproved checks on Tokeneer code (the version originally translated to SPARK 2014), down to 39 unproved but justified checks. The justification is important here: there are limitations to GNATprove analysis, so it is expected that users must sometimes step in and take responsibility for unproved checks.

Using predicates to express constraints

Most of the 39 justifications in Tokeneer code are for string concatenations that involve attribute 'Image. GNATprove currently does not know that S'Image(X), for a scalar type S and a variable X of this type, returns a rather small string (as specified in Ada RM), so it issues a possible range check message when concatenating such an image with any other string. We chose to isolate such calls to 'Image in dedicated functions, with suitable predicates on their return type to convey the information about the small string result. Take for example enumeration type ElementT in audittypes.ads. We define a function ElementT_Image which returns a small string starting at 1 and with length less than 20 as follows:

   function ElementT_Image (X : ElementT) return CommonTypes.StringF1L20 is
      (ElementT'Image (X));
   pragma Annotate (GNATprove, False_Positive,
                    "range check might fail",
                    "Image of enums of type ElementT are short strings starting at index 1");
   pragma Annotate (GNATprove, False_Positive,
                    "predicate check might fail",
                    "Image of enums of type ElementT are short strings starting at index 1");

Note the use of pragma Annotate to justify the range check message and the predicate check message that are generated by GNATprove otherwise. Type StringF1L20 is defined as a subtype of the standard String type with additional constraints expressed as predicates. In fact, we create an intermediate subtype StringF1 of strings that start at index 1 and which are not "super flat", i.e. their last index is at least 0. StringF1L20 inherits from the predicate of StringF1 and adds the constraint that the length of the string is no more than 20:

   subtype StringF1 is String with
     Predicate => StringF1'First = 1 and StringF1'Last >= 0;
   subtype StringF1L20 is StringF1 with
     Predicate => StringF1L20'Last <= 20;

Moving query functions to the spec

Another crucial change was to give visibility to client code over query functions used in contracts. Take for example the API in admin.ads. It defines the behavior of the administrator through subprograms whose contracts use query functions RolePresent, IsPresent and IsDoingOp:

   procedure Logout (TheAdmin :    out T)
     with Global => null,
          Post   => not IsPresent (TheAdmin)
                      and not IsDoingOp (TheAdmin);

The issue was that these query functions, while conveniently abstracting away the details of what it means for the administrator to be present, or to be doing an operation, were defined in the body of package Admin, inside file admin.adb. As a result, the proof of client code of Admin had to consider these calls as blackboxes, which resulted in many unprovable checks. The fix here consisted in moving the definition for the query functions inside the private part of the spec file admin.ads: this way, client code still does not see their implementation, but GNATprove can use these expression functions in proving client code.

   function RolePresent (TheAdmin : T) return PrivTypes.PrivilegeT is
     (TheAdmin.RolePresent);

   function IsPresent (TheAdmin : T) return Boolean is
     (TheAdmin.RolePresent in PrivTypes.AdminPrivilegeT);

   function IsDoingOp (TheAdmin : T) return Boolean is
      (TheAdmin.CurrentOp in OpT);

Using type invariants to enforce global invariants

Some global properties on the version in SPARK 2005 were justified manually, like the global invariant maintained in package Auditlog over the global variables encoding the state of the files used to log operations: CurrentLogFile, NumberLogEntries, UsedLogFiles, LogFileEntries. Here is the text for this justification:

-- Proof Review file for 
--    procedure AuditLog.AddElementToLog

-- VC 6
-- C1:    fld_numberlogentries(state) = (fld_length(fld_usedlogfiles(state)) - 1) 
--           * 1024 + element(fld_logfileentries(state), [fld_currentlogfile(state)
--           ]) .
-- C1 is a package state invariant.
-- proof shows that all public routines that modify NumberLogEntries, UsedLogFiles.Length,
-- CurrentLogFile or LogFileEntries(CurrentLogFile) maintain this invariant.
-- This invariant has not been propogated to the specification since it would unecessarily 
-- complicate proof of compenents that use the facilities from this package.

We can do better in SPARK 2014, by expressing this property as a type invariant. This requires all four variables to become components of the same record type, so that a single global variable LogFiles replaces them:

   type LogFileStateT is record
      CurrentLogFile   : LogFileIndexT  := 1;
      NumberLogEntries : LogEntryCountT := 0;
      UsedLogFiles     : LogFileListT   :=
        LogFileListT'(List   => (others => 1),
                      Head   => 1,
                      LastI  => 1,
                      Length => 1);
      LogFileEntries   : LogFileEntryT  := (others => 0);
   end record
     with Type_Invariant =>
       Valid_NumberLogEntries
         (CurrentLogFile, NumberLogEntries, UsedLogFiles, LogFileEntries);

   LogFiles         : LogFilesT := LogFilesT'(others => File.NullFile)
     with Part_Of => FileState;

With this change, all public subprograms updating the state of log files can now assume the invariant holds on entry (it is checked by GNATprove on every call) and must restore it on exit (it is checked by GNATprove when returning from the subprogram). Locally defined subprograms need not obey this constraint however, which is exactly what is needed here. One subtlety is that some of these local subprograms where accessing the state of log files as global variables. If we had kept LogFiles as a global variable, SPARK rules would have required that its invariant is checked on entry and exit from this subprograms. Instead, we changed the signature of these local subprograms to take LogFiles as an additional parameter, on which the invariant needs not hold.

Other transformations on contracts

A few other transformations were needed to make contracts provable with SPARK 2014. In particular, it was necessary to change a number of "and" logical operations into their short-circuit version "and then". See for example this part of the precondition of Processing in tismain.adb:

       (if (Admin.IsDoingOp(TheAdmin) and
              Admin.TheCurrentOp(TheAdmin) = Admin.OverrideLock)
        then
           Admin.RolePresent(TheAdmin) = PrivTypes.Guard)

The issue was that calling TheCurrentOp requires that IsDoingOp holds:

   function TheCurrentOp (TheAdmin : T) return OpT
     with Global => null,
          Pre    => IsDoingOp (TheAdmin);

Since "and" logical operation evaluates both its operands, TheCurrentOp will also be called in contexts where IsDoingOp does not hold, thus leading to a precondition failure. The fix is simply to use the short-circuit equivalent:

       (if (Admin.IsDoingOp(TheAdmin) and then
              Admin.TheCurrentOp(TheAdmin) = Admin.OverrideLock)
        then
           Admin.RolePresent(TheAdmin) = PrivTypes.Guard)

We also added a few loop invariants that were missing.

What about security?

You can read the original Tokeneer report for a description of the security properties that were provably enforced through formal verification.

To demonstrate that indeed formal verification brings assurance that some security vulnerabilities are not present, we have seeded four vulnerabilities in the code, and reanalyzed it. The analysis of GNATprove (either through flow analysis or proof) detected all four: an information leak, a back door, a buffer overflow and an implementation flaw. You can see that in action in this short 4-minutes video.

]]>
The Road to a Thick OpenGL Binding for Ada: Part 2 http://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada-part-2 Thu, 22 Feb 2018 06:00:00 +0000 Felix Krause http://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada-part-2

This blog post is part two of a tutorial based on the OpenGLAda project and will cover implementation details such as a type system for interfacing with C, error handling, memory management, and loading functions.

If you haven't read part one I encourage you to do so. It can be found here

Wrapping Types

As part of the binding process we noted in the previous blog post that we will need to translate typedefs within the OpenGL C headers into Ada types so that our description of C functions that take arguments or return a value are accurate. Let’s begin with the basic numeric types:

with Interfaces.C;

package GL.Types is
   type Int   is new Interfaces.C.int;      --  GLint
   type UInt  is new Interfaces.C.unsigned; --  GLuint

   subtype Size is Int range 0 .. Int'Last; --  GLsizei

   type Single is new Interfaces.C.C_float; --  GLfloat
   type Double is new Interfaces.C.double;  --  GLdouble
end GL.Types;

We use Single as a name for the single-precision floating point type to avoid confusion with Ada's Standard.Float. Moreover, we can apply Ada’s powerful numerical typing system in our definition of GLsize by defining it with a non-negative range. This affords us some extra compile-time and run-time checks without having to add any conditionals – something not possible in C.

The type list above is, of course, shortened for this post, however, two important types are explicitly declared elsewhere:

  • GLenum, which is used for parameters that take a well-defined set of values specified within the #define directive in the OpenGL header. Since we want to make the Ada interface safe we will use real enumeration types for that.
  • GLboolean, which is an unsigned char representing a boolean value. We do not want to have a custom boolean type in the Ada API because it will not add any value compared to using Ada's Boolean type (unlike e.g. the Int type, which may have a different range than Ada's Integer type).

For these types, we define another package called GL.Low_Level:

with Interfaces.C;

package GL.Low_Level is
   type Bool is new Boolean;

   subtype Enum is Interfaces.C.unsigned;
private
   for Bool use (False => 0, True => 1);
   for Bool'Size use Interfaces.C.unsigned_char'Size;
end GL.Low_Level;

We now have a Bool type that we can use for API imports and an Enum type that we will solely use to define the size of our enumeration types. Note that Bool also is an enumeration type, but uses the size of unsigned_char because that is what OpenGL defines for GLboolean.

To show how we can wrap GLenum into actual Ada enumeration types, lets examine glGetError which is defined like this in the C header:

GLenum glGetError(void);

The return value is one of several error codes defined as preprocessor macros in the header. We translate these into an Ada enumeration then wrap the subprogram resulting in the following:

package GL.Errors is
   type Error_Code is
     (No_Error, Invalid_Enum, Invalid_Value, Invalid_Operation,
      Stack_Overflow, Stack_Underflow, Out_Of_Memory,
      Invalid_Framebuffer_Operation);

   function Error_Flag return Error_Code;
private
   for Error_Code use
     (No_Error                      => 0,
      Invalid_Enum                  => 16#0500#,
      Invalid_Value                 => 16#0501#,
      Invalid_Operation             => 16#0502#,
      Stack_Overflow                => 16#0503#,
      Stack_Underflow               => 16#0504#,
      Out_Of_Memory                 => 16#0505#,
      Invalid_Framebuffer_Operation => 16#0506#);
   for Error_Code'Size use Low_Level.Enum'Size;
end GL.Errors;

With the above code we encode the errors defined in the C header as representations for our enumeration values - this way, our safe enumeration type has the exact same memory layout as the defined error codes and maintains compatibility.

We then add the backend for Error_Flag as import to GL.API:

function Get_Error return Errors.Error_Code;
pragma Import (StdCall, Get_Error, "glGetError");

Error Handling

The OpenGL specification states that whenever an error arises while calling a function of the API, an internal error flag gets set. This flag can then be retrieved with the function glGetError we wrapped above.

It would certainly be nicer, though, if these API calls would raise Ada exceptions instead, but this would mean that in every wrapper to an OpenGL function that may set the error flag we'd need to call Get_Error, and, when the returned flag is something other than No_Error, we'd subsequently need to raise the appropriate exception. Depending on what the user does with the API, this may lead to significant overhead (let us not forget that OpenGL is much more performance-critical than it is safety-critical). In fact, more recent graphics API’s like Vulkan have debugging extensions which require manual tuning to receive error messages - in other words, due to overhead, Vulkan turns off all error checking by default.

So, what we will provide is a feature that auto-raises exceptions whenever the error flag is set, but make it optional. To achieve this, Ada exceptions derived from OpenGL’s error flags need to be defined.

Let’s add the following exception definitions to GL.Errors:

Invalid_Operation_Error             : exception;
Out_Of_Memory_Error                 : exception;
Invalid_Value_Error                 : exception;
Stack_Overflow_Error                : exception;
Stack_Underflow_Error               : exception;
Invalid_Framebuffer_Operation_Error : exception;
Internal_Error                      : exception;

Notice that the exceptions carry the same names as the corresponding enumeration values in the same package. This is not a problem because Ada is intelligent enough to know which one of the two we want depending on context. Also notice the exception Internal_Error which does not correspond to any OpenGL error – we'll see later what we need it for.

Next, we need a procedure that queries the error flag and possibly raises the appropriate exception. Since we will be using such a procedure almost everywhere in our wrapper let’s declare it in the private part of the GL package so that all of GL's child packages have access:

procedure Raise_Exception_On_OpenGL_Error;

And in the body:

procedure Raise_Exception_On_OpenGL_Error is separate;

Here, we tell Ada that this procedure is defined in a separate compilation unit enabling us to provide different implementations depending on whether the user wants automatic exception raising to be enabled or not. Before we continue though let’s set up our project with this in mind:

library project OpenGL is
   --  Windowing_System config omitted

   type Toggle_Type is ("enabled", "disabled");
   Auto_Exceptions : Toggle_Type := external ("Auto_Exceptions", "enabled");

   OpenGL_Sources := ("src");
   case Auto_Exceptions is
      when "enabled" =>
         OpenGL_Sources := OpenGL_Sources & "src/auto_exceptions";
      when "disabled" =>
         OpenGL_Sources := OpenGL_Sources & "src/no_auto_exceptions";
   end case;
   for Source_Dirs use OpenGL_Sources;

   --  packages and other things omitted
end OpenGL;

To conform with the modifications made to the project file we must now create two new directories inside the src folder and place the implementations of our procedure accordingly. GNAT expects the source files to both be named gl-raise_exception_on_openl_error.adb. The implementation of no_auto_exceptions is trivial:

separate (GL)
procedure Raise_Exception_On_OpenGL_Error is
begin
   null;
end Raise_Exception_On_OpenGL_Error;

And the one in auto_exceptions looks like this:

with GL.Errors;

separate (GL)
procedure Raise_Exception_On_OpenGL_Error is
begin
   case Errors.Error_Flag is
      when Errors.Invalid_Operation             => raise Errors.Invalid_Operation_Error;
      when Errors.Invalid_Value                 => raise Errors.Invalid_Value_Error;
      when Errors.Invalid_Framebuffer_Operation => raise Errors.Invalid_Framebuffer_Operation_Error;
      when Errors.Out_Of_Memory                 => raise Errors.Out_Of_Memory_Error;
      when Errors.Stack_Overflow                => raise Errors.Stack_Overflow_Error;
      when Errors.Stack_Underflow               => raise Errors.Stack_Underflow_Error;
      when Errors.Invalid_Enum                  => raise Errors.Internal_Error;
      when Errors.No_Error                      => null;
   end case;
exception
   when Constraint_Error => raise Errors.Internal_Error;
end Raise_Exception_On_OpenGL_Error;

The exception section at the end is used to detect cases where glGetError returns a value we did not know of at the time of implementing this wrapper. Ada would then try to map this value to the Error_Code enumeration, and since the value does not correspond to any value specified in the type definition, the program will raise a Constraint_Error. Of course, OpenGL is very conservative about adding error flags, so this is unlikely to happen, but it is still nice to plan for the future.

Types Fetching Function Pointers at Runtime

Part 1: Implementing the "Fetching" Function

As previously noted, many functions from the OpenGL API must be retrieved as a function pointer at run-time instead of linking to them at compile-time. The reason for this once again comes down to the concept of graceful degradation -- if some functionality exists as an extension (especially functions not part of the OpenGL core) but is unimplemented by a target graphics card driver then the programmer will be able to identify or recognize this case when setting the relevant function pointers during execution. Unfortunately though, this creates an extra step which prevents us from simply importing the whole of the API, and, worse still, on Windows no functions being defined on OpenGL 2.0 or later are available for compile time linking, making programmatic queries required.

So then, the question arises: how will these function pointers are to be retrieved? Sadly, this functionality is not available from within the OpenGL API or driver, but instead is provided by platform-specific extensions, or more specifically, the windowing system supporting OpenGL. So, as with exception handling, we will use a procedure with multiple implementations and switch to the appropriate implementation via GPRBuild:

case Windowing_System is
   when "windows" => OpenGL_Sources := OpenGL_Sources & "src/windows";
   when "x11"     => OpenGL_Sources := OpenGL_Sources & "src/x11";
   when "quartz"  => OpenGL_Sources := OpenGL_Sources & "src/mac";
end case;

...and we declare this function in the main source:

function GL.API.Subprogram_Reference (Function_Name : String)
  return System.Address;

Then finally, in the windowing-system specific folders, we place the implementation and necessary imports from the windowing system's API. Those imports and the subsequent implementations are not very interesting, so I will not discuss them at length here, but I will show you the implementation for Apple's Mac operating system to give you an idea:

with GL.API.Mac_OS_X;

function GL.API.Subprogram_Reference (Function_Name : String)
  return System.Address is

   -- OSX-specific implementation uses CoreFoundation functions
   use GL.API.Mac_OS_X;

   package IFC renames Interfaces.C.Strings;

   GL_Function_Name_C : IFC.chars_ptr := IFC.New_String (Function_Name);

   Symbol_Name : constant CFStringRef :=
     CFStringCreateWithCString
       (alloc    => System.Null_Address,
       cStr     => GL_Function_Name_C,
       encoding => kCFStringEncodingASCII);

   Result : constant System.Address :=
     CFBundleGetFunctionPointerForName
       (bundle      => OpenGLFramework,
       functionName => Symbol_Name);
begin
   CFRelease (Symbol_Name);
   IFC.Free (GL_Function_Name_C);
   return Result;
end GL.API.Subprogram_Reference;

With the above code in effect, we are now able to retrieve the function pointers, however, we still need to implement the querying machinery to which there are three possible approaches:

  • Lazy: When a feature is first needed, its corresponding function pointer is loaded and stored for future use. This approach to loading may produce the least amount of work needed to be done by the resulting application, although, theoretically, it makes performance of a call unpredictable. Since fetching function pointers is fairly trivial operation, however, this is not really a necessarily practical reason against this.
  • Eager: At some defined point in time, a call gets issued to a loading function for every function pointer that is supported by OpenGLAda. The Eager approach produces the largest amount of work for the resulting application, but again, since loading is trivial it does not noticeably slow down the application (and, even if it did, it would so during initialization where it is most tolerable).
  • Explicit: The user is required to specify which features they want to use and we only load the function pointers related to such features. Explicit loading places the heaviest burden on the user, since they must state which features they will be using.

Overall, the consequences of choosing one of these three possibilities are mild, so we will go with the one easiest to implement, which is the eager approach and is the same one used by many other popular OpenGL libraries.

Part 2: Autogenerating the Fetching Implementation

For each OpenGL function we import that must be loaded at runtime we need to create three things:

  • The definition of an access type describing the function's parameters and return types.
  • A global variable having this type to hold the function pointer as soon as it gets loaded.
  • A call to a platform-specific function which will return the appropriate function pointer from a DLL or library for storage into our global function pointer.

Implementing these segments for each subprogram is a very repetitive task, which hints to the possibility of automating it. To check whether this is feasible, let’s go over the actual information we need to write in each of these code segments for an imported OpenGL function:

  • The Ada subprogram signature
  • The name of the C function we import

As you can see, this is almost exactly the same information we would need to write an imported subprogram loaded at compile time! To keep all information about imported OpenGL function centralized, let’s craft a simple specification format where we may list all this information for each subprogram.

Since we need to define Ada subprogram signatures, it seems a good idea to use Ada-like syntax (like GPRBuild does for its project files). After writing a small parser (I will not show details here since that is outside the scope of this post), we can now process a specification file looking like the following. We will discuss the package GL.Objects.Shaders and more about what it does in a bit.

with GL.Errors;
with GL.Types;
with GL.Objects.Shaders;

spec GL.API is
   use GL.Types;

   function Get_Error return Errors.Error_Code with Implicit => "glGetError";
   procedure Flush with Implicit => "glFlush";

   function Create_Shader
     (Shader_Type : Objects.Shaders.Shader_Type) return UInt
   with
     Explicit => "glCreateShader";
end GL.API;

This specification contains two imports we have already created manually and one new import – in this case we use Create_Shader as an example for a subprogram that needs to be loaded via function pointer. We use Ada 2012-like syntax for specifying the target link name with aspects and the import mode. There are two import modes:

  • Implicit - meaning that the subprogram will be imported via pragmas. This will give us a subprogram declaration that will be bound to its implementation by the dynamic library loader. So it happens implicitly and we do not actually need to write any code for it. This is what we previously did in our import of glFlush in part one.
  • Explicit - meaning that the subprogram will be provided as a function pointer variable. We will need to generate code that assigns a proper value to that variable at runtime in this case.

Processing this specification will generate us the following Ada subunits:

with GL.Errors;
with GL.Types;

private package GL.API is
   use GL.Types;

   type T1 is access function (P1 : Objects.Shaders.Shader_Type) return UInt;
   pragma Convention (StdCall, T1);

   function Get_Error return Errors.Error_Code;
   pragma Import (StdCall, Get_Error, "glGetError");

   procedure Flush;
   pragma Import (StdCall, Flush, "glFlush");

   Create_Shader : T1;
end package GL.API;

--  ---------------

with System;
with Ada.Unchecked_Conversion;
private with GL.API.Subprogram_Reference;
procedure GL.Load_Function_Pointers is
   use GL.API;

   generic
      type Function_Reference is private;
   function Load (Function_Name : String) return Function_Reference;
   
   function Load (Function_Name : String) return Function_Reference is
      function As_Function_Reference is
        new Ada.Unchecked_Conversion
              (Source => System.Address,
               Target => Function_Reference);

      Raw : System.Address := Subprogram_Reference (Function_Name);
   begin
      return As_Function_Reference (Raw);
   end Load;

   function Load_T1 is new Load (T1);
begin
   GL.API.Create_Shader := Load_T1 ("glCreateShader");
end GL.Load_Function_Pointers;

Notice how our implicit subprograms get imported like before, but for the explicit subprogram, a type T1 got created as an access type to the subprogram, and a global variable Create_Shader is defined to be of this type - satisfying all of our needs.

The procedure GL.Load_Function_Pointers contains the code to fill this variable with the right value by obtaining a function pointer using the platform-specific implementation discussed above. The generic load function exists so that additional function pointers can be loaded using this same code.

The only thing left to do is to expose this functionality in the public interface like the example below:

package GL is
   --  ... other code

   procedure Init;

   --  ... other code
end GL;

--  ------

with GL.Load_Function_Pointers;

package body GL is
   --  ... other code

   procedure Init renames GL.Load_Function_Pointers;
  
   --  ... other code
end GL;

Of course, we now require the user to explicitly call Init somewhere in their code... You might think that we could automatically execute the loading code at package initialization, but this would not work, because some OpenGL implementations (most prominently the one on Windows) will refuse to load any OpenGL function pointers unless there is a current OpenGL context. This context will only exist after we created an OpenGL surface to render on, which will be done programmatically by the user.

In practice, OpenGLAda includes a binding to the GLFW library as a platform-independent way of creating windows with an OpenGL surface on them, and this binding automatically calls Init whenever a window is made current (i.e. placed in foreground), so that the user does not actually need to worry about it. However, there may be other use-cases that do not employ GLFW, like, for example, creating an OpenGL surface widget with GtkAda. In that case, calling Init manually is still required given our design.

Memory Management

The OpenGL API enables us to create various objects that reside in GPU memory for things like textures or vertex buffers. Creating such objects gives us an ID (kind of like a memory address) which we can then use to refer to the object instead of a memory address. To avoid memory leaks, we will want to manage these IDs automatically in our Ada wrapper so they are automatically destroyed once the last reference vanishes. Ada’s Controlled types are an ideal candidate for the job. Let's start writing a package GL.Objects to encapsulate the functionality:

package GL.Objects is
   use GL.Types;

   type GL_Object is abstract tagged private;

   procedure Initialize_Id (Object : in out GL_Object);

   procedure Clear (Object : in out GL_Object);

   function Initialized (Object : GL_Object) return Boolean;
   
   procedure Internal_Create_Id
     (Object : GL_Object; Id : out UInt) is abstract;

   procedure Internal_Release_Id
     (Object : GL_Object; Id : UInt) is abstract;
private
   type GL_Object_Reference;
   type GL_Object_Reference_Access is access all GL_Object_Reference;

   type GL_Object_Reference is record
      GL_Id           : UInt;
      Reference_Count : Natural;
      Is_Owner        : Boolean;
   end record;

   type GL_Object is abstract new Ada.Finalization.Controlled with record
      Reference : GL_Object_Reference_Access := null;
   end record;

   -- Increases reference count.
   overriding procedure Adjust (Object : in out GL_Object);

   -- Decreases reference count. Destroys texture when it reaches zero.
   overriding procedure Finalize (Object : in out GL_Object);
end GL.Objects;   

GL_Object is our smart pointer here, and GL_Object_Reference is the holder of the object's ID as well as the reference count. We will derive the actual object types (which there are quite a few) from GL_Object so that the base type can be abstract and we can define some subprograms that must be overridden by the child types to enforce the rule. Note that since the class hierarchy is based on GL_Object, all derived types have an identically-typed handle to a GL_Object_Reference object, and thus, our reference-counting is independent of the actual derived type.

The only thing the derived type must declare in order for our automatic memory management to work is how to create and delete the OpenGL object in GPU memory – this is what Internal_Create_Id and Internal_Release_Id in the above segment are for. Because they are abstract, they must be put into the public part of the package even though they should never be called by the user directly.

The core of our smart pointer machinery will be implemented in the Adjust and Finalize procedures provided by Ada.Finalization.Controlled. Since this topic has already been extensively covered in this Ada Gem I am going to skip over the gory implementation details.

So, to create a new OpenGL object the user must call Initialize_Id on a smart pointer which assigns the ID of the newly created object to the smart pointer's backing object. Clear can then later be used to make the smart pointer uninitialized again (but only delete the object if the reference count reaches zero).

To test our system, let's implement a Shader object. Shader objects will hold source code and compiled binaries of GLSL (GL Shading Language) shaders. We will call this package GL.Objects.Shaders in keeping with the rest of the project's structure:

package GL.Objects.Shaders is
   pragma Preelaborate;

   type Shader_Type is
     (Fragment_Shader,
      Vertex_Shader,
      Geometry_Shader,
      Tess_Evaluation_Shader,
      Tess_Control_Shader);

   type Shader (Kind : Shader_Type) is new GL_Object with private;

   procedure Set_Source (Subject : Shader; Source : String);

   procedure Compile (Subject : Shader);

   procedure Release_Shader_Compiler;

   function Compile_Status (Subject : Shader) return Boolean;

   function Info_Log (Subject : Shader) return String;

private
   type Shader (Kind : Shader_Type) is new GL_Object with null record;

   overriding
   procedure Internal_Create_Id (Object : Shader; Id : out UInt);

   overriding
   procedure Internal_Release_Id (Object : Shader; Id : UInt);

   for Shader_Type use
     (Fragment_Shader        => 16#8B30#,
      Vertex_Shader          => 16#8B31#,
      Geometry_Shader        => 16#8DD9#,
      Tess_Evaluation_Shader => 16#8E87#,
      Tess_Control_Shader    => 16#8E88#);

   for Shader_Type'Size use Low_Level.Enum'Size;
end GL.Objects.Shaders;

The two overriding procedures are implemented like this:

overriding
procedure Internal_Create_Id (Object : Shader; Id : out UInt) is
begin
   Id := API.Create_Shader (Object.Kind);
   Raise_Exception_On_OpenGL_Error;
end Internal_Create_Id;

overriding
procedure Internal_Release_Id (Object : Shader; Id : UInt) is
   pragma Unreferenced (Object);
begin
   API.Delete_Shader (Id);
   Raise_Exception_On_OpenGL_Error;
end Internal_Release_Id;

Of course, we need to add the subprogram Delete_Shader to our import specification so it will be available in the generated GL.API package. A nice thing is that, in Ada, pointer dereference is often done implicitly so we need not worry whether Create_Shader and Delete_Shader are loaded via function pointers or with the dynamic library loader – the code would look exactly the same in both cases!

Documentation

One problem we did not yet address is documentation. After all, because we are adding structure and complexity to the OpenGL API, which does not exist in its specification, how is a user supposed to find the wrapper of a certain OpenGL function they want to use?

What we need to do, then, is generate a list where the name of each OpenGL function we wrap is listed and linked to its respective wrapper function in OpenGLAda's API. Of course, we do not want to generate that list manually. Instead, let’s use our import specification again and enrich it with additional information:

   function Get_Error return Errors.Error_Code with
     Implicit => "glGetError",  Wrapper => "GL.Errors.Error_Flag";
   procedure Flush with
     Implicit => "glFlush", Wrapper => "GL.Flush";

With the new "aspect-like" declarations in our template we can enhance our generator with code that writes a Markdown file listing all imported OpenGL functions and linking that to their wrappers. In theory, we could even avoid adding the wrapper information explicitly by analyzing OpenGLAda's code to detect which subprogram wraps the OpenGL function. Tools like ASIS and LibAdaLang would help us with that, but that implementation would be far more work than adding our wrapper references explicitly.

The generated list can be seen on OpenGLAda's website showing all the functions that are actually supported. It is intended to be navigated via search (a.k.a. Ctrl+F).

Conclusion

By breaking down the complexities of a large C API like OpenGL, we have gone through quite a few improvements that can be done when creating an Ada binding. Some of them were not so obvious and probably not necessary for classifying a binding as thick - for example, auto-loading our function pointers at run-time was simply an artifact of supporting OpenGL and not covered inside the scope of the OpenGL API itself.

We also discovered that when wrapping a C API in Ada we must lift the interface to a higher level since Ada is indeed designed to be a higher-level language than C, and, in this vein, it was natural to add features that are not part of the original API to make it fit more at home in an Ada context.

It might be tempting to write a thin wrapper for your Ada project to avoid overhead, but beware - you will probably still end up writing a thick wrapper. After all, the code around calls that facilitates thinly wrapped functions and the need for data conversions does not simply vanish!

Of course, all this is a lot of work! To give you some numbers: The OpenGLAda repository contains 15,874 lines of Ada code (excluding blanks and comments, tests, and examples) while, for comparison, the C header gl.h (while missing many key features) is only around 3,000 lines.

]]>
For All Properties, There Exists a Proof http://blog.adacore.com/for-all-properties-there-exists-a-proof Mon, 19 Feb 2018 10:15:00 +0000 Yannick Moy http://blog.adacore.com/for-all-properties-there-exists-a-proof

With the recent addition of a Manual Proof capability in SPARK 18, it is worth looking at an example which cannot be proved by automatic provers, to see the options that are available for proving it with SPARK. The following code is such an example, where the postcondition of Do_Nothing cannot be proved with provers CVC4 or Z3, although it is exactly the same as its precondition:

   subtype Index is Integer range 1 .. 10;
   type T1 is array (Index) of Integer;
   type T2 is array (Index) of T1;

   procedure Do_Nothing (Tab : T2) with
     Ghost,
     Pre  => (for all X in Index => (for some Y in Index => Tab(X)(Y) = X + Y)),
     Post => (for all X in Index => (for some Y in Index => Tab(X)(Y) = X + Y));

   procedure Do_Nothing (Tab : T2) is null;

The issue is that SMT provers that we use in SPARK like CVC4 and Z3 do not recognize the similarity between the property assumed here (the precondition) and the property to prove (the postcondition). To such a prover, the formula to prove (the Verification Condition or VC) looks like the following in SMTLIB2 format:

(declare-sort integer 0)
(declare-fun to_rep (integer) Int)
(declare-const tab (Array Int (Array Int integer)))
(assert
  (forall ((x Int))
  (=> (and (<= 1 x) (<= x 10))
    (exists ((y Int))
      (and (and (<= 1 y) (<= y 10))
        (= (to_rep (select (select tab x) y)) (+ x y)))))))
(declare-const x Int)
(assert (<= 1 x))
(assert (<= x 10))
(assert
  (forall ((y Int))
    (=> (and (<= 1 y) (<= y 10))
       (not (= (to_rep (select (select tab x) y)) (+ x y))))))
(check-sat)

We see here some of the encoding from SPARK programming language to SMTLIB2 format: the standard integer type Integer is translated into an abstract type integer, with a suitable projection to_rep from this abstract type to the standard Int type of mathematical integers in SMTLIB2; the array types T1 and T2 are translated into SMTLIB2 Array types. The precondition, which is assumed here, is directly transformed into a universally quantified axiom (starting with "forall"), while the postcondition is negated and joined with the other hypotheses, as an SMT solver will try to deduce an inconsistency to prove the goal by contradiction. So the negated postcondition becomes:

   (for some X in Index => (for all Y in Index => not (Tab(X)(Y) = X + Y)));

The existentially quantified variable X becomes a constant x in the VC, with assertions stating its bounds 1 and 10, and the universal quantification becomes another axiom.

Now it is useful to understand how SMT solvers deal with universally quantified axioms. Obviously, they cannot "try out" every possible value of parameters. Here, the quantified variable ranges over all mathematical integers! And in general, we may quantify over values of abstract types which cannot be enumerated. Instead, SMT solvers find suitable "candidates" for instantiating the axioms. The main technique to find such candidates is called trigger-based instantiation. The SMT solver identifies terms in the quantified axiom that contain the quantified variables, and match them with the so-called "ground" terms in the VC (terms that do not contain quantified or "bound" variables). Here, such a term containing x in the first axiom is (to_rep (select (select tab x) y)), or simply (select tab x), while in the second axiom such a term containing y could be (to_rep (select (select tab x) y)) or (select (select tab x) y). The issue with the VC above is that these do not match any ground term, hence neither CVC4 nor Z3 can prove the VC.

Note that Alt-Ergo is able to prove the VC, using the exact same trigger-based mechanism, because it considers (select tab x) from the second axiom as a ground term in matching. Alt-Ergo uses this term to instantiate the first axiom, which in turn provides the term (select (select tab x) sko_y) [where sko_y is a fresh variable corresponding to the skolemisation of the existentially quantified variable y]. Alt-Ergo then uses this new term to instantiate the second axiom, resulting in a contradiction. So Alt-Ergo can deduce that the VC is unsatisfiable, hence proves the original (non-negated) postcondition.

I am going to consider in the following alternative means to prove such a property, when all SMT provers provided with SPARK fail.

solution 1 - use an alternative automatic prover

As the property to prove is an exact duplication of a known property in hypothesis, a different kind of provers, called provers by resolution, is a perfect fit. Here, I'm using E prover, but many others are supported by the Why3 platform used in SPARK, and would be as effective. The first step is to install E prover from its website (www.eprover.org) or from its integration in your Linux distro. Then, you need to run the executable why3config to generate a suitable .why3.conf configuration file in your HOME directory, with the necessary information for Why3 to know how to generate VCs for E prover, and how to call it. Currently, GNATprove cannot be called with --prover=eprover, so instead I called directly the underlying Why3 tool and it proves the desired postcondition:

$ why3 prove -L /path/to/theories -P Eprover quantarrays.mlw
quantarrays.mlw Quantarrays__subprogram_def WP_parameter def : Valid (0.02s)

solution 2 - prove interactively

With SPARK 18 comes the possibility to prove a VC interactively inside the editor GPS. Just right-click on the message about the unproved postcondition and select "Start Manual Proof". Various panels are opened in GPS:

Manual Proof inside GPS

Here, the manual proof is really simple. We start by applying axiom H, as the conclusion of this axiom matches the goal to prove, which makes it necessary to prove the conditions for applying axiom H. Then we use the known bounds on X in axioms H1 and H2 to prove the conditions. And we're done! The following snapshot shows that GPS now confirms that the VC has been proved:

Manual Proof inside GPS

Note that it is possible to call an automatic prover by its name, like "altergo", "cvc4", or "z3" to prove the VC automatically after the initial application of axiom H.

solution 3 - use an alternative interactive prover

It is also possible to use powerful external interactive provers like Coq or Isabelle. You first need to install these on your machine. GNATprove and GPS are directly integrated with Coq, so that you can right-click on the unproved postcondition, select "Prove Check", then manually enter the switch "--prover=coq" to select Coq prover. GPS will then open CoqIDE on the VC as follows:

Start Coq Proof inside CoqIDE (called from GPS)

The proof in Coq is as simple as before. Here is the exact set of tactics to apply to reproduce what we did with manual proof in GPS:

Coq Proof inside CoqIDE (called from GPS)

Note that the tactic "auto" in Coq proves this VC automatically.

What to Remember

There are many ways forward that are available when automatic provers available with GNATprove fail to prove a property. We already presented in various occasions the use of ghost code. Here we described three other ways: using an alternative automatic prover, proving interactively, and using an alternative interactive prover.

[cover image of Kurt Gödel, courtesy of WikiPedia, who demonstrated in fact that no all true properties can be ever proved]

]]>
Bitcoin blockchain in Ada: Lady Ada meets Satoshi Nakamoto http://blog.adacore.com/bitcoin-in-ada Thu, 15 Feb 2018 13:00:00 +0000 Johannes Kanig http://blog.adacore.com/bitcoin-in-ada

Bitcoin is getting a lot of press recently, but let's be honest, that's mostly because a single bitcoin worth 800 USD in January 2017 was worth almost 20,000 USD in December 2017. However, bitcoin and its underlying blockchain are beautiful technologies that are worth a closer look. Let’s take that look with our Ada hat on!

So what's the blockchain?

“Blockchain” is a general term for a database that’s maintained in a distributed way and is protected against manipulation of the entries; Bitcoin is the first application of the blockchain technology, using it to track transactions of “coins”, which are also called Bitcoins.

Conceptually, the Bitcoin blockchain is just a list of transactions. Bitcoin transactions in full generality are quite complex, but as a first approximation, one can think of a transaction as a triple (sender, recipient, amount), so that an initial mental model of the blockchain could look like this:

SenderRecipientAmount
<Bitcoin address><Bitcoin address>0.003 BTC
<Bitcoin address><Bitcoin address>0.032 BTC
.........

Other data, such as how many Bitcoins you have, are derived from this simple transaction log and not explicitly stored in the blockchain.

Modifying or corrupting this transaction log would allow attackers to appear to have more Bitcoins than they really have, or, allow them to spend money then erase the transaction and spend the same money again. This is why it’s important to protect against manipulation of that database.

The list of transactions is not a flat list.  Instead, transactions are grouped into blocks. The blockchain is a list of blocks, where each block has a link to the previous block, so that a block represents the full blockchain up to that point in time:

Thinking as a programmer, this could be implemented using a linked list where each block header contains a prev pointer.  The blockchain is grown by adding new blocks to the end, with each new block pointing to the former previous block, so it makes more sense to use a prev pointer instead of a next pointer.  In a regular linked list, prev pointer points directly to the memory used for the previous block. But the uniqueness of the blockchain is that it's a distributed data structure; it's maintained by a network of computers or nodes. Every bitcoin full node has a full copy of the blockchain, but what happens if members of the network don't agree on the contents of some transaction or block? A simple memory corruption or malicious act could result in a client having incorrect data.  This is why the blockchain has various checks built-in that guarantee that corruption or manipulation can be detected.

How does Bitcoin check data integrity?

Bitcoin’s internal checks are based on a cryptographic hash function. This is just a fancy name for a function that takes anything as input and spits out a large number as output, with the following properties:

  • The output of the function varies greatly and unpredictably even with tiny variations of the input;

  • It is extremely hard to deduce an input that produces some specific output number, other than by using brute force; that is, by computing the function again and again for a large number of inputs until one finds the input that produces the desired output.

The hash function used in Bitcoin is called SHA256.  It produces a 256-bit number as output, usually represented as 64 hexadecimal digits. Collisions (different input data that produces the same output hash value) are theoretically possible, but the output space is so big that collisions on actual data are considered extremely unlikely, in fact practically impossible.

The idea behind the first check of Bitcoin's data integrity is to replace a raw pointer to a memory region with a “safe pointer” that can, by construction, only point to data that hasn’t been tampered with. The trick is to use the hash value of the data in the block as the “pointer” to the data. So instead of a raw pointer, one stores the hash of the previous block as prev pointer:

Here, I’ve abbreviated the 256-bit hash values by their first two and last four hex digits – by design, Bitcoin block hashes always start with a certain number of leading zeroes. The first block contains a "null pointer" in the form of an all zero hash.

Given a hash value, it is infeasible to compute the data associated with it, so one can't really "follow" a hash like one can follow a pointer to get to the real data.  Therefore, some sort of table is needed to store the data associated with the hash value.

Now what have we gained? The structure can no longer easily be modified. If someone modifies any block, its hash value changes, and all existing pointers to it are invalidated (because they contain the wrong hash value). If, for example, the following block is updated to contain the new prev pointer (i.e., hash), its own hash value changes as well. The end result is that the whole data structure needs to be completely rewritten even for small changes (following prev pointers in reverse order starting from the change). In fact such a rewrite never occurs in Bitcoin, so one ends up with an immutable chain of blocks. However, one needs to check (for example when receiving blocks from another node in the network) that the block pointed to really has the expected hash. 

Block data structure in Ada

To make the above explanations more concrete, let's look at some Ada code (you may also want to have bitcoin documentation available).

A bitcoin block is composed of the actual block contents (the list of transactions of the block) and a block header. The entire type definition of the block looks like this (you can find all code in this post plus some supporting code in this github repository):

   type Block_Header is record
      Version : Uint_32;
      Prev_Block : Uint_256;
      Merkle_Root : Uint_256;
      Timestamp : Uint_32;
      Bits : Uint_32;
      Nonce : Uint_32;
   end record;

   type Transaction_Array is array (Integer range <>) of Uint_256;

   type Block_Type (Num_Transactions : Integer) is record
      Header : Block_Header;
      Transactions : Transaction_Array (1 .. Num_Transactions);
   end record;

As discussed, a block is simply the list of transactions plus the block header which contains additional information. With respect to the fields for the block header, for this blog post you only need to understand two fields:

  • Prev_Block a 256-bit hash value for the previous block (this is the prev pointer I mentioned before)

  • Merkle_Root a 256-bit hash value which summarizes the contents of the block and guarantees that when the contents change, the block header changes as well. I will explain how it is computed later in this post.

The only piece of information that’s missing is that Bitcoin usually uses the SHA256 hash function twice to compute a hash. So instead of just computing SHA256(data), usually SHA256(SHA256(data)) is computed. One can write such a double hash function in Ada as follows, using the GNAT.SHA256 library and String as a type for a data buffer (we assume a little-endian architecture throughout the document, but you can use the GNAT compiler’s Scalar_Storage_Order feature to make this code portable):

with GNAT.SHA256; use GNAT.SHA256;

   function Double_Hash (S : String) return Uint_256 is
      D : Binary_Message_Digest := Digest (S);
      T : String (1 .. 32);
      for T'Address use D'Address;
      D2 : constant Binary_Message_Digest := Digest (T);

      function To_Uint_256 is new Ada.Unchecked_Conversion
        (Source => Binary_Message_Digest,
         Target => Uint_256);
   begin
      return To_Uint_256 (D2);
   end Double_Hash;

The hash of a block is simply the hash of its block header. This can be expressed in Ada as follows (assuming that the size in bits of the block header, Block_Header’Size in Ada, is a multiple of 8):

   function Block_Hash (B : Block_Type) return Uint_256 is
      S : String (1 .. Block_Header'Size / 8);
      for S'Address use B.Header'Address;
   begin
      return Double_Hash (S);
   end Block_Hash;

Now we have everything we need to check the integrity of the outermost layer of the blockchain. We  simply iterate over all blocks and check that the previous block indeed has the hash used to point to it:

declare
   Cur : String :=
     "00000000000000000044e859a307b60d66ae586528fcc6d4df8a7c3eff132456";
   S : String (1 ..64);
begin
   loop
      declare
         B : constant Block_Type := Get_Block (Cur);
      begin
         S := Uint_256_Hex (Block_Hash (B));
         Put_Line ("checking block hash = " & S);
         if not (Same_Hash (S,Cur)) then 
            Ada.Text_IO.Put_Line ("found block hash mismatch");
         end if;
         Cur := Uint_256_Hex (B.Prev_Block);
      end;
   end loop;
end;

A few explanations: the Cur string contains the hash of the current block as a hexadecimal string. At each iteration, we fetch the block with this hash (details in the next paragraph) and compute the actual hash of the block using the Block_Hash function. If everything matches, we set Cur to the contents of the Prev_Block field. Uint_256_Hex is the function to convert a hash value in memory to its hexadecimal representation for display.

One last step is to get the actual blockchain data. The size of the blockchain is now 150GB and counting, so this is actually not so straightforward! For this blog post, I added 12 blocks in JSON format to the github repository, making it self-contained. The Get_Block function reads a file with the same name as the block hash to obtain the data, starting at a hardcoded block with the hash mentioned in the code. If you want to verify the whole blockchain using the above code, you have to either query the data using some website such as blockchain.info, or download the blockchain on your computer, for example using the Bitcoin Core client, and update Get_Block accordingly.

How to compute the Merkle Root Hash

So far, we were able to verify the proper chaining of the blockchain, but what about the contents of the block?  The objective is now to come up with the Merkle root hash mentioned earlier, which is supposed to "summarize" the block contents: that is, it should change for any slight change of the input.

First, each transaction is again identified by its hash, similar to how blocks are identified. So now we need to compute a single hash value from the list of hashes for the transactions of the block. Bitcoin uses a hash function which combines two hashes into a single hash:

   function SHA256Pair (U1, U2 : Uint_256) return Uint_256 is
      type A is array (1 .. 2) of Uint_256;
      X : A := (U1, U2);
      S : String (1 .. X'Size / 8);
      for S'Address use X'Address;
   begin
      return Double_Hash (S);
   end SHA256Pair;

Basically, the two numbers are put side-by-side in memory and the result is hashed using the double hash function.

Now we could just iterate over the list of transaction hashes, using this combining function to come up with a single value. But it turns out Bitcoin does it a bit differently; hashes are combined using a scheme that's called a Merkle tree:

One can imagine the transactions (T1 to T6 in the example) be stored at the leaves of a binary tree, where each inner node carries a hash which is the combination of the two child hashes. For example, H7 is computed from H1 and H2. The root node carries the "Merkle root hash", which in this way summarizes all transactions. However, this image of a tree is just that - an image to show the order of hash computations that need to be done to compute the Merkle root hash. There is no actual tree stored in memory.

There is one peculiarity in the way Bitcoin computes the Merkle hash: when a row has an odd number of elements, the last element is combined with itself to compute the parent hash. You can see this in the picture, where H9 is used twice to compute H11.

The Ada code for this is quite straightforward:

   function Merkle_Computation (Tx : Transaction_Array) return Uint_256 is
      Max : Integer :=
          (if Tx'Length rem 2 = 0 then Tx'Length else Tx'Length + 1);
      Copy : Transaction_Array (1 .. Max);
   begin
      if Tx'Length = 1 then
         return Tx (Tx'First);
      end if;
      if Tx'Length = 0 then
         raise Program_Error;
      end if;
      Copy (1 .. Tx'Length) := Tx;
      if (Max /= Tx'Length) then
         Copy (Max) := Tx (Tx'Last);
      end if;
      loop
         for I in 1 .. Max / 2 loop
            Copy (I) := SHA256Pair (Copy (2 * I - 1), Copy (2 *I ));
         end loop;
         if Max = 2 then
            return Copy (1);
         end if;
         Max := Max / 2;
         if Max rem 2 /= 0 then
            Copy (Max + 1) := Copy (Max);
            Max := Max + 1;
         end if;
      end loop;
   end Merkle_Computation;

Note that despite the name, the input array only contains transaction hashes and not actual transactions. A copy of the input array is created at the beginning; after each iteration of the loop in the code, it contains one level of the Merkle tree. Both before and inside the loop, if statements check for the edge case of combining an odd number of hashes at a given level.

We can now update our checking code to also check for the correctness of the Merkle root hash for each checked block. You can check out the whole code from this repository; the branch “blogpost_1” will stay there to point to the code as shown here.

Why does Bitcoin compute the hash of the transactions in this way? Because it allows for a more efficient way to prove to someone that a certain transaction is in the blockchain.

Suppose you want to show someone that you sent her the required amount of Bitcoin to buy some product. The person could, of course, download the entire block you indicate and check for themselves, but that’s inefficient. Instead, you could present them with the chain of hashes that leads to the root hash of the block.

If the transaction hashes were combined linearly, you would still have to show them the entire list of transactions that come after yours in the block. But with the Merkle hash, you can present them with a “Merkle proof”: that is, just the hashes required to compute the path from your transaction to the Merkle root. In your example, if your transaction is T3, it's enough to also provide H4, H7 and H11: the other person can  compute the Merkle root hash from that and compare it with the “official” Merkle root hash of that block.

When I first saw this explanation, I was puzzled why an attacker couldn’t modify transaction T3 to T3b and then “invent” the hashes H4b, H7b and H11b so that the Merkle root hash H12 is unchanged. But the cryptographic nature of the hash function prevents this: today, there is no known attack against the hash function SHA256 used in Bitcoin that would allow inventing such input values (but for the weaker hash function SHA1 such collisions have been found).

Wrap-Up

In this blog post I have shown Ada code that can be used to verify the data integrity of blocks from the Bitcoin blockchain. I was able to check the block and Merkle root hashes for all the blocks in the blockchain in a few hours on my computer, though most of the time was spent in Input/Output to read the data in.

There are many more rules that make a block valid, most of them related to transactions. I hope to cover some of them in later blog posts.

]]>
The Road to a Thick OpenGL Binding for Ada: Part 1 http://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada Mon, 05 Feb 2018 15:59:00 +0000 Felix Krause http://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada

This blog post is part one of a tutorial based on the OpenGLAda project and will cover some the background of the OpenGL API and the basic steps involved in importing platform-dependent C functions.

Motivation

Ada was designed by its onset in the late 70’s to be highly compatible with other languages - for example, there are currently native facilities for directly using libraries from C, FORTRAN, COBOL, C++, and even Java. However, there is still a process (although automate-able to a certain extent) that must be followed to safely and effectively import an API or create what we will refer to here as a binding.

Additionally, foreign APIs may not be the most efficient or user-friendly for direct use in Ada, and so it is often considered useful to go above and beyond making a simple or thin binding and instead craft a small custom library (or thick binding) above the original API to solidify and greatly simplify its use within the Ada language.

In this blog post I will describe the design decisions and architecture of OpenGLAda - a custom thick binding to the OpenGL API for Ada, and, in the process, I hope to provide ideas and techniques that may inspire others to contribute their own bindings for similar libraries.

Below are some examples based on the classic OpenGL Superbible of what is possible using the OpenGLAda binding and whose complete source can be found on my Github repo for OpenGLAda here along with instructions for setting up an Ada environment:

Screenshots of example projects from OpenGLAda

Background

OpenGL, created in 1991 by Silicon Graphics, has had a long history as an industry standard for rendering 3D vector graphics - growing through numerous revisions (currently at version 4.6) both adding new features and deprecating or removing others. As a result, the once simple API has become more complex and difficult to wield at times. Despite this and even with the competition of Microsoft’s DirectX and the creation of new APIs (like Vulkan), OpenGL still remains a big player in the Linux, Mac, and free-software world.

Unlike a typical C library, OpenGL has hundreds (maybe even thousands) of implementations, usually provided by graphics hardware vendors. While the OpenGL API itself is considered platform-independent, making use of it does depends heavily on the target platform's graphics and windowing systems. This is due to the fact that rendering requires a so-called OpenGL context consisting of a drawing area on the screen and all associated data needed for rendering. For this reason, there exist multiple glue APIs that enable using OpenGL in conjunction with several windowing systems.

Design Challenges

A concept that proliferates the design of OpenGL is graceful degradation - meaning that if some feature or function is unavailable on a target platform the client software may supply a workaround or simply skip the part of the rendering process in which the feature is required. This makes it necessary to query for existing features during run-time. Additionally, the code for querying OpenGL features is not part of the OpenGL API itself and must be provided by us and defined separately for each platform we plan to support.

These properties pose the following challenges for our Ada binding:

  1. It must include some platform-dependent code, ideally hiding this from the user to enable platform-independent usage.
  2. It must access OpenGL features without directly linking to them so that missing features can be handled inside the application.

First Steps

Note: I started working on OpenGLAda in 2012 so it only uses the features of the Ada 2005 language level. Some code shown here could be written in a more succinct way with the added constructs in Ada 2012 (most notably aspects and expression functions).

To get started on our binding we need to translate subprogram definitions from the standard OpenGL C header into Ada. Since we are writing a thick binding and are going above and beyond directly using the original C function, these API imports should be invisible to the user. Thus, we will define a set of private packages such as GL.API to house all of these imports. A private package can only be used by the immediate parent package and its children, making it invisible for a user of the library. The public package GL and its public child packages will provide the public interface.

To translate a C subprogram declaration to Ada, we need to map all C types it uses into equivalent Ada types then essentially change the syntax from C to Ada. For the first import, we choose the following subprogram:

void glFlush();

This is a command used to tell OpenGL to execute commands currently stored in internal buffers. It is a very common command and thus is placed directly in the top-level package of the public interface. Since the command has no parameters and returns no values, there are no types involved so we don’t need to care about them for now. Our Ada code looks like this:

package GL is
   procedure Flush;
end GL;

private package GL.API is
   procedure Flush;
   pragma Import
     (Convention    => C,
      Entity        => Flush,
      External_Name => "glFlush");
end GL.API;

package body GL is
   procedure Flush is
   begin
      API.Flush;
   end Flush;
end GL;

Instead of providing an implementation of GL.API.Flush in a package body, we use the pragma Import to tell the Ada compiler that we are importing this subprogram from another library. The first parameter is the calling convention, which defines low-level details about how a subprogram call is to be translated into machine code. It is vital that the caller and the callee agree on the same calling convention; a mistake at this point is hard to detect and, in the worst case, may lead to memory corruption during run-time.

Note that when defining the implementation of the public subprogram GL.Flush, we cannot use a renames clause like we typically would, because our imported backend subprogram is within a private package.

Now, the interesting part: how do we link to the appropriate OpenGL implementation according to the system we are targeting? Not only are there multiple implementations, but their link names also differ.

The solution is to use the GPRBuild tool and define a scenario variable to select the correct linker flags:

library project OpenGL is
   type Windowing_System_Type is
      ("windows", --  Microsoft Windows
       "x11",     --  X Window System (primarily used on Linux)
       "quartz"); --  Quartz Compositor (the macOS window manager)

   Windowing_System : Windowing_System_Type :=
     external ("Windowing_System");

  for Languages use ("ada");
  for Library_Name use "OpenGLAda";
  for Source_Dirs use ("src");

   package Compiler is
      for Default_Switches ("ada") use ("-gnat05");
   end Compiler;

   package Linker is
      case Windowing_System is
         when "windows" =>
            for Linker_Options use ("-lOpenGL32");
         when "x11" =>
            for Linker_Options use ("-lGL");
         when "quartz" =>
            for Linker_Options use ("-Wl,-framework,OpenGL");
      end case;
   end Linker;
end OpenGL;

We will need other distinctions based on the windowing system later, and thus we name the scenario variable Windowing_System accordingly, although, at this point, it would also be sensible to distinguish just the operating system instead. We use Linker_Options instead of Default_Switches in the linker to tell GPRBuild what options we need when linking the final executable.

As you can see, the library we link against is called OpenGL32 on Windows and  GL on Linux. On MacOS, there is the concept of frameworks which are somewhat more sophisticated software libraries. On the gcc command line, they can be given with "-framework <name>", which gcc hands over to the linker. However, this does not work easily with GPRBuild unless we use the "-Wl,option" flag, whose operation is defined as:

Pass "option" as an option to the linker. If option contains commas, it is split into multiple options at the commas. You can use this syntax to pass an argument to the option.

At this point, we have almost successfully wrapped our first OpenGL subprogram. However, there is a nasty little detail we overlooked: Windows APIs use a calling convention different than the standard C one. One usually only needs to care about this when linking against the Win32 API, however, OpenGL is thought to be part of the Windows API as we can see in the OpenGL C header:

GLAPI void APIENTRY glFlush (void);

... and by digging through the Windows version of this header we then find somewhere, wrapped in some #ifdef's, this line:

#define APIENTRY __stdcall

This means our target C function has the calling convention stdcall, which is only used on Windows. Thankfully, GNAT supports this calling convention, and moreover, for every system that is not Windows defines it as synonym for the C calling convention. Thus, we can rewrite our import:

procedure Flush;
pragma Import
  (Convention    => Stdcall,
   Entity        => Flush,
   External_Name => "glFlush");

With the above code, our first wrapper subprogram is ready.


Stay tuned for part two where we will cover a basic type system for interfacing with C, error handling, memory management, and more!

Part two of this article can be found here!

]]>
AdaCore at FOSDEM 2018 http://blog.adacore.com/adacore-at-fodsem-2018 Thu, 18 Jan 2018 14:55:02 +0000 Pierre-Marie de Rodat http://blog.adacore.com/adacore-at-fodsem-2018

Every year, free and open source enthusiasts gather at Brussels (Belgium) for two days of FLOSS-related conferences. FOSDEM organizers setup several “developer rooms”, which are venues that host talks on specific topics. This year, the event will happen on the 3rd and 4th of February (Saturday and Sunday) and there is a room dedicated to the Ada programming language.

Just like last year and the year before, several AdaCore engineers will be there. We have five talks scheduled:

In the Ada devroom:

In the Embedded, mobile and automotive devroom:

In the Source Code Analysis devroom:

Note also in the Embedded, mobile and automotive devroom that the talk from Alexander Senier about the work they are doing at Componolit, which uses SPARK and Genode to bring trust to the Android platform.

If you happen to be in the area, please come and say hi!

]]>
Leveraging Ada Run-Time Checks with Fuzz Testing in AFL http://blog.adacore.com/running-american-fuzzy-lop-on-your-ada-code Tue, 19 Dec 2017 09:36:08 +0000 Lionel Matias http://blog.adacore.com/running-american-fuzzy-lop-on-your-ada-code

Fuzzing is a very popular bug finding method. The concept, very simply, is to continuously inject random (garbage) data as input of a software component, and wait for it to crash. Google's Project Zero team made it one of their major vulnerability-finding tools (at Google scale). It is very efficient at robust-testing file format parsers, antivirus software, internet browsers, javascript interpreters, font face libraries, system calls, file systems, databases, web servers, DNS servers... When Heartbleed came out, people found out that it was, indeed, easy to find. Google even launched a free service to fuzz at scale widely used open-source libraries, and Microsoft created Project Springfield, a commercial service to fuzz your application (at scale also, in the cloud).

Writing robustness tests can be tedious, and we - as developers - are usually bad at it. But when your application is on an open network, or just user-facing, or you have thousands of appliance in the wild, that might face problems (disk, network, cosmic rays :-)) that you won't see in your test lab, you might want to double-, triple-check that your parsers, deserializers, decoders are as robust as possible...

In my experience, fuzzing causes so many unexpected crashes in so many software parts, that it's not unusual to spend more time doing crash triage than preparing a fuzzing session. It is a great verification tool to complement structured testing, static analysis and code review.

Ada is pretty interesting for fuzzing, since all the runtime checks (the ones your compiler couldn't enforce statically) and all the defensive code you've added (through pre-/post-conditions, asserts, ...) can be leveraged as fuzzing targets.

Here's a recipe to use American Fuzzy Lop on your Ada code.

American Fuzzy Lop

AFL is a fuzzer from Michał Zalewski (lcamtuf), from the Google security team. It has an impressive trophy case of bugs and vulnerabilities found in dozens of open-source libraries or tools.

You can even see for yourself how efficient guided fuzzing is in the now classic 'pulling jpegs out of thin air' demonstration on the lcamtuf blog.

I invite you to read the technical description of the tool to get a precise idea of the innards of AFL.

Installation instructions are covered in the Quick Start Guide, and they can be summed as:

  1. Get the latest source
  2. Build it (make)
  3. Make your program ready for fuzzing (here you have to work a bit)
  4. Start fuzzing...

There are two main parts of the tool:

  • afl-clang/afl-gcc to instrument your binary
  • and afl-fuzz that runs your binary and uses the instrumentation to guide the fuzzing session.

Instrumentation

afl-clang / afl-gcc compiles your code and adds a simple instrumentation around branch instructions. The instrumentation is similar to gcov or profiling instrumentation but it targets basic blocks. In the clang world, afl-clang-fast uses a plug-in to add the instrumentation cleanly (the compiler knows about all basic blocks, and it's very easy to add some code at the start of a basic block in clang). In the gcc world the tool only provides a hacky solution.

The way it works is that instead of calling your GCC of predilection, you call afl-gcc. afl-gcc will then call your GCC to output the assembly code generated from your code. To simplify, afl-gcc patches every jump instruction and every label (jump destination) to append an instrumentation block. It then calls your assembler to finish the compilation job.

Since it is a pass on assembly code generated from GCC it can be used to fuzz Ada code compiled with GNAT (since GNAT is based on GCC). In the gprbuild world this means calling gprbuild with the --compiler-subst=lang,tool option (see gprbuild manual).

Note : afl-gcc will override compilation options to force -O3 -funroll-loops. The reason behind this is that the authors of AFL noticed that those optimization options helped with the coverage instrumentation (unrolling loops will add new 'jump' instructions).

With some codebases there can appear a problem with the 'rep ret' instruction. For obscure reasons gcc sometimes insert a 'rep ret' instruction instead of a ‘ret’ (return) instruction. Some info on the gcc mailing list archives and in more detail if you dare on a dedicated website call repzret.org.

When AFL inserts its instrumentation code, the 'rep ret' instruction is not correct anymore ('as' complains). Since 'rep ret' is exactly the same instruction (except a bit slower on some AMD arch) as ‘ret’, you can add a step in afl-as (the assembly patching module) to patch the (already patched) assembly code: add the following code at line 269 in afl-as.c (on 2.51b or 2.52b versions):

    if (!strncmp(line, "\trep ret", 8)) {
       SAYF("[LMA patch] afl-as : replace 'rep ret' with (only) 'ret'\n");
       fputs("\tret\n", outf);
       continue;
    }

... and then recompile AFL. It then works fine, and prints a specific message whenever it encounters the problematic case. I didn't need this workaround for the example programs I chose for this post (you probably won't need it), but it can happen, so here you go...

Though a bit hacky, going through assembly and sed-patching it seems the only way to do this on gcc, for now. It's obviously not available on any other arch (power, arm) as such, as afl-as inserts an x86-specifc payload. Someone wrote a gcc plug-in once and it would need some love to be ported to a gcc-6 (recent GNAT) or -8 series (future GNAT). The plug-in approach would also allow to do in-process fuzzing, speed-up the fuzzing process, and ease the fuzzing of programs with a large initialization/set-up time.

When you don't have the source code or changing your build chain would be too hard, the afl-fuzz manual mentions a Qemu-based option. I haven't tried it though.

The test-case generator

It takes a bunch of valid inputs to your application, and implements a wide variety of random mutations, runs your application with them and then uses the inserted instrumentation to guide itself to new code paths and avoid staying too much on paths that already crash.

AFL looks for crashes. It is expecting a call to abort() (SIGABRT). Its job is to try and crash your software, its search target is "a new unique crash".

It's not very common to get a core dump (SIGSEGV/SIGABRT) in Ada with GNAT, even following an uncaught top-level exception. You'll have to help the fuzzer and provoke core dumps on errors you want to catch. A top-level exception by itself won't do it. In the GNAT world you can dump core using the Core_Dump procedure in the GNAT-specific package GNAT.Exception_Actions. What I usually do is let all exceptions bubble up to a top-level exception handler and filter by name, and only crash/abort on the exceptions I'm interested in. And if the bug you're trying to find with fuzzing doesn't crash your application, make it  a crashing bug.

With all that said, let’s find some open-source libraries to fuzz.

Fuzzing Zip-Ada

Zip-Ada is a nice pure-Ada library to work with zip archives. It can open, extract, compress, decompress most of the possible kinds of zip files. It even has implemented recently LZMA compression. It's 100% Ada, portable, quite readable and simple to use (drop the source, use the gpr file, look up the examples and you're set). And it's quite efficient (say my own informal benchmarks). Anyway it's a cool project to contribute to but I'm no compression wizard. Instead, let's try and fuzz it.

Since it's a library that can be given arbitrary files, maybe of dubious source, it needs to be robust.

I got the source from sourceforge of version 52 (or if you prefer, on github), uncompressed it and found the gprbuild file. Conveniently Zip-Ada comes with a debug mode that enables all possible runtime checks from GNAT, including -gnatVa, -gnato. The zipada.gpr file also references a 'pragma file' (through -gnatec=debug.pra) that contains a 'pragma Initialize_Scalars;' directive so everything is OK on the build side.

Then we need a very simple test program that takes a file name as a command-line argument, then drives the library from there. File parsers are the juiciest targets, so let's read and parse a file: we'll open and extract a zip file. For a first program what we're looking for is procedure Extract in the Unzip package:

-- Extract all files from an archive (from)
procedure Extract (From                 : String;
                   Options              : Option_set := No_Option;
                   Password             : String := "";
                   File_System_Routines : FS_Routines_Type := Null_Routines)

Just give it a file name and it will (try to) parse it as an archive and extract all the files from the archive.

We also need to give AFL what it needs (abort() / core dump) so let's add a top-level exception block that will do that, unconditionally (at first) on any exception.

The example program looks like:

with UnZip; use UnZip;
with Ada.Command_Line;
with GNAT.Exception_Actions;
with Ada.Exceptions;
with Ada.Text_IO; use Ada.Text_IO;

procedure Test_Extract is
begin
  Extract (From                 => Ada.Command_Line.Argument (1),
           Options              => (Test_Only => True, others => False),
           Password             => "",
           File_System_Routines => Null_routines);
exception
  when Occurence : others  =>
     Put_Line ("exception occured [" & Ada.Exceptions.Exception_Name (Occurence)
               & "] [" & Ada.Exceptions.Exception_Message (Occurence)
               & "] [" & Ada.Exceptions.Exception_Information (Occurence) & "]");
     GNAT.Exception_Actions.Core_Dump (Occurence);
end Test_Extract;

And to have it compile, we add it to the list of main programs in the zipada.gpr file.

Then let's build:

gprbuild --compiler-subst=Ada,/home/lionel/afl/afl-2.51b/afl-gcc -p -P zipada.gpr -Xmode=debug

We get a classic gprbuild display, with some additional lines:

...
afl-gcc -c -gnat05 -O2 -gnatp -gnatn -funroll-loops -fpeel-loops -funswitch-loops -ftracer -fweb -frename-registers -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fipa-cp-clone -ffunction-sections -gnatec../za_elim.pra zipada.adb
afl-cc 2.51b by <lcamtuf@google.com>
afl-as 2.51b by <lcamtuf@google.com>
[+] Instrumented 434 locations (64-bit, non-hardened mode, ratio 100%).
afl-gcc -c -gnat05 -O2 -gnatp -gnatn -funroll-loops -fpeel-loops -funswitch-loops -ftracer -fweb -frename-registers -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fipa-cp-clone -ffunction-sections -gnatec../za_elim.pra comp_zip.adb
afl-cc 2.51b by <lcamtuf@google.com>
afl-as 2.51b by <lcamtuf@google.com>
[+] Instrumented 45 locations (64-bit, non-hardened mode, ratio 100%).
...

The 2 additional afl-gcc and afl-as steps show up along with a counter of instrumented locations in the assembly code for each unit. So, some instrumentation was inserted.

Fuzzers are bad with checksums (http://moyix.blogspot.fr/2016/07/fuzzing-with-afl-is-an-art.html is an interesting dive into what can block afl-fuzz and what can be done, and John Regehr had a blog post on what AFL is bad at). For example, there’s no way for a fuzzing tool to go through a checksum test: it would need to generate only test cases that have a matching checksum. So, to make sure we get somewhere, I removed all checksum tests. There was one for zip CRC. Another one for zip passwords, for similar reasons. After I commented out those tests, I recompiled the test program.

Then we’ll need to build a fuzzing environnement:

mkdir fuzzing-session
mkdir fuzzing-session/input
mkdir fuzzing-session/output

We also need to bootstrap the fuzzer with an initial corpus that doesn't crash. If there's a test suite, put the correct files in input/.

Then afl-fuzz can (finally) be launched:

AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON \
  /home/lionel/afl/afl-2.51b/afl-fuzz -m 1024 -i input -o output ../test_extract @@

 -i dir        - input directory with test cases
 -o dir        - output directory for fuzzer findings
 -m megs       - memory limit for child process (50 MB)
  @@ to tell afl to put the input file as a command line argument. By default afl will write to the program's stdin.

The AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON prelude is to silence a warning from afl-fuzz about how your system handles core dumps (see the man page for core). For afl-fuzz it's a problem because whatever is done to handle core dumps on your system might take some time and afl-fuzz will think the program timed out (although it crashed). For you it can also be a problem : It’s possible (see : some linux distros) that it’d instruct your system to do something (send a UI notification, fill your /var/log/messages, send a crash report e-mail to your sysadmin, …) with core dumps automatically (and you might not care). Maybe check first with your sysadmin… If you’re root on your machine, follow afl-fuzz’s advice and change your /proc/sys/kernel/core_pattern to something sensitive.

Let’s go:

JwHBfuE9UC2Mex1k4simAtXAsYtlNqt6teGfgRBa

In less than 2 minutes, afl-fuzz finds several crashes. While it says they’re “unique”, they in fact trigger the same 2 or 3 exceptions. After 3 hours, it “converges” to a list of crashes, and letting it run for 3 days doesn’t bring another one.

It got a string of CONSTRAINT_ERRORs:

  • CONSTRAINT_ERROR : unzip.adb:269 range check failed

  • CONSTRAINT_ERROR : zip.adb:535 range check failed

  • CONSTRAINT_ERROR : zip.adb:561 range check failed

  • CONSTRAINT_ERROR : zip-headers.adb:240 range check failed

  • CONSTRAINT_ERROR : unzip-decompress.adb:650 range check failed

  • CONSTRAINT_ERROR : unzip-decompress.adb:712 index check failed

  • CONSTRAINT_ERROR : unzip-decompress.adb:1384 access check failed

  • CONSTRAINT_ERROR : unzip-decompress.adb:1431 access check failed

  • CONSTRAINT_ERROR : unzip-decompress.adb:1648 access check failed

I sent those and the reproducers to Gautier de Montmollin (Zip-Ada's maintainer). He corrected those quickly (revisions 587 up to 599). Most of those errors now are raised as Zip-Ada-specific exceptions. He also decided to rationalize the list of raised exceptions that could (for legitimate reasons) be raised from the Zip-Ada decoding code.

It also got some ADA.IO_EXCEPTIONS.END_ERROR:

  • ADA.IO_EXCEPTIONS.END_ERROR : zip.adb:894

  • ADA.IO_EXCEPTIONS.END_ERROR : s-ststop.adb:284 instantiated at s-ststop.adb:402

I redid another fuzzing session after all the corrections and improvements confirming the list of exceptions.

This wasn’t a lot of work (for me), mostly using the cycles on my machine that I didn’t use, and I got a nice thanks for contributing :-).

Fuzzing AdaYaml

AdaYaml is a library to parse YAML files in Ada.

Let’s start by cloning the github repository (the one before all the corrections). For those not familiar to git (here's a tutorial) :

git clone https://github.com/yaml/AdaYaml.git
git checkout 5616697b12696fd3dcb1fc01a453a592a125d6dd

Then the source code of the version I tested should be in the AdaYaml folder.

If you don't want anything to do with git, there's a feature on github to download a Zip archive of a version of a repository.

ksoBKV25LRyOwCwsEcIQQxyJpiHcOWUItHBVw_9G

AdaYaml will ask for a bit more work to fuzz: we need to create a simple example program, then add some compilation options to the GPR files (-gnatVa, -gnato) and to add a pragma configuration file to set pragma Initialize_Scalars. This last option, combined with -gnatVa helps surface accesses to uninitialized variables (if you don't know the option : https://gcc.gnu.org/onlinedocs/gcc-4.6.3/gnat_rm/Pragma-Initialize_005fScalars.html and http://www.adacore.com/uploads/technical-papers/rtchecks.pdf). All those options to make sure we catch the most problems possible with runtime checks.

The example program looks like:

with Utils;
with Ada.Text_IO;
with Ada.Command_Line;
with GNAT.Exception_Actions;
with Ada.Exceptions;
with Yaml.Dom;
with Yaml.Dom.Vectors;
with Yaml.Dom.Loading;
with Yaml.Dom.Dumping;
with Yaml.Events.Queue;
procedure Yaml_Test
is
  S : constant String := Utils.File_Content (Ada.Command_Line.Argument (1));
begin
  Ada.Text_IO.Put_Line (S);
  declare
     V : constant Yaml.Dom.Vectors.Vector := Yaml.Dom.Loading.From_String (S);
     E : constant Yaml.Events.Queue.Reference :=
       Yaml.Dom.Dumping.To_Event_Queue (V);
     pragma Unreferenced (E);
  begin
     null;
  end;
exception
  when Occurence : others =>
     Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence));
     GNAT.Exception_Actions.Core_Dump (Occurence);
end Yaml_Test;

The program just reads a file and parses it, transforms it into a vector of DOM objects, then transforms those back to a list of events (see API docs).

The YAML reference spec may help explain a bit what's going on here.

Using the following diagram, and for those well-versed in YAML:

z-svjCqfMHHg6qx8b9nUGAzXbR12sPQaAdPUk9MG

  • the V variable (of our test program) is a "Representation" generated via the Parse -> Compose path
  • the E variable is an "Event Tree" generated from V via "Serialize" (so, going back down to a lower-level representation from the DOM tree).

For this specific fuzzing test, the idea is not to stop at the first stage of parsing but also to go a bit through the data that was decoded, and do something with it (here we stop short of a round-trip to text, we just go back to an Event Tree).

Sometimes a parser faced with incoherent input will keep on going (fail silently) and won't fill (initialize) some fields.

The GPR files to patch are yaml.gpr and the parser_tools.gpr subproject.

The first fuzzing session triggers “expected” exceptions from the parser:

  • YAML.PARSER_ERROR

  • YAML.COMPOSER_ERROR

  • LEXER.LEXER_ERROR

  • YAML.STREAM_ERROR (as it turns out, this one is also unexpected... more on this one later)

Which should happen with malformed input.

So to get unexpected crashes and only those, let’s filter them in the top-level exception handler.

exception
  when Occurence : others =>
     declare
        N : constant String := Ada.Exceptions.Exception_Name (Occurence);
     begin
        Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence));
        if N = "YAML.PARSER_ERROR"
          or else N = "LEXER.LEXER_ERROR"
          or else N = "YAML.STREAM_ERROR"
          or else N = "YAML.COMPOSER_ERROR"
        then
          null;
        else
          GNAT.Exception_Actions.Core_Dump (Occurence);
        end if;
end Yaml_Test;

Then, I recompiled, used some YAML example files as a startup corpus, and started fuzzing.

dq2n8KEUH1vwQcpP5kAltzVkFcamXdz7j7DIbXgk

After 4 minutes 30 seconds, the first crashes appeared.

I let it run for hours, then a day and found a list of issues. I sent all of those and the reproducers to Felix Krause (maintainer of the AdaYaml project).

He was quick to answer and analyse all the exceptions. Here are his comments:

  • ADA.STRINGS.UTF_ENCODING.ENCODING_ERROR : bad input at Item (1)

I guess this happens when you use a unicode escape sequence that codifies a code point beyond the unicode range (0 .. 0x10ffff). Definitely an error and should raise a Lexer_Error instead.

… and he created issue https://github.com/yaml/AdaYaml/issues/4

  • CONSTRAINT_ERROR : text.adb:203 invalid data

This hints to a serious error in my custom string allocator that can lead to memory corruption. I have to investigate to be able to tell what goes wrong here.

… and then he found the problem: https://github.com/yaml/AdaYaml/issues/5

  • CONSTRAINT_ERROR : Yaml.Dom.Mapping_Data.Node_Maps.Insert: attempt to insert key already in map

This happens when you try to parse a YAML mapping that has two identical keys (this is conformant to the standard which disallows that). However, the error should be catched and a Compose_Error should be raised instead.

… and he opened https://github.com/yaml/AdaYaml/issues/3

  • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:283 overflow check failed

  • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:286 overflow check failed

  • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:289 overflow check failed

This is, thankfully, an obvious error: Hex escape sequence in the input may have up to eight nibbles, so they represent a value range of 0 .. 2**32 - 1. I use, however, a Natural to store that value, which is a subtype of Integer, which is of platform-dependent range – in this case, it is probably 32-bit, but since it is signed, its range goes only up to 2**31 - 1. This would suffice in theory, since the largest unicode code point is 0x10ffff, but AdaYaml needs to catch cases that exceed this range.

… and attached to https://github.com/yaml/AdaYaml/issues/4

  • STORAGE_ERROR : stack overflow or erroneous memory access

 … and he created issue https://github.com/yaml/AdaYaml/issues/6 and changed the parsing mode of nested structures to avoid stack overflows (no more recursion).

There were also some “hangs”: AFL monitors the execution time of every test case, and flags large timeouts as hangs, to be inspected separately from crashes. Felix took the examples with a long execution time, and found an issue with the hashing of nodes.

With all those error cases, Felix created an issue that references all the individual issues, and corrected them.

After all the corrections, Felix gave me a analysis of the usefulness of the test:

Your findings mirror the test coverage of the AdaYaml modules pretty well:
There was no bug in the parser, as this is the most well-tested module. One bug each was found in the lexer and the text memory management, as these modules do have high test coverage, but only because they are needed for the parser tests. And then three errors in the DOM code as this module is almost completely untested.

After reading a first draft of this blog post, Felix noted that YAML.STREAM_ERROR was in fact an unexpected error in my test program.

Also, you should not exclude Yaml.Stream_Error. This error means that a malformed event stream has been encountered. Parsing a YAML input stream or serializing a DOM structure should *always* create a valid event stream unless it raises an exception – hence getting Yaml.Stream_Error would actually show that there's an internal error in one of those components. [...] Yaml.Stream_Error would only be an error with external cause if you generate an event stream manually in your code. 

I filtered this exception because I'd encountered it in the test suite available in the AdaYaml github repository (it is in fact a copy of the reference yaml test-suite). I wanted to use the complete test suite as a starting corpus, but examples 8G76 and 98YD crashed and it prevented me from starting the fuzzing session, so instead of removing the crashing test cases, I filtered out the exception...

The fact that 2 test cases from the YAML test suite make my simple program crash is interesting, but can we find more cases ?

I removed those 2 files from the initial corpus, and I focused the small test program on finding cases that crash on a YAML.STREAM_ERROR:

exception
  when Occurence : others =>
     declare
        N : constant String := Ada.Exceptions.Exception_Name (Occurence);
     begin
        Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence));
        if N = "YAML.STREAM_ERROR" then
          GNAT.Exception_Actions.Core_Dump (Occurence);
        end if;
end Yaml_Test;

In less than 5 minutes, AFL finds 5 categories of crashes:

  • raised YAML.STREAM_ERROR : Unexpected event (expected document end): ALIAS
  • raised YAML.STREAM_ERROR : Unexpected event (expected document end): MAPPING_START
  • raised YAML.STREAM_ERROR : Unexpected event (expected document end): SCALAR
  • raised YAML.STREAM_ERROR : Unexpected event (expected document end): SEQUENCE_START
  • raised YAML.STREAM_ERROR : Unexpected event (expected document start): STREAM_END

Felix was quick to answer:

Well, seems like you've found a bug in the parser. This looks like the parser may generate some node after the first root node of a document, although a document always has exactly one root node. This should never happen; if the YAML contains multiple root nodes, this should be a Parser_Error.

I opened a new issue about this, to be checked later.

Fuzzing GNATCOLL.JSON

JSON parsers are a common fuzzing target, not that different from YAML. This could be interesting.

Following a similar pattern as other fuzzing sessions, let’s first build a simple unit test that reads and parses an input file given at the command-line (first argument), using GNATCOLL.JSON (https://github.com/AdaCore/gnatcoll-core/blob/master/src/gnatcoll-json.ads). This time I massaged one of the unit tests into a simple “read a JSON file all in memory, decode it and print it” test program, that we’ll use for fuzzing.

Note: for the exercise here I used GNATCOLL GPL 2016, because that's what I was using for a personal project. You should probably use the latest version when you do this kind of testing, at least before you report your findings.

The test program is very simple:

procedure JSON_Fuzzing_Test is
   Filename  : constant String  := Ada.Command_Line.Argument (1);
   JSON_Data : Unbounded_String := File_IO.Read_File (Filename);
begin
   declare
      Value : GNATCOLL.JSON.JSON_Value :=
        GNATCOLL.JSON.Read (Strm     => JSON_Data,
                            Filename => Filename);
   begin
      declare
         New_JSON_Data : constant Unbounded_String :=
           GNATCOLL.JSON.Write (Item => Value, Compact => False);
      begin
         File_IO.Write_File (File_Name     => "out.json",
                             File_Contents => New_JSON_Data);
      end;
   end;
end JSON_Fuzzing_Test;

The GPR file is simple with a twist : to make sure we compile this program with gnatcoll, and that when we’ll use afl-gcc we’ll compile the library code with our substitution compiler, we’ll “with” the actual “gnatcoll_full.gpr” (actual gnatcoll source code !) and not the one for the compiled library.

Then we build the project in "debug" mode, to get all the runtime checks available:

gprbuild -p -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug 

Then I tried to find a test corpus. One example is https://github.com/nst/JSONTestSuite cited in “Parsing JSON is a minefield”. There’s a test_parsing folder there that contains 318 test cases.

Trying to run them first on the new simple test program shows already several "crash" cases:

  • nice GNATCOLL.JSON.INVALID_JSON_STREAM exceptions

    • Numerical value too large to fit into an IEEE 754 float

    • Numerical value too large to fit into a Long_Long_Integer

    • Unexpected token

    • Expected ',' in the array value

    • Unfinished array, expecting ending ']'

    • Expecting a digit after the initial '-' when decoding a number

    • Invalid token

    • Expecting digits after 'e' when decoding a number

    • Expecting digits after a '.' when decoding a number

    • Expected a value after the name in a JSON object at index N

    • Invalid string: cannot find ending "

    • Nothing to read from stream

    • Unterminated object value

    • Unexpected escape sequence

… which is fine, since you’ll expect this specific exception when parsing user-provided JSON.

Then I got to:

  • raised ADA.STRINGS.INDEX_ERROR : a-strunb.adb:1482

    • n_string_1_surrogate_then_escape_u1.json

    • n_string_1_surrogate_then_escape_u.json

    • n_string_invalid-utf-8-in-escape.json

    • n_structure_unclosed_array_partial_null.json

    • n_structure_unclosed_array_unfinished_false.json

    • n_structure_unclosed_array_unfinished_true.json

For which I opened https://github.com/AdaCore/gnatcoll-core/issues/5

  • raised CONSTRAINT_ERROR : bad input for 'Value: "16#??????"]#"

    • n_string_incomplete_surrogate.json

    • n_string_incomplete_escaped_character.json

    • n_string_1_surrogate_then_escape_u1x.json

 For which I opened https://github.com/AdaCore/gnatcoll-core/issues/6

  • … and STORAGE_ERROR : stack overflow or erroneous memory access
    • n_structure_100000_opening_arrays.json

This last one can be worked around using with ulimit -s unlimited (so, removing the limit of stack size). Still, beware of your stack when parsing user-provided JSON. For AdaYaml similar problems appeared, and were robustified, and I’m not sure whether this potential “denial of service by stack overflow” should be classified as a bug, it’s at least something to know when using GNATCOLL.JSON on user-provided JSON data (I’m guessing most API endpoints these days).

Those exceptions are the ones you don’t expect, and maybe didn’t put a catch-all there. A clean GNATCOLL.JSON.INVALID_JSON_STREAM exception might be better.

Note: on all those test cases, I didn’t check whether the results of the tests were OK. I just checked for crashes. It might be very interesting to check the corrected of GNATCOLL.JSON against this test suite.

Now let’s try through fuzzing to find more cases where you don’t get a clean GNATCOLL.JSON.INVALID_JSON_STREAM.

The first step is adding a final “catch-all” exception handler to abort only on unwanted exceptions (not all of them):

exception
  -- we don’t want to abort on a “controlled” exception
  when GNATCOLL.JSON.INVALID_JSON_STREAM =>
     null;
  when Occurence : others =>
     Ada.Text_IO.Put_Line
       ("exception occured for " & Filename
        & " [" &  Ada.Exceptions.Exception_Name (Occurence)
        & "] [" & Ada.Exceptions.Exception_Message (Occurence)
        & "] [" & Ada.Exceptions.Exception_Information (Occurence) & "]");
     GNAT.Exception_Actions.Core_Dump (Occurence);
end JSON_Fuzzing_Test;

And then clean the generated executable:

gprclean -r -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug

Then rebuild it using afl-gcc:

gprbuild --compiler-subst=Ada,/home/lionel/afl/afl-2.51b/afl-gcc -p -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug

Then we generate an input corpus for AFL, by keeping only the files that didn’t generate a call to abort() with the new JSON_Fuzzing_Test test program.

On first launch (AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON /home/lionel/aws/afl/afl-2.51b/afl-fuzz -m 1024 -i input -o output ../json_fuzzing_test @@), afl-fuzz complains:

[*] Attempting dry run with 'id:000001,orig:i_number_huge_exp.json'...
[-] The program took more than 1000 ms to process one of the initial test cases.
    This is bad news; raising the limit with the -t option is possible, but
    will probably make the fuzzing process extremely slow.
    If this test case is just a fluke, the other option is to just avoid it
    altogether, and find one that is less of a CPU hog.
[-] PROGRAM ABORT : Test case 'id:000001,orig:i_number_huge_exp.json' results in a timeout
        Location : perform_dry_run(), afl-fuzz.c:2776

… and it’s true, the i_number_huge_exp.json file takes a long time to be parsed:

[lionel@lionel fuzzing-session]$ time ../json_fuzzing_test input/i_number_huge_exp.json
input/i_number_huge_exp.json:1:2: Numerical value too large to fit into an IEEE 754 float
real    0m7.273s
user    0m3.717s
sys     0m0.008s

My machine isn’t fast, but still, this is a denial of service waiting to happen. I opened a ticket just in case.

Anyway let’s remove those input files that gave a timeout before we even started the fuzzing (the other ones are n_structure_100000_opening_arrays.json and n_structure_open_array_object.json).

During this first afl-fuzz run, in the start phase, a warning appears a lot of times:

[!] WARNING: No new instrumentation output, test case may be useless.

AFL looks through the whole input corpus, and checks whether input files have added any new basic block coverage to the already tested examples (also from the input corpus).

The initial phase ends with:

[!] WARNING: Some test cases look useless. Consider using a smaller set.
[!] WARNING: You probably have far too many input files! Consider trimming down.

To be the most efficient, afl-fuzz needs the slimmest input corpus with the highest basic block coverage, the most representative of all the OK code paths, and the least redundant possible. You can look through the afl-cmin and afl-tmin tools to minimize your input corpus.

For this session, let’s keep the test corpus as it is (large and redundant), and start the fuzzing session.

In the first seconds of fuzzing, we already get the following state:

Fuzzing GNATCOLL.JSON

Already 3 crashes, and 2 “hangs”. Looking through those, it seems afl-fuzz already found by itself examples of “ADA.STRINGS.INDEX_ERROR : a-strunb.adb:1482” and “CONSTRAINT_ERROR : bad input for 'Value: "16#?????”, although I removed from the corpus all files that showed those problems.

Same thing with the “hang”, afl-fuzz found an example of large float number, although I removed all “*_huge_*” float examples.

Let’s try and focus on finding something else than the ones we know.

I added the following code in the top-level exception handler:

   when Occurence : others =>
     declare
           Text : constant String := Ada.Exceptions.Exception_Information (Occurence);
     begin
        if    Ada.Strings.Fixed.Index (Source => Text, Pattern => "bad input for 'Value:") /= 0 then return;
        elsif Ada.Strings.Fixed.Index (Source => Text, Pattern => "a-strunb.adb:1482") /= 0 then return;
        end if;
     end;

It’s very hacky but it’ll remove some parasites (i.e. the crashes we know) from the crash bin.

Let’s restart the fuzzing session (remove the output/ directory, recreate it, and call afl-fuzz again).

Now after 10 minutes, no crash had occured, so I let the fuzzer run for 2 days straight, and it didn’t find any crash or hang other than the ones already triggered by the test suite.

It did however find some additional stack overflows (with examples that open a lot of arrays) even though I had put 1024m as a memory limit for afl-fuzz… Maybe something to look up...

What next?

Let's start fuzzing your favorite project, and report your results.

If you want to dive deeper in the subject of fuzzing with AFL, here's a short reading list for you:

  • Even simpler, if you have an extensive test case list, you can use afl-cmin (a corpus minimizer) to directly fuzz your parser or application efficiently, see the great success of AFL on sqlite.
  • The fuzzing world took on the work of lcamtuf and you can often hear about fuzzing-specific passes in clang/llvm to help fuzzing where it's bad at (checksums, magic strings, whole-string comparisons...). 
  • There's a lot of tooling around afl-fuzz: aflgo directs the focus of a fuzzing session to specific code parts, and Pythia helps evaluate the efficiency of your fuzzing session. See also afl-cov for live coverage analysis.

If you find bugs, or even just perform a fuzzing pass on your favorite open-source software, don't hesitate to get in touch with the maintainers of the project. From my experience, most of the time, maintainers will be happy to get free testing. Even if it's just to say that AFL didn't find anything in 3 days... It's already a badge of honor :-).

Thanks:

Many thanks to Yannick Moy that sparked the idea of this blog post after I talked his ear off for a year (was it two?) about AFL and fuzzing in Ada, and helped me proof-read it. Thanks to Gautier and Felix who were very reactive and nice about the reports, and who took some time to read drafts of this post. All your suggestions were very helpful.

]]>
Cross-referencing Ada with Libadalang http://blog.adacore.com/cross-referencing-ada-with-libadalang Mon, 18 Dec 2017 13:59:49 +0000 Pierre-Marie de Rodat http://blog.adacore.com/cross-referencing-ada-with-libadalang

Libadalang has come a long way since the last time we blogged about it. In the past 6 months, we have been working tirelessly on name resolution, a pretty complicated topic in Ada, and it is finally ready enough that we feel ready to blog about it, and encourage people to try it out.

WARNING: While pretty far along, the work is still not finished. It is expected that some statements and declarations are not yet resolved. You might also run into the occasional crash. Feel free to report that on our github!

In our last blog post, we learned how to use Libadalang’s lexical and syntactic analyzers in order to highlight Ada source code. You may know websites that display source code with cross-referencing information: this makes it possible to navigate from references to declarations. For instance elixir, Free Electrons’ Linux source code explorer: go to a random source file and click on an identifier. This kind of tool makes it very easy to explore an unknown code base.

So, we extended our code highlighter to generate cross-references links, as a showcase of Libadalang’s semantic analysis abilities. If you are lazy, or just want to play with the code, you can find a compilable set of source files for it at Libadalang’s repository on GitHub (look for ada2web.adb). If you are interested in how to use name resolution in your own programs, we will use this blog post to show how to use Libadalang’s name resolution to expand our previous code highlighter.

Note that if you haven’t read the previous blog post, we recommend you to read it as below, we assume familiarity with topics from it.

Where are my source files?

Unlike lexical and syntactic analysis, which process source files separately, semantic analysis works on a set of source files, or more precisely on a source files plus all its dependencies. This is logical: in order to understand an object declaration in foo.ads, one needs to know about the corresponding type, and if the type is declared in another source file (say bar.ads), both files are required for analysis.

By default, Libadalang assumes that all source files are in the current directory. That’s enough for toy source files, but not at all for real world projects, which are generally spread over multiple directories in a complex nesting scheme. Libadalang can’t know about the files layout of all Ada projects in the world, so we created an abstraction that enables anyone to tell it how to reach source files: the Libadalang.Analysis.Unit_Provider_Interface interface type. This type has exactly one abstract primitive: Get_Unit which, given a unit name and a unit kind (specification or body?) calls Analysis_Context’s Get_From_File or Get_From_Buffer to create the corresponding analysis unit.

In the context of a source code editor (for instance), this allows Libadalang to query a source file even if this file exists only in memory, not in a real source file, or if it’s more up-to-date in memory. Using a custom unit provider in Libadalang is easy: dynamically allocate a concrete implementation of this interface, then pass it to the Unit_Provider formal in Analysis_Context’s constructor: the Create function. Libadalang will take care of deallocating this object when the context is destroyed.

declare
   UP  : My_Unit_Provider_Access :=
      new My_Unit_Provider_Type …;
   Ctx : Analysis_Context := Create (Unit_Provider => UP);
   --  UP will be queried when performing name resolution
begin
   --  Do useful things, and then when done…
   Destroy (Ctx);
end;

Nowadays, a lot of Ada projects use GPRbuild and thus have a project file. That’s fortunate: project files give us exactly the information Libadalang needs: where are source files, what’s their naming scheme. Because of this, Libadalang provides a tagged type that implements this interface to deal with project files: Project_Unit_Provider_Type, from the Libadalang.Unit_Files.Projects package. In order to do this, one first need to load the project file using GNATCOLL.Projects:

declare
   Project_File : GNATCOLL.VFS.Virtual_File;
   Project      : GNATCOLL.Projects.Project_Tree_Access;
   Env          : GNATCOLL.Projects.Project_Environment_Access;
   UP           : Libadalang.Analysis.Unit_Provider_Access;
   Ctx          : Libadalang.Analysis.Analysis_Context; 
begin
   --  First load the project file
   Project := new Project_Tree;
   Initialize (Env);

   --  Initialize Project_File, set the target, create
   --  scenario variables, …
   Project.Load (Project_File, Env);

   --  Now create the unit provider and the analysis context.
   --  Is_Project_Owner is set to True so that the project
   --  is deallocated when UP is destroyed.
   UP := new Project_Unit_Provider_Type’
     (Create (Project, Env, True));
   Ctx := Create (Unit_Provider => UP);

   --  Do useful things, and then when done…
   Destroy (Ctx);
end;

Now that Libadalang knows where the source files are, we can ask it to resolve names!

Let’s jump to definitions

Just like in the highligther, most of the website generator will consist of asking Libadalang to parse source files (Get_From_File), checking for lexing/parsing errors (Has_Diagnostics, Diagnostics) and then dealing with AST nodes and tokens in analysis units. The new bits here are turning identifiers into hypertext links to redirect to their definition. As for highlighting classes, we do this token annotation with an array and a tree traversal:

Unit : Analysis_Unit := …;
--  Analysis unit to process

Xrefs : array (1 .. Token_Count (Unit)) of Basic_Decl :=
   (others => No_Basic_Decl);
--  For each token, the declaration to which the token should
--  link or No_Basic_Decl for no cross-reference.

function Process_Node
  (Node : Ada_Node’Class) return Visit_Status;
--  Callback for AST traversal. For string literals and
--  identifiers, annotate the corresponding in Xrefs to the
--  designated declaration, if found.

With these declarations, we can do the annotations easily:

Root (Unit).Traverse (Process_Node’Access);

But how Process_Node does its magic? That’s easy too:

function Process_Node
  (Node : Ada_Node’Class) return Visit_Status is
begin
   --  Annotate only tokens for string literals and
   --  identifiers.
   if Node.Kind not in Ada_String_Literal | Ada_Identifier
   then
      return Into;
   end if;

   declare
      Token : constant Token_Type :=
         Node.As_Single_Tok_Node.F_Tok;
      Idx   : constant Natural := Natural (Index (Token));
      Decl  : Basic_Decl renames Xrefs (Idx);
   begin
      Decl := Node.P_Referenced_Decl;
   exception
       when Property_Error => null;
   end;
end Process_Node;

String literal and identifier nodes both inherit from the Single_Tok_Node abstract node, hence the conversion to retrieve the underlying token. Then we locate which cell in the Xrefs array they correspond to. And finally we fill it with the result of the P_Referenced_Decl primitive. This function tries to fetch the declaration corresponding to Node. Easy I said!

What’s the exception handler for, you might ask, though? What we call AST node properties (all functions whose name starts with P_) can raise Property_Error exceptions. These can happen if Libadalang works on invalid Ada sources and cannot find query results. As name resolution is still actively developed, it can happen that this exception is raised even for valid source code: if that happens to you, please report this bug! Note that if a property raises an exception that is not a Property_Error, this is another kind of bug: please report it too!

Bind it all together

Now we have a list of Basic_Decl nodes to create hypertext links, but how can we do that? The trick is to get the name of the source file that contains this declaration, plus its source location:

Decl_Unit : constant Analysis_Unit := Decl.Get_Unit;
Decl_File : constant String := Get_Filename (Decl_Unit);
Decl_Line : constant Langkit_Support.Slocs.Line_Number :=
   Decl.Sloc_Range.Start_Line;

Then you can turn this information into a hypertext link. For example, if you generate X.html for the X source file (foo.ads.html for foo.ads, …) and generate LY HTML anchors for line number Y:

Line_No : constant String :=
   Natural'Image (Natural (Decl_Line));
Href : constant String :=
   Decl_File & ".html#L"
   & Line_No (Line_No'First + 1 .. Line_No'Last);

Some amount of plumbing is still needed to have a complete website generator:

  • get a list of all source files to process in the loaded project, using GNATCOLL.Projects’ API;

  • actually output HTML code: code from the previous blog can be reused and updated to do this;
  • generate an index HTML file as an entry point for navigation.

But as usual, covering all these topics will get out of the scope of this blog post and will make an unreasonable long essay. So thank you once more for reading this post to the end!

]]>
Make with Ada 2017- Ada Based IoT Framework http://blog.adacore.com/make-with-ada-2017-ada-based-iot-framework Wed, 13 Dec 2017 15:41:00 +0000 Manuel Iglesias Abbatermarco http://blog.adacore.com/make-with-ada-2017-ada-based-iot-framework

Summary

The Ada IoT Stack consists of an lwIp (“lightweight IP”) stack implementation written in Ada, with an associated high-level protocol to support embedded device connectivity nodes for today’s IoT world. The project was developed for the Make With Ada 2017 competition based on existing libraries and ported to embedded STM32 devices.

Motivation

Being a huge fan of IoT designs, I originally planned to work on a real device such an IoT node or gateway, while using the Ada language. I really enjoy writing programs but my roots are in hardware (I earned a B.S. in electronic engineering). Back then I was not really a programmer, but if you can master the intricacies of hardware I think that experience will eventually help make you a good programmer; I’m not sure it works the same way in the other direction.

With so many programming languages out there, it's difficult to gain experience with all of them. In my case I was not even aware of the Ada language until I found out about the contest. That got me to learn Ada: there is a saying that without deadlines nobody will finish on time, and the contest supplied both the motivation and a deadline. For me the best way to learn something is not to read a book chapter by chapter. Sometimes you need to read a “getting starting” guide to gain a basic knowledge, but after that if you don't have a problem to solve, your motivation might come to an end. I made my choice to continue, with the IoT project.

Network Stack

When I started the IoT project I soon realized that the Ada Drivers Library didn't provide a TCP/IP stack. The Etherscope software by Mr. Carrez from the 2016 Make with Ada contest provided the Ethernet connectivity and UDP protocol but not TCP. I started to look at how to implement the TCP/IP stack, but due to my lack of experience in Ada and the fact that contest was running for almost two months it would not have worked to do something from scratch. So I wrote to Mr. Chouteau (at AdaCore) asking for information about any Ada libraries that implement a TCP stack, and he referred me to a lwIp implementation in Ada and Spark 2014. It was on a Spark 2014 git repository that had not shown up on a Google search. As far as I could tell, the code was only tested on a Linux-flavor OS using a TAP interface. I spent a couple of weeks studying the code and getting it to work on my debian box, which was a little hacky at the end since the TAP C driver implementation (system calls) didn’t work as expected; I ended up coding the TAP interface by hand.

A TAP device operates at layer 2 of the OSI model, which means Ethernet frames. That makes sense here since lwIp implements networking from Ethernet datagrams. Then I realized that I could combine the Etherscope project with the lwIp implementation. After removing several parts of the Etherscope project to get only Ethernet Frames I was ready to feed the lwIp stack. It was not so easy, I spent some time porting the lwIp Ada code to the Ada version for embedded ARM. One obstacle I found was the use of xml files to describe the network datagrams and other structures in a way that's used by a program called xmlada to generate some Ada body files. These describe things like bit and byte positions of TCP flags or fields within the datagram. The problem was that the ARM version don't provide xmlada, so I end up copying the generated files in my project.

After quite some time I got the lwIp stack to work on my STM32F769I board. This was no easy task, especially because the STLink debugger is not so easy to work with. (For example semihosting is basically the only way to have debugger output in the form of "printf". This is really slow and basically interrupts the flow of program execution in a nasty way. The problem here is that the ST board doesn't provide a JTAG interface to the Cortex M4/M7 device, and the STlink on board doesn't have an SWO line connection.)

IoT Stack

The TCP/IP stack was just the beginning; it was really nice to see it working, but quickly gets boring. The original lwIp implements a TCP echo server: you open a socket, connect and then anything you send is replied by the server, which is not very useful for IoT. So I felt I was not making real progress, at least toward something that would give the judges a tangible project to evaluate. Again I was in a rush, this time with more knowledge of Ada but, as before, without wanting to write something from scratch.

One day I found the Simple Components Ada code by Mr. Kazakov. I really got to love it, but the problem was that after reading it a little bit I felt similar to the day I assisted my first German language "unterricht" years ago. I decided to continue spending time reading it until I finally figured out how to start porting it to my lwIp implementation. The first thing I ported was the MQTT client code, because of its simplicity in terms of dependencies on other “Simple Components” classes if we can refer to them like that. One problem solved here was the change in paradigm. Simple Components used Ada GNAT Sockets, and my lwip basically uses a callback scheme:, two different worlds, but previous experience with sockets helps since you know what the code is doing and what it should do in the new environment. At the end, the MQTT port consumed more time than expected since I not only ported the Simple Components but I also added code from the existing MQTT lwIp implementation in C to make up for the lack of timers.

It was really difficult for me to figure out how the connection state can be recovered when the callback executes. For example when a connection is made, certain variables are initialized and kept in a data structure, but after the callback returns, the structure needs to be preserved so that it can be recovered by a connection event and used in the corresponding callback.

The MQTT client gave me the insights and the experience to continue working and to try the more complicated HTTP protocol, this time improving the callback association. The port of the HTTP server was where problems with the lwIp implementation start to arise. I am almost sure that the lwIp code was only tested in a specific TCP/IP echo-controlled environment; the problem is that a TCP connection can behave differently, or more precisely has different scenarios (e.g., when closing a connection), so I ended up "patching the code" to behave as closely as possible to the standard. I also fixed some memory allocation problems with the lwIp Pcb's of the original stack. Nonetheless if you decide to try the code please be aware that the code should be treated as a "development " version.

Ultimately there was not enough time to finish my IoT Node as I had initially intended. The good part is that I really enjoy solving this kind of problem.

Work in progress

I really was impressed with Ada, it has the power to do things that in other languages like C/C++ would be much too prone error-prone. The lack of a good debugger was offset by the increased productivity in writing code that had fewer bugs in the first place. I hope to have some time to continue working on this project in the near future.

To see Manuel's full project log click here.

]]>
Welcoming New Members to the GNAT Pro Family http://blog.adacore.com/adacore-launches-new-gnat-pro-product-lines Wed, 29 Nov 2017 09:10:29 +0000 Jamie Ayre http://blog.adacore.com/adacore-launches-new-gnat-pro-product-lines

As we see the importance of software grow in applications, the quality of that software has become more and more important. Even outside the mission- and safety-critical arena customers are no longer accepting software failures (the famous blue screens of death, and there are many...). Ada has a very strong answer here and we are seeing more and more interest in using the language from a range of industries. It is for this reason that we have completed our product line by including an entry-level offer for C/C++ developers wanting to switch to Ada and reinforced our existing offer with GNAT Pro Assurance for programmers building the most robust software platforms with life cycles spanning decades.

This recent press release explains the positioning of the GNAT Pro family. Details of the full product range can be found here.

We’d love to chat more with you about these new products and how Ada can keep you ahead of your competitors. Drop us an email to info@adacore.com.



]]>
There's a mini-RTOS in my language http://blog.adacore.com/theres-a-mini-rtos-in-my-language Thu, 23 Nov 2017 10:06:15 +0000 Fabien Chouteau http://blog.adacore.com/theres-a-mini-rtos-in-my-language

The first thing that struck me when I started to learn about the Ada programing language was the tasking support. In Ada, creating tasks, synchronizing them, sharing access to resources, are part of the language

In this blog post I will focus on the embedded side of things. First because it's what I like, and also because it's much more simple :)

For real-time and embedded applications, Ada defines a profile called `Ravenscar`. It's a subset of the language designed to help schedulability analysis, it is also more compatible with platforms such as micro-controllers that have limited resources.

So this will not be a complete lecture on Ada tasking. I might do a follow-up with some more tasking features, if you ask for it in the comments ;)

Tasks

So the first thing is to create tasks, right?

There are two ways to create tasks in Ada, first you can declare and implement a single task:

   --  Task declaration
   task My_Task;
   --  Task implementation
   task body My_Task is
   begin
      --  Do something cool here...
   end My_Task;

If you have multiple tasks doing the same job or if you are writing a library, you can define a task type:

   --  Task type declaration
   task type My_Task_Type;
   --  Task type implementation
   task body My_Task_Type is
   begin
      --  Do something really cool here...
   end My_Task_Type;

And then create as many tasks of this type as you want:

   T1 : My_Task_Type;
   T2 : My_Task_Type;

One limitation of Ravenscar compared to full Ada, is that the number of tasks has to be known at compile time.

Time

The timing features of Ravenscar are provided by the package (you guessed it) Ada.Real_Time.

In this package you will find:

  •  a definition of the Time type which represents the time elapsed since the start of the system
  •  a definition of the Time_Span type which represents a period between two Time values
  •  a function Clock that returns the current time (monotonic count since the start of the system)
  •  Various sub-programs to manipulate Time and Time_Span values

The Ada language also provides an instruction to suspend a task until a given point in time: delay until.

Here's an example of how to create a cyclic task using the timing features of Ada.

  task body My_Task is
      Period       : constant Time_Span := Milliseconds (100);
      Next_Release : Time;
   begin
      --  Set Initial release time
      Next_Release := Clock + Period;

      loop
         --  Suspend My_Task until the Clock is greater than Next_Release
         delay until Next_Release;

         --  Compute the next release time
         Next_Release := Next_Release + Period;
         
         --  Do something really cool at 10Hz...
      end loop;

   end My_Task;

Scheduling

Ravenscar has priority-based preemptive scheduling. A priority is assigned to each task and the scheduler will make sure that the highest priority task - among the ready tasks - is executing.

A task can be preempted if another task of higher priority is released, either by an external event (interrupt) or at the expiration of its delay until statement (as seen above).

If two tasks have the same priority, they will be executed in the order they were released (FIFO within priorities).

Task priorities are static, however we will see below that a task can have its priority temporary escalated.

The task priority is an integer value between 1 and 256, higher value means higher priority. It is specified with the Priority aspect:

   Task My_Low_Priority_Task
     with Priority => 1;

   Task My_High_Priority_Task
     with Priority => 2;

Mutual exclusion and shared resources

In Ada, mutual exclusion is provided by the protected objects.

At run-time, the protected objects provide the following properties:

  • There can be only one task executing a protected operation at a given time (mutual exclusion)
  • There can be no deadlock

In the Ravenscar profile, this is achieved with Priority Ceiling Protocol.

A priority is assigned to each protected object, any tasks calling a protected sub-program must have a priority below or equal to the priority of the protected object.

When a task calls a protected sub-program, its priority will be temporarily raised to the priority of the protected object. As a result, this task cannot be preempted by any of the other tasks that potentially use this protected object, and therefore the mutual exclusion is ensured.

The Priority Ceiling Protocol also provides a solution to the classic scheduling problem of priority inversion.

Here is an example of protected object:

   --  Specification
   protected My_Protected_Object
     with Priority => 3
   is

      procedure Set_Data (Data : Integer);
      --  Protected procedues can read and/or modifiy the protected data
      
      function Data return Integer;
      --  Protected functions can only read the protected data

   private
   
      --  Protected data are declared in the private part
      PO_Data : Integer := 0;
   end;
   --  Implementation
   protected body My_Protected_Object is

      procedure Set_Data (Data : Interger) is
      begin
         PO_Data := Data;
      end Set_Data;

      function Data return Integer is
      begin
         return PO_Data;
      end Data;
   end My_Protected_Object;

Synchronization

Another cool feature of protected objects is the synchronization between tasks.

It is done with a different kind of operation called an entry.

An entry has the same properties as a protected procedure except it will only be executed if a given condition is true. A task calling an entry will be suspended until the condition is true.

This feature can be used to synchronize tasks. Here's an example:

   protected My_Protected_Object is
      procedure Send_Signal;
      entry Wait_For_Signal;
   private
      We_Have_A_Signal : Boolean := False;
   end My_Protected_Object;
   protected body My_Protected_Object is

      procedure Send_Signal is
      begin
          We_Have_A_Signal := True;
      end Send_Signal;    
      
      entry Wait_For_Signal when We_Have_A_Signal is
      begin
          We_Have_A_Signal := False;
      end Wait_For_Signal;
   end My_Protected_Object;

Interrupt Handling

Protected objects are also used for interrupt handling. Private procedures of a protected object can be attached to an interrupt using the Attach_Handler aspect.

   protected My_Protected_Object
     with Interrupt_Priority => 255
   is
   
   private
   
      procedure UART_Interrupt_Handler
        with Attach_Handler => UART_Interrupt;
   
   end My_Protected_Object;

Combined with an entry it provides and elegant way to handle incoming data on a serial port for instance:

   protected My_Protected_Object
     with Interrupt_Priority => 255
   is
      entry Get_Next_Character (C : out Character);
      
   private
      procedure UART_Interrupt_Handler
              with Attach_Handler => UART_Interrupt;
      
      Received_Char  : Character := ASCII.NUL;
      We_Have_A_Char : Boolean := False;
   end
   protected body My_Protected_Object is

      entry Get_Next_Character (C : out Character) when We_Have_A_Char is
      begin
          C := Received_Char;
          We_Have_A_Char := False;
      end Get_Next_Character;
      
      procedure UART_Interrupt_Handler is
      begin
          Received_Char  := A_Character_From_UART_Device;
          We_Have_A_Char := True;
      end UART_Interrupt_Handler;      
   end

A task calling the entry Get_Next_Character will be suspended until an interrupt is triggered and the handler reads a character from the UART device. In the meantime, other tasks will be able to execute on the CPU.

Multi-core support

Ada supports static and dynamic allocation of tasks to cores on multi processor architectures. The Ravenscar profile restricts this support to a fully partitioned approach were tasks are statically allocated to processors and there is no task migration among CPUs. These parallel tasks running on different CPUs can communicate and synchronize using protected objects.

The CPU aspect specifies the task affinity:

   task Producer with CPU => 1;
   task Consumer with CPU => 2;
   --  Parallel tasks statically allocated to different cores

Implementations

That's it for the quick overview of the basic Ada Ravenscar tasking features.

One of the advantages of having tasking as part of the language standard is the portability, you can run the same Ravenscar application on Windows, Linux, MacOs or an RTOS like VxWorks. GNAT also provides a small stand alone run-time that implements the Ravenscar tasking on bare metal. This run-time is available, for instance, on ARM Cortex-M micro-controllers.

It's like having an RTOS in your language.

]]>
Make with Ada 2017- A "Swiss Army Knife" Watch http://blog.adacore.com/make-with-ada-2017-a-swiss-army-knife-watch Wed, 22 Nov 2017 14:49:00 +0000 J. German Rivera http://blog.adacore.com/make-with-ada-2017-a-swiss-army-knife-watch

Summary 

The Hexiwear is an IoT wearable development board that has two NXP Kinetis microcontrollers. One is a K64F (Cortex-M4 core) for running the main embedded application software. The other one is a KW40 (Cortex M0+ core) for running a wireless connectivity stack (e.g., Bluetooth BLE or Thread). The Hexiwear board also has a rich set of peripherals, including OLED display, accelerometer, magnetometer, gryroscope, pressure sensor, temperature sensor and heart-rate sensor. This blog article describes the development of a "Swiss Army Knife" watch on the Hexiwear platform. It is a bare-metal embedded application developed 100% in Ada 2012, from the lowest level device drivers all the way up to the application-specific code, for the Hexiwear's K64F microcontroller.

I developed Ada drivers for Hexiwear-specific peripherals from scratch, as they were not supported by AdaCore's Ada drivers library. Also, since I wanted to use the GNAT GPL 2017 Ada compiler but the GNAT GPL distribution did not include a port of the Ada Runtime for the Hexiwear board, I also had to port the GNAT GPL 2017 Ada runtime to the Hexiwear. All this application-independent code can be leveraged by anyone interested in developing Ada applications for the Hexiwear wearable device.


Project Overview

The purpose of this project is to develop the embedded software of a "Swiss Army Knife" watch in Ada 2012 on the Hexiwear wearable device.

The Hexiwear is an IoT wearable development board that has two NXP Kinetis microcontrollers. One is a K64F (Cortex-M4 core) for running the main embedded application software. The other one is a KW40 (Cortex M0+ core) for running a wireless connectivity stack (e.g., Bluetooth BLE or Thread). The Hexiwear board also has a rich set of peripherals, including OLED display, accelerometer, magnetometer, gryroscope, pressure sensor, temperature sensor and heart-rate sensor.

The motivation of this project is two-fold. First, to demonstrate that the whole bare-metal embedded software of this kind of IoT wearable device can be developed 100% in Ada, from the lowest level device drivers all the way up to the application-specific code. Second, software development for this project will produce a series of reusable modules that can be used in the future as a basis for creating "labs" for teaching an Ada 2012 embedded software development class using the Hexiwear platform. Given the fact that the Hexiwear platform is a very attractive platform for embedded software development, its appeal can be used to attract more embedded developers to learn Ada.

The scope of the project will be to develop only the firmware that runs on the application microcontroller (K64F). Ada drivers for Hexiwear-specific peripherals need to be developed from scratch, as they are not supported by AdaCore’s Ada drivers library. Also, since I will be using the GNAT GPL 2017 Ada compiler and the GNAT GPL distribution does not include a port of the Ada Runtime for the Hexiwear board, the GNAT GPL 2017 Ada runtime needs to be ported to the Hexiwear board.

The specific functionality of the watch application for the time frame of "Make with Ada 2017" will include:

  • Watch mode: show current time, date, altitude and temperature
  • Heart rate monitor mode: show heart rate (when Hexiwear worn on the wrist)
  • G-Forces monitor mode: show G forces in the three axis (X, Y, Z). 

In addition, when the Hexiwear is plugged to a docking station, a command-line interface will be provided over the UART port of the docking station. This interface can be used to set configurable parameters of the watch and to dump debugging information.

Summary of Accomplishments 

I designed and implemented the "Swiss Army Knife" watch application and all necessary peripheral drivers 100% in Ada 2012. The only third-party code used in this project, besides the GNAT GPL Ravenscar SFP Ada runtime library, is the following:

  • A font generator, leveraged from AdaCore’s Ada drivers library. 
  • Ada package specification files, generated by the the svd2ada tool, containing declarations for the I/O registers of the Kinetis K64F’s peripherals.

Below are some diagrams depicting the software architecture of the "Swiss Army Knife" watch:

The current implementation of the "Swiss Army Knife" watch firmware, delivered for "Make with Ada 2017" has the following functionality:

  • Three operating modes: 
  • Watch mode: show current time, date, altitude and temperature
  • Heart rate monitor mode: show heart rate monitor raw reading, when Hexiwear worn on the wrist.
  • G-Forces monitor mode: show G forces in the three axis (X, Y, Z). 
  • To switch between modes, the user just needs to do a double-tap on the watch’s display. When the watch is first powered on, it starts in "watch mode".
  • To make the battery charge last longer, the microcontroller is put in deep-sleep mode (low-leakage stop mode) and the display is turned off, after 5 seconds. A quick tap on the watch’s display, will wake it up. Using the deep-sleep mode, makes it possible to extend the battery life from 2 hours to 12 hours, on a single charge. Indeed, now I can use the Hexiwear as my personal watch during the day, and just need to charge it at night.
  • A command-line interface over UART, available when the Hexiwear device is plugged to its docking station. This interface is used for configuring the following attributes of the watch: 
    1. Current time (not persistent on power loss, but persistent across board resets)
    2. Current date (not persistent on power loss, but persistent across board resets)
    3. Background color (persistent on the K64F microcontroller’s NOR flash, unless firmware is re-imaged)
    4. Foreground color (persistent on the K64F microcontroller’s NOR flash, unless firmware is re-imaged)

Also, the command-line interface can be used for dumping the different debugging logs: info, error and debug.

The top-level code of the watch application can be found at: https://github.com/jgrivera67/make-with-ada/tree/master/hexiwear_watch

I ported the GNAT GPL 2017 Ravenscar Small-Foot-Print Ada runtime library to the Hexiwear board and modify it to support the memory protection unit (MPU) of the K64 microcontroller, and more specifically to support MPU-aware tasks. This port of the Ravenscar Ada runtime can be found in GitHub at https://github.com/jgrivera67/embedded-runtimes

The relevant folders are:

I developed device drivers for the following peripherals of the Kinetis K64 micrcontroller as part of this project and contributed their open-source Ada code in GitHub at https://github.com/jgrivera67/make-with-ada/tree/master/drivers/mcu_specific/nxp_kinetis_k64f:

  • DMA Engine
  • SPI controller
  • I2C controller
  • Real-time Clock (RTC)
  • Low power management
  • NOR flash

These drivers are application-independent and can be easily reused for other Ada embedded applications that use the Kinetis K64F microcontroller.

I developed device drivers for the following peripherals of the Hexiwear board as part of this project and contributed their open-source Ada code in GitHub at https://github.com/jgrivera67/make-with-ada/tree/master/drivers/board_specific/hexiwear:

  • OLED graphics display (on top of the SPI and DMA engine drivers, and making use of the advanced DMA channel linking functionality of the Kinetis DMA engine)
  • 3-axis accelerometer (on top of the I2C driver)
  • Heart rate monitor (on top of the I2C driver)
  • Barometric pressure sensor (on top of the I2C driver)

These drivers are application-independent and can be easily reused for other Ada embedded applications that use the Hexiwear board.

I designed the watch application and its peripheral drivers, to use the memory protection unit (MPU) of the K64F microcontroller, from the beginning, not as an afterthought. Data protection at the individual data object level is enforced using the MPU, for the private data structures from every Ada package linked into the application. For this, I leveraged the MPU framework I developed earlier and which I presented in the Ada Europe 2017 Conference. Using this MPU framework for the whole watch application demonstrates that the framework scales well for a realistic-size application. The code of this MPU framework is part of a modified Ravenscar small-foot-print Ada runtime library port for the Kinetis K64F microcontroller, whose open-source Ada code I contributed at https://github.com/jgrivera67/embedded-runtimes/tree/master/bsps/kinetis_k64f_common/bsp

I developed a CSP model of the Ada task architecture of the watch firmware with the goal of formally verifying that it is deadlock free, using the FDR4 tool. Although I ran out of time to successfully run the CSP model through the FDR tool, developing the model helped me gain confidence about the overall completeness of the Ada task architecture as well as the completeness of the watch_task’s state machine. Also, the CSP model itself is a form a documentation that provides a high-level formal specification of the Ada task architecture of the watch code.

Hexiwear "Swiss Army Knife" watch developed in Ada 2012

Open

My Project’s code is under the BSD license It’s hosted on Github. The project is divided in two repositories:

To do the development, I used the GNAT GPL 2017 toolchain - ARM ELF format (hosted on Windows 10 or Linux), including the GPS IDE. I also used the svd2ada tool to generate Ada code from SVD XML files for the Kinetis microcontrollers I used.

Collaborative

I designed this Ada project to make it easy for others to leverage my code. Anyone interested in developing their own flavor of "smart" watch for the Hexiwear platform can leverage code from my project. Also, anyone interested in developing any type of embedded application in Ada/SPARK for the Hexiwear platform can leverage parts of the software stack I have developed in Ada 2012 for the Hexiwear, particularly device drivers and platform-independent/application-independent infrastructure code, without having to start from scratch as I had to. All you need is to get your own Hexiwear development kit (http://www.hexiwear.com/shop/), clone my Git repositories mentioned above and get the GNAT GPL 2017 toolchain for ARM Cortex-M (http://libre.adacore.com/download/, choose ARM ELF format).

The Hexiwear platform is an excellent platform to teach embedded software development in general, and in Ada/SPARK in particular, given its rich set of peripherals and reasonable cost. The Ada software stack that I have developed for the Hexiwear platform can be used as a base to create a series of programming labs as a companion to courses in Embedded and Real-time programming in Ada/SPARK, from basic concepts and embedded programming techniques, to more advanced topics such as using DMA engines, power management, connectivity, memory protection and so on. 

Dependable

  • I used the memory protection unit to enforce data protection at the granularity of individual non-local data objects, throughout the code of the watch application and its associated device drivers.
  • I developed a CSP model of the Ada task architecture of the watch firmware with the goal of formally verifying that it is deadlock free, using the FDR4 tool. Although I ran out of time to successfully run the CSP model through the FDR tool, developing the model helped me gain confidence about the completeness of the watch_task’s state machine and the overall completeness of the Ada task architecture of the watch code.
  • I consistently used the information hiding principles to architect the code to ensure high levels of maintainability and portability, and to avoid code duplication across projects and across platforms.
  • I leveraged extensively the data encapsulation and modularity features of the Ada language in general, such as private types and child units including private child units, and in some cases subunits and nested subprograms.
  • I used gnatdoc comments to document key data structures and subprograms. 
  • I used Ada 2012 contract-based programming features and assertions extensively
  • I used range-based types extensively to leverage the Ada power to detect invalid integer values.
  • I Used Ada 2012 aspect constructs wherever possible
  • I Used GNAT coding style. I use the -gnaty3abcdefhiklmnoOprstux GNAT compiler option to check compliance with this standard.
  • I used GNAT flags to enable rigorous compile-time checks, such as -gnato13 -gnatf -gnatwa -gnatVa -Wall.

Inventive

  • The most innovative aspect of my project is the use of the memory protection unit (MPU) to enforce data protection at the granularity of individual data objects throughout the entire code of the project (both application code and device drivers). Although the firmware of a watch is not a safety-critical application, it serves as a concrete example of a realistic-size piece of embedded software that uses the memory protection unit to enforce data protection at this level of granularity. Indeed, this project demonstrates the feasibility and scalability of using the MPU-based data protection approach that I presented at the Ada Europe 2017 conference earlier this year, for true safety-critical applications.

CSP model of the Ada task architecture of the watch code, with the goal of formally verifying that the task architecture was deadlock free, using the FDR4 model checking tool. Although I ran out of time to successfully run the CSP model through the FDR tool, the model itself provides a high-level formal specification of the Ada task architecture of the watch firmware, which is useful as a concise form of documentation of the task architecture, that is more precise than the task architecture diagram alone.

Future Work

Short-Term Future Plans

  • Implement software enhancements to existing features of the "Swiss Army Knife" watch:

    • Calibrate accuracy of altitude and temperature readings
    • Calibrate accuracy of heart rate monitor reading
    • Display heart rate in BPM units instead of just showing the raw sensor reading
    • Calibrate accuracy of G-force readings
  • Develop new features of the "Swiss Army Knife" watch:

    • Display compass information (in watch mode). This entails extending the accelerometer driver to support the built-in magnetometer
    • Display battery charge remaining. This entails writing a driver for the K64F’s A/D converter to interface with the battery sensor
    • Display gyroscope reading. This entails writing a driver for the Hexiwear’s Gyroscope peripheral
    • Develop Bluetooth connectivity support, to enable the Hexiwear to talk to a cell phone over blue tooth as a slave and to talk to other Bluetooth slaves as a master. As part of this, a Bluetooth "glue" protocol will need to be developed for the K64Fto communicate with the Bluetooth BLE stack running on the KW40, over a UART. For the BLE stack itself, the one provided by the chip manufacturer will be used. A future challenge could entail to write an entire Bluetooth BLE stack in Ada to replace the manufacturer’s KW40 firmware.
  • Finish the formal analysis of the Ada task architecture of the watch code, by successfully running its CSP model through the FDR tool to verify that the architecture is deadlock free, divergence free and that it satisfies other applicable safety and liveness properties.

Long-Term Future Plans

  • Develop more advanced features of the "Swiss Army Knife" watch:

    • Use sensor fusion algorithms combining readings from accelerometer and gyroscope
    • Add Thread connectivity support, to enable the watch to be an edge device in an IoT network (Thread mesh). This entails developing a UART-based interface to the Hexiwear’s KW40 (which would need to be running a Thread (804.15) stack).
    • Use the Hexiwear device as a dash-mounted G-force recorder in a car. Sense variations of the 3D G-forces as the car moves, storing the G-force readings in a circular buffer in memory, to capture the last 10 seconds (or more depending on available memory) of car motion. This information can be extracted over bluetooth. 
    • Remote control of a Crazyflie 2.0 drone, from the watch, over Bluetooth. Wrist movements will be translated into steering commands for the drone.
  • Develop a lab-based Ada/SPARK embedded software development course using the Hexiwear platform, leveraging the code developed in this project. This course could include the following topics and corresponding programming labs:
    1. Accessing I/O registers in Ada.
      • Lab: Layout of I/O registers in Ada and the svd2ada tool
    2. Pin-level I/O: Pin Muxer and GPIO. 
      • Lab: A Traffic light using the Hexiwear’s RGB LED
    3. Embedded Software Architectures: Cyclic Executive.
      • Lab: A watch using the real-time clock (RTC) peripheral with polling
    4. Embedded Software Architectures: Main loop with Interrupts.
      • Lab: A watch using the real-time clock (RTC) with interrupts
    5. Embedded Software Architectures: Tasks.
      • Lab: A watch using the real-time clock (RTC) with tasks
    6. Serial Console.
      • Lab: UART driver with polling and with interrupts)
    7. Sensors: A/D converter.
      • Lab: Battery charge sensor
    8. Actuators: Pulse Width Modulation (PWM).
      • Lab: Vibration motor and light dimmer
    9. Writing to NOR flash.
      • Lab: Saving config data in NOR Flash
    10. Inter-chip communication and complex peripherals: I2C.
      • Lab: Using I2C to interface with an accelerometer
    11. Inter-chip communication and complex peripherals: SPI.
      • Lab: OLED display
    12. Direct Memory Access I/O (DMA).
      • Lab: Measuring execution time and making OLED display rendering faster with DMA
    13. The Memory Protection Unit (MPU).
      • Lab: Using the memory protection unit
    14. Power Management.
      • Lab: Using a microcontroller’s deep sleep mode
    15. Cortex-M Architecture and Ada Startup Code.
      • Lab: Modifying the Ada startup code in the Ada runtime library to add a reset counter)
    16. Recovering from failure.
      • Lab: Watchdog timer and software-triggered resets
    17. Bluetooth Connectivity
      • Lab: transmit accelerometer readings to a cell phone over Bluetooth
]]>
Physical Units Pass the Generic Test http://blog.adacore.com/physical-units-pass-the-generic-test Thu, 16 Nov 2017 07:30:00 +0000 Yannick Moy http://blog.adacore.com/physical-units-pass-the-generic-test

The support for physical units in programming languages is a long-standing issue, which very few languages have even attempted to solve. This issue was mostly solved for Ada in 2012 by our colleagues Ed Schonberg and Vincent Pucci, who introduced special aspects for specifying physical dimensions on types. An aspect Dimension_System allows the programmer to define a new system of physical units, while an aspect Dimension allows setting the dimensions of a given subtype in the dimension system of its parent type. The dimension system in GNAT is completely checked at compile time, with no impact on the executable size or execution time, and it offers a number of facilities for defining units, managing fractional dimensions and printing out dimensioned quantities. For details, see the article "Implementation of a simple dimensionality checking system in Ada 2012" presented at the ACM SIGAda conference HILT 2012 (also attached below).

The GNAT dimension system did not attempt to deal with generics though. As noted in the previous work by Grein, Kazakov and Wilson in "A survey of Physical Units Handling Techniques in Ada":

The conflict between the requirements 1 [Compile-Time Checks] and 2 [No Memory / Speed Overhead at Run-Time] on one side and requirement 4 [Generic Programming] on the other is the source of the problem of dimension handling.

So the solution in GNAT solved 1 and 2 while allowing generic programming but leaving it unchecked for dimensionality correctness. Here is the definition of generic programming by Grein, Kazakov and Wilson:

Here generic is used in a wider sense, as an ability to write code dealing with items of types from some type sets.  In our case it means items of different dimension. For instance, it should be possible to write a dimension-aware integration program, which would work for all valid combinations of dimensions.

This specific issue of programming a generic integration over time that would work for a variety of dimensions was investigated earlier this year by our partners from Technical University of Munich for their Glider autopilot software in SPARK. Working with them, we found a way to upgrade the dimensionality analysis in GNAT to support generic programming, which recently has been implemented in GNAT.

The goal was to apply dimensionality analysis on the instances of a generic, and to preserve dimension as much as possible across type conversions. In our upgraded dimensionality analysis in GNAT, a conversion from a dimensioned type (say, a length) to its dimensionless base type (the root of the dimension system) now preserves the dimension (length here), but will look like valid Ada code to any other Ada compiler. Which makes it possible to define a generic function Integral as follows:

    generic
        type Integrand_Type is digits <>;
        type Integration_Type is digits <>;
        type Integrated_Type is digits <>;
    function Integral (X : Integrand_Type; T : Integration_Type) return Integrated_Type;

    function Integral (X : Integrand_Type; T : Integration_Type) return Integrated_Type is
    begin
       return Integrated_Type (Mks_Type(X) * Mks_Type(T));
    end Integral;

We are using above the standard Mks_Type defined in System.Dim.Mks as root type, but we could do the same with the root of a user-defined dimension system. The generic code is valid Ada since the arguments X and T are converted into the same type (here Mks_Type), in order to multiply them without compilation error. The result, which is now of type Mks_Type, is then converted to the final type Integrated_Type. With the original dimensionality analysis in GNAT, dimensions were lost during these type conversions. 

However, with the upgraded analysis in GNAT, a conversion to the root type indicates that the dimensions of X and T have to be preserved and tracked when converting both to and from the root type Mks_Type. With this, still using the standard types defined in System.Dim.Mks, we can define an instance of this generic that integrates speed over time:

    function Velocity_Integral is new Integral(Speed, Time, Length);

The GNAT-specific dimensionality analysis will perform additional checks for correct dimensionality in all such generic instances, while for any other Ada compiler this program still passes as valid (but not dimensionality-checked) Ada code. For example, for an invalid instance as:

    function Bad_Velocity_Integral is new Integral(Speed, Time, Mass);

GNAT issues the error:

dims.adb:10:05: instantiation error at line 5
dims.adb:10:05: dimensions mismatch in conversion
dims.adb:10:05: expression has dimension [L]
dims.adb:10:05: target type has dimension [M]

One subtlety that we faced when developing the Glider software at Technical University of Munich was that, sometimes, we do want to convert a value from a dimensioned type into another dimensioned type. This was the case in particular because we defined our own dimension system in which angles had their own dimension to verify angular calculations, which worked well most of the time.

However, the angle dimension must be removed when multiplying an angle with a length, which produces an (arc) length, or when using an existing trigonometric function that expects a dimensionless argument. In theses cases, a simple type conversion to the dimensionless root type is not enough, because now the dimension of the input is preserved. We found two solutions to this problem:

  • either define the root of our dimension system as a derived type from a parent type, say Base_Unit_Type, and convert to/from Base_Unit_Type to remove dimensions; or
  • explicitly insert conversion coefficients into the equations with dimensions such that the dimensions do cancel out as required.

For example, our use of an explicit Angle_Type with its own dimension (denoted A) first seemed to cause trouble because of conversions such as this one:

Distance := 2.0 * EARTH_RADIUS * darc; -- expected L, found L.A

where darc is of Angle_Type (dimension A) and EARTH_RADIUS of Length_Type (dimension L). First, we escaped the unit system as follows:

Distance := 2.0 * EARTH_RADIUS * Unit_Type(Base_Unit_Type(darc));

However, this bypasses the dimensionality checking system and can lead to dangerous mixing of physical dimensions. It would be possible to accidentally turn a temperature into a distance, without any warning. A safer way to handle this issue is to insert the missing units explicitly:

Distance := 2.0 * EARTH_RADIUS * darc * 1.0/Radian;

Here, Radian is the unit of Angle_Type, which we need to get rid of to turn an angle into a distance. In other words, the last term represents a coefficient with the required units to turn an angle into a distance. Thus, darc*1.0/Radian still carries the same value as darc, but is dimensionless as required per the equation, and GNAT can perform a dimensionality analysis also in such seemingly dimensionality-defying situations.

Moreover, this solution is less verbose than converting to the base unit type and then back. In fact, it can be made even shorter:

Distance := 2.0 * EARTH_RADIUS * darc/Radian;

With its improved dimensionality analysis, GNAT Pro 18 has solved the conflict between requirements 1 [Compile-Time Checks] and 2 [No Memory / Speed Overhead at Run-Time] on one side and requirement 4 [Generic Programming] on the other side, hopefully making Grein, Kazakov and Wilson happier! The dimensionality analysis in GNAT is a valuable feature for programs that deal with physical units. It increases readability by making dimensions more explicit and it reduces programming errors by checking the dimensions for consistency. For example, we used it on the StratoX Weather Glider from Technical University of Munich, as well as the RESSAC User Case, an example of autonomous vehicle development used as challenge for certification.

For more information on the dimensionality analysis in GNAT, see the GNAT User's Guide. In particular, the new rules that deal with conversions are at the end of the section, and we copy them verbatim below:

  The dimension vector of a type conversion T(expr) is defined as follows, based on the nature of T:
 -  If T is a dimensioned subtype then DV(T(expr)) is DV(T) provided that either expr is dimensionless or DV(T) = DV(expr). The conversion is illegal if expr is dimensioned and DV(expr) /= DV(T). Note that vector equality does not require that the corresponding Unit_Names be the same.
    As a consequence of the above rule, it is possible to convert between different dimension systems that follow the same international system of units, with the seven physical components given in the standard order (length, mass, time, etc.). Thus a length in meters can be converted to a length in inches (with a suitable conversion factor) but cannot be converted, for example, to a mass in pounds.
 -  If T is the base type for expr (and the dimensionless root type of the dimension system), then DV(T(expr)) is DV(expr). Thus, if expr is of a dimensioned subtype of T, the conversion may be regarded as a “view conversion” that preserves dimensionality.
    This rule makes it possible to write generic code that can be instantiated with compatible dimensioned subtypes. The generic unit will contain conversions that will consequently be present in instantiations, but conversions to the base type will preserve dimensionality and make it possible to write generic code that is correct with respect to dimensionality.
 -  Otherwise (i.e., T is neither a dimensioned subtype nor a dimensionable base type), DV(T(expr)) is the empty vector. Thus a dimensioned value can be explicitly converted to a non-dimensioned subtype, which of course then escapes dimensionality analysis.

Thanks to Ed Schonberg and Ben Brosgol from AdaCore for their work on the design and implementation of this enhanced dimensionality analysis in GNAT.

]]>
Make with Ada 2017: Brushless DC Motor Controller http://blog.adacore.com/make-with-ada-2017-brushless-dc-motor-controller Tue, 14 Nov 2017 14:33:39 +0000 Jonas Attertun http://blog.adacore.com/make-with-ada-2017-brushless-dc-motor-controller

Not long after my first experience with the Ada programming language I got to know about the Make With Ada 2017 contest. And, just as it seemed, it turned out to be a great way to get a little bit deeper into the language I had just started to learn

The ada-motorcontrol project involves the design of a BLDC motor controller software platform written in Ada. These types of applications often need to be run fast and the core control software is often tightly connected to the microcontroller peripherals. Coming from an embedded systems perspective with C as the reference language, the initial concerns were if an implementation in Ada actually could meet these requirements.

It turned out, on the contrary, that Ada is very capable considering both these requirements. In particular, accessing peripherals on the STM32 with help of the Ada_Drivers_Library really made using the hardware related operations even easier than using the HAL written in C by ST.

Throughout the project I found uses for many of Ada’s features. For example, the representation clause feature made it simple to extract data from received (and to put together the transmit) serial byte streams. Moreover, contract based programming and object oriented concepts such as abstracts and generics provided means to design clean and easy to use interfaces, and a well organized project.

One of the objectives of the project was to provide a software platform to help developing various motor control applications, with the core functionality not being dependent on some particular hardware. Currently however it only supports a custom inverter board, since unfortunately I found that the HAL provided in Ada_Drivers_Library was not comprehensive enough to support all the peripheral features used. But the software is organized such as to keep driver dependent code separated. To put this to test, I welcome contributions to add support for other inverter boards. A good start would be the popular VESC-board.


Ada Motor Controller Project Log:

Motivation

The recent advances in electric drives technologies (batteries, motors and power electronics) has led to increasingly higher output power per cost, and power density. This in turn has increased the performance of existing motor control applications, but also enabled some new - many of them are popular projects amongst diyers and makers, e.g. electric bike, electric skateboard, hoverboard, segway etc. 

On a hobby-level, the safety aspects related to these is mostly ignored. Professional development of similar applications, however, normally need to fulfill some domain specific standards putting requirements on for example the development process, coding conventions and verification methods. For example, the motor controller of an electric vehicle would need to be developed in accordance to ISO 26262, and if the C language is used, MISRA-C, which defines a set of programming guidelines that aims to prevent unsafe usage of the C language features. 

Since the Ada programming language has been designed specifically for safety critical applications, it could be a good alternative to C for implementing safe motor controllers used in e.g. electric vehicle applications. For a comparison of MISRA-C and Ada/SPARK, see this report. Although Ada is an alternative for achieving functional safety, during prototyping it is not uncommon that a mistake leads to destroyed hardware (burned motor or power electronics). Been there, done that! The stricter compiler of Ada could prevent such accidents. 

Moreover, while Ada is not a particularly "new" language, it includes more features that would be expected by a modern language, than is provided by C. For example, types defined with a specified range, allowing value range checks already during compile time, and built-in multitasking features. Ada also supports modularization very well, allowing e.g. easy integration of new control interfaces - which is probably the most likely change needed when using the controller for a new application. 

This project should consist of and fulfill:

  • Core software for controlling brushless DC motors, mainly aimed at hobbyists and makers.
  • Support both sensored and sensorless operation.
  • Open source software (and hardware).
  • Implementation in Ada on top of the Ravenscar runtime for the stm32f4xx.
  • Should not be too difficult to port to another microcontroller.

And specifically, for those wanting to learn the details of motor control, or extend with extra features:

  • Provide a basic, clean and readable implementation.
  • Short but helpful documentation. 
  • Meaningful and flexible logging.
  • Easy to add new control interfaces (e.g. CAN, ADC, Bluetooth, whatever).

Hardware

The board that will be used for this project is a custom board that I previously designed with the intent to get some hands-on knowledge in motor control. It is completely open source and all project files can be found on GitHub

  • Microcontroller STM32F446, ARM Cortex-M4, 180 MHz, FPU
  • Power MOSFETs 60 V
  • Inline phase current sensing
  • PWM/PPM control input
  • Position sensor input as either hall or quadrature encoder
  • Motor and board temp sensor (NTC)
  • Expansion header for UART/ADC/DAC/SPI/I2C/CAN

It can handle power ranges in the order of what is required by an electric skateboard or electric bike, depending on the used battery voltage and cooling. 

There are other inverter boards with similar specifications. One very popular is the VESC by Benjamin Vedder. It is probably not that difficult to port this project to work on that board as well.

Rough project plan

I thought it would be appropriate to write down a few bullets of what needs to be done. The list will probably grow...

  • Create a port of the Ravenscar runtime to the stm32f446 target on the custom board
  • Add stm32f446 as a device in the Ada Drivers Library
  • Get some sort of hello world application running to show that stuff works
  • Investigate and experiment with interrupt handling with regards to overhead
  • Create initialization code for all used mcu peripherals
  • Sketch the overall software architecture and define interfaces
  • Implementation
  • Documentation...

Support for the STM32F446

The microprocessor that will be used for this project is the STM32F446. In the current version of the Ada Drivers Library and the available Ravenscar embedded runtimes, there is no explicit support for this device. Fortunately, it is very similar to other processors in the stm32f4-family, so adding support for stm32f446 was not very difficult once I understood the structure of the repositories. I forked these and added them as submodules in this project's repo

Compared to the Ravenscar runtimes used by the discovery-boards, there are differences in external oscillator frequency, available interrupt vectors and memory sizes. Otherwise they are basically the same. 

An important tool needed to create the new driver and runtime variants is svd2ada. It generates device specification and body files in ada based on an svd file (basically xml) that describes what peripherals exist, how their registers look like, their address', existing interrupts, and stuff like that. It was easy to use, but a little bit confusing how flags/switches should be set when generating driver and runtime files. After some trail and error I think I got it right. I created a Makefile for generating all these file with correct switches.

I could not find an svd-file for the stm32f446 directly from ST, but found one on the internet. It was not perfect though. Some of the source code that uses the generated data types seems to make assumptions on the structure of these types. Depending on how the svd file looks, svd2ada may or may not generate them in the expected way. There were also other missing and incorrect data in the svd file, so I had to manually correct these. There are probably additional issues that I have not found yet...

It is alive!

I made a very simple application consisting of a task that is periodically delayed and toggles the two leds on the board each time the task resumes. The leds toggles with the expected period, so the oscillator seems to be initialized correctly. 

Next up I need to map the different mcu pins to the corresponding hardware functionality and try to initialize the needed peripherals correctly. 

The control algorithm and its use of peripherals

There are several methods of controlling brushless motors, each with a specific use case. As a first approach I will implement sensored FOC, where the user requests a current value (or torque value). 

To simplify, this method can be divided into the following steps, repeated each PWM period (typically around 20 kHz):

  1. Sample the phase currents
  2. Transform the values into a rotor fixed reference frame
  3. Based on the requested current, calculate a new set of phase voltages
  4. Transform back to the stator's reference frame
  5. Calculate PWM duty cycles as to create the calculated phase voltages

Fortunately, the peripherals of the stm32f446 has a lot of features that makes this easier to implement. For example, it is possible to trigger the ADC directly from the timers that drives the PWM. This way the sampling will automatically be synchronized with the PWM cycle. Step 1 above can thus be started immediately as the ADC triggers the corresponding conversion-complete-interrupt. In fact, many existing implementations perform all the steps 1-to-6 completely within an ISR. The reason for this is simply to reduce any unnecessary overhead since the performed calculations is somewhat lengthy. The requested current is passed to the ISR via global variables. 

I would like to do this the traditional way, i.e. to spend as little time as possible in the ISR and trigger a separate Task to perform all calculations. The sampled current values and the requested current shall be passed via Protected Objects. All this will of course create more overhead. Maybe too much? Need to be investigated.

PWM and ADC is up and running

I have spent some time configuring the PWM and ADC peripherals using the Ada Drivers Library. All in all it went well, but I had to do some smaller changes to the drivers to make it possible to configure the way I wanted. 

  • PWM is complementary output, center aligned with frequency of 20 kHz
  • PWM channels 1 to 3 generates the phase voltages
  • PWM channel 4 is used to trigger the ADC, this way it is possible to set where over the PWM period the sampling should occur
  • By default the sampling occurs in the middle of the positive waveform (V7)
  • The three ADC's are configured to Triple Multi Mode, meaning they are synchronized such that each sampled phase quantity is sampled at the same time. 
  • Phase currents and voltages a,b,c are mapped to the injected conversions, triggered by the PWM channel 4
  • Board temperature and bus voltage is mapped to the regular conversions triggered by a timer at 14 kHz
  • Regular conversions are moved to a volatile array using DMA automatically after the conversions complete
  • ADC create an interrupt after the injected conversions are complete

The drivers always assumed that the PWM outputs are mapped to a certain GPIO, so in order to configure the trigger channel I had to add a new procedure to the drivers. Also, the Scan Mode of the ADCs where not set up correctly for my configuration, and the config of injected sequence order was simply incorrect. I will send a pull request to get these changes merged with the master branch. 

Interrupt overhead/latency

As was described in previous posts the structure used for the interrupt handling is to spend minimum time in the interrupt context and to signal an awaiting task to perform the calculations, which executes at a software priority level with interrupts fully enabled. The alternative method is to place all code in the interrupt context. 

This Ada Gem and its following part describes two different approaches for doing this type of task synchronization. Both use a protected procedure as the interrupt handler but signals the awaiting task in different ways. The first uses an entry barrier and the second a Suspension Object. The idiom using the entry barrier has the advantage that it can pass data as an integrated part of the signaling, while the Suspension Object behaves more like a binary semaphore. 

For the ADC conversion complete interrupt, I tested both methods. The protected procedure used as the ISR read the converted values consisting of six uint16. For the entry barrier method these where passed to the task using an out-parameter. When using the second method the task needed to collect the sample data using a separate function in the protected object.

Overhead in this context I define as the time from that the ADC generates the interrupt, to the time the event triggered task starts running. This includes, first, an isr-wrapper that is a part of the runtime which then calls the installed protected procedure, and second, the execution time of the protected procedure which reads the sampled data, and finally, the signaling to the awaiting task. 

I measured an approximation of the overhead by setting a pin high directly in the beginning of the protected procedure and then low by the waiting task directly when waking up after the signaling. For the Suspension Object case the pin was set low after the read data function call, i.e. for both cases when the sampled data was copied to the task. The code was compiled with the -O3 flag. 

The first idiom resulted in an overhead of ~8.4 us, and the second ~10 us. This should be compared to the period of the PWM which at 20 kHz is 50 us. Obviously the overhead is not negligible, so I might consider using the more common approach for motor control applications of having the current control algorithm in the interrupt context instead. However, until the execution time of the algorithm is known, the entry barrier method will be assumed... 

Note: "Overhead" might be the wrong term since I don't know if during the time measured the cpu was really busy. Otherwise it should be called latency I think...

Purple: Center aligned PWM at 50 % duty where the ADC triggers in the center of the positive waveform. Yellow: Pin state as described above. High means time of overhead/latency.

Reference frames

A key benefit of the FOC algorithm is that the actual control is performed in a reference frame that is fixed to the rotor. This way the sinusoidal three phase currents, as seen in the stator's reference frame, will instead be represented as two DC values, assuming steady state operation. The transforms used (Clarke and Park) requires that the angle between the rotor and stator is known. As a first step I am using a quadrature encoder since that provides a very precise measurement and very low overhead due to the hardware support of the stm32. 

Three types has been defined, each representing a particular reference frame: Abc, Alfa_Beta and Dq. Using the transforms above one can simply write:

declare
   Iabc  : Abc;  --  Measured current (stator ref)
   Idq   : Dq;   --  Measured current (rotor ref)
   Vdq   : Dq;   --  Calculated output voltage (rotor ref)
   Vabc  : Abc;  --  Calculated output voltage (stator ref)
   Angle : constant Float := ...;
begin
   Idq := Iabc.Clarke.Park(Angle);

   --  Do the control...

   Vabc := Vdq.Park_Inv(Angle).Clarke_Inv;
end;

Note that Park and Park_Inv both use the same angle. To be precise, they
both use Sin(Angle) and Cos(Angle). Now, at first, I simply implemented
these by letting each transform calculate Sin and Cos locally. Of
course, that is a waste for this particular application. Instead, I
defined an angle object that when created also computed Sin and Cos of
the angle, and added versions of the transforms to use these
"ahead-computed" values instead.

declare
   --  Same...

   Angle : constant Angle_Obj := Compose (Angle_Rad); 
   --  Calculates Sin and Cos
begin
   Idq := Iabc.Clarke.Park(Angle);

   --  Do the control...

   Vabc := Vdq.Park_Inv(Angle).Clarke_Inv;
end;

This reduced the execution time somewhat (not as much as I thought, though), since the trigonometric functions are the heavy part. Using lookup table based versions instead of the ones provided by Ada.Numerics might be even faster...

It spins!

The main structure of the current controller is now in place. When a button on the board is pressed the sensor is aligned to the rotor by forcing the rotor to a known angle. Currently, the requested q-current is set by a potentiometer. 

As of now, it is definitely not tuned properly, but it at least it shows that the general algorithm is working as intended. 

In order to make this project easier to develop on, both for myself and any other users, I need to add some logging and tuning capabilities. This should allow a user to change and/or log variables in the application (e.g. control parameters) while the controller is running. I have written a tool for doing this (over serial) before, but then in C. It would be interesting to rewrite it in Ada. 

Contract Based Programming

So far, I have not used this feature much. But when writing code for the logging functionality I ran into a good fit for it. 

I am using Consistent Overhead Byte Stuffing (COBS) to encode the data sent over uart. This encoding results in unambiguous packet framing regardless of packet content, thus making it easy for receiving applications to recover from malformed packets. The packets are separated by a delimiter (value 0 in this case), making it easy to synchronize the receiving parser. The encoding ensures that the encoded packet itself does not contain the delimiter value. 

A good feature of COBS is that given that the raw data length is less than 254, then the overhead due to the encoding is always exactly one byte. I could of course simply write this fact as a comment to the encode/decode functions, allowing the user to make this assumption in order to simplify their code. A better way could be to write this condition as contracts. 

   Data_Length_Max : constant Buffer_Index := 253;

   function COBS_Encode (Input : access Data)
                         return Data
   with
      Pre => Input'Length <= Data_Length_Max,
      Post => (if Input'Length > 0 then
                  COBS_Encode'Result'Length = Input'Length + 1
               else
                  Input'Length = COBS_Encode'Result'Length);

   function COBS_Decode (Encoded_Data : access Data)
                         return Data
   with
      Pre => Encoded_Data'Length <= Data_Length_Max + 1,
      Post => (if Encoded_Data'Length > 0 then
                  COBS_Decode'Result'Length = Encoded_Data'Length - 1
               else
                  Encoded_Data'Length = COBS_Decode'Result'Length);

Logging and Tuning

I just got the logging and tuning feature working. It is an Ada-implementation using the protocol as used by a previous project of mine, Calmeas. It enables the user to log and change the value of variables in the application, in real-time. This is very helpful when developing systems where the debugger does not have a feature of reading and writing to memory while the target is running. 

The data is sent and received over uart, encoded by COBS. The interfaces of the uart and cobs packages implements an abstract stream type, meaning it is very simple to change the uart to some other media, and that e.g. cobs can be skipped if wanted. 

Example

The user can simply do the following in order to get the variable V_Bus_Log loggable and/or tunable:

V_Bus_Log  : aliased Voltage_V;
...
Calmeas.Add (Symbol      => V_Bus_Log'Access,
             Name        => "V_Bus",
             Description => "Bus Voltage [V]");

It works for (un)signed integers of size 8, 16 and 32 bits, and for floats. 

After adding a few variables, and connecting the target to the gui:

As an example, this could be used to tune the current controller gains:

As expected, the actual current comes closer to the reference as the gain increases

As of now, the tuning is not done in a "safe" way. The writing to added symbols is done by the separate task named Logger, simply by doing unchecked writes to the address of the added symbol, one byte at a time. At the same time the application is reading the symbol's value from another task with higher prio. The optimal way would be to pass the value through a protected type, but since the tuning is mostly for debugging purposes, I will make it the proper way later on...

Note that the host GUI is not written in Ada (but Python), and is not itself a part of this project. 

Architecture overview

Here is a figure showing an overview of the software:

Summary

This project involves the design of a software platform that provides a good basis when developing motor controllers for brushless motors. It consist of a basic but clean and readable implementation of a sensored field oriented control algorithm. Included is a logging feature that will simplify development and allows users to visualize what is happening. The project shows that Ada successfully can be used for a bare-metal project that requires fast execution. 

The design is, thanks to Ada's many nice features, much easier to understand compared to a lot of the other C-implementations out there, where, as a worst case, everything is done in a single ISR. The combination of increased design readability and the strictness of Ada makes the resulting software safer and simplifies further collaborative development and reuse. 

Some highlights of what has been done:

  • Porting of the Ravenscar profiles to a custom board using the STM32F446
  • Adding support for the STM32F446 to Ada_Drivers_Library project
  • Adding some functionality to Ada_Drivers_Library in order to fully use all peripheral features
  • Fixing a bug in Ada_Drivers_Library related to a bit more advanced ADC usage
  • Written HAL-isch packages so that it is easy to port to another device than STM32
  • Written a communication package and defined interfaces in order to make it easier to add control inputs.
  • Written a logging package that allows the developer to debug, log and tune the application in real-time.
  • Implemented a basic controller using sensored field oriented control
  • Well documented specifications with a generated html version

Future plans:

  • Add hall sensor support and 6-step block commutation
  • Add sensorless operation
  • Add CAN support (the pcb has currently no transceiver, though)
  • SPARK proving
  • Write some additional examples showing how to use the interfaces.
  • Port the software to the popular VESC-board.

]]>
Prove in the Cloud http://blog.adacore.com/prove-in-the-cloud Wed, 18 Oct 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/prove-in-the-cloud

We have put together a byte (8 bits) of examples of SPARK code on a server in the cloud here. (many thanks Nico Setton for that!)

Each example consists in very few lines, a few files at most, and demoes an interesting feature of SPARK. The initial version is incorrect, and hitting the "Prove" button will return with messages that point to the errors. By following the suggested fix in comments you should be able to get the code to prove automatically.

Of course, you could already do this by installing yourself SPARK GPL on a machine (download here and follow these instructions to install additional provers). The benefit with this webpage is that anyone can now experiment live with SPARK without installing first the toolset. This is very much inspired from what Microsoft Research has done with other verification tools as part of their rise4fun website.

Something particularly interesting for academics is that all the code for this widget is open source. So you can setup your own proof server for hands-on sessions, with your own exercises, in a matter of minutes! Just clone the code_examples_server project from GitHub, follow the instructions in the README, populate your server with your exercises and examples, and you're set! No need to ask for IT to setup all boxes for your students, they just need a browser to point to your server location. An exercise consists in a directory with:

  • a file example.yaml with the name and description of the exercise (in YAML syntax)
  • a GNAT project file main.gpr (could be almost empty, or force the use of SPARK_Mode so that all the code is analyzed)
  • the source files for this exercise

For inspiration, see the examples from the Compile_And_Prove_Demo project from GitHub, inside directory 'examples', which were used to populate our little online proof webpage.

Feel free to report problems or suggest improvements to the widget or the examples on their respective GitHub project pages.

]]>
SPARK Tutorial at FDL Conference http://blog.adacore.com/spark-tutorial-at-fdl-conference Tue, 12 Sep 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/spark-tutorial-at-fdl-conference

Researcher Martin Becker from Technische Universität München is giving a SPARK tutorial next week, on Monday 18th, at the Forum on specification & Design Languages in Verona, Italy. Even if you cannot attend, you may find it useful to look at the material for his tutorial, with a complete cookbook to install and setup SPARK, and a 90-minutes slide deck packed with rich and practical information about SPARK, as well as neat hands-on to get a feel for using SPARK in practice.

It is all here with the very good slides in particular. 

]]>
New SPARK Cheat Sheet http://blog.adacore.com/new-spark-cheat-sheet Thu, 24 Aug 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/new-spark-cheat-sheet

Our good friend Martin Becker has produced a new cheat sheet for SPARK, that you may find useful for a quick reminder on syntax that you have not used for some time. It is simpler than the cheat sheets we already had in English and Japanese, and depending on your style you will prefer one of the other. Also particularly useful for trainings! (and I'll use it for sure, since I have multiple trainings in the coming months...)

]]>
Highlighting Ada with Libadalang http://blog.adacore.com/highlighting-ada-with-libadalang Tue, 08 Aug 2017 12:30:00 +0000 Pierre-Marie de Rodat http://blog.adacore.com/highlighting-ada-with-libadalang

While we are working very hard on semantic analysis in Libadalang, it is already possible to leverage its lexical and syntactic analyzers. A useful example for this is a syntax highlighter.

In the context of programming languages, syntax highlighters make it easier for us, humans, to read source code. For example, formatting keywords with a bold font and a special color, while leaving identifiers with a default style, enables a quick understanding of a program’s structure. Syntax highlighters are so useful that they’re integrated into daily developer tools:

  • most “serious” code editors provide highlighters for tons of programming languages (C, Python, Ada, OCaml, shell scripts), markup languages (XML, HTML, BBcode), DSLs (Domain Specific Languages: SQL, TeX), etc.
  • probably all online code browsers ship syntax highlighters at least for mainstream languages: GitHub, Bitbucket, GitLab, …

From programming language theory, that many of us learned in engineering school (some even with passion!), we can distinguish two highlighting levels:

  1. Tokens(lexeme)-based highlighting. Each token is given a style from its token kind (keyword, comment, integer literal, identifier, …).

  2. Syntax-based highlighting. This one is higher-level: for example, give a special color for identifiers that give their name to functions and another color for identifiers that give their name to type declarations.

Most syntax highlighting engines, for instance pygments’s or vim’s, are based on regular expressions, which don’t offer to highlighter writers the same formalism as what we just described. Generally, regular expressions enable us to create an approximation of the two highlighting levels, which yields something a bit fuzzy. On the other hand, it is much easier to write such highlighters and to get pretty results quickly, compared to a fully fledged lexer/parser, which is probably why regular expressions are so popular in this context.

But here we already have a full lexer/parser for the Ada language: Libadalang. So why not use it to build a “perfect” syntax highlighter? This is going to be the exercise of the day. All blog posts so far only demonstrated the use of Libadalang’s Python API; Libadalang is primarily an Ada library, so let’s use its Ada API for once!

One disclaimer, first: the aim of this post is not to say that the world of syntax highlighters is broken and should be re-written with compiler-level lexers and parsers. Regexp-based highlighters are a totally fine compromise in contexts such as text editors; here we just demonstrate how Libadalang can be used to achieve a similar goal, but keep in mind that the result is not technically equivalent. For instance, what we will do below will require valid input Ada sources and will only work one file at a time, unlike editors that might need to work on smaller granularity items to keep the UI responsive, which is more important in this context than correctness.

Okay so, how do we start?

The first thing to do as soon as one wants to use Libadalang is to create an analysis context: this is an object that will enclose the set of sources files to process.

with Libadalang.Analysis;

package LAL renames Libadalang.Analysis;
Ctx : LAL.Analysis_Context := LAL.Create;

Good. At the end of our program, we need to release the resources that were allocated in this context:

LAL.Destroy (Ctx);

Now everything interesting must happen in between. Let’s ask Libadalang to parse a source file:

Unit : LAL.Analysis_Unit := LAL.Get_From_File
   (Ctx, "my_source_file.adb", With_Trivia => True);

The analysis of a source file yields what we call in Libadalang an analysis unit. This unit is tied to the context used to create it.

Here, we also enable a peculiar analysis option: With_Trivia tells Libadalang not to discard “trivia” tokens. Inspired by the Roslyn Compiler, what we call a trivia token is a token that is ignored when parsing source code. In Ada as in most (all?) programming languages, comments are trivia: developers are allowed to put any number of them anywhere between two tokens, this will not change the validity of the program nor its semantics. Because of this most compiler implementations just discard them: keeping them around would hurt performance for no gain. Libadalang is designed for all kinds of tools, not only compilers, so we give the choice to the user of whether or not to keep trivia around.

What we are trying to do here is to highlight a source file. We want to highlight comments as well, so we need to ask to preserve trivia.

At this point, a real world program would have to check if parsing happened with no error. We are just playing here, so we’ll skip that, but you can have a look at the Libadalang.Analysis.Has_Diagnostic and Libadalang.Analysis.Diagnostics functions if you want to take care of this.

Fine, so we assume parsing went well and now we just have to go through tokens and assign a specific style to each of them. First, let’s have a look at the various token-related data types in Libadalang we have to deal with:

  • LAL.Token_Type: reference to a token/trivia in a specific analysis unit. Think of it as a cursor in a standard container. There is one special value: No_Token which, as you may guess, is used to represent the end of the token stream or just an invalid reference, like a null access.

  • LAL.Token_Data_Type: holder for the data related to a specific token. Namely: token kind, whether it’s trivia, index in the token/trivia stream and source location range.

  • Libadalang.Lexer.Token_Data_Handlers.Token_Index: a type derived from Integer to represent the indexes in token/trivia streams.

Then let’s define holders to annotate the token stream:

type Highlight_Type is (Text, Comment, Keyword, Block_Name, ...);

Instead of directly assigning colors to the various token kinds, this enumeration defines categories for highlighting. This makes it possible to provide different highlighting styles later: one set of colors for a white background, and another one for a black background, for example.

subtype Token_Index is
   Libadalang.Lexer.Token_Data_Handlers.Token_Index;

type Highlight_Array is
   array (Token_Index range <>) of Highlight_Type;

type Highlights_Holder (Token_Count, Trivia_Count : Token_Index) is
record
   Token_Highlights  : Highlight_Array (1 .. Token_Count);
   Trivia_Highlights : Highlight_Array (1 .. Trivia_Count);
end record;

In Libadalang, even though tokens and trivia make up a logical interleaved stream, they are stored as two separate streams, hence the need for two arrays. So here is a procedure to make the annotating process easier:

procedure Set
  (Highlights : in out Highlights_Holder;
   Token      : LAL.Token_Data_Type;
   HL         : Highlight_Type)
is
   Index : constant Token_Index := LAL.Index (Token);
begin
   if LAL.Is_Trivia (Token) then
      Highlights.Trivia_Highlights (Index) := HL;
   else
      Highlights.Token_Highlights (Index) := HL;
   end if;
end Set;

Now let’s start the actual highlighting! We begin with the token-based one as described earlier.

Basic_Highlights : constant
  array (Libadalang.Lexer.Token_Kind) of Highlight_Type :=
 (Ada_Identifier => Identifier,
      Ada_All .. Ada_Return
    | Ada_Elsif | Ada_Reverse
    | -- ...
      => Keyword,
    --  ...
  );

The above declaration associate a highlighting class for each token kind defined in Libadalang.Lexer. The only work left is to determine highlighting classes iterating on each token in Unit:

Token : LAL.Token_Type := LAL.First_Token (Unit);

while Token /= LAL.No_Token loop
   declare
      TD : constant LAL.Token_Data_Type := LAL.Data (Token);
      HL : constant Highlight_Type :=
         Basic_Highlights (LAL.Kind (TD));
   begin
      Set (Highlights, TD, HL);
   end;
   Token := LAL.Next (Token);
end loop;

Easy, right? Once this code has run, we already have a pretty decent highlighting for our analysis unit! The second pass is just a refinement that uses syntax as described at the top of this blog post:

function Syntax_Highlight
  (Node : access LAL.Ada_Node_Type'Class) return LAL.Visit_Status;
 
LAL.Traverse (LAL.Root (Unit), Syntax_Highlight'Access);

LAL.Traverse will traverse Unit’s syntax tree (AST) and call the Syntax_Highlight function on each node. This function is a big dispatcher on the kind of the visited node:

function Syntax_Highlight
  (Node : access LAL.Ada_Node_Type'Class) return LAL.Visit_Status
is
   procedure Highlight_Block_Name
      (Name : access LAL.Name_Type'Class) is
   begin
      Highlight_Name (Name, Block_Name, Highlights);
   end Highlight_Block_Name;
begin
   case Node.Kind is
      when LAL.Ada_Subp_Spec =>
         declare
            Subp_Spec : constant LAL.Subp_Spec :=
               LAL.Subp_Spec (Node);

            Params : constant LAL.Param_Spec_Array_Access :=
               Subp_Spec.P_Node_Params;

         begin
            Highlight_Block_Name
              (Subp_Spec.F_Subp_Name, Highlights);
            Highlight_Type_Expr
              (Subp_Spec.F_Subp_Returns, Highlights);
            for Param of Params.Items loop
               Highlight_Type_Expr
                 (Param.F_Type_Expr, Highlights);
            end loop;
         end;

      when LAL.Ada_Subp_Body =>
         Highlight_Block_Name
           (LAL.Subp_Body (Node).F_End_Id, Highlights);

      when LAL.Ada_Type_Decl =>
         Set (Highlights,
              LAL.Data (Node.Token_Start),
              Keyword_Type);
         Highlight_Block_Name
           (LAL.Type_Decl (Node).F_Type_Id, Highlights);

      when LAL.Ada_Subtype_Decl =>
         Highlight_Block_Name
           (LAL.Subtype_Decl (Node).F_Type_Id, Highlights);

      --  ...
   end case;
   return LAL.Into;
end Syntax_Highlight;

Depending on the nature of the AST node to process, we apply specific syntax highlighting rules. For example, the first one above: for subprogram specifications (Subp_Spec), we highlight the name of the subprogram as a “block name” while we highlight type expressions for the return type and the type of all parameters as “type expressions”. Let’s go deeper: how do we highlight names?

procedure Highlight_Name
  (Name       : access LAL.Name_Type'Class;
   HL         : Highlight_Type;
   Highlights : in out Highlights_Holder) is
begin
   if Name = null then
      return;
   end if;

   case Name.Kind is
      when LAL.Ada_Identifier | LAL.Ada_String_Literal =>
         --  Highlight the only token that this node has
         declare
            Tok : constant LAL.Token_Type :=
              LAL.Single_Tok_Node (Name).F_Tok;
         begin
            Set (Highlights, LAL.Data (Tok), HL);
         end;

      when LAL.Ada_Dotted_Name =>
         --  Highlight both the prefix, the suffix and the
         --  dot token.

         declare
            Dotted_Name : constant LAL.Dotted_Name :=
               LAL.Dotted_Name (Name);
            Dot_Token   : constant LAL.Token_Type :=
               LAL.Next (Dotted_Name.F_Prefix.Token_End);
         begin
            Highlight_Name
              (Dotted_Name.F_Prefix, HL, Highlights);
            Set (Highlights, LAL.Data (Dot_Token), HL);
            Highlight_Name
              (Dotted_Name.F_Suffix, HL, Highlights);
         end;

      when LAL.Ada_Call_Expr =>
         --  Just highlight the name of the called entity
         Highlight_Name
           (LAL.Call_Expr (Name).F_Name, HL, Highlights);

      when others =>
         return;
   end case;
end Highlight_Name;

The above may be quite long, but what it does isn’t new: just as in the Syntax_Highlight function, we execute various actions depending on the kind of the input AST node. If it’s a mere identifier, then we just have to highlight the corresponding only token. If it’s a dotted name (X.Y in Ada), we highlight the prefix (X), the suffix (Y) and the dot in between as names. And so on.

At this point, we could create other syntactic highlighting rules for remaining AST nodes. This blog post is already quite long, so we’ll stop there.

There is one piece that is missing before our syntax highlighter can become actually useful: output formatted source code. Let’s output HTML, as this format is easy to produce and quite universal. We start with a helper analogous to the previous Set procedure, to deal with the dual token/trivia streams:

function Get
  (Highlights : Highlights_Holder;
   Token      : LAL.Token_Data_Type) return Highlight_Type
is
   Index : constant Token_Index := LAL.Index (Token);
begin
   return (if LAL.Is_Trivia (Token)
           then Highlights.Trivia_Highlights (Index)
           else Highlights.Token_Highlights (Index));
end Get;

And now let’s get to the output itself. This starts with a simple iteration on token, so the outline is similar to the first highlighting pass we did above:

Token     : LAL.Token_Type := LAL.First_Token (Unit);
Last_Sloc : Slocs.Source_Location := (1, 1);

Put_Line ("<pre>");
while Token /= LAL.No_Token loop
   declare
      TD         : constant LAL.Token_Data_Type :=
         LAL.Data (Token);
      HL         : constant Highlight_Type :=
         Get (Highlights, TD);
      Sloc_Range : constant Slocs.Source_Location_Range :=
         LAL.Sloc_Range (TD);

      Text : constant Langkit_Support.Text.Text_Type :=
         LAL.Text (Token);
   begin
      while Last_Sloc.Line < Sloc_Range.Start_Line loop
         New_Line;
         Last_Sloc.Line := Last_Sloc.Line + 1;
         Last_Sloc.Column := 1;
      end loop;

      if Sloc_Range.Start_Column > Last_Sloc.Column then
         Indent (Integer (Sloc_Range.Start_Column - Last_Sloc.Column));
      end if;

      Put_Token (Text, HL);
      Last_Sloc := Slocs.End_Sloc (Sloc_Range);
   end;
   Token := LAL.Next (Token);
end loop;
Put_Line ("</pre>");

The tricky part here is that tokens alone are not enough: we use the source location information (line and column numbers) associated to tokens in order to re-create line breaks and whitespaces: this is what the inner while loop and if statement do. As usual, we delegate “low-level” actions to dedicated procedures:

procedure Put_Token
  (Text : Langkit_Support.Text.Text_Type;
   HL   : Highlighter.Highlight_Type) is
begin
   Put ("<span style=""color: #" & Get_HTML_Color (HL)
        & ";"">");
   Put (Escape (Text));
   Put ("</span>");
end Put_Token;

procedure New_Line is
begin
   Put ((1 => ASCII.LF));
end New_Line;

procedure Indent (Length : Natural) is
begin
   Put ((1 .. Length => ' '));
end Indent;

Writing the Escape function, which wraps special HTML characters such as < or > into HTML entities (< and cie), and Get_HTML_Color, which returns a suitable hexadecimal string to encode the color corresponding to a highlighting category (for instance: #ff0000, i.e. the red color, for keywords) is left as an exercise to the reader.

Note that Escape must deal with a Text_Type formal. This type, which is really a subtype of Wide_Wide_String, is used to encode source excerpts in a uniform way in Libadalang, regardless of the input encoding. In order to do something useful with them, one must transcode it to UTF-8, for example. One way to do this is to use GNATCOLL.Iconv, but this is out of the scope of this post.

So here we are! Now you know how to:

  • parse Ada source files with Libadalang;

  • iterate on the stream of tokens/trivia in the resulting analysis unit, as well as process the associated data;

  • traverse the syntax tree of this unit;

  • combine the above in order to create a syntax highlighter.

Thank you for reading this post to the end! If you are interested in pursuing this road, you can find a compilable set of sources for this syntax highlighter on Libadalang’s repository on Github. And because we cannot decently dedicate a whole blog post to a syntax highlighter without a little demo, here is one:

Little demo of Ada source code syntax highlighting with Libadalang
]]>
Pretty-Printing Ada Containers with GDB Scripts http://blog.adacore.com/pretty-printing-ada-containers-with-gdb-scripts Tue, 25 Jul 2017 13:00:00 +0000 Pierre-Marie de Rodat http://blog.adacore.com/pretty-printing-ada-containers-with-gdb-scripts

When things don’t work as expected, developers usually do one of two things: either add debug prints to their programs, or run their programs under a debugger. Today we’ll focus on the latter activity.

Debuggers are fantastic tools. They turn our compiled programs from black boxes into glass ones: you can interrupt your program at any point during its execution, see where it stopped in the source code, inspect the value of your variables (even some specific array item), follow chains of accesses, and even modify all these values live. How powerful! However, sometimes there’s so much information available that navigating through it to reach the bit of state you want to inspect is just too complex.

Take a complex container, such as an ordered map from Ada.Containers.Ordered_Maps, for example. These are implemented in GNAT as  binary trees: collections of nodes, each node having a link to its parent and its left and children a key and a value. Unfortunately, finding a particular node in a debugger that corresponds to the key you are looking for is a painful task. See for yourself:

with Ada.Containers.Ordered_Maps;

procedure PP is
   package Int_To_Nat is
  	new Ada.Containers.Ordered_Maps (Integer, Natural);

   Map : Int_To_Nat.Map;
begin
   for I in 0 .. 9 loop
  	Map.Insert (I, 10 * I);
   end loop;

   Map.Clear;  --  BREAK HERE
end PP;

Build this program with debug information and execute it until line 13:

$ gnatmake -q -g pp.adb
$ gdb -q ./pp
Reading symbols from ./pp...done.
(gdb) break pp.adb:13
Breakpoint 1 at 0x406a81: file pp.adb, line 13.
(gdb) r
Breakpoint 1, pp () at pp.adb:13
13         Map.Clear;  --  BREAK HERE

(gdb) print map
$1 = (tree => (first => 0x64e010, last => 0x64e1c0, root => 0x64e0a0, length => 10, tc => (busy => 0, lock => 0)))

# “map” is a record that contains a bunch of accesses…
# not very helpful. We need to go deeper.
(gdb) print map.tree.first.all
$2 = (parent => 0x64e040, left => 0x0, right => 0x0, color => black, key => 0, element => 0)

# Ok, so what we just saw above is the representation of the node
# that holds the key/value association for key 0 and value 0. This
# first node has no child (see left and right above), so we need to
# inspect its parent:
(gdb) print map.tree.first.parent.all
$3 = (parent => 0x64e0a0, left => 0x64e010, right => 0x64e070, color => black, key => 1, element => 10)

# Great, we havethe second element! It has a left child,
# which is our first node ($2). Now let’s go to its right
# child:
(gdb) print map.tree.first.parent.right.all
$4 = (parent => 0x64e040, left => 0x0, right => 0x0, color => black, key => 2, element => 20)

# That was the third element: this one has no left or right
# child, so we have to get to the parent of $3:
(gdb) print map.tree.first.parent.parent.all
$5 = (parent => 0x0, left => 0x64e040, right => 0x64e100, color => black, key => 3, element => 30)

# So far, so good: we already visited the left child ($4), so
# now we need to visit $5’s right child:
(gdb) print map.tree.first.parent.parent.right.all
$6 = (parent => 0x64e0a0, left => 0x64e0d0, right => 0x64e160, color => black, key => 5, element => 50)

# Key 5? Where’s the node for the key 4? Oh wait, we should
# also visit the left child of $6:
(gdb) print map.tree.first.parent.parent.right.left.all
$7 = (parent => 0x64e100, left => 0x0, right => 0x0, color => black, key => 4, element => 40)

# Ad nauseam…

Manually visiting a binary tree is much easier for computers than it is for humans, as everyone knows. So in this case, it seems easier to write a debug procedure in Ada that iterates on the container and prints each key/value association and call this debug procedure from GDB.

But this has its own drawbacks: first, it forces you to remember to write this procedure for each container instantiation you do, eventually forcing you to rebuild your program each time you debug it. But there’s worse: if you debug your program from a core dump, it’s not possible to call a debug procedure from GDB. Besides, if the state of your program is somehow corrupted, due to stack overflow or a subtle memory handling bug (dangling pointers, etc.), calling this debug procedure will probably corrupt your process even more, making the debugging session a nightmare!

This is where GDB comes to the rescue: there’s a feature called pretty-printers, which makes it possible to hook into GDB to customize how it displays values. For example, you can write a hook that intercepts values whose type matches the instantiation of ordered maps and that displays only the “useful” content of the maps.

We developed several GDB scripts to implement such pretty-printers for the most common standard containers in Ada: not only vectors, hashed/ordered maps/sets, linked lists, but also unbounded strings. You can find them in the dedicated repository hosted on GitHub: https://github.com/AdaCore/gnat-gdb-scripts.

With these scripts properly installed, inspecting the content of containers becomes much easier:

(gdb) print map
$1 = pp.int_to_nat.map of length 10 = {[0] = 0, [1] = 10, [2] = 20,
  [3] = 30, [4] = 40, [5] = 50, [6] = 60, [7] = 70, [8] = 80,
  [9] = 90}

Note that beginning with GNAT Pro 18, GDB ships with these pretty-printers, so there’s no setup other than adding the following commands to your .gdbinit file:

python import gnatdbg; gnatdbg.setup()

Happy debugging!

]]>
Proving Loops Without Loop Invariants http://blog.adacore.com/proving-loops-without-loop-invariants Thu, 20 Jul 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/proving-loops-without-loop-invariants

For all the power that comes with proof technology, one sometimes has to pay the price of writing a loop invariant. Along the years, we've strived to facilitate writing loop invariants by designing a methodology in four easy steps for writing a loop invariant, by providing loop patterns and their corresponding loop invariants, by generating automatically the part of loop invariants that talks about unmodified parts of objects, but writing loops invariants remains difficult sometimes, in particular for beginners.

At the same time, some loops look so simple that they don't seem to require a loop invariant. Take for example the following loop that initializes an array, with value J at index J for every index:

  subtype Index is Integer range 1 .. 10;
   type Arr is array (Index) of Integer;

   procedure Init (A : out Arr) is
   begin
      for J in Index loop
         A (J) := J;
      end loop;
   end Init;

Suppose you want to prove that indeed Init ensures that A(J) is equal to J at every index J:

 procedure Init (A : out Arr) with
     Post => (for all J in Index => A(J) = J);

Previously, you would have needed a loop invariant for GNATprove to be able to prove this postcondition. This is not needed anymore. Instead, GNATprove unrolls the loop in Init as if it were defined as:

   procedure Init (A : out Arr) is
   begin
      A (1) := 1;
      A (2) := 2;
      A (3) := 3;
      A (4) := 4;
      A (5) := 5;
      A (6) := 6;
      A (7) := 7;
      A (8) := 8;
      A (9) := 9;
      A (10) := 10;
   end Init;

This allows GNATprove to prove the postcondition without loop invariant.

Not every loop can be unrolled this way. Firstly, we need to know how many times the loop should be unrolled, so it's not possible to unroll while-loops, plain-loops, or for-loops that have bounds not known at compile time. Secondly, we don't want to unroll loops that have thousands of iterations, or it would lead to an explosion in complexity that would defy the purpose of this feature. Hence we limit unrolling to loops with less than 20 iterations.

Finally, we want to give the user control over this feature, so we do not unroll loops which contain already a loop invariant (or a loop variant), which is a sign that the user wants to use here the usual proof mechanism.

This new feature in SPARK has proved very effective on the first examples we've tried it. For example, it allowed to prove a Tic-Tac-Toe implementation by my colleague Quentin Ochem (for a student training course) without loop invariants, instead of the 16 loop invariants that were originally needed. I also tried it on the computation of the longest common prefix from two starting points in a text, a typical beginner example in SPARK. The original code uses a maximal length of 100000 for the text array, and a while-loop to move forward in the array. Just replace that with a maximal length of 10 and a for-loop, and you get an automatic proof of the rich postcondition of this function without any loop invariant! Transforming a while-loop into a for-loop is not that hard, provided you have a higher bound N on the number of iterations. The while-loop:

while Cond loop
      ...
   end loop;

becomes:

   for K in 1 .. N loop
      exit when not Cond;
      ...
   end loop;

Similarly, on another archetypal learning example, binary search, one can change the while-loop "while Left <= Right loop" into a simple for-loop "for J in U'Range loop" as control will exit as soon as the array is found to be ordered. Again, GNATprove manages to prove the rich postcondition of binary search without a loop invariant in that case.

Besides its interest for beginners, loop unrolling could also be convenient to explore new algorithms, as a way to get confidence in the implementation on a small scale before moving to the final production-level scale. Not only the provers can be relied on to prove for-loops with low number of iterations (under 20), but counterexamples also tend to be better on the unrolled loops, so interaction with the tool is twice improved.

That feature does not eliminate the need for loop invariants in the general case, but it makes it easier to learn and to experiment with SPARK, before the need for loop invariants pushes one to learn how to write them. To know more about this feature, see the SPARK User's Guide.

]]>
Research Corner - Focused Certification of SPARK in Coq http://blog.adacore.com/research-corner-focused-certification-of-spark-in-coq Tue, 18 Jul 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/research-corner-focused-certification-of-spark-in-coq

The SPARK toolset aims at giving guarantees to its users about the properties of the software analyzed, be it absence of runtime errors or more complex properties. But the SPARK toolset being itself a complex tool, it is not free of errors. So how do we get confidence in the results of the analysis? The established means for getting confidence in tools in industry in through a process called sometimes tool certification, sometimes tool qualification. It requires to describe at various levels of details (depending on the criticality of the tool usage) the intended functionality of the tool, and to demonstrate (usually through testing) that the tool correctly implements these functionalities.

The academic way of obtaining confidence is also called "certification" but it uncovers a completely different reality. It requires to provide mathematical evidence, through mechanized proof, that the tool indeed performs a formally specified functionality. Examples of that level of certification are the CompCert compiler and the SEL4 operating system. This level of assurance is very costly to achieve, and as a result not suitable for the majority of tools.

For SPARK, we have worked with our academic partners from Kansas State University and Conservatoire National des Arts et Métiers to achieve a middle ground, establishing mathematical evidence of the correctness of a critical part of the SPARK toolset. The part on which we focused is the tagging of nodes requiring run-time checks by the frontend of the SPARK technology. This frontend is a critical and complex part, which is shared between the formal verification tool GNATprove and the compiler GNAT. It is responsible for generating semantically annotated Abstract Syntax Trees of the source code, with special tags for nodes that require run-time checks. Then, GNATprove relies on these tags to generate formulas to prove, so a missing tag means a missing verification. Our interest in getting better assurance on this part of the technology is not theoretical: that's a part where we repeatedly had errors in the past, leading to missing verifications.

Interested in knowing more? See the paper attached which has been accepted at SEFM 2017 conference, or look at the initial formalization work presented at HILT conference in 2013.

]]>
Applied Formal Logic: Searching in Strings http://blog.adacore.com/applied-formal-logic-searching-in-strings Thu, 29 Jun 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/applied-formal-logic-searching-in-strings

A friend pointed me to recent posts by Tommy M. McGuire, in which he describes how Frama-C can be used to functionally prove a brute force version of string search, and to find a previously unknown bug in a faster version of string search called quick search. Frama-C and SPARK share similar history, techniques and goals. So it was tempting to redo the same proofs on equivalent code in SPARK, and completing them with a functional proof of the fixed version of quick search. This is what I'll present in this post.

Contrary to strings in C which start at index 0, standard strings in SPARK range over positive numbers, and usually start at index 1. I could have made my own strings to start at index 0, but there is no reason to stick to C convention when writing the algorithm in SPARK. At the same time, it's convenient to force the string to start at index 1 with an explicit predicate, which I do like that:

 subtype Text is String with Predicate => Text'First = 1;

Following the order of exposure of Tommy M. McGuire's posts, here is the implementation for the brute force algorithm in SPARK:

 function Brute_Force (Needle, Haystack : in Text) return Natural is
      Diff : Boolean;
   begin
      for I in 1 .. Haystack'Length - Needle'Length + 1 loop
         Diff := False;

         for J in Needle'Range loop
            Diff := Needle(J) /= Haystack(J + (I - 1));
            exit when Diff;
         end loop;

         if not Diff then
            return I;
         end if;
      end loop;

      return 0;
   end Brute_Force;

I am doing here without the parameters n and h which were used in the C version to denote the length of strings needle and haystack, since these are readily available as attributes Haystack'Length and Needle'Length in SPARK. Since I'm working on strings starting at index 1, there are a few adjustments compared to the C version. The use of a temporary variable Diff is needed to detect that the inner loop was exited due to a difference between Needle and the portion of Haystack starting at J, as the for-loop in SPARK does not increment its index in the last iteration of the loop, contrary to its C version.

On this initial version, GNATprove issues one message about a possible integer overflow when computing "Haystack'Length - Needle'Length + 1". It automatically proves all other run-time checks (2 initialization checks, 1 array index check,  2 integer range checks, 2 integer overflow checks). GNATprove also provides a counterexample to understand the possible failure, which can be displayed in our IDE GPS by clicking on the magnify icon on the left of the message/line:

You have to scroll right in the IDE to see all the values, so here are the relevant ones: Haystack'First = 1 and Haystack'Last = 2147483647 and Needle'First = 1 and Needle'Last = 0. In that case, Haystack'Length is 2147483647 and Needle'Length is 0, which means that "Haystack'Length - Needle'Length + 1" is one past the largest signed 32-bits integer. Hence the overflow. One way to avoid this issue is to require that Needle is not the empty string, so its length is at least 1:

function Brute_Force (Needle, Haystack : in Text) return Natural with
     Pre => Needle'Length >= 1;

This precondition is sufficient for GNATprove to prove all checks in Brute_Force, but I've made it stronger like done by TMM in his post, as it does not make sense to look for a needle that is longer than the haystack:

function Brute_Force (Needle, Haystack : in Text) return Natural with
     Pre => Needle'Length in 1 .. Haystack'Length;

Note that, compared to what is needed with Frama-C, we don't need here to provide loop assigns or loop invariants. GNATprove automatically computes the variables that are modified in a loop, as well as the range of for-loop indexes. Still following the order of exposure of TMM's posts, let's turn to the functional contract for searching a string. I'm directly translating here the functions partial_match_at and match_at given by TMM from C to SPARK, as well as the contract of brute_force. Functions Partial_Match_At and Match_At are ghost functions in SPARK (with aspect Ghost), which means that they can be used only in assertions/contracts and ghost code. A difference with Frama-C is that ghost code is executable like regular code in SPARK, so one must show absence of run-time errors in ghost code as well, hence the precondition on Partial_Match_At below:

 --  There is a partial match of the needle at location loc in the
   --  haystack, of length len.
   function Partial_Match_At
     (Needle, Haystack : Text; Loc : Positive; Len : Natural) return Boolean
   is
     (for all I in 1 .. Len => Needle(I) = Haystack(Loc + (I - 1)))
   with Ghost,
        Pre => Len <= Needle'Length
          and then Loc - 1 <= Haystack'Length - Len;

   --  There is a complete match of the needle at location loc in the
   --  haystack.
   function Match_At (Needle, Haystack : Text; Loc : Positive) return Boolean is
     (Loc - 1 <= Haystack'Length - Needle'Length
      and then Partial_Match_At (Needle, Haystack, Loc, Needle'Length))
   with Ghost;

The contract on Brute_Force is similar to the one in Frama-C, with a shift by one for the origin of strings, Brute_Force'Result instead of \result to denote the result of the function, and an if-expression instead of behaviors (SPARK has a similar notion of contract cases, but they must always have disjoint guards in SPARK, so are not applicable here):

function Brute_Force (Needle, Haystack : in Text) return Natural with
     Pre  => Needle'Length in 1 .. Haystack'Length,
     Post => Brute_Force'Result in 0 .. Haystack'Length - Needle'Length + 1
       and then
       (if Brute_Force'Result > 0 then
          Match_At (Needle, Haystack, Brute_Force'Result)
        else
          (for all K in Haystack'Range =>
             not Match_At (Needle, Haystack, K)));

Before we even try to prove that this contract is satisfied by the implementation of Brute_Force, it is a good idea to test it on a few inputs, to get rid of silly mistakes. Here is a test driver to do precisely that:

with String_Search; use String_Search;

procedure Test_Search is
   All_Men : constant Text :=
     "We hold these truths to be self-evident, that all men are created equal,"
     & " that they are endowed by their Creator with certain unalienable "
     & "Rights, that among these are Life, Liberty and the Pursuit of "
     & "Happiness. That to secure these rights, Governments are instituted "
     & "among Men, deriving their just powers from the consent of the governed";
begin
   pragma Assert (Brute_Force (All_Men, "just powers") > 0);
   pragma Assert (Brute_Force (All_Men, "austin powers") = 0);
end Test_Search;

Just compile the code with assertions on (switch -gnata), run it, and... it fails the precondition of Brute_Force:

raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : failed precondition from string_search.ads:24

What happened here is that I put arguments in the wrong order in the call to Brute_Force. I'm not making this up, this really happened to me (I am that bad!). Anyway, that illustrates that testing is a good idea, even if here it detected a bug in the test itself. The fix in SPARK is to use named parameters to avoid such issues. They don't have to appear in the same order as in the function signature, but it's a good idea nonetheless:

 pragma Assert (Brute_Force (Needle => "just powers", Haystack => All_Men) > 0);

Once fixed, the test passes without errors. Like in the case of Frama-C, we need to add loop invariants for GNATprove to prove that Brute_Force satisfies its contract. Loop invariants in SPARK are different from the classical loop invariants used in Frama-C: you can put them anywhere in the loop, and they don't have to hold when reaching/exiting the loop but only when execution reaches the program point of the loop invariant. I prefer in general to put loop invariants at the end of loops, because it's more natural to express what has been achieved so far:

function Brute_Force (Needle, Haystack : in Text) return Natural is
      Diff : Boolean;
   begin
      for I in 1 .. Haystack'Length - Needle'Length + 1 loop
         Diff := False;

         for J in Needle'Range loop
            Diff := Needle(J) /= Haystack(J + (I - 1));
            exit when Diff;
            pragma Loop_Invariant (Partial_Match_At (Needle, Haystack, I, J));
            pragma Loop_Invariant (Diff = (Needle(J) /= Haystack(J + (I - 1))));
         end loop;

         if not Diff then
            return I;
         end if;

         pragma Loop_Invariant
           (for all K in 1 .. I => not Match_At (Needle, Haystack, K));
      end loop;

      return 0;
   end Brute_Force;

A subtlety above is that, since we're replacing the implicit loop invariant in the inner loop (located at the start of the loop) by an explicit loop invariant at the end of the inner loop, we need to repeat in that loop invariant the information about the current value of Diff, otherwise this information is not available on the path starting from the loop invariant and exiting the loop in the last iteration. Otherwise this is similar to what was done in Frama-C. With these loop invariants, GNATprove proves all checks in Brute_Force, including its postcondition.

I kept above the implementation structure originating from the C version of brute_force, but in SPARK we can simplify it by replacing the inner loop with a direct comparison of Needle with a slice of Haystack:

   function Brute_Force (Needle, Haystack : in Text) return Natural is
   begin
      for I in 1 .. Haystack'Length - Needle'Length + 1 loop
         if Needle = Haystack(I .. I + (Needle'Last - 1)) then
            return I;
         end if;

         pragma Loop_Invariant
           (for all K in 1 .. I => not Match_At (Needle, Haystack, K));
      end loop;

      return 0;
   end Brute_Force;

This version is also completely proved by GNATprove.

Now turning to the more involved algorithm for string search called quick search presented in this other post by TMM. Translating the implementation, contracts and loop invariants in SPARK is quite easy. As for the brute force version, more precise types in SPARK allow to get rid of a number of annotations:

type Shift_Table is array (Character) of Positive;

   procedure Make_Bad_Shift (Needle : Text; Bad_Shift : out Shift_Table) with
     Pre  => Needle'Length < Integer'Last,
     Post => (for all C in Character => Bad_Shift(C) in 1 .. Needle'Length + 1);

   function QS (Needle, Haystack : in Text) return Natural with
     Pre => Needle'Length < Integer'Last
       and then Haystack'Length < Integer'Last - 1
       and then Needle'Length in 1 .. Haystack'Length;

I am also getting rid of a loop in Make_Bad_Shift and a loop in QS compared to their C version, as we can directly assign and compare strings in SPARK:

   procedure Make_Bad_Shift (Needle : Text; Bad_Shift : out Shift_Table) is
   begin
      Bad_Shift := (others => Needle'Length + 1);

      for J in Needle'Range loop
         Bad_Shift(Needle(J)) := Needle'Length - J + 1;
         pragma Loop_Invariant (for all C in Character => Bad_Shift(C) in 1 .. Needle'Length + 1);
      end loop;
   end Make_Bad_Shift;

   function QS (Needle, Haystack : in Text) return Natural is
      Bad_Shift : Shift_Table;
      I : Positive;

   begin
      --  Preprocessing
      Make_Bad_Shift (Needle, Bad_Shift);

      --  Searching
      I := 1;
      while I <= Haystack'Length - Needle'Length + 1 loop
         if Needle = Haystack(I .. I + (Needle'Last - 1)) then
            return I;
         end if;
         I := I + Bad_Shift(Haystack(I + Needle'Length));  --  Shift
      end loop;

      return 0;
   end QS;

GNATprove proves all checks on the above code, including postconditions, except for the array index check when computing "Haystack(I + Needle'Length)". This is precisly the bug that was discovered by TMM, that he presents in his post. GNATprove further helps by providing a counterexample to understand the possible failure:

Indeed, when I=2 and Haystack'Last=2, "I + Needle'Length" is outside of the bounds of Haystack whenever Needle is not the empty string. We can fix that by exiting early from the loop before the assignment to I in the loop:

exit when I = Haystack'Length - Needle'Length + 1;

With this fix, GNATprove proves all checks on the code of quick search.

Now turning to proving the functional behavior of quick search. The postcondition of QS is the same as the one of Brute_Force, given that only the algorithm changes between the two:

function QS (Needle, Haystack : in Text) return Natural with
     Pre => Needle'Length < Integer'Last
       and then Haystack'Length < Integer'Last - 1
       and then Needle'Length in 1 .. Haystack'Length,
     Post => QS'Result in 0 .. Haystack'Length - Needle'Length + 1
       and then
       (if QS'Result > 0 then
          Match_At (Needle, Haystack, QS'Result)
        else
          (for all K in Haystack'Range =>
             not Match_At (Needle, Haystack, K)));

In order to prove the contract of QS, we'll need to specify and prove the functional behavior of Make_Bad_Shift first. As explained by TMM in his post, Make_Bad_Shift is used to align the last instance of a given character in the needle with a matching character in the haystack. So for every such character C, either it does not occur in the needle in which case Bad_Shift(C) has the value "Needle'Length + 1", or it occurs (possibly multiple times) in the needle in which case it occurs last at index "Needle'Length - Bad_Shift(C) + 1". This is what is expressed in the following postcondition:

 procedure Make_Bad_Shift (Needle : Text; Bad_Shift : out Shift_Table) with
     Pre  => Needle'Length < Integer'Last,
     Post => (for all C in Character => Bad_Shift(C) in 1 .. Needle'Length + 1)
       and then (for all C in Character =>
                   (if Bad_Shift(C) = Needle'Length + 1 then
                      (for all K in Needle'Range => C /= Needle(K))
                    else
                      Needle(Needle'Length - Bad_Shift(C) + 1) = C
                      and (for all K in Needle'Length - Bad_Shift(C) + 2 .. Needle'Last => Needle(K) /= C)
                 ));

In order to prove that the implementation of Make_Bad_Shift satisfies this postcondition, we simply have to repeat this postcondition as a loop invariant, accumulating that information as the loop index J progresses (see how occurrences of Needle'Last in the postcondition were replaced by occurrences of J in the loop invariant):

 procedure Make_Bad_Shift (Needle : Text; Bad_Shift : out Shift_Table) is
   begin
      Bad_Shift := (others => Needle'Length + 1);

      for J in Needle'Range loop
         Bad_Shift(Needle(J)) := Needle'Length - J + 1;
         pragma Loop_Invariant (for all C in Character => Bad_Shift(C) in 1 .. Needle'Length + 1);
         pragma Loop_Invariant (for all C in Character =>
                                  (if Bad_Shift(C) = Needle'Length + 1 then
                                     (for all K in 1 .. J => C /= Needle(K))
                                   else
                                      Needle(Needle'Length - Bad_Shift(C) + 1) = C
                                      and (for all K in Needle'Length - Bad_Shift(C) + 2 .. J => Needle(K) /= C)
                                ));
      end loop;
   end Make_Bad_Shift;

GNATprove proves all checks on the above code.

Now turning to QS, we need to establish a loop invariant very similar to the one used in Brute_Force, except here we want to establish the property that Needle does not match up to index "I + Bad_Shift(Haystack(I + Needle'Length)) - 1" instead of just I:

  pragma Loop_Invariant
           (for all K in 1 .. I + Bad_Shift(Haystack(I + Needle'Length)) - 1 => not Match_At (Needle, Haystack, K));

We also need to bound I in the loop invariant, as we're inserting the above loop invariant in the middle of the loop, hence we do not get "for free" that I satisfies the loop test:

 pragma Loop_Invariant (I <= Haystack'Length - Needle'Length);

With these additions, GNATprove proves all checks in QS, including its postcondition, but it does not prove its loop invariant:

string_search.adb:111:81: medium: loop invariant might fail after first iteration, cannot prove not Match_At (Needle, Haystack, K) (e.g. when Haystack = (0 => 'NUL', 5 => 'NUL', others => 'SOH') and Haystack'First = 1 and Haystack'Last = 6 and I = 4 and K = 5 and Needle = (0 => 'SOH', 3 => 'SOH', 4 => 'SOH', 6 => 'SOH', others => 'NUL') and Needle'First = 1 and Needle'Last = 2)
string_search.adb:111:81: medium: loop invariant might fail in first iteration, cannot prove not Match_At (Needle, Haystack, K) (e.g. when Haystack = (0 => 'NUL', 2 => 'NUL', others => 'SOH') and Haystack'First = 1 and Haystack'Last = 3 and I = 1 and K = 2 and Needle = (0 => 'SOH', 3 => 'SOH', others => 'NUL') and Needle'First = 1 and Needle'Last = 2)

This is expected. There is a big reasoning gap to go from the postcondition of Make_Bad_Shift to the loop invariant in QS. We are going to use ghost code to close that gap and convince GNATprove that the loop invariant holds in every iteration. What we need to show is that, for every starting position that is skipped (for K in the range I + 1 to I + Bad_Shift(Haystack(I + Needle'Length)) - 1), the needle cannot align with the haystack at that position. In fact, we know exactly at which position these alignments would fail: at the position "I + Needle'Length" in Haystack. Looking at the postcondition of Make_Bad_Shift, this corresponds to position "I + Needle'Length - K + 1" in Needle. Let's write it down just before the loop invariant:

       for K in I + 1 .. I + Bad_Shift(Haystack(I + Needle'Length)) - 1 loop
            pragma Assert (Haystack(I + Needle'Length) /= Needle(I + Needle'Length - K + 1));
            pragma Assert (not Match_At (Needle, Haystack, K));
         end loop;

GNATprove proves the above assertions, using the first one to prove the second one, so we can now accumulate this information in a loop invariant for all values of positions that are skipped:

 for K in I + 1 .. I + Bad_Shift(Haystack(I + Needle'Length)) - 1 loop
            pragma Assert (Haystack(I + Needle'Length) /= Needle(I + Needle'Length - K + 1));
            pragma Loop_Invariant
              (for all L in 1 .. K => not Match_At (Needle, Haystack, L));
         end loop;

With this addition of ghost code, GNATprove proves all checks in QS, including its postcondition and loop invariants. In the final version of that code, I'm using a local ghost procedure Prove_QS instead of inlining the ghost code in the implementation of QS. That way, GNATprove still internally inlines the implementation of Prove_QS to prove QS, but the compiler will completely get rid of the body and call to Prove_QS in the final executable built without assertions:

 function QS (Needle, Haystack : in Text) return Natural is
      Bad_Shift : Shift_Table;
      I : Positive;

      procedure Prove_QS with Ghost is
         Shift : constant Positive := Bad_Shift(Haystack(I + Needle'Length));
      begin
         for K in I + 1 .. I + Shift - 1 loop
            pragma Assert (Haystack(I + Needle'Length) /= Needle(I + Needle'Length - K + 1));
            pragma Loop_Invariant
              (for all L in 1 .. K => not Match_At (Needle, Haystack, L));
         end loop;
      end Prove_QS;

   begin
      --  Preprocessing
      Make_Bad_Shift (Needle, Bad_Shift);

      --  Searching
      I := 1;
      while I <= Haystack'Length - Needle'Length + 1 loop
         if Needle = Haystack(I .. I + (Needle'Last - 1)) then
            return I;
         end if;
         exit when I = Haystack'Length - Needle'Length + 1;

         Prove_QS;

         pragma Loop_Variant (Increases => I);
         pragma Loop_Invariant (I <= Haystack'Length - Needle'Length);
         pragma Loop_Invariant
           (for all K in 1 .. I + Bad_Shift(Haystack(I + Needle'Length)) - 1 => not Match_At (Needle, Haystack, K));

         I := I + Bad_Shift(Haystack(I + Needle'Length));  --  Shift
      end loop;

      return 0;
   end QS;

I also added a loop variant to ensure that the while-loop will terminate. For-loops always terminate in SPARK because the loop index cannot be assigned by the user (contrary to what C allows), but while-loops or plain-loops might not terminate, hence the use of a loop variant to verify their termination.

The code presented in this post is available on GitHub: spec and body. Now a challenge for Frama-C users is to translate back the functional proof of QS in SPARK into C and Frama-C!

The project SPARK-by-Example by Christophe Garion and Jérôme Hugues contains other examples of functionally proven string algorithms, which correspond to the SPARK version of the work done by Jens Gerlach with Frama-C in the ACSL-by-Example project.

]]>
The Adaroombot Project http://blog.adacore.com/the-adaroombot-project Tue, 20 Jun 2017 12:48:00 +0000 Rob Tice http://blog.adacore.com/the-adaroombot-project

Owning an iRobot RoombaⓇ is an interesting experience. For those not familiar, the RoombaⓇ is a robot vacuum cleaner that’s about the diameter of a small pizza and stands tall enough to still fit under your bed. It has two independently driven wheels, a small-ish dust bin, a rechargeable battery, and a speaker programmed with pleasant sounding beeps and bloops telling you when it’s starting or stopping a cleaning job. You can set it up to clean on a recurring schedule through buttons on the robot, or with the new models, the mobile app. It picks up an impressive amount of dust and dirt and it makes you wonder how you used to live in a home that was that dirty.

A Project to Learn Ada


I found myself new to AdaCore without any knowledge of the Ada programming language around the same time I acquired a RoombaⓇ for my cats to use as a golf cart when I wasn’t home. In order to really learn Ada I decided I needed a good project to work on. Having come from an embedded Linux C/C++ background I decided to do a project involving a Raspberry Pi and something robotic that it could control. It just so happens that iRobot has a STEM initiative robot called the CreateⓇ 2 which is aimed towards embedded control projects. That’s how the AdaRoombot project was born.

The first goal of the project was to have a simple Ada program use the CreateⓇ 2’s serial port to perform some control algorithm. Mainly this would require the ability to send commands to the robot and receive feedback information from the robot’s numerous sensors. As part of the CreateⓇ 2 documentation package, there is a PDF detailing the serial port interface called the iRobot CreateⓇ 2 Open Interface Specification.

On the command side of things there is a simple protocol: each command starts with a one-byte opcode specifying which command is being issued and then is followed by a number of bytes carrying the data associated with the opcode, or the payload. For example, the Reset command has an opcode of 7 and has zero payload bytes. The Set Day/Time command has an opcode of 168 and has a 3-byte payload with a byte specifying the Day, another for the Hour, and another for the Minute. The interface for the Sensor data is a little more complicated. The host has the ability to request data from individual sensors, a list of sensors, or tell the robot to just stream the list over and over again for processing. To make things simple, I choose to just receive all the sensor data on each request.

Because we are using a Raspberry Pi, it is quite easy to communicate with a serial port using the Linux tty interface. As with most userspace driver interfaces in Linux, you open a file and read and write byte data to the file. So, from a software design perspective, the lowest level of the program abstraction should take robot commands and transform them into byte streams to write to the file, and conversely read bytes from the file and transform the byte data to sensor packets. The next level of the program should perform some algorithm by interpreting sensor data and transmitting commands to make the robot perform some task and the highest level of the program should start and stop the algorithm and do some sort of system monitoring.

The high level control algorithm I used is very simple: drive straight until I hit something, then turn around and repeat. However, the lower levels of the program where I am interfacing with peripherals is much more exciting. In order to talk to the serial port, I needed access to file I/O and Linux’s terminal I/O APIs. 


Ada has cool features

Ada has a nifty way to interface with the Linux C libraries that can be seen near the bottom of “src/communication.ads”. There I am creating Ada specs for C calls, and then telling the compiler to use the C implementations supplied by Linux using pragma Import. This is similar to using extern  in C. I am also using pragma Convention which tells the compiler to treat Ada records like C structs so that they can be passed into the imported C functions. With this I have the ability to interface to any C call I want using Ada, which is pretty cool. Here is an example mapping the C select call into Ada:

--  #include <sys/select.h>
--  fd_set represents file descriptor sets for the select function.
--  It is actually a bit array.
type Fd_Set is mod 2 ** 32;
pragma Convention (C, Fd_Set);

--  #include <sys/time.h>
--  time_t tv_sec - number of whole seconds of elapsed time.
--  long int tv_usec - Rest of the elapsed time in  microseconds.
type Timeval is record
    Tv_Sec  : C.Int;
    Tv_Usec : C.Int;
end record;
pragma Convention (C, Timeval);

function C_Select (Nfds : C.Int;
                  Readfds   : access Fd_Set;
                  Writefds  : access Fd_Set;
                  Exceptfds : access Fd_Set;
                  Timeout   : access Timeval)
                  return C.int;
pragma Import (C, C_Select, "select");

The other neat low-level feature to note here can be seen in “src/types.ads”. The record Sensor_Collection is a description of the data that will be received from the robot over the serial port. I am using a feature called a representation clause to tell the compiler where to put each component of the record in memory, and then overlaying the record on top of a byte stream. By doing this, I don’t have to use any bit masks or bit shift to access individual bits or fields within the byte stream. The compiler has taken care of this for me. Here is an example of a record which consists of Boolean values, or bits in a byte:

type Sensor_Light_Bumper is record
    LT_Bump_Left         : Boolean;
    LT_Bump_Front_Left   : Boolean;
    LT_Bump_Center_Left  : Boolean;
    LT_Bump_Center_Right : Boolean;
    LT_Bump_Front_Right  : Boolean;
    LT_Bump_Right        : Boolean;
end record
 with Size => 8;

for Sensor_Light_Bumper use record
    LT_Bump_Left at 0 range 0 .. 0;
    LT_Bump_Front_Left at 0 range 1 .. 1;
    LT_Bump_Center_Left at 0 range 2 .. 2;
    LT_Bump_Center_Right at 0 range 3 .. 3;
    LT_Bump_Front_Right at 0 range 4 .. 4;
    LT_Bump_Right at 0 range 5 .. 5;
end record;

In this example, LT_Bump_Left is the first bit in the byte, LT_Bump_Front_Left is the next bit, and so on. In order to access these bits, I can simply use the dot notation to access members of the record, where with C I would have to mask and shift. Components that span multiple bytes can also include an endianness specification. This is useful because on this specific platform data is little endian, but the serial port protocol is big endian. So instead of byte swapping, I can specify certain records as having big endian mapping. The compiler then handles the swapping.

These are some of the really cool low level features available in Ada. On the high-level programming side, the algorithm development, Ada feels more like C++, but with some differences in philosophy. For instance, certain design patterns are more cumbersome to implement in Ada because of things like, Ada objects don’t have explicit constructors or destructors. But, after a small change in mind-set it was fairly easy to make the robot drive around the office.

Adaroombot Bottom
Adaroombot Top

The code for AdaRoombot, which is available on Github, can be compiled using the GNAT GPL cross compiler for the Raspberry Pi 2 located on the AdaCore Libre site. The directions to build and run the code is included in the README file in the root directory of the repo. The next step is to add some vision processing and make the robot chase a ball down the hallway. Stay tuned….

The code is available on GitHub: here.


Want to start creating and hacking with Ada? We have a great challenge for you!

The Make with Ada competition, hosted by AdaCore, calls on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and offers over €8000 in total prizes. Celebrating its sophomore year, the Make With Ada Competition is designed to promote the benefits of the languages to the software development community at large. For more information and to register, go to makewithada.org.

]]>
GNAT GPL 2017 is out! http://blog.adacore.com/gnat-gpl-2017-is-out Thu, 15 Jun 2017 13:20:09 +0000 Pierre-Marie de Rodat http://blog.adacore.com/gnat-gpl-2017-is-out

For those users of the GNAT GPL edition, we are pleased to announce the availability of the 2017 release of GNAT GPL and SPARK GPL.

SPARK GPL 17 offers improved automation of proofs, thanks to improvements in the underlying prover Alt-Ergo and a finer-grain splitting of conjunctions. Interaction during proof has been improved thanks to the new statistics, display and replay modes. Type invariants from Ada are now also supported in SPARK. Note that the optional provers CVC4 and Z3 are no longer distributed with SPARK GPL 2017, and should be installed separately.

This release also marks the first introduction of “future language” Ada 2020 features:

  • delta aggregates (partial aggregate notation)
  • AI12-0150-1 class-wide invariants now employ class-wide pre-/postcondition-like semantics (static call resolution).

This release supports the ARM ELF bare metal target, hosted on Windows and Linux, as well as the following native platforms:

  • Mac OS (64 bits)
  • Linux (64 bits)
  • Windows (32 bits)

The GNATemulator technology has been added to the bare metal target, making it easier to develop and test on those platforms.

The compiler toolchain is now based on GCC 6. The native runtime comes with a Zero Foot Print runtime, and the ARM ELF compiler comes with runtimes for a variety of boards, including support for the Raspberry Pi 2.

The latest version of the GPS IDE comes with many bug fixes and enhancements, notably in the areas of debugger integration and support for bare-metal development.

The GPL 2017 release can be downloaded:

Feeling inspired and want to start Making with Ada today? Check out the Make With Ada Competition! http://makewithada.org

]]>
Ada on the first RISC-V microcontroller http://blog.adacore.com/ada-on-the-first-risc-v-microcontroller Tue, 13 Jun 2017 12:30:00 +0000 Fabien Chouteau http://blog.adacore.com/ada-on-the-first-risc-v-microcontroller

The RISC-V open instruction set is getting more and more news coverage these days. In particular since the release of the first RISC-V microcontroller from SiFive and the announcement of an Arduino board at the Maker Faire Bay Area 2017.

As an Open Source software company we are very interested in this trendy, new, open platform. AdaCore tools already support an open IP core with the Leon processor family, a popular architecture in the space industry that is developed in VHDL and released under the GPL. RISC-V seems to be targeting a different market and opening new horizons.

GNAT - the Ada compiler developed and maintained by AdaCore - is part of the GCC toolchain. As a result, when a new back-end is added we can fairly quickly start targeting it and developing in Ada. In this blog post I will describe the steps I followed to build the tool chain and start programing the HiFive 1 RISC-V microcontroller in Ada.

Building the tool chain

The first step is to build the compiler. SiFive - manufacturer of the MCU - provides an SDK repository with scripts to build a cross RISC-V GCC. All I had to do was to change the configure options to enable Ada support

--enable-languages=c,c++,ada

and disable libada since this is a bare-metal target (no operating system) we won’t use a complete run-time

--disable-libada

If you want to build the toolchain yourself, I forked and modified the freedom-e-sdk repository.

Just clone it

$ git clone --recursive https://github.com/Fabien-Chouteau/freedom-e-sdk

install a native GNAT from your Linux distrib (I use Ubuntu)

$ sudo apt-get install gnat

and start the build

$ cd freedom-e-sdk
$ make tools

If you have a problem with this procedure don’t hesitate to open an issue on GitHub, I’ll see what I can do to help.

Building the run-time

Ada programs always need a run-time library, but there are different run-time profiles depending on the constraints of the platform. In GNAT we have the so called Zero FootPrint run-time (ZFP) that provides the bare minimum and therefore is quite easy to port to a new platform (no exception propagation, no tasking, no containers, no file system access, etc.).

I started from Shawn Nock’s ZFP for the Nordic nRF51 and then simply changed the linker script and startup code, everything else is platform independant.

You can find the run-time in this repository: https://github.com/Fabien-Chouteau/zfp-hifive1

Writing the drivers

To control the board I need a few drivers. I started by writing the description of the hardware registers using the SVD format: here.

I then generated Ada mapping from this file using the SVD2Ada tool. You can find more info about this process at the beginning of this blog post.

From these register mappings it’s fairly easy to implement the drivers. So far I wrote GPIO and UART: https://github.com/AdaCore/Ada_Drivers_Library/tree/master/arch/RISC-V/SiFive/drivers

First Ada projects on the HiFive1

The first project is always a blinky. The HiFive1 has RGB leds so I started by driving those. You can find this example in the Ada_Drivers_Library.

If you want to run this simple example on your board, get my fork of the freedom-e-sdk (as described above) and run:

$ make ada_blinky_build

and then

$ make ada_blinky_upload

to flash the board.

For the second project, thanks to the architecture of the Ada_Drivers_Library, I was able to re-use the thermal printer driver from my DIY instant camera and it took me only 5 minutes to print something from the HiFive1.

Conclusion

All of this is only experimental for the moment, but it shows how quickly we can start programing Ada on new platforms. Proper support would require a run-time with tasking, interruptions, protected objects (Ravenscar profile) and of course complete test and validation of the compiler.


Feeling inspired and want to start Making with Ada today? We have the perfect challenge for you!

The Make with Ada competition, hosted by AdaCore, calls on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and offers over €8000 in total prizes. Celebrating its sophomore year, the Make With Ada Competition is designed to promote the benefits of the languages to the software.

]]>
Research Corner - FLOSS Glider Software in SPARK http://blog.adacore.com/research-corner-floss-glider-software-in-spark Sun, 11 Jun 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/research-corner-floss-glider-software-in-spark

Two years ago, we redeveloped the code of a small quadcopter called Crazyflie in SPARK, as a proof-of-concept to show it was possible to prove absence of run-time errors (no buffer overflows, not division by zero, etc.) on such code. Actually, this was done with very modest effort: the rewrite of the stabilization code was all done by an intern in two months. Since then, we maintain the resulting code as FLOSS on GitHub, and it has been used for example by the people involved in CAP 2018 project as a prototyping platform.

The researchers Martin Becker and Emanuel Regnath have raised the bar by developing the code for the autopilot of a small glider in SPARK in three months only. This time, we talk of an autonomous drone operating beyond line of sight. In such a limited timeframe, they achieved both high level of SPARK coverage (portion of the code in SPARK) and high level of automatic proof. They also developed their own agile process around SPARK, using scripts that you can find on this blog. They mostly targeted absence of run-time errors (the Silver level of SPARK assurance) but this is already an impressive feat! In particular they reported about the challenges with proofs of floating-point computations, a topic we have already talked about on this blog.

What's even more interesting for others tempted to do something similar in academia or in industry is that they have published a paper about their experience at SAFECOMP, presented their work at the Frama-C & SPARK Day, and released their code as FLOSS. And of course they are now targeting a more ambitious project to apply the same techniques with SPARK!

]]>
Research Corner - Floating-Point Computations in SPARK http://blog.adacore.com/research-corner-floating-point-computations-in-spark Thu, 08 Jun 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/research-corner-floating-point-computations-in-spark

It is notoriously hard to prove properties of floating-point computations, including the simpler bounding properties that state safe bounds on the values taken by entities in the program. Thanks to the recent changes in SPARK 17, users can now benefit from much better provability for these programs, by combining the capabilities of different provers. For the harder cases, this requires using ghost code to state intermediate assertions proved by one of the provers, to be used by others. This work is described in an article which was accepted at VSTTE 2017 conference. In this article, we describe the mechanisms for adapting the formulas to prove to different provers based on  different technologies to interpret floating-point computations. As we already presented on this blog, the improvements in particular with the abstract interpreter CodePeer and the SMT prover Z3 are very important.

One figure from the article which explains well the current situation is the following:

In order to prove the postcondition of the procedure analyzed (last line), it is necessary to insert 12 intermediate assertions in the source code, which are proved each by one of the provers provided with SPARK Pro (CVC4, ALt-Ergo, Z3 and CodePeer). The green cells in that figure correspond to provers which prove the corresponding formula, with the time they take for that in seconds. Some of these assertions require the use of ghost code to define entities only meant for verification.

While this may seem like a lot of work to prove here a single postcondition, you may convince yourself by reading the article (where we give a mathematical argument for the proof) that this is not trivial either. And the improvements we observed with provers not (yet) included in SPARK (the provers AE_fpa and Colibri on the right hand side of the figure) give us much hope that the situation is going to improve a lot in the coming year.

]]>
Frama-C & SPARK Day Slides and Highlights http://blog.adacore.com/frama-c-spark-day-slides-and-highlights Fri, 02 Jun 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/frama-c-spark-day-slides-and-highlights

We had a very successful event gathering the people interested in formal program verification for C programs (with Frama-C) and for Ada programs (with SPARK). This year, 139 people registered and around 110-120 actually came. The slides are on the page of the event, under "Slides".

If you did not attend the introduction presentation by Claude Marché, I recommend it. It presents very clearly the common history of Frama-C and SPARK, and the common research topics and joint projects today, that lead to this shared event. As Claude says it in conclusion:

Frama-C and SPARK share not only a common history but


  • A will to transfer academic research to the industry of critical software
  • Common challenges, approaches, technical solutions

Of particular interest for SPARK users are the presentations of Carl Brandon, Peter Chapin, Martin Becker and Stefan Berghofer:

  1. rl Brandon and Peter Chapin presented their Lunar IceCube satellite project that was already mentioned on this blog. It's a 15 million dollar project, with a launch valued 18 million dollar, so Carl and Peter will need every help from SPARK for developing a perfect on-board software! They are trying to open source the core platform on which Lunar IceCube operates, called CubedOS, which isolates applicative threads (14 in their case) from the underlying operating system and Ada runtime. CubedOS is similar to the core Flight System developed at NASA, but it's written in SPARK and the goal is to prove at least correct data flow and absence of run-time errors on the code, possibly even some key properties if time allows.
  2. Martin Becker presented his work on a glider software in SPARK. It was a "crazy" project (as he puts it) with such limited timeframe (3 months) and resources (2 persons), so the result is even more impressive. They developed most of the software in SPARK, and achieved both high level of SPARK coverage (portion of the code in SPARK) and automatic proof. They also developed their own agile process around SPARK, using scripts that you can find on this blog.
  3. Stefan Berghofer presented his work on formal verification of cryptographic software in SPARK, using the bridge to interactive prover Isabelle for the more complex properties. His team at secunet, together with their colleagues from University of Rapperswil, has been at the forefront of formal program verification with SPARK for many years. Their Muen separation kernel in SPARK is described in this blog post.

I also liked a lot the presentation of Christophe Garion and Jérôme Hugues. They took a fairly large piece of critical software (10,000 sloc in Ada and 15,000 sloc in C), the PolyORB-Hi runtime for distributed software generated from an AADL description, and applied Frama-C on the C version and SPARK on the Ada version to achieve absence of run-time errors and proof of properties. Their conclusion, which matches the initial assessment they did in 2014, is that it is much easier to achieve high assurance through formal program verification in SPARK than in C, mostly because the language really supports it. What I like is that they did exactly what some reviewers keep asking (in my opinion sometimes mistakenly) when we submit articles describing use of SPARK on industrial projects: that we should also submit a comparison with the same goals achieved with other technologies, on programs in other programming languages. This is rarely feasible. Well, they did it!

Unfortunately, the very interesting presentation by Jean-Marc Mota on experiments on the adoption of SPARK at Thales is not available. This blog post presents the levels of software assurance that we defined for SPARK as the result of these experiments.


]]>
New Guidance for Adoption of SPARK http://blog.adacore.com/new-guidance-for-adoption-of-spark Sat, 27 May 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/new-guidance-for-adoption-of-spark

During 2016, AdaCore has collaborated with Thales to carry multiple experiments in applying SPARK to existing software projects in Ada. We discovered two things during these experiments:

  1. Adoption of formal verification, especially on existing projects, can be achieved in stages.
  2. Specific guidance is essential for adoption, both to define achievable objectives and to address the likely issues that will arise.

The stages we have defined are called Stone, Bronze, Silver, Gold and Platinum. They are described in the document called Implementation Guidance for the Adoption of SPARK that we co-authored with Thales. For each level, we define the benefits but also the impact on process and the costs and limitations. That's the high level view of each level. Then we give detailed guidelines on how to achieve that level on existing or new code.

What we discovered later was that the highest four levels (except Stone) map well with the verication objectives targeted at different DAL/SIL levels at Altran UK, with highest levels typically applied only at highest DAL/SIL levels (DAL A/SIL 4). The levels, guidelines and mapping with DAL/SIL were presented at the recent High Confidence Software and Systems conference.

]]>
DIY Coffee Alarm Clock http://blog.adacore.com/diy-coffee-alarm-clock Tue, 16 May 2017 13:00:00 +0000 Fabien Chouteau http://blog.adacore.com/diy-coffee-alarm-clock

A few weeks ago one of my colleagues shared this kickstarter project : The Barisieur. It’s an alarm clock coffee maker, promising to wake you up with a freshly brewed cup of coffee every morning. I jokingly said “just give me an espresso machine and I can do the same”. Soon after, the coffee machine is in my office. Now it is time to deliver :)

The basic idea is to control the espresso machine from an STM32F469 board and use the beautiful screen to display the clock face and configuration interface.

Hacking the espresso machine

The first step is to be able to control the machine with the 3.3V signal of the microcontroller. To do this, I open the case to get to the two push buttons on the top. Warning! Do not open this kind of appliance if you don’t know what you are doing. First, it can be dangerous, second, these things are not made to be serviceable so there’s a good chance you will never be able to put it back together.

The push buttons are made with two exposed concentric copper traces on a small PCB and a conductive membrane that closes the circuit when the button is pushed.

I use a multimeter to measure the voltage between two circles of one of the buttons. To my surprise the voltage is quite high, about 16V. So I will have to use a MOSFET transistor to act as an electronic switch rather than just connecting the microcontroller to the espresso machine signals.

I put that circuit on an Arduino proto shield that is then plugged behind the STM32F469 disco board. The only things left to do are to drill a hole for the wires to go out of the the machine and to make a couple of metal brackets to attach to the board. Here’s a video showing the entire hacking process.

Writing the alarm clock software

For the clock face and configuration interface I will use Giza, one of my toy projects that I developed to play with the object oriented programming features of Ada. It’s a simplistic/basic UI framework.

Given the resolution of the screen (800x480) and the size of the text I want to display, it will be too slow to use software font rendering. Instead, I will take advantage of the STM32F4’s 2D graphic hardware acceleration (DMA2D) and have some bitmap images embedded in the executable. DMA2D can very quickly copy chunks of memory - typically bitmaps - but also convert them from one format to the other. This project is the opportunity to implement support of indexed bitmap in the Ada_Drivers_Library.

I also add support for STM32F4’s real time clock (RTC) to be able to keep track of time and date and of course trigger the coffee machine at the time configured by the user.

It’s time to put it all together and ask my SO to perform in the the high budget video that you can see at the beginning of this post :)

The code is available on GitHub: here.


Feeling inspired and want to start Making with Ada today? We have the perfect challenge for you!

The Make with Ada competition, hosted by AdaCore, calls on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and offers over €8000 in total prizes. Celebrating its sophomore year, the Make With Ada Competition is designed to promote the benefits of the languages to the software development community at large. For more information and to register, go to makewithada.org.

]]>
(Many) More Low Hanging Bugs http://blog.adacore.com/many-more-low-hanging-bugs Fri, 05 May 2017 13:00:00 +0000 Yannick Moy http://blog.adacore.com/many-more-low-hanging-bugs

In a previous post, we reported our initial experiments to create lightweight checkers for Ada source code, based on the new Libadalang technology. The two checkers we described discovered 12 issues in the codebase of the tools we develop at AdaCore. In this post, we are reporting on 6 more lightweight checkers, which have discovered 114 new issues in our codebase. These 6 checkers allowed us to detect errors and code quality issues for 4 of them (check_deref_null, check_test_not_null, check_same_test, check_bad_unequal) and refactoring opportunities for 2 of them (check_same_then_else, check_useless_assign). Every checker runs in seconds on our codebase, which made it easy to improve them until the checkers had no false alarms. Currently, none of these checkers uses the recent semantic analysis capability in Libadalang, which might be useful in the future to improve their precision. In each of these checkers, we took inspiration from similar lightweight checkers in other static analysis tools, in particular PVS-Studio and its gallery of real-life examples.

Checkers on Dereference

Our first checker is a favorite of many tools. It checks whether a pointer that has been derefenced, is later tested against the null value. This is suspicious, as we'd expect the sequence of events to be opposite. This can point either to an error (the pointer should not be dereferenced without a null check) or a code quality issue (the null test is useless). In fact, we found both when applying the checker on the codebase of our tools. Here's an example of error found in the GNAT compiler, in g-spipat.adb, where procedure Dump dereferences P.P at line 2088:

   procedure Dump (P : Pattern) is

      subtype Count is Ada.Text_IO.Count;
      Scol : Count;
      --  Used to keep track of column in dump output

      Refs : Ref_Array (1 .. P.P.Index);
      --  We build a reference array whose N'th element points to the
      --  pattern element whose Index value is N.

and then much later at line 2156 it checks for P.P being null:

      --  If uninitialized pattern, dump line and we are done

      if P.P = null then
         Put_Line ("Uninitialized pattern value");
         return;
      end if;

The code was fixed to declare array Refs after we know P.P is not null. And here is an example of code quality issue also in the GNAT compiler, at line 2797 of g-comlin.adb, where parameter Line is dereferenced and then tested against null:

      Sections_List : Argument_List_Access :=
                        new Argument_List'(1 .. 1 => null);
      Found         : Boolean;
      Old_Line      : constant Argument_List := Line.all;
      Old_Sections  : constant Argument_List := Sections.all;
      Old_Params    : constant Argument_List := Params.all;
      Index         : Natural;

   begin
      if Line = null then
         return;
      end if;

The code was fixed by declaring Line as a parameter of a not null access type. In some cases the dereference and the test are both in the same expression, for example in this case in our tool GNATstack, at line 97 of dispatching_calls.adb:

         if
           Static_Class.Derived.Contains (Explored_Method.Vtable_Entry.Class)
              and then
           Explored_Method /= null
              and then

The code was fixed here by checking for non-null before dereferencing Explored_Method. Overall, this checker found 11 errors in our codebase and 9 code quality issues.

A second checker in this category looks for places where a test that a pointer is null dominates a dereference of the same pointer. This is, in general, an indication of a logic error, in particular for complex boolean expressions, as shown by the examples from PVS-Studio gallery. We found no such error in our codebase, which may be an indication of our good test coverage. Indeed, any execution of such code will raise an exception in Ada.

Checkers on Tested Expressions

Our first checker on tested expressions looks for identical subexpressions being tested in a chain of if-elsif statements. It points to either errors or code quality issues. Here is an example of error it found in the GNAT compiler, at line 7380 of sem_ch4.adb:

                  if Nkind (Right_Opnd (N)) = N_Integer_Literal then
                     Remove_Address_Interpretations (Second_Op);

                  elsif Nkind (Right_Opnd (N)) = N_Integer_Literal then
                     Remove_Address_Interpretations (First_Op);
                  end if;

The code was fixed by testing Left_Opnd(N) instead of Right_Opnd(N) in the second test. Overall, this checker found 3 errors in our codebase and 7 code quality issues.

A second checker in this category looks for expressions of the form "A /= B or A /= C" where B and C are different literals, which are always True. In general "and" is meant instead of "or". This checker found one error in our QGen code generator, at line 675 of himoco-blockdiagramcmg.adb:

               if Code_Gen_Mode /= "Function"
                 or else Code_Gen_Mode /= "Reusable function"
               then
                  To_Flatten.Append (Obj);
               end if;

Checkers for Code Duplication

Our first checker for code duplication looks for identical code in different branches of an if-statement or case-statement. It may point to typos or logical errors, but in our codebase it pointed only to refactoring opportunities. Still, some of these cause code duplication of more than 20 lines of code, for example at line 1023 of be-checks.adb in CodePeer:

            elsif VN_Kind (VN) = Binexpr_VN
              and then Operator (VN) = Logical_And_Op
              and then Int_Sets.Is_In (Big_True, To_Int_Set_Part (Expect))
            then
               --  Recurse to propagate check down to operands of "and"
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (First_Operand (VN)),
                  Srcpos,
                  File_Name,
                  First_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (Second_Operand (VN)),
                  Srcpos,
                  File_Name,
                  Second_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);
...
            elsif VN_Kind (VN) = Binexpr_VN
              and then Operator (VN) = Logical_Or_Op
              and then Int_Sets.Is_In (Big_False, To_Int_Set_Part (Expect))
            then
               --  Recurse to propagate check down to operands of "and"
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (First_Operand (VN)),
                  Srcpos,
                  File_Name,
                  First_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (Second_Operand (VN)),
                  Srcpos,
                  File_Name,
                  Second_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);

or at line 545 of soap-generator-skel.adb in GPRbuild:

                  when WSDL.Types.K_Derived =>

                     if Output.Next = null then
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    => "Result",
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     else
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    =>
                                "Result."
                                  & Format_Name (O, To_String (N.Name)),
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     end if;

                  when WSDL.Types.K_Enumeration =>

                     if Output.Next = null then
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    => "Result",
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     else
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    =>
                                "Result."
                                  & Format_Name (O, To_String (N.Name)),
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     end if;

Overall, this checker found 62 code quality issues in our codebase.

Our last checker looks for useless assignment to a local variable, where the value is never read subsequently. This can be very obvious, such as this case at line 1067 of be-value_numbers-factory.adb in CodePeer:

      Global_Obj.Obj_Id_Number    := Obj_Id_Number (New_Obj_Id);
      Global_Obj.Obj_Id_Number    := Obj_Id_Number (New_Obj_Id);

or more hidden, such as this case at line 895 of bt-xml-reader.adb, still in CodePeer:

                              if Next_Info.Sloc.Column = Msg_Loc.Column then
                                 Info := Next_Info;
                                 Elem := Next_Cursor;
                              end if;
                              Elem := Next_Cursor;

Overall, this checker found 9 code quality issues in our codebase.

Setup Recipe 

So you actually want to try the above scripts on your own codebase? This is possible right now with your latest GNAT Pro release or the latest GPL release for community & academic users! Just follow the instructions we described in the Libadalang repository and you will be able to run the scripts inside your favorite Python2 interpreter.

Conclusion

Summing over the 8 checkers that we implemented so far (referenced in this post and a previous one), we've found and fixed 24 errors and 102 code quality issues in our codebase. This is definitely showing that these kind of checkers are worth integrating in static analysis tools and we look forward to integrating these and more in our static analyzer CodePeer for Ada programs.

Another lesson is that all of these checkers were developed in a couple of hours each, thanks to the powerful Python API available with Libadalang. While we had to develop some boilerplate to traverse the AST in various directions and multiple workarounds to make for the absence of semantic analysis in the (then available) version of Libadalang, this was relatively little work, and work that we can expect to share across similar checkers in the future. We're now looking forward to the version of Libadalang with on-demand semantic analysis, which will allow us to create even more powerful and useful checkers.

[cover image by Courtney Lemon]

]]>
A Usable Copy-Paste Detector in A Few Lines of Python http://blog.adacore.com/a-usable-copy-paste-detector-in-few-lines-of-python Tue, 02 May 2017 13:00:00 +0000 Yannick Moy http://blog.adacore.com/a-usable-copy-paste-detector-in-few-lines-of-python

After we created lightweight checkers based on the recent Libadalang technology developed at AdaCore, a colleague gave us the challenge of creating a copy-paste detector based on Libadalang. It turned out to be both easier than anticipated, and much more efficient and effective than we could have hoped for. In the near future, we plan to use this new detector to refactor the codebase of some of our tools.

First Attempt: Hashes and Repeated Suffix Trees

Our naive strategy for detecting copy-paste was to reduce it to a string problem, in order to benefit from the existing efficient algorithms on string problems. Our reasoning was that each line of code could be represented by a hash code, so that a file could be represented by a string of hash codes. After a few Web searches, we found the perfect match for this translated problem, on the WikiPedia page for the longest repeated substring problem, which is helpfully pointing to a C implementation used to solve this problem efficiently based on Suffix Trees, a data structure to represent efficiently all suffixes of a string (say, "adacore", "dacore", "acore", "core", "ore", "re" and "e" if your string is "adacore").

So we came up with an implementation in Python of the copy-paste detector, made up of 3 steps:

Step 1: Transform the source code into a string of hash codes

This a simple traversal of the AST produced by Libadalang, producing roughly a hash for each logical line of code. Traversal is made very easy with the API offered by Libadalang, as each node of the AST is iterable in Python to get its children. For example, here is the default case of the encoding function producing the hash codes:

        # Default case, where we hash the kind of the first token for the node,
        # followed by encodings for its subnodes.
        else:
            return ([Code(hash(node.token_start.kind), node, f)] +
                    list(itertools.chain.from_iterable(
                        [enc(sub) for sub in node])))

We recurse here on the AST to concatenate the substrings of hash codes computed for subnodes. The leaf case is obtained for expressions and simple statements, for which we compute a hash of a string obtained from the list of tokens for the node. The API of Libadalang makes it very easy, using again the ability to iterate over a node to get its children. For example, here is the default case of the function computing the string from a node:

            return ' '.join([node.token_start.kind]
                            + [strcode(sub) for sub in node])

We recurse here on the AST to concatenate the kind of the first token for the node with the substrings computed for subnodes. Of course, we are not interested in exactly representing each line of code in this representation. For example, we represent all identifiers by a special wildcard character $, in order to detect copy-pastes even when identifiers are not the same.

Step 2: Construct the Suffix Tree for the string of hash codes

The algorithm by Ukkonen is quite subtle, but it was easy to translate an existing C implementation into Python. For those curious enough, a very instructive series of 6 blog posts leading to this implementation describes Ukkonen's algorithm in details.

Step 3: Compute the longest repeated substring in the string of hash codes

For that, we look at the internal node of the Suffix Tree constructed above with the greatest height (computed in number of hashes). Indeed, this internal node corresponds to two or more suffixes that share a common prefix. For example, with string "adacore", there is a single internal node, which corresponds to the common prefix "a" for suffixes "adacore" and "acore", after which the suffixes are different. The children of this internal node in the Suffix Tree contain the information of where the suffixes start in the string (position 0 for "adacore" and 2 for "acore"), so we can compute positions in the string of hash codes where hashes are identical and for how many hash codes. Then, we can translate this information into files, lines of code and number of lines.

The steps above allow to detect only the longest copy-paste across a codebase (in terms of number of hash codes, which may be different from number of lines of code). Initially, we did not find a better way to detect all copy-pastes longer than a certain limit than to repeat steps 2 and 3 after we remove from the string of hash codes those that correspond to the copy-paste previously detected. This algorithm ran in about one hour on the full codebase of GPS, consisting in 350 ksloc (as counted by sloccount), and it reported both very valuable copy-pastes of more than 100 lines of code, as well as spurious ones. To be clear, the spurious ones were not bugs in the implementation, but limitations of the algorithm that captured "copy-pastes" that were valid duplications of similar lines of code. Then we improved it.

Improvements: Finer-Grain Encoding and Collapsing

The imprecisions of our initial algorithm came mostly from two sources: it was sometimes ignoring too much of the source code, and sometimes too little. That was the case in particular for the abstraction of all identifiers as the wildcard character $, which led to spurious copy-pastes where the identifiers were semantically meaningful and could not be replaced by any other identifier. We fixed that by distinguishing local identifiers that are abstracted away from global identifiers (from other units) that are preserved, and by preserving all identifiers that could be the names of record components (that is, used in a dot notation like Obj.Component). Another example of too much abstraction was that we abstracted all literals by their kind, which again lead to spurious copy-pastes (think of large aggregates defining the value of constants). We fixed that by preserving the value of literals.

As an example of too little abstraction, we got copy-pastes that consisted mostly of sequences of small 5-to-10 lines subprograms, which could not be refactored usefully to share common code. We fixed that by collapsing sequences of such subprograms into a single hash code, so that their relative importance towards finding large copy-pastes was reduced. We made various other adjustments to the encoding function to modulate the importance of various syntactic constructs, simply by producing more or less hash codes for a given construct. An interesting adjustment consisted in ignoring the closing tokens in a construct (like the "end Proc;" at the end of a procedure) to avoid having copy-pastes that start on such meaningless starting points. It seems to be a typical default of token-based approaches, that our hash-based approach allows to solve easily, by simply not producing a hash for such tokens.

After these various improvements, the analysis of GPS codebase came down to 2 minutes, an impressive improvement from the initial one hour! The code for this version of the copy-paste detector can be found in the GitHub repository of Libadalang.

Optimizations: Suffix Arrays, Single Pass 

To improve on the above running time, we looked for alternative algorithms for performing the same task. And we found one! Suffix Arrays are an alternative to Suffix Trees, which is simpler to implement and from which we saw that we could generate all copy-pastes without regenerating the underlying data structure after finding a given copy-paste. We implemented in Python the algorithm in C++ found in this paper, and the code for this alternative implementation can be found in the GitHub repository of Libadalang. This version found the same copy-pastes as the previous one, as expected, with a running time of 1 minute for the analysis of GPS codebase, a 50% improvement!

Looking more closely at the bridges between Suffix Trees and Suffix Arrays (essentially you can reconstruct one from the other), we also realized that we could use the same one-pass algorithm to detect copy-pastes with Suffix Trees, instead of recreating each time the Suffix Tree for the text where the copy-paste just detected had been removed. The idea is that, instead of repeatedly detecting the longest copy-paste on a newly created Suffix Tree, we traverse the initial Suffix Tree and issue all copy-pastes with a maximal length, where copy-pastes that are not maximal can be easily recognized by checking the previous hash in the candidate suffixes. For example, if two suffixes for this copy-paste start at indexes 5 and 10 in the string of hashes, we check the hashes at indexes 4 and 9: if they are the same, then the copy-paste is not maximal and we do not report it. With this change, the running time for our original algorithm is just above 1 minute for the analysis of GPS codebase, i.e. close to the alternative implementation based on Suffix Arrays.

So we ended up with two implementations for our copy-paste detector, one based on Suffix Trees and one based on Suffix Arrays. We'll need to experiment further to decide which one to keep in a future plug-in for our GPS and GNATbench IDEs.

Results on GPS

The largest source base on which we tried this tool is our IDE GNAT Programming Studio (GPS). This is about 350'000 lines of source code. It uses object orientation, tends to have medium-sized subprograms (20 to 30 lines), although there are some much longer ones. In fact, we aim at reducing the size of the longest subprograms, and a tool like gnatmetric will help find them. We are happy to report that most of the code duplication occurred in recent code, as we are transitioning and rewriting some of the old modules.

Nonetheless, the tool helped detect a number of duplicate chunks, with very few spurious detections (corresponding to cases where the tool reports a copy-paste that turns out to be only similar code).

Let's take a look at three copy-pastes that were detected.

Example 1: Intended temporary duplication of code

gps/gvd/src/debugger-base_gdb-gdb_cli.adb:3267:1: copy-paste of 166 lines detected with code from line 3357 to line 3522 in file gps/gvd/src/debugger-base_gdb-gdb_mi.adb

This is a large subprogram used to handle the Memory view in GPS. We have recently started changing the code to use the gdb MI protocol to communicate with gdb, rather than simulate an interactive session. Since the intent is to remove the old code, the duplication is not so bad, but is useful in reminding us we need to clean things up here, preferably soon before the code diverges too much.

Example 2: Unintended almost duplication of code

gps/builder/core/src/commands-builder-scripts.adb:266:1: copy-paste of 21 lines detected with code from line 289 to line 309

This code is in the handling of the python functions GPS.File.compile() and GPS.File.make(). Interestingly enough, these two functions were not doing the same thing initially, and are also documented differently (make attempts to link the file after compiling it). Yet the code is almost exactly the same, except that GPS does not spawn the same build target (see comment in the code below). So we could definitely use an if-expression here to avoid the duplication of the code.

      elsif Command = "compile" then
         Info := Get_Data (Nth_Arg (Data, 1, Get_File_Class (Kernel)));
         Extra_Args := GNAT.OS_Lib.Argument_String_To_List
           (Nth_Arg (Data, 2, ""));

         Builder := Builder_Context
           (Kernel.Module (Builder_Context_Record'Tag));

         Launch_Target (Builder      => Builder,
                        Target_Name  => Compile_File_Target,    -- <<< use Build_File_Target here for "make"
                        Mode_Name    => "",
                        Force_File   => Info,
                        Extra_Args   => Extra_Args,
                        Quiet        => False,
                        Synchronous  => True,
                        Dialog       => Default,
                        Via_Menu     => False,
                        Background   => False,
                        Main_Project => No_Project,
                        Main         => No_File);

         Free (Extra_Args);

The tool could be slightly more helpful here by highlighting the exact differences between the two blocks. As the blocks get longer, it is harder to spot a change in one identifier (as is the case here). This is where an integration in our IDEs like GPS and GNATbench would be useful, and of course possibly with some support for automatic refactoring of the code, also based on Libadalang.

Example 3: Unintended exact duplication of code

gps/code_analysis/src/codepeer-race_details_models.adb:39:1: copy-paste of 20 lines detected with code from line 41 to line 60 in file gps/code_analysis/src/codepeer-race_summary_models.adb

This one is an exact duplication of a function. The tool could perhaps be slightly more helpful by showing those exact duplicates first, since they will often be the easiest ones to remove, simply by moving the function to the spec.

   function From_Iter (Iter : Gtk.Tree_Model.Gtk_Tree_Iter) return Natural is
      pragma Warnings (Off);
      function To_Integer is
        new Ada.Unchecked_Conversion (System.Address, Integer);
      pragma Warnings (On);

   begin
      if Iter = Gtk.Tree_Model.Null_Iter then
         return 0;

      else
         return To_Integer (Gtk.Tree_Model.Utils.Get_User_Data_1 (Iter));
      end if;
   end From_Iter;

Setup Recipe

So you actually want to try the above scripts on your own codebase? This is possible right now with your latest GNAT Pro release or the latest GPL release for community & academic users! Just follow the instructions we described in the Libadalang repository, you will then be able to run the scripts inside your favorite Python2 interpreter.

Conclusion

What we took from this experiment was that: (1) it is easier than you think to develop a copy-paste detector for your favorite language, and (2) technology like Libadalang is key to facilitate the necessary experiments that lead to an efficient and effective detector. On the algorithmic side, we think it's very beneficial to use a string of hash codes as intermediate representation, as this allows to precisely weigh in which language constructs contribute what.

Interestingly, we did not find other tools or articles describing this type of intermediate approach between token-based approaches and syntactic approaches, which provides an even faster analysis than token-based approaches, while avoiding their typical pitfalls, and allows fine-grained control based on the syntactic structure, without suffering from the long running time typical of syntactic approaches.

We look forward to integrating our copy-paste detector in GPS and GNATbench, obviously initially for Ada, but possibly for other languages as well (for example C and Python) as progress on langkit, the Libadalang's underlying technology, allows. The integration of Libadalang in GPS was completed not long ago, so it's easier than ever.

]]>
VerifyThis Challenge in SPARK http://blog.adacore.com/verifythis-challenge-in-spark Fri, 28 Apr 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/verifythis-challenge-in-spark

This year again, the VerifyThis competition took place as part of ETAPS conferences. This is the occasion for builders and users of formal program verification platforms to use their favorite tools on common challenges. The first challenge this year was a good fit for SPARK, as it revolves around proving properties of an imperative sorting procedure.

I am going to use this challenge to show how one can reach different levels of software assurance with SPARK. I'm referring here to the five levels of software assurance that we have used in our guidance document with Thales:

  • Stone level - valid SPARK
  • Bronze level - initialization and correct data flow
  • Silver level - absence of run-time errors (AoRTE)
  • Gold level - proof of key integrity properties
  • Platinum level - full functional proof of requirements

Stone level - valid SPARK

We start with a simple translation in Ada of the simplified variant of pair insertion sort given in page 2 of the challenge sheet:

package Pair_Insertion_Sort with
  SPARK_Mode
is
   subtype Index is Integer range 0 .. Integer'Last-1;
   type Arr is array (Index range <>) of Integer
     with Predicate => Arr'First = 0;

   procedure Sort (A : in out Arr);

end Pair_Insertion_Sort;

package body Pair_Insertion_Sort with
  SPARK_Mode
is
   procedure Sort (A : in out Arr) is
      I, J, X, Y, Z : Integer;
   begin
      I := 0;
      while I < A'Length-1 loop
         X := A(I);
         Y := A(I+1);
         if X < Y then
            Z := X;
            X := Y;
            Y := Z;
         end if;

         J := I - 1;
         while J >= 0 and then A(J) > X loop
            A(J+2) := A(J);
            J := J - 1;
         end loop;
         A(J+2) := X;

         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            J := J - 1;
         end loop;
         A(J+1) := Y;
         I := I+2;
      end loop;

      if I = A'Length-1 then
         Y := A(I);
         J := I - 1;
         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            J := J - 1;
         end loop;
         A(J+1) := Y;
      end if;
   end Sort;

end Pair_Insertion_Sort;

Stone level is reached immediately on this code, as it is in the SPARK subset of Ada.

Bronze level - initialization and correct data flow

Bronze level is also reached immediately, although the flow analysis in SPARK detected a problem the first time I ran it:

pair_insertion_sort.adb:35:39: medium: "Y" might not be initialized
pair_insertion_sort.adb:39:20: medium: "Y" might not be initialized

The problem was that the line initializing Y to A(I) near the end of the program was missing from my initial version. As I copied the algorithm in pseudo-code from the PDF and removed the comments in the badly formatted result, I also removed that line of code! After restoring that line, flow analysis did not complain anymore, I had reached Bronze level.

Silver level - absence of run-time errors (AoRTE)

Then comes Silver level, which was suggested as an initial goal in the challenge: proving absence of runtime errors. As the main reason for run-time errors in this code is the possibility of indexing outside of array bounds, we need to provide bounds for the values of variables I and J used to index A. As these accesses are performed inside loops, we need to do so in loop invariants. Exactly 5 loop invariants are needed here, and with these GNATprove can prove the absence of run-time errors in the code in 7 seconds on my machine:

package body Pair_Insertion_Sort with
  SPARK_Mode
is
   procedure Sort (A : in out Arr) is
      I, J, X, Y, Z : Integer;
   begin
      I := 0;
      while I < A'Length-1 loop
         X := A(I);
         Y := A(I+1);
         if X < Y then
            Z := X;
            X := Y;
            Y := Z;
         end if;

         J := I - 1;
         while J >= 0 and then A(J) > X loop
            A(J+2) := A(J);
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            J := J - 1;
         end loop;
         A(J+2) := X;

         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            J := J - 1;
         end loop;
         A(J+1) := Y;

         pragma Loop_Invariant (I in 0 .. A'Length-2);
         pragma Loop_Invariant (J in -1 .. A'Length-3);
         I := I+2;
      end loop;

      if I = A'Length-1 then
         Y := A(I);
         J := I - 1;
         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            pragma Loop_Invariant (J in 0 .. A'Length-2);
            J := J - 1;
         end loop;
         A(J+1) := Y;
      end if;
   end Sort;

end Pair_Insertion_Sort;

Gold level - proof of key integrity properties

Then comes Gold level, which is the first task of the verification challenge: proving that A is sorted on return. This can be expressed easily with a ghost function Sorted:

 function Sorted (A : Arr; I, J : Integer) return Boolean is
      (for all K in I .. J-1 => A(K) <= A(K+1))
   with Ghost,
        Pre => J > Integer'First
          and then (if I <= J then I in A'Range and J in A'Range);

   procedure Sort (A : in out Arr) with
     Post => Sorted (A, 0, A'Length-1);

Then, we need to augment the previous loop invariants with enough information to prove this sorting property. Let's look at the first (inner) loop. The invariant of this loop is that it maintains the array sorted up to the current high bound I+1, except for a hole of one index at J+1. That's easily expressed with the ghost function Sorted:

            pragma Loop_Invariant (Sorted (A, 0, J));
            pragma Loop_Invariant (Sorted (A, J+2, I+1));

For these loop invariants to be proved, we need to know that the next value possibly passed across the hole at the next iteration (from index J-1 to J+1) is lower than the value currently at J+2. Well, we know that the value at index J-1 is lower than the value at index J thanks to the property Sorted(A,0,J). All we need to add is that A(J+2)=A(J):

 pragma Loop_Invariant (A(J+2) = A(J));

This proves the loop invariant. Next, after the loop, value X is inserted at index J+2 in the array. For the sorting property to hold from J+2 upwards after the loop, X needs to be less than the value at J+2 in the loop invariant. As A(J+2) and A(J) are equal, we can write:

 pragma Loop_Invariant (A(J) > X);

All the other loops are similar. With these additional loop invariants, GNATprove can prove the sorting property in the code in 12 seconds on my machine:

package body Pair_Insertion_Sort with
  SPARK_Mode
is
   procedure Sort (A : in out Arr) is
      I, J, X, Y, Z : Integer;
   begin
      I := 0;
      while I < A'Length-1 loop
         X := A(I);
         Y := A(I+1);
         if X < Y then
            Z := X;
            X := Y;
            Y := Z;
         end if;

         J := I - 1;
         while J >= 0 and then A(J) > X loop
            A(J+2) := A(J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J));
            pragma Loop_Invariant (Sorted (A, J+2, I+1));
            pragma Loop_Invariant (A(J+2) = A(J));
            pragma Loop_Invariant (A(J) > X);
            J := J - 1;
         end loop;
         A(J+2) := X;

         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J));
            pragma Loop_Invariant (Sorted (A, J+1, I+1));
            pragma Loop_Invariant (A(J+1) = A(J));
            pragma Loop_Invariant (A(J) > Y);
            J := J - 1;
         end loop;
         A(J+1) := Y;

         --  loop invariant for absence of run-time errors
         pragma Loop_Invariant (I in 0 .. A'Length-2);
         --  loop invariant for sorting
         pragma Loop_Invariant (J in -1 .. A'Length-3);
         pragma Loop_Invariant (Sorted (A, 0, I+1));
         I := I+2;
      end loop;

      if I = A'Length-1 then
         Y := A(I);
         J := I - 1;
         while J >= 0 and then A(J) > Y loop
            A(J+1) := A(J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-2);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J));
            pragma Loop_Invariant (Sorted (A, J+1, A'Length-1));
            pragma Loop_Invariant (A(J+1) = A(J));
            pragma Loop_Invariant (A(J) > Y);
            J := J - 1;
         end loop;
         A(J+1) := Y;
      end if;
   end Sort;

end Pair_Insertion_Sort;

Platinum level - full functional proof of requirements

Then comes Platinum level, which is the second task of the verification challenge: proving that A on return is a permutation of its value on entry. For that, we are going to use the ghost function Is_Perm that my colleague Claire Dross presented in this blog post, that expresses that the number of occurrences of any integer in arrays A and B coincide (that is, they represent the same multisets, which is a way to express that they are permutations of one another):

function Is_Perm (A, B : Arr) return Boolean is
     (for all E in Integer => Occ (A, E) = Occ (B, E));

and the procedure Swap that she presented in the same blog post, for which I simply give the contract here:

  procedure Swap (Values : in out Arr;
                   X      : in     Index;
                   Y      : in     Index)
   with
     Pre  => X in Values'Range
       and then Y in Values'Range
       and then X /= Y,
     Post => Is_Perm (Values'Old, Values)
       and then Values (X) = Values'Old (Y)
       and then Values (Y) = Values'Old (X)
       and then (for all Z in Values'Range =>
                   (if Z /= X and Z /= Y then Values (Z) = Values'Old (Z)))

The only changes I made with respect to her initial version were to change in various places Natural for Integer, as arrays in our challenge store integers instead of natural numbers. In order to use the same strategy that she showed for selection sort on our pair insertion sort, we need to rewrite a bit the algorithm to swap array cells (as suggested in the challenge specification). For example, instead of the assignment in the first (inner) loop:

  A(J+2) := A(J);

we now have:

 Swap (A, J+2, J);

This has an effect on the loop invariants seen so far, which must be slightly modified. Then, we need to express that every loop maintains in A a permutation of the entry value for A. For that, we create a ghost constant B that stores the entry value of A:

B : constant Arr(A'Range) := A with Ghost;

and use this constant in loop invariants of the form:

pragma Loop_Invariant (Is_Perm (B, A));

We also need to express that the values X and Y that are pushed down the array are indeed the ones found around index J in all loops. For example in the first (inner) loop:

pragma Loop_Invariant ((A(J) = X and A(J+1) = Y) or (A(J) = Y and A(J+1) = X));

With these changes, GNATprove can prove the permutation property in the code in 27 seconds on my machine (including all the ghost code copied from Claire's blog post):

   procedure Sort (A : in out Arr) is
      I, J, X, Y, Z : Integer;
      B : constant Arr(A'Range) := A with Ghost;
   begin
      I := 0;
      while I < A'Length-1 loop
         X := A(I);
         Y := A(I+1);
         if X < Y then
            Z := X;
            X := Y;
            Y := Z;
         end if;

         J := I - 1;
         while J >= 0 and then A(J) > X loop
            Swap (A, J+2, J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J-1));
            pragma Loop_Invariant (Sorted (A, J+2, I+1));
            pragma Loop_Invariant (if J > 0 then A(J+2) >= A(J-1));
            pragma Loop_Invariant (A(J+2) > X);
            --  loop invariant for permutation
            pragma Loop_Invariant (Is_Perm (B, A));
            pragma Loop_Invariant ((A(J) = X and A(J+1) = Y) or (A(J) = Y and A(J+1) = X));
            J := J - 1;
         end loop;
         if A(J+2) /= X then
            Swap (A, J+2, J+1);
         end if;

         while J >= 0 and then A(J) > Y loop
            Swap (A, J+1, J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-3);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J-1));
            pragma Loop_Invariant (Sorted (A, J+1, I+1));
            pragma Loop_Invariant (if J > 0 then A(J+1) >= A(J-1));
            pragma Loop_Invariant (A(J+1) > Y);
            --  loop invariant for permutation
            pragma Loop_Invariant (Is_Perm (B, A));
            pragma Loop_Invariant (A(J) = Y);
            J := J - 1;
         end loop;

         --  loop invariant for absence of run-time errors
         pragma Loop_Invariant (I in 0 .. A'Length-2);
         --  loop invariant for sorting
         pragma Loop_Invariant (J in -1 .. A'Length-3);
         pragma Loop_Invariant (Sorted (A, 0, I+1));
         --  loop invariant for permutation
         pragma Loop_Invariant (Is_Perm (B, A));
         I := I+2;
      end loop;

      if I = A'Length-1 then
         Y := A(I);
         J := I - 1;
         while J >= 0 and then A(J) > Y loop
            Swap (A, J+1, J);
            --  loop invariant for absence of run-time errors
            pragma Loop_Invariant (J in 0 .. A'Length-2);
            --  loop invariant for sorting
            pragma Loop_Invariant (Sorted (A, 0, J-1));
            pragma Loop_Invariant (Sorted (A, J+1, A'Length-1));
            pragma Loop_Invariant (if J > 0 then A(J+1) >= A(J-1));
            pragma Loop_Invariant (A(J+1) > Y);
            --  loop invariant for permutation
            pragma Loop_Invariant (Is_Perm (B, A));
            pragma Loop_Invariant (A(J) = Y);
            J := J - 1;
         end loop;
      end if;
   end Sort;

The complete code for this challenge can be found on GitHub.

Conclusions

Two features of SPARK were particularly useful to help debug unprovable properties. Counterexamples was the first, issuing messages such as the following:

pair_insertion_sort.adb:20:16: medium: array index check might fail (e.g. when A = (others => 1) and A'Last = 3 and J = 2)

Here it allowed me to realize that the upper bound I set for J was too high, so that J+2 was outside of A's bounds.

The second very useful feature was the ability to execute assertions and contracts, issuing messages such as the following:

raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : Loop_Invariant failed at pair_insertion_sort.adb:51

Here it allowed me to realize that a loop invariant which I was trying to prove was in fact incorrect!

As this challenge shows, the five levels of software assurance in SPARK do not require the same level of effort at all. Stone and Bronze levels are rather easy to achieve on small codebases (although they might require refactoring on large codebases), Silver level requires some modest effort, Gold level requires more expertise to drive automatic provers, and finally Platinum level requires a much larger effort than all the previous levels (including possibly some rewriting of the algorithm to make automatic proof possible like here).

Thanks to the organisers of this year's VerifyThis competition, and congrats to the winning teams, two of which used Why3!

]]>
GPS for bare-metal developers http://blog.adacore.com/gps-for-bare-metal-development Wed, 19 Apr 2017 12:57:03 +0000 Anthony Leonardo Gracio http://blog.adacore.com/gps-for-bare-metal-development

In my previous blog article, I exposed some techniques that helped me rewrite the Crazyflie’s firmware from C into Ada and SPARK 2014, in order to improve its safety.

I was still an intern at that time and, back in the day, the support for bare-metal development in GPS was a bit minimalistic: daily activities like flashing and debugging my own firmware on the Crazyflie were a bit painful to do without having to go outside of GPS.

This is not the case anymore. GPS now comes with a number of features regarding bare-metal development that make it very easy for newcomers (as I was) to develop their own software for a particular board.

Bare-metal Holy Grail: Build, Flash, Debug

Building your modified software in order to flash it or debug it on a board is a very common workflow in bare-metal development. GPS offers support for all these different steps, allowing you to perform them at once with one single click.

In particular, GPS now supports two different tools for connecting to your remote board in order to flash and/or debug it:

  • ST-Link utility tools (namely st-util and st-flash) for STM32-based boards

  • OpenOCD, a connection tool supporting various types of boards and probes, specifically the ones that use a JTAG interface

Once installed on your host, using these tools in order to flash or debug your project directly from GPS is very easy. As pictures are worth a thousand words, here is a little tutorial video showing how to set up your bare-metal project from GPS in order to build, flash and debug it on a board:



Monitoring the memory usage

When it comes to bare-metal development, flashing and debugging your project on the targeted board is already a pretty advanced step: it means that you have already been able to compile and, above all, to link your software correctly.

The linking phase can be a real pain due to the limited memory resources of these boards: writing software that does not fit in the board’s available memory is something that can happen pretty quickly as long as your project grows and grows.

To address these potential issues, a Memory Usage view has been introduced. By default, this view is automatically spawned each time you build your executable and displays a complete view of the static memory usage consumed by your software, even when the linking phase has failed. The Memory Usage view uses a map file generated by the GNU ld linker to report the memory usage consumed at three different levels:

  1. Memory regions, which correspond to the MEMORY blocks defined in the linker script used to link the executable
  2. Memory sections (e.g: .data, .bss etc.)
  3. Object files

Having a complete report of the memory usage consumed at each level makes it very convenient to identify which parts of your software are consuming too much memory for the available hardware ressources. This is the case in the example below, where we can see that the .bss section of our executable is just too big for the board's RAM, due to the huge uninitialized array declared in leds.ads.

Conclusion

Bare-metal development support has been improved greatly in GPS, making it easier for newcomers to build, flash and debug their software. Moreover, the Memory Usage view allows the bare-metal developers to clearly identify the cause of memory overflows. 

We don't want to stop our efforts regarding bare-metal development support so don't hesitate to submit ideas of improvements on our GitHub's repository!

]]>
User-friendly strings API http://blog.adacore.com/user-friendly-strings-api Mon, 10 Apr 2017 13:47:00 +0000 Emmanuel Briot http://blog.adacore.com/user-friendly-strings-api

User friendly strings API

In a previous post, we described the design of a new strings package, with improved performance compared to the standard Ada unbounded strings implementation. That post focused on various programming techniques used to make that package as fast as possible.

This post is a followup, which describes the design of the user API for the strings package - showing various solutions that can be used in Ada to make user-friendly data structures.

Tagged types

One of the features added in Ada 2005 is a prefix dot notation when calling primitive operations of tagged types. For instance, if you have the following declarations, you can call the subprograms Slice in one of two ways:

declare
   type XString is tagged private;
   procedure Slice (Self : in out XString; Low, High : Integer);   --  a primitive operation
   S : XString;
begin
   S.Slice (1, 2);      --  the two calls do the same thing, here using a prefix notation
   Slice (S, 1, 2);
end;

This is a very minor change. But in practice, people tend to use the prefix notation because it is more "natural" for people who have read or written code in other programming languages.

In fact, it is so popular that there is some demand to extend this prefix notation, in some future version of the language, to types other than tagged types. Using a tagged type has a cost, since it makes variables slightly bigger (they now have a hidden access field).

In our case, though, a XString was already a tagged type since it is also controlled, so there is no additional cost.

Indexing

The standard Ada strings are quite convenient to use (at least once you understand that you should use declare blocks since you need to know their size in advance). For instance, one can access characters with expressions like:

A : constant Character := S (2);      --  assuming 2 is a valid index
B : constant Character := S (S'First + 1);   --  better, of course
C : constant Character := S (S'Last);

The first line of the above example hides one of the difficulties for newcomers: strings can have any index range, so using just "2" is likely to be wrong here. Instead, the second line should be used. The GNATCOLL strings avoid this pitfall by always indexing from 1. As was explained in the first blog post, this is both needed for the code (so that internally we can reallocate strings as needed without changing the indexes manipulated by the user), and more intuitive for a lot of users.

The Ada unbounded string has a similar approach, and all strings are indexed starting at 1. But you can't use the same code as above, and instead you need to write the more cumbersome: 

S := To_Unbounded_String (...);
A : constant Character := Element (S, 2);     --  second character of the string, always
B : constant Character := Ada.Strings.Unbounded.Element (S, 2);   --  when not using use-clauses

GNATCOLL.Strings takes advantage of some new aspects introduced in Ada 2012 to provide custom indexing functions. So the spec looks like:

type XString is tagged private
    with Constant_Indexing  => Get;

function Get (Self : XString; Index : Positive) return Character;

S : XString := ...;
A : constant Character := S (2);    --   always the second character

After adding this simple Constant_Indexing aspect, we are now back to the simple syntax we were using for standard String, using parenthesis to point to a specific character. But here we also know that "2" is always the second character, so we do not need to use the 'First attribute.

Variable indexing

There is a similar aspect, named Variable_Indexing which can be used to modify a specific character of the string, as we would do with a standard string. So we can write:

type XString is tagged private
    with Variable_Indexing  => Reference;

type Character_Reference (Char : not null access Char_Type) is limited null record
    with Implicit_Dereference => Char;

function Reference
    (Self  : aliased in out XString;
     Index : Positive) return Character_Reference;

S : XString := ...;
S (2) := 'A';
S (2).Char := 'A';    --  same as above, but not taking advantage of Implicit_Derefence

Although this is simple to use for users of the library, the implementation is actually much more complex than for Constant_Indexing

First of all, we need to introduce a new type, called a Reference Type (in our example this is the Character_Reference), which basically is a type with an access discriminant and the Implicit_Dereference aspect. This type acts as a safer replacement for an access type (for instance it can't be copied quite as easily). Through this type, we have an access to a specific character, and therefore users can replace that character easily.

But the real difficulty is exactly because of this access. Imaging that we assign the string to another one. At that point, they share the internal buffer when using the copy-on-write mode described in the first post. But if we then modify a character we modify it in both strings, although from the user's point of view these are two different instances ! Here are three examples of code that would fail:


begin
    S2 := S;
    S (2) := 'A';
    --  Now both S2 and S have 'A' as their second character
end;

declare
    R : Character_Reference := S.Reference (2);
begin
    S2 := S;    --   now sharing the internal buffer
    R := 'A';   -- Both S2 and S have 'A' as their second character
end;

declare
    R : Character_Reference := S.Reference (2);
begin
    S.Append ("ABCDEF");
    R := 'A';   --  might point to deallocated memory
end;

Of course, we want our code to be safe, so the implementation needs to prevent the above errors. As soon we take a Reference to a character we must ensure the internal buffer is no longer shared. This fixes the first example above: Since the call to S (2) no longer shares the buffer, we are indeed only modifying S and not S2.

The second example looks similar, but in fact the sharing is done after we have taken the reference. So in fact when we take a Reference we also need to make the internal buffer unshareable. This is done by setting a special value for the refcounting, so that the assignment always make a copy of S's internal buffer, rather than share it.

There is a hidden cost to using Variable_Indexing, since we are no longer able to share the buffer as soon as a reference was taken. This is unfortunately unavoidable, and is one of those examples where convenience of use goes counter to performance...

The third example is much more complex. Here, we have a reference to a character in the string, but when we append data to the string we are likely to reallocate memory. The system could be allocating new memory, copying the data, and then freeing the old one. And our existing reference would point to the memory area that we have just freed !

I have not found a solution here (and C++ implementations seem to have similar limitations). We cannot simply prevent reallocation (i.e. changing the size of the string), since the user might simply have taken a reference, changed the character, and dropped the reference. At that point, reallocating would be safe. For now, we rely on documentation and on the fact that it is some extra work to preserve a reference as we did in the example, it is much more natural to save the index, and then take another reference when needed, as in:


declare
    Saved : constant Integer := 2;
begin
    S.Append ("ABCDEF");
    S (Saved) := 'A';     --  safe
end;

Iteration

Finally, we want to make it easy to traverse a string and read all its characters (and possibly modify them, of course). With the standard strings, one would do:

declare
    S : String (1 .. 10) := ...;
begin
    for Idx in S'Range loop
        S (Idx) := 'A';
    end loop;

    for C of S loop
       Put (C);
    end loop
end;

declare
    S : Unbounded_String := ....;
begin
    for Idx in 1 .. Length (S) loop
       Replace_Element (S, Idx, 'A');
       Put (Element (S, Idx));
    end loop;
end;

Obviously, the version with unbounded strings is a lot more verbose and less user-friendly. It is possible, since Ada 2012, to provide custom iteration for our own strings, via some predefined aspects. As the gem shows, this is a complex operation...

GNAT provides a much simpler aspect that fulfills all practical needs, namely the Iterable aspect. We use it as such:

type XString is tagged private
    with Iterable => (First => First,
                      Next => Next,
                      Has_Element => Has_Element,
                      Get => Get);

function First (Self : XString) return Positive is (1);
function Next (Self : XString; Index : Positive) return Positive is (Index + 1);
function Has_Element (Self : XString; Index : Positive) return Boolean is (Index <= Self.Length);
function Get (Self : Xstring; Idx : Positive) return Character;   --   same one we used for indexing

These functions are especially simple here because we know the range of indexes. So basically they are declared as expression functions and made inline. This means that the loop can be very fast in practice. For consistency with the way the "for...of" loop is used with standard Ada containers, we chose to have Get return the character itself, rather than its index.  And now we can use:

S : XString := ...;

for C of S loop
    Put (C);
end loop;

As explained, the for..of loop returns the character itself, which means we can't change the string directly. So we need to provide a second iterator that will return the indexes. We do this via a new small type that takes care of everything, as in:

 type Index_Range is record
    Low, High : Natural;
 end record
 with Iterable => (First       => First,
                   Next        => Next,
                   Has_Element => Has_Element,
                   Element     => Element);

 function First (Self : Index_Range) return Positive is (Self.Low);
 function Next (Self : Index_Range; Index : Positive) return Positive  is (Index + 1);
 function Has_Element (Self : Index_Range; Index : Positive)  return Boolean is (Index <= Self.High);
 function Element
    (Self : Index_Range; Index : Positive) return Positive
    is (Index);

function Iterate (Self : XString) return Index_Range
    is ((Low => 1, High => Self.Length));

S : XString := ...;

for Idx in S.Iterate loop
    S (Idx) := 'A';
end loop;

The type Index_Range is very similar to the previous use of the Iterable aspect. We just need to introduce one extra primitive operation for the string, which is used to initiate the iteration.

In fact, we could make Iterate and Index_Range more complex, so that for instance we can iterate on every other character, or on every 'A' in the string, or any kind of iteration scheme. This would of course require a more complex Index_Range, but opens up nice possibilities.

Slicing

With standard strings, one can get a slice by using notation like:

  declare
      S : String (1 .. 10) := ...;
   begin
      Put (S (2 .. 5));
      S (2 .. 5) := 'ABCD';
   end;

Unfortunately, there is no possible equivalent for GNATCOLL strings, because the range notation ("..") cannot be redefined. We have to use code like:

declare
    S : XString := ...;
begin
    Put (S.Slice (2, 5));
    S.Replace (2, 5, 'ABCD');
end;

It would be nice if Ada had a new aspect to let us map the ".." notation to a function, for instance something like:

type XString is tagged private
    with  Range_Indexing  => Slice,           --   NOT Ada 2012
          Variable_Range_Indexing => Replace;    --   NOT Ada 2012

function Slice (Self : XString; Low, High : Integer) return XString;
--  Low and High must be of same type, but that type could be anything
--  The return value could also be anything

procedure Replace (S : in out XString; Low, High : Integer; Replace_With : String);
--  Likewise, Low and High must be of same type, but this could be anything.
--  and Replace_With could be any type

Perhaps in some future version of the language?

Conclusion

Ada 2012 provides quite a number of new aspects that can be applied to types, and make user API more convenient to use. Some of them are complex to use, and GNAT sometimes provides a simpler version.

]]>
GNATprove Tips and Tricks: Proving the Ghost Common Divisor (GCD) http://blog.adacore.com/gnatprove-tips-and-tricks-proving-the-ghost-common-denominator-gcd Thu, 06 Apr 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/gnatprove-tips-and-tricks-proving-the-ghost-common-denominator-gcd

Euclid's algorithm for computing the greatest common divisor of two numbers is one of the first ones we learn in school (source: myself), and also one of the first algorithms that humans devised (source: WikiPedia). So it's quite appealing to try to prove it with an automatic proving toolset like SPARK. It turns out that proving it automatically is not so easy, just like understanding why it works is not so easy. To prove it, I am going to use a fair amount of ghost code, to convey the necessary information to GNATprove, the SPARK proof tool.

Let's start with the specification of what the algorithm should do: computing the greatest common divisor of two positive numbers. The algorithm can be described for natural numbers or even negative numbers, but we'll keep things simple by using positive inputs. Given that Euclid himself only considered positive numbers, that should be enough for us here. A common divisor C of two integers A and B should divide both of them, which we usually express efficiently in software as (A mod C = 0) and (B mod C = 0). This gives the following contract for GCD:

 function Divides (A, B : in Positive) return Boolean is (B mod A = 0) with Ghost;

   function GCD (A, B : in Positive) return Positive with
     Post => Divides (GCD'Result, A)
       and then Divides (GCD'Result, B);

Note that we're already using some ghost code here: the function Divides is marked as Ghost because it is only used in specifications. The problem with the contract above is that it is satisfied by a very simple implementation that does not actually compute the GCD:

 function GCD (A, B : in Positive) return Positive is
   begin
      return 1;
   end GCD;

And GNATprove manages to prove that easily. So we need to specify the GCD function more precisely, by stating that it returns not only a common divisor of A and B, but the greatest of them:

 function GCD (A, B : in Positive) return Positive with
     Post => Divides (GCD'Result, A)
       and then Divides (GCD'Result, B)
       and then (for all X in GCD'Result + 1 .. Integer'Min (A, B) =>
                   not (Divides (X, A) and Divides (X, B)));

Here, I am relying on the property that the GCD is lower than both A and B, to only look for a greater divisor between the value returned (GCD'Result) and the minimum of A and B (Integer'Min (A, B)). That contract uniquely defines the GCD, and indeed GNATprove cannot prove that the previous incorrect implementation of GCD implements this contract.

Let's implement now the GCD, starting with a very simple linear search that looks for the greatest common divisor of A and B:

  function GCD (A, B : in Positive) return Positive is
      C : Positive := Integer'Min (A, B);
   begin
      while C > 1 loop
         exit when A mod C = 0 and B mod C = 0;
         pragma Loop_Invariant (for all X in C .. Integer'Min (A, B) =>
                                  not (Divides (X, A) and Divides (X, B)));
         C := C - 1;
      end loop;

      return C;
   end GCD;

We start at the minimum of A and B, and check if this value divides A and B. If so, we're done and we exit the loop. Otherwise we decrease this value by 1 and we continue. This is not efficient, but this computes the GCD as desired. In order to prove that implementation, we need a simple loop invariant that states that no common divisor of A and B has been encountered in the loop so far. GNATprove proves easily that this implementation implements the contract of GCD (with --level=2 -j0 in less than a second).

Can we increase the efficiency of the algorithm? If we had to do this computation by hand, we would try (Integer'Min (A, B) / 2) rather than (Integer'Min (A, B) - 1) as second possible value for the GCD, because we know that no value in between has a chance of dividing both A and B. Here is an implementation of that idea:

function GCD (A, B : in Positive) return Positive is
      C : Positive := Integer'Min (A, B);
   begin
      if A mod C = 0 and B mod C = 0 then
         return C;

      else
         C := C / 2;

         while C > 1 loop
            exit when A mod C = 0 and B mod C = 0;
            pragma Loop_Invariant (for all X in C .. Integer'Min (A, B) =>
                                     not (Divides (X, A) and Divides (X, B)));
            C := C - 1;
         end loop;

         return C;
      end if;
   end GCD;

I kept the same loop invariant as before, because we need that loop invariant to prove the postcondition of GCD. But GNATprove does not manage to prove that this invariant holds the first time that it enters the loop:

math_simple_half.ads:9:20: medium: postcondition might fail, cannot prove not (Divides (X, A) and Divides (X, B)

Indeed, we know that the GCD cannot lie between (Integer'Min (A, B) / 2) and Integer'Min (A, B) in that case, but provers don't. We have to provide the necessary information for GNATprove to deduce this fact. This is where we are going to use ghost code more extensively. As we need to prove a quantified property, we are going to establish it progressively, though a loop, accumulating the knowledge gathered in a loop invariant:

      ...
         C := C / 2;
         for J in C + 1 .. Integer'Min (A, B) - 1 loop
            pragma Loop_Invariant (for all X in C + 1 .. J =>
                                     not Divides (X, Integer'Min (A, B)));
         end loop;

         while C > 1 loop
         ...

With the help of this ghost code, the loop invariant in our original loop is now fully proved! But the loop invariant of the ghost loop itself is not proved, for iterations beyond the initial one:

math_simple_half.adb:24:38: medium: loop invariant might fail after first iteration, cannot prove not Divides (X, Integer'min)

The property we need here is that, given a value J between (Integer'Min (A, B) / 2 + 1) and (Integer'Min (A, B) - 1), J cannot divide Integer'Min (A, B). We are going to establish this property in a separate ghost lemma, that we'll call inside the loop to get this property. The lemma is an abstraction of the property just stated:

  procedure Lemma_Not_Divisor (Arg1, Arg2 : Positive) with
     Ghost,
     Global => null,
     Pre  => Arg1 in Arg2 / 2 + 1 .. Arg2 - 1,
     Post => not Divides (Arg1, Arg2)
   is
   begin
      null;
   end Lemma_Not_Divisor;

Even better, by isolating the desired property from its context of use, GNATprove can now prove it! Here, it is the underlying prover Z3 which manages to prove the postcondition of Lemma_Not_Divisor. It remains to call this lemma inside the loop:

         ...
         C := C / 2;
         for J in C + 1 .. Integer'Min (A, B) - 1 loop
            Lemma_Not_Divisor (J, Integer'Min (A, B));
            pragma Loop_Invariant (for all X in C + 1 .. J =>
                                     not Divides (X, Integer'Min (A, B)));
         end loop;

         while C > 1 loop
         ...

We're almost done. The only check that GNATprove does not manage to prove is the range check on the decrement to C in our original loop:

math_simple_half.adb:32:20: medium: range check might fail

It looks trivial though. The loop test states that (C > 1), so it should be easy to prove that C can be decremented and remain positive, right? Well, it helps here to understand how proof is carried on loops, by creating an artificial loop starting from the loop invariant in an iteration and ending in the same loop invariant at the next iteration. Because the loop invariant is not the first statement of the loop here, the loop test (C > 1) is not automatically added to the loop invariant (which would be incorrect in general, if the first statement modified C), so provers do not see this hypothesis when trying to prove the range check later. All that is needed here is to explicitly carry the loop test inside the loop invariant. Here is the final implementation of the more efficient GCD, which is proved easily by GNATprove (with --level=2 -j0 in less than two seconds):

 procedure Lemma_Not_Divisor (Arg1, Arg2 : Positive) with
     Ghost,
     Global => null,
     Pre  => Arg1 in Arg2 / 2 + 1 .. Arg2 - 1,
     Post => not Divides (Arg1, Arg2)
   is
   begin
      null;
   end Lemma_Not_Divisor;

   function GCD (A, B : in Positive) return Positive is
      C : Positive := Integer'Min (A, B);
   begin
      if A mod C = 0 and B mod C = 0 then
         return C;

      else
         C := C / 2;
         for J in C + 1 .. Integer'Min (A, B) - 1 loop
            Lemma_Not_Divisor (J, Integer'Min (A, B));
            pragma Loop_Invariant (for all X in C + 1 .. J =>
                                     not Divides (X, Integer'Min (A, B)));
         end loop;

         while C > 1 loop
            exit when A mod C = 0 and B mod C = 0;
            pragma Loop_Invariant (C > 1);
            pragma Loop_Invariant (for all X in C .. Integer'Min (A, B) =>
                                     not (Divides (X, A) and Divides (X, B)));
            C := C - 1;
         end loop;

         return C;
      end if;
   end GCD;

It's time to try to prove Euclid's variant of the algorithm, still with the same contract on GCD:

 function GCD (A, B : in Positive) return Positive is
      An : Positive := A;
      Bn : Natural := B;
      C  : Positive;
   begin
      while Bn /= 0 loop
         C  := An;
         An := Bn;
         Bn := C mod Bn;
      end loop;

      return An;
   end GCD;

The key insight here to justify the large jumps when decreasing the value of C is that, at each iteration of the loop, any common divisor between A and B is still a common divisor of An and Bn. And conversely. So the greatest common divisor of A and B turns out to be also the greatest common divisor of An and Bn. Which is just An when Bn turns out to be zero. Here is the loop invariant that expresses that property:

         pragma Loop_Invariant
           (for all X in Positive =>
              (Divides (X, A) and Divides (X, B))
                =
              (Divides (X, An) and (Bn = 0 or else Divides (X, Bn))));

In order to prove this loop invariant, we need to state a ghost lemma like before, that helps prove the property needed at a particular iteration of the loop:

 procedure Lemma_Same_Divisor_Mod (A, B : Positive) with
     Ghost,
     Global => null,
     Post => (for all X in Positive =>
                (Divides (X, A) and Divides (X, B))
                  =
                (Divides (X, B) and then (Divides (B, A) or else Divides (X, A mod B))))

Because this lemma is more complex, it is not proved automatically like the previous one. Here, we need to implement it with ghost code, extracting three further lemmas that help proving this one. Here is the complete code for this implementation:

procedure Lemma_Divisor_Mod (A, B, X : Positive) with
     Ghost,
     Global => null,
     Pre  => Divides (X, A) and then Divides (X, B) and then not Divides (B, A),
     Post => Divides (X, A mod B)
   is
   begin
      null;
   end Lemma_Divisor_Mod;

   procedure Lemma_Divisor_Transitive (A, B, X : Positive) with
     Ghost,
     Global => null,
     Pre  => Divides (B, A) and then Divides (X, B),
     Post => Divides (X, A)
   is
   begin
      null;
   end Lemma_Divisor_Transitive;

   procedure Lemma_Divisor_Mod_Inverse (A, B, X : Positive) with
     Ghost,
     Global => null,
     Pre  => not Divides (B, A) and then Divides (X, A mod B) and then Divides (X, B),
     Post => Divides (X, A)
   is
   begin
      null;
   end Lemma_Divisor_Mod_Inverse;

   procedure Lemma_Same_Divisor_Mod (A, B : Positive) with
     Ghost,
     Global => null,
     Post => (for all X in Positive =>
                (Divides (X, A) and Divides (X, B))
                  =
                (Divides (X, B) and then (Divides (B, A) or else Divides (X, A mod B))))
   is
   begin
      for X in Positive loop
         if Divides (X, A) and then Divides (X, B) and then not Divides (B, A) then
            Lemma_Divisor_Mod (A, B, X);
         end if;
         if Divides (B, A) and then Divides (X, B) then
            Lemma_Divisor_Transitive (A, B, X);
         end if;
         if not Divides (B, A) and then Divides (X, A mod B) and then Divides (X, B) then
            Lemma_Divisor_Mod_Inverse (A, B, X);
         end if;
         pragma Loop_Invariant
           (for all Y in 1 .. X =>
              (Divides (Y, A) and Divides (Y, B))
                =
              (Divides (Y, B) and then (Divides (B, A) or else Divides (Y, A mod B))));
      end loop;
   end Lemma_Same_Divisor_Mod;

   function GCD (A, B : in Positive) return Positive is
      An : Positive := A;
      Bn : Natural := B;
      C  : Positive;
   begin
      while Bn /= 0 loop
         C  := An;
         An := Bn;
         Bn := C mod Bn;
         Lemma_Same_Divisor_Mod (C, An);
         pragma Loop_Invariant
           (for all X in Positive =>
              (Divides (X, A) and Divides (X, B))
                =
              (Divides (X, An) and (Bn = 0 or else Divides (X, Bn))));
      end loop;

      pragma Assert (Divides (An, An));
      pragma Assert (Divides (An, A));
      pragma Assert (Divides (An, B));

      pragma Assert (for all X in An + 1 .. Integer'Min (A, B) =>
                       not (Divides (X, A) and Divides (X, B)));
      return An;
   end GCD;

The implementation of GCD as well as the main lemma Lemma_Same_Divisor_Mod are proved easily by GNATprove (with --level=2 -j0 in less than ten seconds). The three elementary lemmas Lemma_Divisor_Mod, Lemma_Divisor_Transitive and Lemma_Divisor_Mod_Inverse are not proved automatically. Currently I only reviewed them and convinced myself that they were true. A better solution in the future will be to prove them in Coq, which should not be difficult given that there are similar existing lemmas in Coq, and include them in the SPARK lemma library.

All the code for these versions of GCD can be found in the GitHub repository of SPARK 2014, including another version of GCD where I changed the definition of Divides to reflect more accurately the mathematical notion of divisibility:

   function Divides (A, B : in Positive) return Boolean is
     (for some C in Positive => A * C = B)
   with Ghost;

This is also proved automatically by GNATprove, and this proof also requires the introduction of suitable lemmas that relate the mathematical notion of divisibility to its programming counterpart (B mod A = 0).

]]>
New strings package in GNATCOLL http://blog.adacore.com/new-strings-package-in-gnatcoll Tue, 04 Apr 2017 13:30:00 +0000 Emmanuel Briot http://blog.adacore.com/new-strings-package-in-gnatcoll

New strings package in GNATCOLL

GNATCOLL has recently acquired two new packages, namely GNATCOLL.Strings and GNATCOLL.Strings_Impl. The latter is a generic package, one instance of which is provided as GNATCOLL.Strings.

But why a new strings package? Ada already has quite a lot of ways to represent and manipulate strings, is a new one needed?

This new package is an attempt at finding a middle ground between the standard String type (which is efficient but inflexible) and unbounded strings (which are flexible, but could be more efficient).

GNATCOLL.Strings therefore provides strings (named XString, as in extended-strings) that can grow as needed (up to Natural'Last, like standard strings), yet are faster than unbounded strings. They also come with an extended API, which includes all primitive operations from unbounded strings, in addition to some subprograms inspired from GNATCOLL.Utils and the Python and C++ programming languages.

Small string optimization

GNATCOLL.Strings uses a number of tricks to improve on the efficiency.  The most important one is to limit the number of memory allocations.  For this, we use a trick similar to what all C++ implementations do nowadays, namely the small string optimization.

The idea is that when a string is short, we can avoid all memory allocations altogether, while still keeping the string type itself small. We therefore use an Unchecked_Union, where a string can be viewed in two ways:

    Small string

      [f][s][ characters of the string 23 bytes               ]
         f  = 1 bit for a flag, set to 0 for a small string
         s  = 7 bits for the size of the string (i.e. number of significant
              characters in the array)

    Big string

      [f][c      ][size     ][data      ][first     ][pad    ]
         f = 1 bit for a flag, set to 1 for a big string
         c = 31 bits for half the capacity. This is the size of the buffer
             pointed to by data, and which contains the actual characters of
             the string.
         size = 32 bits for the size of the string, i.e. the number of
             significant characters in the buffer.
         data = a pointer (32 or 64 bits depending on architecture)
         first = 32 bits, see the handling of substrings below
         pad = 32 bits on a 64 bits system, 0 otherwise.
             This is because of alignment issues.

So in the same amount of memory (24 bytes), we can either store a small string of 23 characters or less with no memory allocations, or a big string that requires allocation. In a typical application, most strings are smaller than 23 bytes, so we are saving very significant time here.

This representation has to work on both 32 bits systems and 64 bits systems, so we have careful representation clauses to take this into account.  It also needs to work on both big-endian and little-endian systems. Thanks to Ada's representation clauses, this one in fact relatively easy to achieve (well, okay, after trying a few different approaches to emulate what's done in C++, and that did not work elegantly). In fact, emulating via bit-shift operations ended up with code that was less efficient than letting the compiler do it automatically because of our representation clauses.

Character types

Applications should be able to handle the whole set of unicode characters. In Ada, these are represented as the Wide_Character type, rather than Character, and stored on 2 bytes rather than 1. Of course, for a lot of applications it would be wasting memory to always store 2 bytes per character, so we want to give flexibility to users here.

So the package GNATCOLL.Strings_Impl is a generic. It has several formal parameters, among which:

   * Character_Type is the type used to represent each character. Typically,  it will be Character, Wide_Character, or even possibly Wide_Wide_Character. It could really be any scalar type, so for instance we could use this package to represent DNA with its 4-valued nucleobases.

   * Character_String is an array of these characters, as would be represented in Ada. It will typically be a String or a Wide_String. This type is used to make this package work with the rest of the Ada world.

Note about unicode: we could also always use a Character, and use UTF-8 encoding internally. But this makes all operations (from taking the length to moving the next character) slower, and more fragile. We must make sure not to cut a string in the middle of a multi-byte sequence. Instead, we manipulate a string of code points (in terms of unicode). A similar choice is made in Ada (String vs Wide_String), Python and C++.

Configuring the size of small strings

The above is what is done for most C++ implementations nowadays.  The maximum 23 characters we mentioned for a small string depends in fact on several criteria, which impact the actual maximum size of a small string:

   * on 32 bits system, the size of the big string is 16 bytes, so the maximum size of a small string is 15 bytes.
   * on 64 bits system, the size of the big string is 24 bytes, so the maximum size of a small string is 23 bytes.
   * If using a Character as the character type, the above are the actual number of characters in the string. But if you are using a Wide_Character, this is double the maximum length of the string, so a small string is either 7 characters or 11 characters long.

This is often a reasonable number, and given that applications mostly use small strings, we are already saving a lot of allocations. However, in some cases we know that the typical length of strings in a particular context is different. For instance, GNATCOLL.Traces builds messages to output in the log file. Such messages will typically be at most 100 characters, although they can of course be much larger sometimes.

We have added one more formal parameter to GNATCOLL.Strings_Impl to control the maximum size of small strings. If for instance we decide that a "small" string is anywhere from 1 to 100 characters long (i.e. we do not want to allocate memory for those strings), it can be done via this parameter. Of course, in such cases the size of the string itself becomes much larger.

In this example it would be 101 bytes long, rather than the 24 bytes.  Although we are saving on memory allocations, we are also spending more time copying data when the string is passed around, so you'll need to measure the performance here.

The maximum size for the small string is 127 bytes however, because this size and the 1-bit flag need to fit in 1 bytes in the representation clauses we showed above. We tried to make this more configurable, but this makes things significantly more complex between little-endian and big-endian systems, and having large "small" strings would not make much sense in terms of performance anyway.

Typical C++ implementations do not make this small size configurable.

Task safety

Just like unbounded strings, the strings in this package are not thread safe. This means that you cannot access the same string (read or write) from two different threads without somehow protecting the access via a protected type, locks,...

In practice, sharing strings would rarely be done, so if the package itself was doing its own locking we would end up with very bad performance in all cases, for a few cases where it might prove useful.

As we'll discuss below, it is possible to use two different strings that actually share the same internal buffer, from two different threads. Since this is an implementation detail, this package takes care of guaranteeing the integrity of the shared data in such a case.

Copy on write

There is one more formal parameter, to configure whether this package should use copy-on-write or not. When copy on write is enabled, you can have multiple strings that internally share the same buffer of characters. This means that assigning a string to another one becomes a reasonably fast operation (copy a pointer and increment a refcount). Whenever the string is modified, a copy of the buffer is done so that other copies of the same string are not impacted.

But in fact, there is one drawback with this scheme: we need reference counting to know when we can free the shared data, or when we need to make a copy of it. This reference counting must be thread safe, since users might be using two different strings from two different threads, but they share data internally.

Thus the reference counting is done via atomic operations, which have some impact on performance. Since multiple threads try to access the same memory addresses, this is also a source of contention in multi-threaded applications.

For this reason, the current C++ standard prevents the use of copy-on-write for strings.

In our cases, we chose to make this configurable in the generic, so that users can decide whether to pay the cost of the atomic operations, but save on the number of memory allocations and copy of the characters.  Sometimes it is better to share the data, but sometimes it is better to systematically copy it. Again, actual measurements of the performance are needed for your specific application.

Growth strategy

When the current size of the string becomes bigger than the available allocated memory (for instance because you are appending characters), this package needs to reallocate memory. There are plenty of strategies here, from allocating only the exact amount of memory needed (which saves on memory usage, but is very bad in terms of performance), to doubling the current size of the string until we have enough space, as currently done in the GNAT unbounded strings implementation.

The latter approach would therefore allocate space for two characters, then for 4, then 8 and so on.

This package has a slightly different strategy. Remember that we only start allocating memory past the size of small strings, so we will for instance first allocate 24 bytes. When more memory is needed, we multiply this size by 1.5, which some researchers have found to be a good comprise between waste of memory and number of allocations. For very large strings, we always allocate multiples of the memory page size (4096 bytes), since this is what the system will make available anyway. So we will basically allocate the following: 24, 36, 54, 82, 122,...

An additional constraint is that we only ever allocate even number of bytes. This is called the capacity of the string. In the layout of the big string, as shown above, we store half that capacity, which saves one bit that we use for the flag.

Growing memory

This package does not use the Ada new operator. Instead, we use functions from System.Memory directly, in part so that we can use the realloc system call. This is much more efficient when we need to grow the internal buffer, since in most cases we won't have to copy the characters at all. Saving on those additional copies has a significant impact, as we'll see in the performance measurements below.

Substrings

One other optimization performed by this package (which is not done for unbounded strings or various C++ implementations) is to optimize substrings when also using copy-on-write.

We simply store the index within the shared buffer of the first character of the string , instead of always starting at one.

From the user's point of view, this is an implementation detail. Strings are always indexed from 1, and internally we convert to an actual position in the buffer. This means that if we need to reallocate the buffer, for instance when the string is modified, we transparently change the index of the first character, but the indexes the user was using are still valid.

This results in very significant savings, as shown below in the timings for Trim for instance. Also, we can do an operation like splitting a string very efficiently.

For instance, the following code doesn't allocate any memory, beside setting the initial value of the string. It parses a file containing some "key=value" lines, with optional spaces, and possibly empty lines:



     declare
        S, Key, Value : XString;
        L             : XString_Array (1 .. 2);
        Last          : Natural;
     begin
        S.Set (".......");

        --  Get each line
        for Line in S.Split (ASCII.LF) loop

           --  Split into at most two substrings
           Line.Split ('=', Into => L, Last => Last);

           if Last = 2 then
              Key := L (1);
              Key.Trim;    --  Removing leading and trailing spaces

              Value := L (2);
              Value.Trim;

           end if;
        end loop;
     end;

Conclusion

We use various tricks to improve the performance over unbounded strings.  From not allocating any memory when a string is 24 bytes (or a configurable limit) or less, to growing the string as needed via a careful growth strategy, to sharing internal data until we need to modify it, these various techniques combine to provide a fast and flexible implementation.

Here are some timing experiments done on a laptop (multiple operating systems lead to similar results). The exact timings are irrelevant, so the results are given in percentage of what the unbounded string takes to performance similar operations.

We configured the GNATCOLL.Strings package in different ways, either with or without copy-on-write, and with a small string size from the default 0..23 bytes or a larger 0..127 bytes.

    Setting a small string multiple times (e.g. Set_Unbounded_String)

        unbounded_string                 = 100 %
        xstring-23 without copy on write =  11 %
        xstring-23 with copy on write    =  12 %
        xstring-127 with copy on write   =  17 %

        Here we see that not doing any memory allocation makes XString
        much faster than unbounded string. Most of the time is spent
        copying characters around, via memcpy.

    Setting a large string multiple times (e.g. Set_Unbounded_String)

        unbounded_string                 = 100 %
        xstring-23 without copy on write =  41 %
        xstring-23 with copy on write    =  50 %
        xstring-127 with copy on write   =  32 %

        Here, XString apparently proves better are reusing already
        allocated memory, although the timings are similar when creating
        new strings instead. Most of the difference is probably related to the
        use of realloc instead of alloc.

    Assigning small strings (e.g.   S2 := S1)

        unbounded_string                 = 100 %
        xstring-23 without copy on write =  31 %
        xstring-23 with copy on write    =  27 %
        xstring-127 with copy on write   =  57 %

    Assigning large strings (e.g.   S2 := S1)

        unbounded_string                 = 100 %
        xstring-23 without copy on write = 299 %
        xstring-23 with copy on write    =  63 %
        xstring-127 with copy on write   =  60 %

        When not using copy-on-write (which unbounded strings do), we need
        to reallocate memory, which shows on the second line.

    Appending to large string  (e.g.    Append (S, "...."))

        unbounded_string                 = 100 %
        xstring-23 without copy on write =  39 %
        xstring-23 with copy on write    =  48 %
            same, with tasking           = 142 %
        xstring-127 with copy on write   =  49 %

        When we use tasking, XStrings use atomic operations for the reference
        counter, which slows things down. They become slower than unbounded
        strings, because the latter in fact have a bug when using two
        different strings from two different threads, and they share data
        (they try to save on an atomic operation... this bug is being worked
        on).

    Removing leading and trailing spaces (e.g.   Trim (S, Ada.Strings.Both))

        unbounded_string                 = 100 %
        xstring-23 without copy on write =  50 %
        xstring-23 with copy on write    =  16 %
        xstring-127 with copy on write   =  18 %

        Here we see the benefits of the substrings optimization, which shares
        data for the substrings.
]]>
Simics helps run 60 000 GNAT Pro tests in 24 hours http://blog.adacore.com/simics-helps-run-60-000-gnat-pro-tests-in-24-hours Fri, 31 Mar 2017 14:06:00 +0000 Jerome Guitton http://blog.adacore.com/simics-helps-run-60-000-gnat-pro-tests-in-24-hours

This post has been updated in March 2017 and was originally posted in March 2016.

A key aspect of AdaCore’s GNAT Pro offering is the quality of the product we’re delivering and our proactive approach to resolving issues when they appear. To do so, we need both intensive testing before delivering anything to our customers and to produce “wavefront” versions every day for each product we offer. Doing so each and every day is a real challenge, considering the number of supported configurations, the number of tests to run, and the limit of a 24-hour timeframe. At AdaCore, we rely heavily on virtualization as part of our testing strategy. In this article, we will describe the extent of our GNAT Pro testing on VxWorks, and how Simics helped us meet these challenges.

Broad Support for VxWorks Products

The number of tests to run is proportional to the number of configurations that we support. We have an impressive matrix of configurations to validate:

  • Versions: VxWorks, VxWorks Cert, and VxWorks 653… available on the full range of versions that we support (e.g. 5.5, 6.4 to 6.9, 653 2.1 to 2.5...)
  • CPUs: arm, ppc, e500v2, and x86...
  • Program type: Real Time Process, Downloadable Kernel Module, Static Kernel Module, vThreads, ARINC 653 processes, and with the Cert subsets…
  • Ada configuration variants: zero cost exceptions versus setjmp/longjmp exceptions and Ravenscar tasking profile versus full Ada tasking...

Naturally, there are some combinations in this matrix of possibility that are not supported by GNAT, but the reality is we cover most of it. So the variety of configurations is very high.

The matrix is growing fast. Between 2013 and 2015, we have widened our offer to support the new VxWorks ports (VxWorks 7, VxWorks 653 3.0.x) on a large range of CPUs (arm, e500v2, ppc..), including new configurations that were needed by GNAT Pro users (x86_64). This represents 32 new supported configurations to be added to the 168 existing ones. It was obviously a challenge to qualify all these new versions against our existing test suites, with the goal of supporting the new VxWorks versions as soon as possible.

Simics supports a wide range of VxWorks configurations, and has a good integration with the core OS itself. This made Simics a natural solution for our testing strategy of GNAT Pro. On VxWorks 7 and 653 3.0.x, it allowed us to quickly set up an appropriate testing environment, as predefined material exists to make it work smoothly with most Wind River OSes; we could focus on our own technology right away, instead of spending time on developing, stabilizing and maintaining a new testing infrastructure from scratch.

Representative Testing on Virtual Hardware

Another benefit of Simics is that it not only supports all versions of VxWorks, but also emulates a wide range of hardware platforms. This allows us to test on platforms which are representative of the hardware that will be used in production by GNAT Pro users.

A stable framework for QA led to productivity increases

Stability is an important property of the testing framework; otherwise, “glitches” caused by the testing framework start causing spurious failures, often randomly, that the QA team then needs to analyze each time. Multiplied by the very large number of tests we run each day, and the large number of platforms we test on, lack of stability can quickly lead to an unmanageable situation. Trust of the test framework is a key factor for efficiency.

In that respect, and despite the heavy parallelism required in order to complete the validation of all our products in time, Simics proved to be a robust solution and behaved well under pressure, thus helping us focus our energy on differences caused by our tools, rather than on spurious differences caused by the testing framework.

Test execution speed allowed extensive testing

Some additional info about how broad our testing is. On VxWorks, we mostly have three sets of testsuites:

  • The standard Ada validation testsuite (ACATS): around 3,600 tests;
  • The regression testing of GNAT: around 15,000 tests;
  • Tool-specific testsuites: around 1,800 tests.

All in all, just counting the VxWorks platforms, we are running around 350,000 tests each day. 60,000 of them are run on Simics, mostly on the more recent ports (VxWorks 7 and 653 3.x).

In order to run all these tests, an efficient testing platform is needed. With Simics, we were able to optimize the execution by:

  • Proper tuning of the simulation target;
  • Stopping and restarting the platform at an execution point where tests can be run right away, using checkpoints;
  • Developing additional plugins to have efficient access to the host filesystem from the simulator.

We will give more technical details about these optimizations in a future article.

March 2017 Update

AdaCore’s use of Simics has not substantially changed since last year: it is still a key component of GNAT Pro’s testing strategy on VxWorks platforms. In 2017, GNAT Pro has been ported to PowerPC 64 VxWorks 7; with its broad support for VxWorks products, Simics has been a natural solution to speed up this port.

April 2018 Update

This year we smoothly switched to Simics 5 on all supported configurations; Simics has been of great help to speed up the new port of GNAT Pro to AArch64 VxWorks 7.


]]>
Two Projects to Compute Stats on Analysis Results http://blog.adacore.com/two-projects-to-compute-stats-on-analysis-results Thu, 30 Mar 2017 04:00:00 +0000 Yannick Moy http://blog.adacore.com/two-projects-to-compute-stats-on-analysis-results

The project by Daniel King allows you to extract the results from the log file gnatprove.out generated by GNATprove, into an Excel spreadsheet. What's nice is that you can then easily sort units according to the metrics you are following inside your spreadsheet viewer.

The project by Martin Becker allows you to extract the results from the JSON files generated by GNATprove for each unit analyzed, aggregate these results, and output them into either textual or JSON format. What's nice is that you can then integrate the generated JSON with your automated scripts/setup.

I have installed both on my laptop, and used them on examples such as Tokeneer, and I am happy to report that the results are consistent! :-) I don't know yet which one will become more useful with my setup, but both are worth a try.

]]>
GNATcoverage moves to GitHub http://blog.adacore.com/gnatcoverage-moves-to-github Wed, 29 Mar 2017 13:13:51 +0000 Pierre-Marie de Rodat http://blog.adacore.com/gnatcoverage-moves-to-github

Following the current trend, the GNATcoverage project moves to GitHub! Our new address is: https://github.com/AdaCore/gnatcoverage

GNATcoverage is a tool we developed to analyze and report program coverage. It supports the Ada and C programming languages, several native and embedded platforms, as well as various coverage criteria, from object code level instruction and branch coverage up to source level decision or MC/DC coverage, qualified for use in avionics certification contexts. For source-level analysis, GNATcoverage works hand-in-hand with the GNAT Pro compilers.

Originally developed as part of the Couverture research project, GNATcoverage became a supported product for a first set of targets in the 2009/2010 timeframe.

Since the beginning of the project, the development happened on the OpenDO forge. This has served us well, but we are now in the process of moving all our projects to GitHub. What does this change for you?

  • We will now use GitHub issues and pull requests for discussions that previously happened on mailing lists. We hope these places will be more visible and community-friendly.

In the near future, we’ll close the project on the OpenDO forge. We are keen to see you on GitHub!

]]>
Writing on Air http://blog.adacore.com/writing-on-air Mon, 27 Mar 2017 13:00:00 +0000 Jorge Real http://blog.adacore.com/writing-on-air

While searching for motivating projects for students of the Real-Time Systems course here at Universitat Politècnica de València, we found a curious device that produces a fascinating effect. It holds a 12 cm bar from its bottom and makes it swing, like an upside-down pendulum, at a frequency of nearly 9 Hz. The free end of the bar holds a row of eight LEDs. With careful and timely switching of those LEDs, and due to visual persistence, it creates the illusion of text... floating in the air!

The web shows plenty of references to different realizations of this idea. They are typically used for displaying date, time, and also rudimentary graphics. Try searching for "pendulum clock LED", for example. The picture in Figure 1 shows the one we are using.

Figure 1. The pendulum, speaking about itself

The software behind this toy is a motivating case for the students, and it contains enough real-time and synchronisation requirements to also make it challenging. 

We have equipped the lab with a set of these pendulums, from which we have disabled all the control electronics and replaced them with STM32F4 Discovery boards. We use also a self-made interface board (behind the Discovery board in Figure 1) to connect the Discovery with the LEDs and other relevant signals of the pendulum. The task we propose our students is to make it work under the control of a Ravenscar program running on the Discovery. We use GNAT GPL 2016 for ARM hosted on Linux, along with the Ada Drivers Library.

There are two different problems to solve: one is to make the pendulum bar oscillate with a regular period; the other one is to then use the LEDs to display some text.

Swing that bar!

The bar is fixed from the bottom to a flexible metal plate (see Figure 2). The stable position of the pendulum is vertical and still. There is a permanent magnet attached to the pendulum, so that the solenoid behind it can be energised to cause a repulsion force that makes the bar start to swing.

Figure 2. Pendulum mechanics
Figure 3. Detail of barrier pass detector

At startup, the solenoid control is completely blind to the whereabouts of the pendulum. An initial sequence must be programmed with the purpose of moving the bar enough to make it cross the barrier (see detail in Figure 3), a pass detector that uses an opto-coupler sensor located slightly off-center the pendulum run. This asymmetry is crucial, as we'll soon see.

Once the bar crosses the barrier at least three times, we have an idea about the pendulum position along time and we can then apply a more precise control sequence to keep the pendulum swinging regularly. The situation is pretty much like swinging a kid swing: you need to give it a small, regular push, at the right time. In our case, that time is when the pendulum enters the solenoid area on its way to the right side, since the solenoid repels the pendulum rightwards. That happens at about one sixth of the pendulum cycle, so we first need to know when the cycle starts and what duration it has. And for that, we need to pay close attention to the only input of the pendulum: the barrier signal.

Figure 4 sketches a chronogram of the barrier signal. Due to its asymmetric placement, the signal captured from the opto-coupler is also asymmetric.

Figure 4. Chronogram of the barrier signal and correspondence with extreme pendulum positions

To determine the start time and period of the next cycle, we take note of the times when rising and falling edges of the barrier signal occur. This is easy work for a small Finite State Machine (FSM), triggered by barrier interrupts to the Discovery board. Once we have collected the five edge times T1 to T5 (normally would correspond to 2 full barrier crossings plus the start of a third one) we can calculate the period by subtracting T5 - T1. Regarding the start time of the next cycle, we know the pendulum initiated a new cycle (reached its left-most position) just in between the two closest pulses (times T1 and T4). So, based on the information gathered, we estimate that the next cycle will start at time T5 + (T4 - T1) / 2.

But… all we know when we detect a barrier edge is whether it is rising or falling. So, when we detect the first rising edge of Barrier, we can't be sure whether it corresponds to T1 (the second barrier crossing) or T3 (the first). We have arbitrarily guessed it is T1, so we must verify this guess and fix things if it was incorrect. This check is possible precisely due to the asymmetric placement of the pass detector: if our guess was correct, then T3 - T1 should be less than T5 - T3. Otherwise we need to re-assign our measurements (T3, T4 and T5 become T1, T2 and T3) and then move on to the adequate FSM state (waiting for T4).

Once we know when the pendulum will be in the left-most position (the cycle start time) and the estimated duration of the next cycle, we can give a solenoid pulse at the cycle start time plus one sixth of the period. The pulse duration, within reasonable range, affects mostly the amplitude of the pendulum run, but not so much its period. Experiments with pulse durations between 15 and 38 milliseconds showed visible changes in amplitude, but period variations of only about 100 microseconds, for a period of 115 milliseconds (less than 0.1%). We found 18-20 ms to work well.

So, are we done with the pendulum control? Well... almost, but no, we’re not: we are also concerned by robustness. The software must be prepared for unexpected situations, such as someone or something suddenly stopping the bar. If our program ultimately relies on barrier interrupts and they do not occur, then it is bound to hang. A timeout timing event is an ideal mechanism to revive a dying pendulum. If the timeout expires, then the barrier-based control is abandoned and the initialisation phase engaged again, and again if needed, until the pendulum makes sufficient barrier crossings to let the program retake normal operation. After adding this recovery mechanism, we can say we are done with the pendulum control: the bar will keep on swinging while powered.

Adding lyrics to that swing

Once the pendulum is moving at a stable rate, we are ready to tackle the second part of the project: using the eight LEDs to display some text. Knowing the cycle start time and estimated period duration, one can devise a plan to display each line of a character at the proper period times. We have already calculated the next cycle start time and duration for the pendulum control. All we need to do now is to timely provide that information to a displaying task.

Figure 5. Time to display an exclamation mark!

The pendulum control functions described above are implemented by a package with the following (abbreviated) specification:

        with STM32F4;       use STM32F4;
        with Ada.Real_Time; use Ada.Real_Time;

        package Pendulum_IO is

           --  Set LEDs using byte pattern (1 => On, 0 => Off)
           procedure Set_LEDs (Pattern : in Byte);  

           --  Synchronization point with start of new cycle
           procedure Wait_For_Next_Cycle (Init_Time      : out Time; 
                                          Cycle_Duration : out Time_Span);

        private
              task P_Controller with Storage_Size => 4 * 1024;
        end Pendulum_IO;

The specification includes subprograms for setting the LEDs (only one variant shown here) and procedure Wait_For_Next_Cycle, which in turn calls a protected entry whose barrier (in the Ada sense, this time) is opened by the barrier signal interrupt handler, when the next cycle timing is known. This happens at time T5 (see Figure 4), when the current cycle is about to end but with sufficient time before the calling task must start switching LEDs. The P_Controller task in the private part is the one in charge of keeping the pendulum oscillating.

Upon completion of a call to Wait_For_Next_Cycle, the caller knows the start time and period of the next pendulum cycle (parameters Init_Time and Cycle_Period). By division of the period, we can also determine at what precise times we need to switch the LEDs. Each character is encoded using an 8 tall x 5 wide dot matrix, and we want to fit 14 characters in the display. Adding some left and right margins to avoid the slowest segments, and a blank space to the right of each character, we  subdivide the period in 208 lines. These lines represent time windows to display each particular character chunk. Since the pendulum period is around 115 milliseconds, it takes just some 550 microseconds for the pendulum to traverse one line.

If that seems tight, there is an even tighter requirement than this inter-line delay. The LEDs must be switched on only during an interval between 100 and 200 microseconds. Otherwise we would see segments, rather than dots, due to the pendulum speed. This must also be taken into account when designing the plan for the period, because the strategy changes slightly depending on the current pendulum direction. When it moves from left to right, the first 100 microseconds of a line correspond to it's left part, whereas the opposite is true for the opposite direction.


Dancing with the runtime

Apart from careful planning of the sequence to switch the LEDs, this part is possibly less complex, due to the help of Wait_For_Next_Cycle. However, the short delays imposed by the pendulum have put us in front of a limitation of the runtime support. The first try to display some text was far from satisfactory. Often times, dots became segments. Visual glitches happened all the time as well. Following the track to this issue, we ended up digging into the Ravenscar runtime (the full version included in GNAT GPL 2016) to eventually find that the programmed precision for timing events and delay statements was set to one millisecond. This setting may be fine for less demanding applications, and it causes a relatively low runtime overhead; but it was making it impossible for us to operate within the pendulum’s tight delays. Things started to go well after we modified and recompiled the runtime sources to make delays and timing events accurate to 10 microseconds. It was just a constant declaration, but it was not trivial to find it! Definitely, this is not a problem we ask our students to solve: they use the modified runtime.

If you come across the same issue and the default accuracy of 1 millisecond is insufficient for your application, look for the declaration of constant Tick_Period in the body of package System.BB.Board_Support (file s-bbbosu.adb in the gnarl-common part of either the full or the small footprint versions of the runtime). For an accuracy of 10 microseconds, we set the constant to Clock_Frequency / 100_000.

More fun

There are many other things that can be done with the pendulum, such as scrolling a text longer than the display width, or varying the scrolling speed by pressing the user button in the Discovery board (both features are illustrated in the video below, best viewed in HD); or varying the intensity of the text by changing the LEDs flashing time; or displaying graphics rather than just text... 

One of the uses we have given the pendulum is as a chronometer display for times such as the pendulum period, the solenoid pulse width, or other internal program delays. This use has proved very helpful to better understand the process at hand and also to diagnose the runtime delay accuracy issue. 

The pendulum can also be used as a rudimentary oscilloscope. Figure 6 shows the pendulum drawing the chronograms of the barrier signal and the solenoid pulse. The top two lines represent these signals, respectively, as the pendulum moves rightwards. The two bottom lines are for the leftwards semi-period and must be read leftwards. In Figure 7, the two semi-periods are chronologically re-arranged. The result cannot be read as in a real oscilloscope, because of the varying pendulum speed; but knowing that, it is indicative.

Figure 6. Pendulum used as an oscilloscope (original image)
Figure 7. Oscilloscope image, chronologically re-arranged

Want to see it?

I plan to take a pendulum with me to the Ada-Europe 2017 Conference in Vienna. It will be on display during the central days of the conference (13, 14 and 15 June) and I'll be around for questions and suggestions.

Credit, where it's due

My colleague Ismael Ripoll was the one who called my attention to the pendulum, back in 2005. We implemented the text part only (we did not disconnect the solenoid from the original pendulum's microcontroller). Until porting (and extending) this project to the Discovery board, we’ve been using an industrial PC with a digital I/O card to display text in the pendulum. The current setup is about two orders of magnitude cheaper. And it also fits much better the new focus of the subject on real-time and also embedded systems.

I'm thankful to Vicente Lliso, technician at the DISCA department of UPV, for the design and implementation of the adapter card connecting the Discovery board with the pendulum, for his valuable comments and for patiently attending my requests for small changes here and there.

My friend and amateur photographer Álvaro Doménech produced excellent photographical material to decorate this entry, as well as the pendulum video. Álvaro is however not to be blamed for the oscilloscope pictures, which I took with just a mobile phone camera.

And many thanks to Pat Rogers, from AdaCore, who helped me with the delay accuracy issue and invited me to write this entry. It was one of Pat's excellent tutorials at the Ada-Europe conference that pushed me into giving a new angle to this pendulum project... and to others in the near future!

]]>
SPARK Tetris on the Arduboy http://blog.adacore.com/spark-tetris-on-the-arduboy Mon, 20 Mar 2017 13:00:00 +0000 Fabien Chouteau http://blog.adacore.com/spark-tetris-on-the-arduboy

One of us got hooked on the promise of a credit-card-size programmable pocket game under the name of Arduboy and participated in its kickstarter in 2015. The kickstarter was successful (but late) and delivered  the expected working board in mid 2016. Of course, the idea from the start was to program it in Ada , but this is an 8-bits AVR microcontroller (the ATmega32u4 by Atmel) not supported anymore by GNAT Pro. One solution would have been to rebuild our own GNAT compiler for 8-bit AVR from the GNAT FSF repository and use the AVR-Ada project. Another solution, which we explore in this blog post, is to use the SPARK-to-C compiler that we developed at AdaCore to turn our Ada code into C and then use the Arduino toolchain to compile for the Arduboy board.

This is in fact a solution we are now proposing to those who need to compile their code for a target where we do not propose an Ada compiler, in particular small microcontrollers used in industrial automation and automotive industries. Thanks to SPARK-to-C, you can now develop your code in SPARK, compile it to C, and finally compile the generated C code to your target. We have built the universal SPARK compiler! This product will be available to AdaCore customers in the coming months.

We started from the version of Tetris in SPARK that we already ported to the Atmel SAM4S, Pebble-Time smartwatch and Unity game engine. For the details on what is proved on Tetris, see the recording of a talk at FOSDEM 2017 conference. The goal was to make this program run on the Arduboy.

SPARK-to-C Compiler

What we call the SPARK-to-C compiler in fact accepts both less and more than SPARK language as input. It allows pointers (which are not allowed in SPARK) but rejects tagged types and tasking (which are allowed in SPARK). The reason this is the case is that it’s easy to compile Ada pointers into C pointers but much harder to support object oriented or concurrent programming.

SPARK-to-C supports, in particular, all of Ada’s scalar types (enumerations, integers, floating-point, fixed-point, and access) as well as records and arrays and subtypes of these. More importantly, it can generate all the run-time checks to detect violations of type constraints such as integer and float range checks and checks for array accesses out of bounds and access to a null pointer or invalid pointer. Therefore, you can program in Ada and get the guarantee that the executable compiled from the C code generated by SPARK-to-C preserves the integrity of the program, as if you had compiled it directly from Ada with GNAT.

Compiling Ada into C poses interesting challenges. Some of them are resolved by following the same strategy used by GNAT during compilation to binary code. For example, bounds of unconstrained arrays are bundled with the data for the array in so-called "fat pointers", so that both code that directly references Array'First and Array'Last as well as runtime checks for array accesses can access the array bounds in C. This is also how exceptions, both explicit in the code and generated for runtime checks, are handled. Raising an exception is translated into a call to the so-called "last chance handler", a function provided by the user that can perform some logging before terminating the program. This is exactly how exceptions are handled in Ada for targets that don’t have runtime support. In general, SPARK-to-C provides very little runtime support, mostly for numerical computations (sin, cosine, etc.), accessing a real time clock, and outputting characters and strings. Other features require specific source-to-source transformations of Ada programs. For example, functions that return arrays in Ada are transformed into procedures with an additional output parameter (a pointer to some preallocated space in the caller) in C.

The most complex part of SPARK-to-C deals with unnesting nested subprograms because, while GCC supports nested functions as an extension, this is not part of standard C. Hence C compilers cannot be expected to deal with nested functions. Unnesting in SPARK-to-C relies on a tight integration of a source-to-source transformation of Ada code in the GNAT frontend, with special handling of nested subprograms in the C-generation backend. Essentially, the GNAT frontend creates an 'activation record' that contains a pointer field for each uplevel variable referenced in the nested subprogram. The nested subprogram is then transformed to reference uplevel variables through the pointers in the activation record passed as additional parameters. A further difficulty is making this work for indirect references to uplevel variables and through references to uplevel types based on these variables (for example the bound of an array type). SPARK-to-C deals also with these cases: you can find all details in the comments of the compiler file exp_unst.ads

Compiling Tetris from SPARK to C

Once SPARK-to-C is installed, the code of Tetris can be compiled into C with the version of GPRbuild that ships i SPARK-to-C:

$ gprbuild -P<project> --target=c

For example, the SPARK expression function Is_Empty from Tetris code:

function Is_Empty (B : Board; Y : Integer; X : Integer) return Boolean is
      (X in X_Coord and then Y in Y_Coord and then B(Y)(X) = Empty);

is compiled into the C function tetris_functional__is_empty, with explicit checking of array bounds before accessing the board:

boolean tetris_functional__is_empty(tetris_functional__board b, integer y, integer x) {
  boolean C123s = false;
  if ((x >= 1 && x <= 10) && (y >= 1 && y <= 50)) {
    if (!((integer)y >= 1 && (integer)y <= 50))
      __gnat_last_chance_handler(NULL, 0);
    if (!((integer)x >= 1 && (integer)x <= 10))
      __gnat_last_chance_handler(NULL, 0);
    if ((b)[y - 1][x - 1] == tetris_functional__empty) {
      C123s = true;
    }
  }
  return (C123s);
}

or into the following simpler C function when using compilation switch -gnatp to avoid runtime checking:

boolean tetris_functional__is_empty(tetris_functional__board b, integer y, integer x) {
  return (((x >= 1 && x <= 10) && (y >= 1 && y <= 50)) && ((b)[y - 1][x - 1] == tetris_functional__empty));
}

SPARK to C Tetris

Running on Arduboy

To interface the SPARK Tetris implementation with the C API of the Arduboy, we use the standard language interfacing method of SPARK/Ada:

procedure Arduboy_Set_Screen_Pixel (X : Integer; Y : Integer);
pragma Import (C, Arduboy_Set_Screen_Pixel, "set_screen_pixel");

A procedure Arduboy_Set_Screen_Pixel is declared in Ada but not implemented. The pragma Import tells the compiler that this procedure is implemented in C with the name “set_screen_pixel”.

SPARK-to-C will translate calls to the procedure “Arduboy_Set_Screen_Pixel” to calls to the C function “set_screen_pixel”. We use the same technique for all the subprograms that are required for the game (button_right_pressed, clear_screen, game_over, etc.).

The program entry point is in the Arduino sketch file SPARK_Tetris_Arduboy.ino (link). In this file, we define and export the C functions (set_screen_pixel() for instance) and call the SPARK/Ada code with _ada_main_tetris().

It’s that simple :)

If you have an Arduboy, you can try this demo by first following the quick start guide, downloading the project from GitHub, loading the Arduino sketch SPARK_Tetris_Arduboy/SPARK_Tetris_Arduboy.ino, and then clicking the upload button.

]]>
Research Corner - Auto-active Verification in SPARK http://blog.adacore.com/research-corner-auto-active-verification-in-spark Thu, 09 Mar 2017 05:00:00 +0000 Claire Dross http://blog.adacore.com/research-corner-auto-active-verification-in-spark

GNATprove performs auto-active verification, that is, verification is done automatically, but usually requires annotations by the user to succeed. In SPARK, annotations are most often given in the form of contracts (pre and postconditions). But some language features, in particular ghost code, allow proof guidance to be much more involved. As an example of how far we can go to guide the proof, see the paper we are presenting at NASA Formal Methods symposium 2017. It describes how an imperative red black tree implementation in SPARK was verified using intensive auto-active verification. The code presented in this paper is available as a distributed example in the SPARK github repository and is described in the SPARK user guide (see 'red_black_trees').

]]>
Rod Chapman on Software Security http://blog.adacore.com/rod-chapman-on-software-security Tue, 07 Mar 2017 05:00:00 +0000 Yannick Moy http://blog.adacore.com/rod-chapman-on-software-security

Rod Chapman gave an impactful presentation at Bristech conference last year. His subject: programming Satan's computer! His way of pointing out how difficult it is to produce secure software. Of course, it would not be Rod Chapman if he did not have also a few hints at how they have done it at Altran UK over the years. And SPARK is central to this solution, although it does not get mentioned explicitly in the talk! (although Rod lifts the cover in answering a question at the end)

Really worth seeing, Rod is a captivating speaker.

]]>
AdaCore attends FOSDEM http://blog.adacore.com/adacore-attends-fosdem Wed, 22 Feb 2017 05:00:00 +0000 AdaCore Admin http://blog.adacore.com/adacore-attends-fosdem

Earlier this month AdaCore attended FOSDEM in Brussels, an event focused on the use of free and open source software. Two members of our technical team were there to give talks: "Prove with SPARK: No Math, Just Code" from Yannick Moy and "64 bit Bare Metal Programming on RPI-3" from Tristan Gingold.

Yannick Moy presented how to prove key properties of Tetris in SPARK and run it on ARM Cortex M. The presentation focused on the accessibility of the proof technology to software engineers, who simply have to code the specification, which does not require a specific math background. Click here to watch Yannick's presentation in full.

Tristan Gingold presented the Raspberry PI 3 board, how to write and build a first example and a demo of a more advanced multi-core application. With almost no tutorials on the internet, he addressed the main new feature of the popular RPI-3 board: 4x 64 bit cores. Click here to watch Tristan's presentation in full. 

For more information and updates on future events that AdaCore will be attending, please visit our website or follow our Twitter @AdaCoreCompany.

]]>
Getting started with the Ada Drivers Library device drivers http://blog.adacore.com/getting-started-with-the-ada-drivers-library-device-drivers Tue, 14 Feb 2017 14:00:00 +0000 Pat Rogers http://blog.adacore.com/getting-started-with-the-ada-drivers-library-device-drivers

The Ada Drivers Library (ADL) is a collection of Ada device drivers and examples for ARM-based embedded targets. The library is maintained by AdaCore, with development originally (and predominantly) by AdaCore personnel but also by the Ada community at large.  It is available on GitHub and is licensed for both proprietary and non-proprietary use.

The ADL includes high-level examples in a directory at the top of the library hierarchy. These examples employ a number of independent components such as cameras, displays, and touch screens, as well as middleware services and other device-independent interfaces. The stand-alone components are independent of any given target platform and appear in numerous products. (A previous blog entry examined one such component, the Bosch BLO055 inertial measurement unit (IMU)). Other examples show how to create high-level abstractions from low-level devices. For instance, one shows how to create abstract data types representing serial ports.

In this entry we want to highlight another extremely useful resource: demonstrations for the low-level device drivers. Most of these drivers are for devices located within the MCU package itself, such as GPIO, UART/USART, DMA, ADC and DAC, and timers. Other demonstrations are for some of the stand-alone components that are included in the supported target boards, for example gyroscopes and accelerometers. Still other demonstrations are for vendor-defined hardware such as a random number generator.

These demonstrations show a specific utilization of a device, or in some cases, a combination of devices. As such they do not have the same purpose as the high-level examples. They may just display values on an LCD screen or blink LEDs. Their purpose is to provide working examples that can be used as starting points when incorporating devices into client applications. As working driver API references they are invaluable.

Approach

Typically there are multiple, independent demonstration projects for each device driver because each is intended to show a specific utilization. For example, there are five distinct demonstrations for the analog-to-digital conversion (ADC) driver. One shows how to set up the driver to use polling to get the converted value. Another shows how to configure the driver to use interrupts instead of polling. Yet another shows using a timer to trigger the conversions, and another builds on that to show the use of DMA to get the converted values to the user. In each case we simply display the resulting values on an LCD screen rather than using them in some larger application-oriented sense.

Some drivers, the I2C and SPI communication drivers specifically,  do not have dedicated demonstrations of their own. They are used to implement drivers for devices that use those protocols, i.e., the drivers for the stand-alone components. The Bosch BLO055 IMU mentioned earlier is an example.

Some of the demonstrations illustrate vendor-specific capabilities beyond typical functionality. The STM32 timers, for example, have direct support for quadrature motor encoders. This support provides CPU-free detection of motor rotation to a resolution of a fraction of a degree. Once the timer is configured for this purpose the application merely samples a register to get the encoder count. The timer will even provide the rotation direction. See the encoder demonstration if interested.

Implementation

All of the drivers and demonstration programs are written in Ada 2012. They use preconditions and postconditions, especially when the driver is complicated. The preconditions capture API usage requirements that are otherwise expressed only within the documentation, and sometimes not expressed at all. Similarly, postconditions help clients understand the effects of calls to the API routines, effects that are, again, only expressed in the documentation. Some of the devices are highly sophisticated -- a nice way of saying blindingly complicated -- and their documentation is complicated too. Preconditions and postconditions provide an ideal means of capturing information from the documentation, along with overall driver usage experience. The postconditions also help with the driver implementation itself, acting as unit tests to ensure implementer understanding. Other Ada 2012 features are also used, e.g., conditional and quantified expressions.

The STM32.Timers package uses preconditions and postconditions extensively because the STM timers are "highly sophisticated." STM provides several kinds of timer with significantly different capabilities. Some are defined as "basic," some "advanced," and others are "general purpose." The only way to know which is which is by the timer naming scheme ("TIM" followed by a number) and the documentation.  Hence TIM1 and TIM8 are advanced timers, whereas TIM6 and TIM7 are basic timers.  TIM2 through TIM5 are general purpose timers but not the same as TIM9 through TIM14, which are also general purpose. We use preconditions and postconditions to help keep it all straight. For example, here is the declaration of the routine for enabling an interrupt on a given timer. There are several timer interrupts possible, represented by the enumeration type Timer_Interrupt. The issue is that basic timers can only have one of the possible interrupts specified, and only advanced timers can have two of those possible. The preconditions express those restrictions to clients.

procedure Enable_Interrupt
   (This   : in out Timer;
    Source : Timer_Interrupt)
with  
   Pre =>
      (if Basic_Timer (This) then Source = Timer_Update_Interrupt) and
      (if Source in Timer_COM_Interrupt | Timer_Break_Interrupt then Advanced_Timer (This)),
   Post => Interrupt_Enabled (This, Source);

The preconditions reference Boolean functions Basic_Timer and Advanced_Timer in order to distinguish among the categories of timers. They simply compare the timer specified to a list of timers in those categories. 

The postcondition tells us that the interrupt will be enabled after the call returns. That is useful for the user but also for the implementer because it serves as an actual check that the implementation does what is expected. When working with hardware, though, we have to keep in mind that the hardware may clear the tested condition before the postcondition code is called. For example, a routine may set a bit in a register in order to make the attached device do something, but the device may clear the bit as part of its response. That would likely happen before the postcondition code could check that the bit was set. When looking throughout the drivers code you may notice some "obvious" postconditions are not specified. That may be the cause.

The drivers use compiler-dependent facilities only when essential. In particular, they use an AdaCore-defined aspect specifying that access to a given memory-mapped register is atomic even when only one part of it is read or updated. This access reflects the hardware requirements and simplifies the driver implementation code considerably.

Organization

The device driver demonstrations are vendor-specific because the corresponding devices exist either within the vendor-defined MCU packages or outside the MCU on the vendors' target boards. The first vendor supported by the library was STMicroelectroncs (STM), although other vendors are beginning to be represented too. As a result, the device driver demonstrations are currently for MCU products and boards from STM and are, therefore, located in a library subdirectory specific to STM. Look for them in the /Ada_Drivers_Library/ARM/STM32/driver_demos/ subdirectory of your local copy from GitHub. There you will see some drivers immediately. These are for drivers that are shared across an entire MCU family. Others are located in further subdirectories containing either a unique device's driver, or devices that do exist across multiple MCUs but nonetheless differ in some significant way.

Let's look at one of the demonstration projects, the "demo_LIS3DSH_tilt" project, so that we can highlight the more important parts.  This program demonstrates basic use of the LIS3DSH accelerometer chip. The four LEDs surrounding the accelerometer will come on and off as the board is moved, reflecting the directions of the accelerations measured by the device.

The first thing to notice is the "readme.md" file. As you might guess, this file explains what the project demonstrates and, if necessary, how to set it up. In this particular case the text also mentions the board that is intended for execution, albeit implicitly, because the text mentions the four LEDs and an accelerometer that are specific to one of the STM32 Discovery boards. In other words, the demo is intended for a specific target board. At the time of this writing, all the STM demonstration projects run on either the STM32F4 or the STM32F429I Discovery boards from STMicroelectronics. They are very inexpensive, amazingly powerful boards. Some demonstrations will run on either one because they do not use board-specific resources.

But even if a demonstration does not require a specific target board, it still matters which board you use because the demo's project file (the "gpr file") specifies the target. If you use a different target the executable will download but may not run correctly, perhaps not at all.

The executable may not run because the specified target's runtime library is used to build the binary executable. These libraries have configurations that reflect the differences in the target board, especially memory and clock rates, so using the runtime that matches the board is critical. This is the first thing to check when the board you are using simply won't run the demonstration at all.

The demonstration project file specifies the target by naming another project in a with-clause. This other project represents a specific target board. Here is the elided content of this demonstration's project file. Note the second with-clause that specifies a gpr file for the STM32F407 Discovery board. That is one of the two lines to change if you want to use the F429I Discovery instead.

with "../../../../boards/common_config.gpr";
with "../../../../boards/stm32f407_discovery.gpr";

project Demo_LIS3DSH_Tilt extends "../../../../examples/common/common.gpr" is

   ...
   for Runtime ("Ada") use STM32F407_Discovery'Runtime("Ada");
   ...

end Demo_LIS3DSH_Tilt;

The other line to change in the project file is the one specifying the "Runtime" attribute. Note how the the value of the attribute is specified in terms of another project's Runtime attribute. That other project is the one named in the second with-clause, so when we change the with-clause we must change the name of the referenced project too.

That's really all you need to change in the gpr file. GPS and the builder will handle everything else automatically.

There is, however, another effect of the with-clause naming a specific target. The demonstration programs must refer to the target MCU in order to use the devices in the MCU package. They may also need to refer to devices on the target board. Different MCU packages have differing numbers of devices (eg, USARTs) in the package. Similarly, different boards have different external components (accelerometers versus gyroscopes, for example). We don't want to limit the code in the ADL to work with specific boards, but that would be the case if the code referenced the targets by name, via packages representing the specific MCUs and boards. Therefore, the ADL defines two packages that represent the MCU and the board indirectly. These are the STM32.Device and STM32.Board packages, respectively. The indirection is then resolved by the gpr file named in the with-clause. In this demonstration the clause names the STM32F407_Discovery project so that is the target board represented by the STM32.Board package. That board uses an STM32F407VG MCU so that is the MCU represented by the STM32.Device package. Each package contains declarations for objects and constants representing the specific devices on that specific MCU and target board.

You'll also see a file named ".gdbinit" at the same level as the readme.md and gpr files. This is a local gdb script that automatically resets the board when debugging. It is convenient but not essential.

At that same level you'll also see a "gnat.adc" file containing configuration pragmas. These files contain a single pragma that ensures all interrupt handlers are elaborated before any interrupts can trigger them, among other things. It is not essential for the correct function of these demonstrations but is a good idea in general.

Other than those files you'll see subdirectories for the source files and compiler's products (the object and ALI files, and the executable file).

And that's it. Invoke GPS on the project file and everything will be handled by the IDE.

Application Use

We mentioned that you must change the gpr file if you want to use a different target board. That assumes you are running the demonstration programs themselves. There is no requirement that you do so. You could certainly take the bulk of the code and use it on some other target that has the same MCU family inside. That's the whole point of the demonstrations: showing how to use the device drivers! The Certyflie project, also on the AdaCore GitHub, is just such a project. It uses these device drivers so it uses an STM32.Device package for the on-board STM32F405 MCU, but the target board is a quad-copter instead of one of the Discovery kits.

Concluding Remarks

Finally, it must be said that not all available devices have drivers in the ADL, although the most important do. More drivers and demonstrations are needed. For example, the hash processor and the cryptographic processor on the STM Cortex-M4 MCUs do not yet have drivers. Other important drivers are missing as well. CAN and Ethernet support is either minimal or lacking entirely. And that's not even mentioning the other vendors possible. We need the active participation of the Ada community and hope you will join us!

]]>
Proving Tetris With SPARK in 15 Minutes http://blog.adacore.com/proving-tetris-with-spark-in-15-minutes Fri, 10 Feb 2017 05:00:00 +0000 Yannick Moy http://blog.adacore.com/proving-tetris-with-spark-in-15-minutes

Here is the page of the talk with the slides and recording.

]]>
Going After the Low Hanging Bug http://blog.adacore.com/going-after-the-low-hanging-bug Mon, 30 Jan 2017 14:00:00 +0000 Raphaël Amiard http://blog.adacore.com/going-after-the-low-hanging-bug

At AdaCore, we've been developing deep static analysis tools (CodePeer and SPARK) since 2008. And if you factor in the fact that some developers of these tools had been developing these tools (or others) for the past decades, it's fair to say that we've got a deep expertise in deep static analysis tools.

At the same time, many Web companies have adopted light-weight static analysis integrated in their agile code-review-commit cycle. Some of these tools are deployed for checking all commits in the huge codebases of Google (Tricorder) or Facebook (Infer). Others are commercial tools implementing hundreds of small checkers (SonarLintPVS-Studio, Flawfinder, PC-lint, CppCheck). The GNAT compiler implements some of these checkers through its warnings, our coding standard checker GNATcheck implements others, but we are also missing in our technology some useful checkers that rely on intraprocedural or interprocedural control and data flow analysis typically out of reach of a compiler or a coding standard checker.

Some of these checkers are in fact implemented with much greater precision in our deep static analysis tools CodePeer and SPARK, but running these tools requires developing a degree of expertise in the underlying technology to be used effectively, and their use can be costly in terms of resources (machines, people). Hence, these tools are typically used for high assurance software, where the additional confidence provided by deep static analysis outweights the costs. In addition, our tools target mostly absence of run-time errors and a few logical errors like unused variables or statements and a few suspicious constructs. Thus they don't cover the full spectrum of checkers implemented in light-weight static analyzers.

Luckily, the recent Libadalang technology, developed at AdaCore, provides an ideal basis on which to develop such light-weight static analysis, as it can parse and analyze thousands of lines of code in seconds. As an experiment, we implemented two simple checkers using the Python binding of Libadalang, and we found a dozen bugs in the codebases of the tools we develop at AdaCore (including the compiler and static analyzers). That's what we describe in the following.


Checker 1: When Computing is a Waste of Cycles

The first checker detects arguments of some arithmetic and comparison operators which are syntactically identical, in cases where this could be expressed with a constant instead (like "X - X" or "X <= X"):

import libadalang as lal

def same_tokens(left, right):
    return len(left) == len(right) and all(
        le.kind == ri.kind and le.text == ri.text
        for le, ri in zip(left, right)
    )

def has_same_operands(binop):
    return same_tokens(list(binop.f_left.tokens), list(binop.f_right.tokens))

def interesting_oper(op):
    return not isinstance(op, (lal.OpMult, lal.OpPlus, lal.OpDoubleDot,
                               lal.OpPow, lal.OpConcat))

c = lal.AnalysisContext('utf-8')
unit = c.get_from_file(source_file)
for b in unit.root.findall(lal.BinOp):
    if interesting_oper(b.f_op) and has_same_operands(b):
        print 'Same operands for {} in {}'.format(b, source_file)

(Full source available at https://github.com/AdaCore/lib...)

Despite all the extensive testing that is done on our products, this simple 20-lines checker found 1 bug in the GNAT compiler, 3 bugs in CodePeer static analyzer and 1 bug in the GPS IDE! We show them below so that you can convince yourself that they are true bugs, really worth finding.

The bug in GNAT is on the following code in sem_prag.adb:

            --  Attribute 'Result matches attribute 'Result

            elsif Is_Attribute_Result (Dep_Item)
              and then Is_Attribute_Result (Dep_Item)
            then
               Matched := True;

One or the references to Dep_Item should really be Ref_Item. Here is the correct version:

            --  Attribute 'Result matches attribute 'Result

            elsif Is_Attribute_Result (Dep_Item)
              and then Is_Attribute_Result (Ref_Item)
            then
               Matched := True;

Similarly, one of the three bugs in CodePeer can be found in be-ssa-value_numbering-stacks.adb:

            --  Recurse on array base and sliding amounts;
            if VN_Kind (Addr_With_Index) = Sliding_Address_VN
              and then Num_Sliding_Amounts (Addr_With_Index) =
                         Num_Sliding_Amounts (Addr_With_Index)
            then

where one of the references to Addr_With_Index above should really be to Addr_With_Others_VN, and the other two are in be-value_numbers.adb:

         return VN_Global_Obj_Id (VN2).Obj_Id_Number =
           VN_Global_Obj_Id (VN2).Obj_Id_Number
           and then VN_Global_Obj_Id (VN2).Enclosing_Module =
           VN_Global_Obj_Id (VN2).Enclosing_Module;

where two of the four references to VN2 should really be to VN1.

The bug in GPS is in language-tree-database.adb:

               if Get_Construct (Old_Obj).Attributes /=
                 Get_Construct (New_Obj).Attributes
                 or else Get_Construct (Old_Obj).Is_Declaration /=
                 Get_Construct (New_Obj).Is_Declaration
                 or else Get_Construct (Old_Obj).Visibility /=
                 Get_Construct (Old_Obj).Visibility

The last reference to Old_Obj should really be New_Obj.


Checker 2: When Testing Gives No Information

The second checker detects syntactically identical expressions which are chained together in a chain of logical operators, so that one of the two identical tests is useless (as in "A or B or A"):

import libadalang as lal

def list_operands(binop):
    def list_sub_operands(expr, op):
        if isinstance(expr, lal.BinOp) and type(expr.f_op) is type(op):
            return (list_sub_operands(expr.f_left, op)
                    + list_sub_operands(expr.f_right, op))
        else:
            return [expr]

    op = binop.f_op
    return (list_sub_operands(binop.f_left, op)
            + list_sub_operands(binop.f_right, op))

def is_bool_literal(expr):
    return (isinstance(expr, lal.Identifier)
            and expr.text.lower() in ['true', 'false'])

def has_same_operands(expr):
    ops = set()
    for op in list_operands(expr):
        tokens = tuple((t.kind, t.text) for t in op.tokens)
        if tokens in ops:
            return op
        ops.add(tokens)

def same_as_parent(binop):
    par = binop.parent
    return (isinstance(binop, lal.BinOp)
            and isinstance(par, lal.BinOp)
            and type(binop.f_op) is type(par.f_op))

def interesting_oper(op):
    return isinstance(op, (lal.OpAnd, lal.OpOr, lal.OpAndThen, lal.OpOrElse,
                           lal.OpXor))

c = lal.AnalysisContext('utf-8')
unit = c.get_from_file(source_file)
for b in unit.root.findall(lambda e: isinstance(e, lal.BinOp)):
    if interesting_oper(b.f_op) and not same_as_parent(b):
        oper = has_same_operands(b)
        if oper:
            print 'Same operand {} for {} in {}'.format(oper, b, source_file)

(Full source available at https://github.com/AdaCore/lib...)

Again, this simple 40-lines checker found 4 code quality issues in the GNAT compiler, 2 bugs in CodePeer static analyzer, 1 bug and 1 code quality issue in GPS IDE and 1 bug in QGen code generator. Ouch!

The four code quality issues in GNAT are simply duplicated checks that are not useful. For example in par-endh.adb:

         --  Cases of normal tokens following an END

          (Token = Tok_Case   or else
           Token = Tok_For    or else
           Token = Tok_If     or else
           Token = Tok_Loop   or else
           Token = Tok_Record or else
           Token = Tok_Select or else

         --  Cases of bogus keywords ending loops

           Token = Tok_For    or else
           Token = Tok_While  or else

The test "Token = Tok_For" is present twice. Probably better for maintenance to have it once only. The three other issues are similar.

The two bugs in CodePeer are in utils-arithmetic-set_arithmetic.adb:

      Result  : constant Boolean
        := (not Is_Singleton_Set (Set1)) and then (not Is_Singleton_Set (Set1))
        and then (Num_Range_Pairs (Set1, Set2) > Range_Pair_Limit);

The second occurrence of Set1 should really be Set2.

The code quality issue in GPS is in mi-parser.adb:

           Token = "traceframe-changed" or else
           Token = "traceframe-changed" or else

The last line is useless. The bug in GPS is in vcs.adb:

      return (S1.Label = null and then S2.Label = null
              and then S2.Icon_Name = null and then S2.Icon_Name = null)

The first reference to S2 on the last line should really be S1. Note that this issue had already been detected by CodePeer, which is run as part of GPS quality assurance, and it had been fixed on the trunk by one of the GPS developers. Interestingly here, two tools using either a syntactic heuristic or a deep semantic interpretation (allowing CodePeer to detect that "S2.Icon_Name = null" is always true when reaching the last subexpression) reach the same conclusion on that code.

Finally, the bug in QGen is in himoco-change_buffers.ads:

   procedure Apply_Update (Self : in out Change_Buffer)
   with Post =>
   --  @req TR-CHB-Apply_Update
   --  All signals, blocks and variables to move shall
   --  be moved into their new container. All signals to merge shall be
   --  merged.
     (
        (for all Elem of Self.Signals_To_Move =>
             (Elem.S.Container.Equals (Elem.Move_Into.all)))
      and then
        (for all Elem of Self.Blocks_To_Move =>
             (if Elem.B.Container.Is_Not_Null then
                  Elem.B.Container.Equals (Elem.Move_Into.all)))
      and then
        (for all Elem of Self.Signals_To_Move =>
             (Elem.S.Container.Equals (Elem.Move_Into.all)))
      and then

The first and third conjuncts in the postcondition are the same. After checking with QGen developers, the final check here was actually meant for Self.Variables_To_Move instead of Self.Signals_To_Move. So we detected here a bug in the specification (expressed as a contract), using a simple syntactic checker!


Setup Recipe

So you actually want to try the above scripts on your own codebase? This is possible right now with your latest GNAT Pro release or the latest GPL release for community & academic users! Just follow the instructions we described in the Libadalang repository, you will then be able to run the scripts inside your favorite Python2 interpreter.


Conclusion

Overall, this little experiment was eye opening, in particular for us who develop these tools, as we did not expect such gross mistakes to have gone through our rigorous reviews and testing. We will continue investigating what benefits light-weight static analysis might provide, and if this investigation is successful, we will certainly include this capability in our tools. Stay tuned!

[cover image by Max Pixel, Creative Commons Zero - CC0]

]]>
Hash it and Cache it http://blog.adacore.com/hash-it-and-cache-it Tue, 24 Jan 2017 05:00:00 +0000 Johannes Kanig http://blog.adacore.com/hash-it-and-cache-it

GNATprove already uses quite a few techniques to avoid doing the same work over and over again. This older blog post explains the main mechanisms that are in play. In this blog post, we describe yet another mechanism, that can decrease the work done by gnatprove considerably. In this other post we have described how we changed the architecture of the SPARK tools to achieve more parallelism.

When GNATprove analyses your SPARK program, it internally creates an intermediate file in the Why3 format, and then spawns a program called "gnatwhy3" to process this file. It is usually in the gnatwhy3 process that GNATprove spends most of its analysis time. Couldn't we save a lot of this time if we "cache" calls to gnatwhy3? That is, we could detect if the intermediate Why3 file is the same as the last time, and no other relevant conditions have changed, and simply take the analysis results of last time.

This is exactly what this new SPARK feature does. It computes a hash of the intermediate file and all relevant parameters, and uses a memcached server to check if a previous run of gnatwhy3 exists with this hash. If yes, the result of the last analysis is retrieved from the server. If not, the gnatwhy3 tool is run as usual.

The nice property of this new feature is that the memcached server can be shared across members of a team. All you need is to set up a machine on your network which has such a server running. Memcached is free, open source and very easy to set up. Then you can specify this server to your invocation of GNATprove if you add ``--memcached-server=hostname:portnumber``  to the commmand line. If your team uses this option, and your colleague developer has just run the SPARK tools on a checkout of your project, you should see speedups in your own usage of the tools (and vice-versa).

]]>
Introducing Libadalang http://blog.adacore.com/introducing-libadalang Mon, 23 Jan 2017 14:00:00 +0000 Raphaël Amiard http://blog.adacore.com/introducing-libadalang

Show me your abstract syntax tree

AdaCore is working on a host of tools that works on Ada code. The compiler, GNAT, is the most famous and prominent one, but it is far from being the only one.

Over the years we have done tools with a variety of requirements regarding processing Ada code, that we can put on a spectrum:

  • Some tools, like a compiler, or some static analyzers, will need to ensure that they are working on correct Ada code, both from a syntactic and semantic point of view.

  • Some tools, like a source code pretty-printer, can relax some of those constraints. While the semantic correctness of the code can be used by the pretty printer, it is not necessary per-se. We can even imagine a pretty-printer working on a syntactically incorrect source, if it ensures not to change its intended meaning.

  • Some other tools, like an IDE, will need to work on code that is incorrect, and even evolving dynamically.

At AdaCore, we already have several interleaved tools to process Ada code: The GNAT compiler, the ASIS library, GNAT2XML, the GPS IDE. A realization of the past years, however, has been that we were lacking a unified solution to process code at the end of the previously described spectrum: Potentially evolving, potentially incorrect Ada code.

Libadalang

Libadalang is meant to fill that gap, providing an easy to use library to syntactically and semantically analyze Ada code. The end-goal is both to use it internally, in our tools and IDEs, to provide the Ada-aware engine, and to propose it to customers and Ada users, so that they can create their own custom Ada aware tools.

Unlike the tools that we currently propose to implement your own tools, Libadalang will provide different levels, allowing a user to work on a purely syntactic level if needed, or access more semantic information.

We will also provide interfaces to multiple languages:

  • You will be able to use Libadalang from Ada of course.

  • But you will also be able to use it from Python, for users who want to prototype easily, or do one off analyses/statistics about their code.

We will in a following series of blog posts showcase how to use Libadalang to solve concrete problems, so that interested people can get a feel of how to use Libadalang.

Stay tuned

Libadalang is not ready for public consumption yet, but you can see the progress being made on GitHub: https://github.com/AdaCore/lib...

Stay tuned!

]]>
New Year's Resolution for 2017: Use SPARK, Say Goodbye to Bugs http://blog.adacore.com/new-years-resolution-for-2017-no-bugs-with-spark Wed, 04 Jan 2017 13:14:00 +0000 Yannick Moy http://blog.adacore.com/new-years-resolution-for-2017-no-bugs-with-spark

NIST has recently published a report called "Dramatically Reducing Software Vulnerabilities" in which they single out five approaches which have the potential for creating software with 100 times fewer vulnerabilities than we do today. One of these approaches is formal methods. In the introduction of the document, the authors explain that they selected the five approaches that meet the following three criteria:

  • Dramatic impact,
  • 3 to 7-year time frame and
  • Technical activities.

The dramatic impact criteria is where they aim at "reducing vulnerabilities by two orders of magnitude". The 3 to 7-year time frame was meant to select "existing techniques that have not reached their full potential for impact". The technical criteria narrowed the selection to the technical area.

Among formal methods, the report highlights strong suits of SPARK, such as "Sound Static Program Analysis" (the raison d'être of SPARK), "Assertions, Pre- and Postconditions, Invariants, Aspects and Contracts" (all of which are available in SPARK), and "Correct-by-Construction". The report also cites SPARK projects Tokeneer and iFACTS as example of mature uses of formal methods.

Another of the five approaches selected by NIST to dramatically reduce software vulnerabilities is what they call "Additive Software Analysis Techniques", where results of analysis techniques are combined. This has been on our radar since 2010 when we first planned an integration between our static analysis tool CodePeer and our formal verification toolset SPARK. We have finally achieved a first step in the integration of the two tools in SPARK 17, by using CodePeer as a first level of proof tool inside SPARK Pro

Paul Black who lead the work on this report was interviewed a few months ago, and he talks specifically about formal methods at 7:30 in the podcast. His host Kevin Greene from US Homeland Security mentions that "There has been a lot of talk especially in the federal community about formal methods." To which Paul Black answers later that "We do have to get a little more serious about formal methods."

NIST is not the only ones to support the use of SPARK. Editor Bill Wong from Electronic Design has included SPARK in his "2016 Gifts for the Techie", saying:

It is always nice to give something that is good for you, so here is my suggestion (and it’s a cheap one): Learn SPARK. Yes, I mean that Ada programming language subset.

For those who'd like to follow NIST or Bill Wong's advice, here is where you should start:

]]>
SPARK and CodePeer, a Good Match! http://blog.adacore.com/spark-and-codepeer-a-good-match Sat, 31 Dec 2016 05:00:00 +0000 Johannes Kanig http://blog.adacore.com/spark-and-codepeer-a-good-match

As a reader of this blog, you probably know that SPARK, to prove the absence of runtime errors in SPARK programs, translates the program into logical formulas that are then passed to powerful tools called SMT-solvers, that can prove the validity of the logical formulas. Simply put, for every check in your SPARK program, the tool generates a logical formula such that, if the SMT solver manages to prove it, the runtime check cannot happen.

The CodePeer tool for the static analysis of Ada programs functions in a vastly different way: It tracks the ranges of variables and expressions in your program.  When it encounters a runtime check (which often can also be expressed as a range), it will verify if the range of the variable and the range of the check allow for a potential runtime failure or not. 

Though the applied techniques and use cases of CodePeer and SPARK are very different, ultimately, both tools will analyse the user code and try, for each potential runtime check, to prove that it cannot happen. If they fail to achieve this proof, a message will be issued to the user.  We always felt that there was some kind of synergy possible between CodePeer and SPARK, but we never quite saw how they would fit together, given their very different use cases and underlying techniques. 

However, with the SPARK 17.1 release, we have finally brought the two technologies together.  In hindsight, the idea is very simple, but isn't it always like that?  We simply run the CodePeer tool as part of the SPARK analysis. Every check that has already identified as "cannot fail" by the CodePeer tool is not translated to logical formulas, and directly reported as proved.  From the user point of view, CodePeer becomes simply another proof tool. 

In our tests, we have found CodePeer to be very effective. It quickly discharges all the simple runtime checks that can be proved simply by looking at the ranges of the involved variables and expressions.  It is particularly efficient when applied to runtime checks involving floating point variables

Starting from SPARK Pro version 17.1 targeted for release in February, the CodePeer engine will be part of the SPARK Pro package.  It can be enabled on the command line with the switch "--codepeer=on", or by selecting the corresponding checkbox in the GPS and GNATBench plug-ins.

Note that CodePeer and SPARK are still separate products, because, as mentioned above, they cater to different use cases.

For more information, see the SPARK User's Guide.

]]>
SPARK Cheat Sheets (en & jp) http://blog.adacore.com/spark-cheat-sheets-en-jp Fri, 16 Dec 2016 05:00:00 +0000 Yannick Moy http://blog.adacore.com/spark-cheat-sheets-en-jp

Here is the SPARK cheat sheet usually distributed in trainings, both in English and in Japanese.

Attachments

]]>
Make with Ada: DIY instant camera http://blog.adacore.com/make-with-ada-diy-instant-camera Mon, 12 Dec 2016 09:00:00 +0000 Fabien Chouteau http://blog.adacore.com/make-with-ada-diy-instant-camera

There are moments in life where you find yourself with an AdaFruit thermal printer in one hand, and an OpenMV camera in the other. You bought the former years ago, knowing that you would do something cool with it, and you are playing with the latter in the context of a Hackaday Prize project. When that moment comes — and you know it will come — it’s time to make a DIY instant camera. For me it was at the end of a warm Parisian summer day. The idea kept me awake until 5am, putting the pieces together in my head, designing an enclosure that would look like a camera. Here’s the result:


The Hardware

On the hardware side, there’s nothing too fancy. I use a 2 cell LiPo battery from my drone. It powers the thermal printer and a 5V regulator. The regulator powers the OpenMV module and the LCD screen. There’s a push button for the trigger and a slide switch for the mode selection, both are directly plugged in the OpenMV IOs. The thermal printer is connected via UART, while the LCD screen uses SPI. Simple.

The Software

For this project I added support for the OpenMV in the Ada_Drivers_Library. It was the opportunity to do the digital camera interface (DCMI) driver as well as two Omnivision camera sensors, ST7735 LCD driver and the thermal printer.

The thermal printer is only capable of printing black or white pixel bitmap (not even gray scale), this is not great for a picture. Fortunately, the printing head has 3 times more pixels than the height of a QQVGA image, which is the format I get from the OpenMV camera. If I also multiply the width by 3, for each RGB565 pixel from the camera I can have 9 black or white pixels on the paper (from 160x120 to 480x360). This means I can use a dithering algorithm (a very naive one) to produce a better quality image. A pixel of grayscale value X from 0 to 255 will be transformed in a 3x3 black and white matrix, like this:

This is a quick and dirty dithering, one could greatly improve image quality by using the Floyd–Steinberg algorithm or similar (it would require more processing and more memory).

As always, the code is on GitHub.

Have fun!

]]>
Building High-Assurance Software without Breaking the Bank http://blog.adacore.com/formal-methods-webinar-building-high-assurance-software-without-breaking-the-bank Tue, 06 Dec 2016 05:00:00 +0000 AdaCore Admin http://blog.adacore.com/formal-methods-webinar-building-high-assurance-software-without-breaking-the-bank

AdaCore will be hosting a joint webcast next Monday 12th December 2pm ET/11am PT with SPARK experts Yannick Moy and Rod Chapman. Together, they will present the current status of the SPARK solution and explain how it can be successfully adopted in your current software development processes.

Attendees will learn:

  • How to benefit from formal program verification
  • Lessons learned from SPARK projects
  • How to integrate SPARK into existing projects
  • Where to learn about SPARK
  • Why "too hard, too costly, too risky" is a myth

The late computer scientist Edsger Dijkstra once famously said "Program testing can be used to show the presence of bugs, but never to show their absence." This intrinsic drawback has become more acute in recent years, with the need to make software "bullet proof" against increasingly complex requirements and pervasive security attacks. Testing can only go so far. Fortunately, formal program verification offers a practical complement to testing, as it addresses security concerns while keeping the cost of testing at an acceptable level.

Formal verification has a proven track record in industries where reliability is paramount, and among the available technologies, the SPARK toolset features prominently. It has been used successfully for developing high confidence software in industries including Aerospace, Defense, and Air Traffic Management. SPARK tools can address specific requirements for robustness (absence of run-time errors) and functional correctness (contract-based verification) that are relevant in critical systems, including those that are subject to certification requirements.

You can join us for this webinar by registering here.


]]>
Make With Ada Winners Announced! http://blog.adacore.com/make-with-ada-winners-announced Tue, 29 Nov 2016 05:00:00 +0000 AdaCore Admin http://blog.adacore.com/make-with-ada-winners-announced

Judging for the first annual Make with Ada competition has come to an end and we can now reveal the results.

Huge congratulations to Stephane Carrez who came in 1st place this year, gaining a prize of €5000, with his EtherScope project!

The EtherScope monitoring tool analyses Ethernet traffic and can read network packets (TCP, UDP, IGMP, etc.), performs real-time analysis, and displays the results on a 480x272 touch panel. It runs on a STM32F746 micro-controller from STMicroelectronics and its interface can filter results at different levels and report various types of information. 

All sources for the application are available on Github here


In 2nd place, winning €2000, is German Rivera with his Autonomous Car Framework! The framework was developed in Ada 2012 for developing control software for the NXP cup race car. The system is made up of a ‘toy size’ race car chassis, paired with a FRDM-KL25Z board and a TFC-shield board. The car kit also has two DC motors for the rear wheels, a steering servo for the front wheels and a line scan camera.

In 3rd place, winning €1000, is Shawn Nock with his Bluetooth Beacons project! His “iBeacon” project targeted a Nordic Semiconductor nRF51822 System-on-a-Chip (a Cortex-M0 part with integrated 2.4GHz Radio) with a custom development board.    

The two runners up received a Crazyflie 2.0 nano drone each as special prizes. The Ada Lovelace Special Prize was awarded to Sébastien Bardot for inventiveness with his explorer and mapper robot.

The Robert Dewar Special Prize was awarded to German Rivera for dependability with his IoT Networking Stack for the NXP FRDM-K64F board.

Participants had to design an embedded software project for an ARM Cortex M or R processor, using Ada and/or SPARK as the main programming language and put it into action. 

Well done to all the winners and participants, we received some great projects and got the chance to review some interesting ideas this year.

To keep up to date with future competitions and all things Ada, follow @adaprogrammers on Twitter or visit the official Make with Ada website at www.makewithada.org.

]]>
GNATprove Tips and Tricks: a Lemma for Sorted Arrays http://blog.adacore.com/gnatprove-tips-and-tricks-a-lemma-for-sorted-arrays Mon, 28 Nov 2016 05:00:00 +0000 Sylvain Dailler http://blog.adacore.com/gnatprove-tips-and-tricks-a-lemma-for-sorted-arrays

A few weeks ago, we began, working on a new feature that might help users to prove the correctness of their programs manipulating sorted arrays. We report on the creation of the first lemma of a new unit on arrays in the SPARK lemma library

The objective is to provide users with a way to prove some complex properties on programs (complex for SMT-based automatic provers). The lemmas are written as ghost procedures with no side-effects. They are proved in SPARK and their postcondition can be reused by provers by calling the procedure.

In particular, our first lemma is on the transitivity of the order in arrays. It is used to refine this (simplified) assertion on an array Arr that states that the array is sorted by comparing adjacent elements:

(for all I in Arr'Range =>
  (if I /= Arr'First then
    Arr (I - 1) < Arr (I)))

into this equivalent expression that the array is sorted by comparing any two elements:

(for all I in Arr'Range =>
  (for all J in Arr'Range =>
    (if I < J then Arr (I) < Arr (J))))

This new condition is logically equivalent to the first one, but in practice it can be used by SMT provers when we need to compare two elements of the sorted array that are not adjacent. In some rare cases, provers like CVC4 ans Z3 are able to prove that the two conditions are equivalent, but this requires doing inductive reasoning which is very partially supported in such provers. So, in general, SMT provers are just unable to prove it. This is the reason for us providing this new lemma, that we proved once and for all (using the Coq proof assistant).

How to use it:

First, import the spark_lemmas library into your project file.

with "spark_lemmas";

To use this lemma on an array Arr, you first need to instantiate the package SPARK.Unconstrained_Array_Lemmas with your own array index type (Index_Type), array element type (Element_T), ordering function on Element_T (Less) and create an unconstrained array type as follow:

type A is array (T_Index range <>) of T_Value;

You can now instantiate the generic package:

package Test is new SPARK.Unconstrained_Array_Lemmas
     (Index_Type => Index_Type,
      Element_T  => Element_T,
      A          => A,
      Less       => "<");

To use it on an array Arr, you only need to cast your array type into A and apply the lemma:

Test.Lemma_Transitive_Order (A (Arr));

Safety conditions:

Note that this lemma relies on a meta argument of correctness on the types and function we use to instantiate it. For usability reasons, we used a SPARK_Mode => Off on the library code so that users are not polluted with Coq proved conditions (also they would have needed to install Coq to get a complete proof). We proved this lemma completely in Coq for instances of the types and transitive order. And because our proof does not depend on the size and bounds of the types, which are the only semantic differences visible from the lemma implementation, user instantiated lemmas should also be correct.

So, if you are a user of SPARK and find difficult proofs in your code that would fit in a library like this one, please contact us, we might be able to provide lemmas for your problems on some cases.

]]>
Integrate new tools in GPS (2) http://blog.adacore.com/integrate-new-tools-in-gps-2 Tue, 22 Nov 2016 10:23:00 +0000 Emmanuel Briot http://blog.adacore.com/integrate-new-tools-in-gps-2

Customizing build target switches

In the first post in this series (Integrate new tools in GPS) we saw how to create new build targets in GPS to spawn external tools via a menu or toolbar button, and then display the output of that tool in its own console, as well as show error messages in the Locations view.

The customization left to the user is however limited in this first version. If the build target's Launch Mode was set to Manual With Dialog, a dialog is displayed when the menu or button is selected. This dialog lets users edit the command line just before the target is executed.

The next time that the dialog is open, it will have the modified command line by default.

This is however not very user friendly, since the user has to remember all the valid switches for the tool. For the standard tools, GPS provides nicer dialogs with one widget for each valid switch, so let's do that for our own tool.

Whenever we edit targets interactively in GPS, via the /Build/Settings/Targets menu, GPS saves the modified build target in a local file, which takes precedence over the predefined targets and those defined in plug-ins. This ensure that your modifications will always apply, but has the drawback that any change in the original target is not visible to the user.

So let's remove the local copy that GPS might have made of our build target. The simplest, for now, is to remove the file

  • Windows:  %USER_PROFILE%\.gps\targets.xml
  • Linux and Mac: ~/.gps/targets.xml

Describing the switches

We now have to describe what are the valid switches for our tool.

For a given tool (for instance my_style_checker), we could have multiple build targets. For instance:

  • one that is run manually by the user via a toolbar button
  • and another build target with a slightly different command line that is run every time the user saves a file.

To avoid duplication, GPS introduces the notion of a target model. This is very similar to a build target, but includes information that is shared by all related build targets. We will therefore modify the plugin we wrote in the first part to move some information to the target model, and in addition add a whole section describing the valid switches.

import GPS
GPS.parse_xml("""
    <target-model name="my_style_checker_model">
        <description>Basis for all targets that spawn my_style_checker</description>
        <iconname>gps-custom-build-symbolic</iconname>
        <switches command="my_style_checker" columns="2" lines="1">
           <title column="1" line="1">Styles</title>
           <check label="Extended checks"
                  switch="--extended"
                  tip="Enable more advanced checks"
                  column="1" line="1" />
           <check label="Experimental checks"
                  switch="--experimental"
                  column="1" line="1" />
           <combo switch="--style" separator="=" noswitch="gnat">
              <combo-entry label="GNAT" value="gnat" />
              <combo-entry label="Style 1" value="style1" />
              <combo-entry label="Style 2" value="style2" />
           </combo>

           <title column="2" line="1">Processing</title>
           <spin label="Multiprocessing"
                 switch="-j"
                 min="0"  max="100"  default="1"
                 column="2" line="1" />
        </switches>
    </target-model>

    <target model="my_style_checker_model"
            category="File"
            name="My Style Checker">
        <in-toolbar>TRUE</in-toolbar>
        <in-menu>TRUE</in-menu>
        <launch-mode>MANUALLY_WITH_DIALOG</launch-mode>
        <command-line>
           <arg>my_style_checker</arg>
           <arg>%F</arg>
        </command-line>
        <output-parsers>
           output_chopper
           utf_converter
           location_parser
           console_writer
           end_of_build
        </output-parsers>
    </target>
""")

Let's go over the details.

  • on line 3, we create our target model. To avoid confusion, we give it a name different from that of the build target itself. This name will be visible in the /Build/Settings/Targets dialog, when users interactively create new targets.
  • on line 6, we describe the set of switches that we want to make configurable graphically. The switches will be organized into two columns, each of which contains one group of switches (a "line" in the XML, for historical reasons).
  • on line 7, we add a title at the top of the left column (column "1").
  • on line 8 and 12, we add two check boxes. When they are selected, the corresponding switch is added to the command line. On the screenshot below, you can see that "Extended checks" is enabled, and thus the "--extended" switch has been added. These two switches are  also in column "1".
  • on line 15, we add a combo box, which is a way to let users choose one option among many. When no switch is given on the command line, this is the equivalent of "--switch=gnat". We need to specify that an extra  equal sign needs to be set between the switch and its value.
  • on line 22, we add a spin box, as a way to select a numerical value. No switch on the command line is the equivalent of "-j1".
  • on line 29, we alter the definition of our build target by referencing our newly created target model, instead of the previous "execute", and then remove duplicate information which is already available in the model.

At this point, when the user executes the build target via the toolbar button, we see the following dialog that let's users modify the command line more conveniently.

 The same switches are visible in the /Build/Settings/Targets dialog

Conclusion

In this post, we saw how to further integrate our tool in GPS, by letting users set the command line switches interactively.

We have now reached the limits of what is doable via just build targets. Later posts will extend the plugin with python code, to add support for preferences and custom menus. We will also see how to chain multiple actions via workflows (save all files, compile all code, then run the style checker for instance).

]]>
Automatic Generation of Frame Conditions for Array Components http://blog.adacore.com/automatic-generation-of-frame-conditions-for-array-components Mon, 21 Nov 2016 05:00:00 +0000 Claire Dross http://blog.adacore.com/automatic-generation-of-frame-conditions-for-array-components

One of the most important challenges for SPARK users is to come up with adequate contracts and annotations, allowing GNATprove to verify the expected properties in a modular way. Among the annotations mandated by the SPARK toolset, the hardest to come up with are probably loop invariants.

As explained in a previous post, loop invariants are assertions that are used as cut point by the GNATprove tool to verify loop statements. In general, it is necessary to explicit in a loop invariant every information computed in the loop statement, or only the information computed in the last iteration will be available for proof.

A good loop invariant not only needs to describe how parts of variables modified by the loop evolve during the loop execution, but should also explicitly state the preservation of unmodified parts of modified variables. This last part is often forgotten, as it seems obvious to developers.

previous post explains how GNATprove can automatically infer loop invariants for preservation of unmodified record components, and so, even if the record is itself nested inside a record or an array. Recently, this generation was improved to also support the simplest cases of partial array updates.

For a loop that updates an array variable (or an array component of a composite variable) by assigning to one (or more) of its indexes, GNATprove can infer preservation of unmodified fields in two cases:

  • If an array is assigned at an index which is constant through the loop iterations, all the other components are preserved.
  • If an array is assigned at the loop index, all the following components (or preceding components if the loop is reversed) are preserved.

This can be demonstrated on an example. Let us consider the following loop which swaps lines 5 and 7 of a square matrix of size 10:

type Matrix is array (Positive range <>, Positive range <>) of Natural;
   M : Matrix (1 .. 10, 1 .. 10) := (5      => (others => 1),
                                     7      => (others => 2),
                                     others => (others => 0));
   Tmp : Natural;

   for C in M'Range (2) loop
      pragma Loop_Invariant (for all I in M'First (2) .. C - 1 =>
                               M (5, I) = M'Loop_Entry (7, I)
                             and M (7, I) = M'Loop_Entry (5, I));
      Tmp := M (5, C);
      M (5, C) := M (7, C);
      M (7, C) := Tmp;
   end loop;

   pragma Assert (for all I in 1 .. 10 =>
                    (for all J in 1 .. 10 =>
                       (if I = 5 then M (I, J) = 2
                        elsif I = 7 then M (I, J) = 1
                        else M (I, J) = 0)));

We only describe in the invariant the effect of the computation on the modified part of M, that is, that it has effectively swapped the two lines up to index C. To discharge the following assertion, we also need to know that elements stored at other lines are preserved by the loop. As the first array index in the assignments is constant, GNATprove infers an invariant stating that components M (I, J) are not modified if I is not 5 or 7:

 pragma Loop_Invariant (for all I in M'Range (1) =>
                              (if I /= 5 and I /= 7 then
                                (for all J in M'Range (2) =>
                                   M (I, J) = M'Loop_Entry (I, J))));

But it is not enough to verify our example. We also need to know that, at line 5 and 7, components stored after column C have not been modified yet, or we won't be able to verify the loop invariant itself. Luckily, as the second index of the assignments is the loop index, GNATprove can also infer this invariant, coming up with the additional:

      pragma Loop_Invariant (for all I in M'Range (1) =>
                              (if I = 5 or I = 7 then
                                (for all J in C .. M'Last (2) =>
                                   M (I, J) = M'Loop_Entry (I, J))));

The above example illustrates a case where GNATprove is able to successfully infer adequate loop invariants, leaving the user with only the most meaningful part to write. However, loop invariant generation for preservation of array components in GNATprove relies on heuristics, which makes it unable to deal with more complex cases. As an example, let us consider a slight variation of the above, which does not iterate directly on the column number, but introduces a shift of 1:

   M : Matrix (1 .. 10, 1 .. 10) := (5      => (others => 1),
                                     7      => (others => 2),
                                     others => (others => 0));
   Tmp : Natural;

   for C in M'First (2) - 1 .. M'Last (2) - 1 loop
      pragma Loop_Invariant (for all I in M'First (2) .. C =>
                               M (5, I) = M'Loop_Entry (7, I)
                             and M (7, I) = M'Loop_Entry (5, I));
      Tmp := M (5, C + 1);
      M (5, C + 1) := M (7, C + 1);
      M (7, C + 1) := Tmp;
   end loop;

   pragma Assert (for all I in 1 .. 10 =>
                    (for all J in 1 .. 10 =>
                       (if I = 5 then M (I, J) = 2
                        elsif I = 7 then M (I, J) = 1
                        else M (I, J) = 0)));

Like in the previous case, GNATprove can infer that lines different from 5 and 7 are preserved. Unfortunately, as the second index in the update is now a variable computation, it cannot come up with the second part of the invariant, and reports a failed attempt at verifying the loop invariant. The counter-example provided by GNATprove informs us that the verification could not succeed at the second line of the invariant (at line 5), for index I = 1, when updating column 2 (C = 1). Clearly, we are missing information about the preservation of components following column C at lines 5 and 7. If we add this invariant manually:

 pragma Loop_Invariant (for all I in C + 1 .. M'Last (2) =>
                                 M (5, I) = M'Loop_Entry (5, I)
                               and M (7, I) = M'Loop_Entry (7, I));

the proof is achieved again.

Another case for which GNATprove's heuristics are not precise yet, is relational updates. As an example, consider the following loop which increments the diagonal of a matrix:

 M : Matrix (1 .. 10, 1 .. 10) := (others => (others => 0));

   for I in M'Range (1) loop
      pragma Loop_Invariant (for all K in M'First (1) .. I - 1 =>
                               M (K, K) = M'Loop_Entry (K, K) + 1);
      M (I, I) := M (I, I) + 1;
   end loop;

   pragma Assert (for all I in 1 .. 10 =>
                    (for all J in 1 .. 10 =>
                       (if I = J then M (I, J) = 1
                        else M (I, J) = 0)));

Here again, we only specify the interesting part of the invariant, namely, that the elements encountered so far on the diagonal of the matrix have been incremented by one. Since we are updating the matrix exactly at the loop index, we could hope that GNATprove would be able to infer the appropriate invariant. Unfortunately, this is not the case, as the tool reports a failed attempt at verifying the assertion following the loop. Note that the loop invariant itself is proved, which means that GNATprove could infer that, at each iteration, the indexes following the current index had not been modified yet. So, what went wrong? The counter-example provided by GNATprove, here again, can be of some help. It precises that the verification was not completed when I is 8 and J is 9. Unsurprisingly, it means that the tool was not able to infer an invariant stating that elements outside the diagonal but located on columns smaller than the current index were preserved by the loop. Indeed, this would require handling specifically the case were both indexes of an update are the same value, which GNATprove does not do yet. If the following invariant is added manually to the loop, the verification succeeds:

 pragma Loop_Invariant (for all K in M'First (1) .. I - 1 =>
                                 (for all H in M'First (1) .. I - 1 =>
                                      (if K /= H then
                                            M (K, H) = M'Loop_Entry (K, H))));
GNATprove also handles preserved array components of components of record or array objects, as well as some cases of updates through a slice. See the corresponding section in the user guide for more information.
]]>
GNATprove Tips and Tricks: What’s Provable for Real Now? http://blog.adacore.com/gnatprove-tips-and-tricks-whats-provable-for-real-now Thu, 17 Nov 2016 05:00:00 +0000 Yannick Moy http://blog.adacore.com/gnatprove-tips-and-tricks-whats-provable-for-real-now

One year ago, we presented on this blog what was provable about fixed-point and floating-point computations (the two forms of real types in SPARK). I used as examples codes computing approximations of Pi using the Leibniz formula and a refinement of it through the Shanks transformation. GNATprove managed to prove the exact result of the computation in only one case, the simple algorithm implemented with fixed-point values.

Since then, we have integrated static analysis in SPARK. Static analyzer CodePeer can be called as part of SPARK analysis using the switch --codepeer=on. CodePeer analysis is particularly interesting when analyzing code using floating-point computations, as CodePeer is both fast and precise for proving bounds of floating-point operations. For more details about this feature, see the SPARK User's Guide.

During the last year, we have also modified completely the way floating-point numbers are seen by SMT provers. In particular, we are now using the native support for floating-point numbers in prover Z3. So Z3 can reason precisely about floating-point computations, instead of relying on a partial axiomatization like CVC4 and Alt-Ergo still do. Improved support for floating-point computations is also expected to be available in CVC4 and Alt-Ergo in 2017.

Both of these features lead to dramatic changes in provability for code doing fixed-point and floating-point computations. On the example codes that I presented last year, GNATprove manages to prove all the results of the four computations (with fixed-points or floating-points, with Leibniz or Shanks) when CodePeer is used (without Z3). GNATprove manages to prove three of the four computations when Z3 is used (without CodePeer). The one that is not proved is the computation of the Shanks algorithm implemented using fixed-point values. In fact it is not provable with Z3 because this algorithm involves a division between fixed-point values that needs to be rounded, and the Ada standard specifies that either the fixed-point value above or the fixed-point value below is a suitable rounding. In SPARK, we strive to stay independent of such compiler-specific implementation choices, so we don't specify which value is chosen between the two different results. The reason why CodePeer manages to prove it is because CodePeer follows the choices made by GNAT in this case, as documented in the SPARK User's Guide. In addition, using either CodePeer or Z3 is sufficient to prove all the other run-time checks in the example codes (range checks, overflow checks and divisions by zero checks).

These features were included in the preview release SPARK 17.0, and initial feedback from customers show that in some cases they do improve the results of proof on code doing fixed-point and floating-point computations. In some other cases, we have witnessed regressions in proof due to the fact CVC4 and Alt-Ergo now prove fewer checks. We are actively working to revert these regressions, and we think that the new combination of analyses will lead to large improvements for proof on such codebases in the future.

]]>
Integrate new tools in GPS http://blog.adacore.com/integrate-new-tools-in-gps Tue, 15 Nov 2016 15:10:03 +0000 Emmanuel Briot http://blog.adacore.com/integrate-new-tools-in-gps

Integrate external tools in GPS

The GNAT Programming Studio is a very flexible platform. Its goal is to provide a graphical user interface (GUI) on top of existing command line tools. Out of the box, it integrates support for compilers (via makefiles or gprbuild) for various languages (Ada, C, C++, SPARK, Python,...), verification tools (like codepeer), coverage analysis for gcov and gnatcoverage, version control systems (git, subversion,...), models (via QGEN) and more.

But it can't possibly integrate all the tools you are using daily.

This blog describes how such tools can be integrated in GPS, from basic integration to more advanced python based plugins. It is the first in a series that will get your started in writing plug-ins for GPS.

Basic integration: build targets

Every time GPS spawns an external tool, it does this through what is called
a Build Target. They can be edited via the /Build/Settings/Targets menu (see also the online documentation)

Build Targets dialog

On the left of the dialog, you will see a list of all existing build targets, organized into categories. Clicking on any of these shows, in the right panel, the command line used to execute that target, including the name of the tool, the switches,...

The top of the right panel describes how this build targets integrates in GPS. For instance, the Display Target section tells GPS whether to add a  button to the main toolbar, or an entry in the /Build menu or perhaps in some of the contextual menus.

Selecting any of the resulting toolbar buttons or menus will either execute the action directly, no question asked (if the Launch Mode is set to "Manually with no dialog"), or display a dialog to let users modify the switches for that one specific run (if the Launch Mode is set to "Manually with Dialog"). It is even possible to execute that build target every time the user saves a file, which is useful if you want to run some kind of checks.

Creating a new target

To create a new build target, click on the [+] button on the left. This pops up a small dialog asking for the name of the target (as displayed in GPS), a target model (which presets a number of things for the target -- I recommed using "execute -- Run an executable" as a starting point), and finally the category in which the target is displayed.


Create new target

Click on [OK].

In the right panel, you can then edit the command line to run, where the target should be displayed, and how it is run.


Edit target properties

The command line supports a number of macros, like %F, that will be replaced with information from the current context when the target is run. For instance, using "%F" will insert the full path name for the current file. Hover the mouse on the command line field to see all other existing macros.

Press [OK] when you are done modifying all the settings.

(On some versions of GPS, you need to restart GPS before you can execute the build target, or macro expansion will not be done properly)

Target added to toolbar
Target added to menu

Running the target

When you select either the toolbar button or the menu, GPS will open a new console and show the output of your tool. In this example, we have created a simple program that outputs a dummy error on the first line of the file passed in argument.

Running the target

Advanced integration via target models

As you saw in the previous screenshot, the result of running our custom command is that its output is sent to a new window named "Run". But nothing is clickable. In our example, we would like users to be able to click on the location and have GPS display the source location.

This is no doable just via the GUI, so we'll need to write a small plugin. Let's remove the target we created earlier, by once again opening the /Build/Settings/Targets dialog, selecting our new target, and clicking on [-].

Let's then create a new file

  • Windows: %USER_PROFILE%\.gps\plug-ins\custom_target.py
  • Linux and Mac: ~/.gps/plug-ins/custom_target.py

The initial contents of this file is:

import GPS
GPS.parse_xml("""
    <target model="execute" category="File" name="My Style Checker">
        <in-toolbar>TRUE</in-toolbar>
        <in-menu>TRUE</in-menu>
        <launch-mode>MANUALLY</launch-mode>
        <iconname>gps-custom-build-symbolic</iconname>
        <command-line>
            <arg>my_style_checker</arg>
            <arg>%F</arg>
        </command-line>
        <output-parsers>
           output_chopper
           utf_converter
           location_parser
           console_writer
           end_of_build
        </output-parsers>
   </target>
 """)

This file is very similar to what the GUI dialog we used in the first part did (the GUI created a file ~/.gps/targets.xml or %USER_PROFILE%\.gps\targets.xml). See the online documentation.

One major difference though is the list of output parsers on line 12. They tell GPS what should be done with whatever the tool outputs. In our case:

  1. we make sure that the output is not split in the middle of lines (via "output_chopper");
  2. we convert the output to UTF-8 (via "utf_converter");
  3. more importantly for us, we then ask GPS, with "location_parser" to detect error messages and create entries in the Locations view for them, so that users can click them;
  4. we also want to see the full output of the tool in its own console ("console_writer");
  5. Finally, we let GPS performs various cleanups with "end_of_build". Do not forget this last one, since it is also responsible for expanding the %F macro.

The full list of predefined output parsers can be found in the online documentation.

If we then restart GPS and execute our build target, we now get two consoles showing the output of the tool, as happens for compilers for instance.

Locations view

Conclusion

With the little plugin we wrote, our tool is now much better integrated in GPS. We have

  • menus to spawn the tool, with arguments that depend on the current context
  • display the tool's output in a new console
  • make error messages clickable by the user to show the proper sources

In later posts, we will show how to make this more configurable, via custom GPS menus and preferences. We will also explain what workflows are and how they can be used to chain multiple commands.

]]>
Driving a 3D Lunar Lander Model with ARM and Ada http://blog.adacore.com/3d-lunar-lander-model Thu, 10 Nov 2016 14:00:00 +0000 Pat Rogers http://blog.adacore.com/3d-lunar-lander-model

One of the interesting aspects of developing software for a bare-board target is that displaying complex application-created information typically requires more than the target board can handle. Although some boards do have amazing graphics capabilities, in some cases you need to have the application on the target interact with applications on the host. This can be due to the existence of special applications that run only (or already) on the host, in particular.

For example, I recently created an Ada driver for a 9-DOF inertial measurement unit (IMU) purchased from AdaFruit. This IMU device takes accelerometer, magnetometer, and gyroscope data inputs and produces fused values indicating the absolute orientation of the sensor in 3D space. It is sensor-fusion on a chip (strictly, a System in Package, or SiP), obviating the need to write your own sensor fusion code. You simply configure the sensor to provide the data in the format desired and then read the Euler angles, quaternions, or vectors at the rate you require.  I plan to use this orientation data in a small robot I am building that uses an STM32 Discovery board for the control system.

The device is the "AdaFruit 9-DOF Absolute Orientation IMU Fusion Breakout - BNO055" that puts the Bosch BO055 sensor chip on its own breakout board, with 3.3V regulator, logic level shifting for the I2C pins, and a Cortex-M0 to do the actual sensor data fusion.

The breakout board containing the BNO055 sensor and Cortex-M0 processor, courtesy AdaFruit
The bottom of the BNO055 breakout board and a quarter coin, for comparison, courtesy AdaFruit

See https://www.adafruit.com/produ... for further product details.

So I have easy access to fused 3D orientation data but I don't know whether those data are correct -- i.e., whether my driver is really working. I could just display the values of the three axes on the target's on-board LCD screen but that is difficult to visualize in general. 

Again, AdaFruit provides a much more satisfying approach. They have a demonstration for their IMU breakout board that uses data from the sensor to drive a 3D object modeled on the host computer. As the breakout board is rotated, the modeled object rotates as well, providing instant (and much more fun) indication of driver correctness.

The current version of the demonstration is described in detail here, using a web-based app to display the model.  I got an earlier version that uses the "Processing" application to display a model, and instead of using their 3D model of a cat I use a model of the Apollo Lunar Excursion Module (LEM). 

The LEM model is available from NASA, but must be converted to the "obj" model data format supported by the Processing app.  

The Processing app is available here for free. Once downloaded and installed on the host, Processing can execute programs -- "sketches" -- that can do interesting things, including displaying 3D models.  The AdaFruit demo provided a sketch for displaying their cat model. I changed the hard-coded file name to specify the LEM model and changed the relative size of the model, but that was about all that was changed.

The 3D LEM model displayed by the Processing app

The sketch gets the orientation data from a serial port, and since the sensor breakout board is connected to an STM32 Discovery board, we need that board to communicate over one of the on-board USART ports. The Ada Drivers Library includes all that is necessary for doing that, so the only issue is how to connect the board's USART port to a serial port on the host. 

For that purpose I use a USB cable specifically designed to appear as a serial port on the host (e.g., a COM port on Windows). These cables are available from many vendors, including Mouser:

  • Mouser Part No:  895-TTL-232R-5V
  • Manufacturer Part No:  TTL-232R-5V
  • Manufacturer:   FTDI

but note that the above is a 5-volt cable in case that is an issue.  There are 3V versions available.

The end of the cable is a female header, described in the datasheet (DS_TTL-232R_CABLES-217672.pdf).   Header pin 4 on the cable is TXD, the transmit data output. Header pin 5 on the cable is RXD, the receive data input.  The on-board software I wrote that sends the sensor data over the port uses specific GPIO pins for the serial connection, thus I connected the cable header pins to the STMicro board's GPIO pins as follows:

  • header pin 1, the black wire's header slot, to a ground pin on the board
  • header pin 4, the orange wire's header slot, to PB7
  • header pin 5, the yellow wire's header slot, to PB6

On the host, just plug in the USB-to-serial cable. Once the cable is connected it will appear like a host serial port and can be selected within the Processing app displaying the model.  Apply power to the board and the app will start sending the orientation data. 

The breakout board, STM32F429 Discovery board, and USB serial cable connections are shown in the following image. (Note that I connected a green wire to the USB cable's orange header wire because I didn't have an orange connector wire available.)

When we rotate the breakout board the LEM model will rotate accordingly, as shown in the following video:

To display the LEM model, double-click on the "lander.pde" file to invoke the Processing app on that sketch file. Then press the Run button in the Processing app window.  That will bring up another window showing the LEM model. 

In the second window showing the lander, there is a pull-down at the upper left for the serial port selection.  Select the port corresponding to the USB cable attached to the STM32 board's serial port. That selection will be recorded in a file named "serialconfig.txt" located in the same directory as the model so that you don't have to select it again, unless it changes for some reason.

Note that in that second window there is a check-box labeled "Print serial data."  If you enable that option you will see the IMU data coming from the breakout board via the serial port, displayed in the main Processing app window. That data includes the current IMU calibration states so that you can calibrate the IMU (by moving the IMU board slowly along three axes). When all the calibration values are "3" the IMU is fully calibrated, but you don't need to wait for that -- you can start rotating the IMU board as soon as the model window appears.

The Ada Drivers Library is available here.

The Ada source code, the 3D LEM model, and the Processing sketch file for this example are available here.

]]>
Research Corner - SPARK on Lunar IceCube Micro Satellite http://blog.adacore.com/research-corner-spark-on-lunar-icecube-micro-satellite Wed, 09 Nov 2016 05:00:00 +0000 Yannick Moy http://blog.adacore.com/research-corner-spark-on-lunar-icecube-micro-satellite

This group from Vermont Technical College is famous for being the only ones whose CubeSat micro satellite (out of 12 university CubeSats) remained functional during two years, when the second best team only achieved four months of functioning time. They already used SPARK for this first project, and credited the use of SPARK for being essential in getting perfect software on board. You may also have seen their CubeSat on the cover of the SPARK 2014 book, which is not a surprise as Peter Chapin from this group is one of the authors!

Thanks to this past success, NASA has selected this group to develop the flight software for a much bigger system, a micro satellite that is 6 times bigger and which involves 6 partners, whose mission is to map water vapor and ice on the moon. In this project too they will use SPARK to help them get perfect software. You can see the details in their paper or their presentation at HILT 2016, just a few weeks ago. Interestingly, one of the hurdles that they mention in this paper, having to do with the restrictions of the Ravenscar profile of Ada (XDR encoding/decoding of messages), has recently been lifted in SPARK: SPARK now supports the Extended Ravenscar profile whose goal is precisely to allow the kind of code that will be useful for the Lunar IceCube mission.

If you are interested in following this project, look at their website. The Lunar IceCube will be launched in 2018.

]]>
Verifying Tasking in Extended, Relaxed Style http://blog.adacore.com/verifying-tasking-in-extended-relaxed-style Mon, 07 Nov 2016 05:00:00 +0000 Piotr Trojanek http://blog.adacore.com/verifying-tasking-in-extended-relaxed-style

Tasking was one of the big features introduced in the previous release of SPARK 2014. However, GNATprove only supported tasking-related constructs allowed by the Ravenscar profile. Now it also supports the more relaxed GNAT Extended Ravenscar profile. The GNAT Reference Manual already documents how the new profile is different from the old one and why you might like it. Here we explain how this new profile might affect your SPARK code.

First, in the GNAT Extended Ravenscar you are no longer restricted to one entry per protected type. In particular, you can now directly implement message stores with multiple consumers: each consumer can wait for a specific kind of messages by blocking on its own entry.

Expressions in entry barriers are no longer restricted to simple Boolean variables. Now they might involve simple protected variables, literals, and predefined relational and logical operators (e.g. "<", "/=", "and"). This is more relaxed, but the new restrictions still guarantee that the evaluation of a barrier expression will raise no runtime errors and will not have side effects. (That's why they are called "pure.") Tip: if strict Ravenscar forced you to have a barrier variable "Non_Empty" then now you can rename it to a more natural "Empty" and write the barrier expression as "not Empty."

Relative delay statements like "delay 1.0" are now allowed. They are much less cumbersome than the strict Ravenscar pattern:

   declare
      Now : constant Ada.Real_Time.Time := Ada.Real_Time.Clock;
   begin
      delay until Now + Seconds (1);
   end;

Finally, you are free to use Ada.Calendar package now, which (as suggested in the GNAT Reference Manual) might be handy for log messages. The relaxed profile also allows "delay until" statements where the expression is of the type Ada.Calendar.Time, not Ada.Real_Time.Time. But here is a detail: both Ada.Calendar and Ada.Real_Time have their own abstract states called Clock_Time. It is because Ada RM says that bases for Ada.Calendar and Ada.Real_Time are not necessarily the same (and SPARK conservatively assumes they are not). Of course, you may ask GNATprove to confirm that you are using the right clock: just put the expected Clock_Time state in your Global contract.

For details on how to enable the new profile and for general introduction to concurrency in SPARK please visit the User's Guide

]]>
Simplifying our product versioning http://blog.adacore.com/simplifying-our-product-versioning Fri, 04 Nov 2016 09:35:00 +0000 Olivier Ramonat http://blog.adacore.com/simplifying-our-product-versioning

Looking at the list of product versions that were expected for 2017 it became clear that we had to review the way we were handling product versioning. Was it so useful to have all these different versions: 1.9.0, 4.9.0, 2.11.0, 2.4.0, 7.5.0, 3.9.0, 17.0.0, 6.3.0, 1.7.0, 3.4.0, 1.4.0, 1.5.0, 3.2.0, 1.2.0, 1.9, 2.13.0, and 2.2.0? Was it worth the cost? Did we really know which product had its version bumped to 3.9.0?

AdaCore products are released on a time based cycle, with the preview release in October, a major release in February, and a corrective version in July. One sensible solution is to choose a date versioning scheme using the year for the major version number, then the minor version for our release increment, and no micro version.

Starting with the 17.0 preview release all AdaCore products will have a unified version number.

By the way, can you guess what products all these version numbers above refer to?

]]>
SPARK 2014 Rationale: Support for Type Invariants http://blog.adacore.com/spark-2014-rationale-support-for-type-invariants Wed, 12 Oct 2016 04:00:00 +0000 Claire Dross http://blog.adacore.com/spark-2014-rationale-support-for-type-invariants

Type invariants are used to model properties that should always hold for users of a data type but can be broken inside the data type implementation. Type invariant are part of Ada 2012 but were not supported in SPARK until SPARK Pro 17.

To demonstrate how they can be used, let us consider an implementation of binary trees as an example. As GNATprove does not support access types, we model them using indexes inside an array.

package Binary_Trees is
  type Index_Type is range 1 .. Max;
   subtype Extended_Index_Type is Index_Type'Base range 0 .. Max;
   type Position_Type is (Left, Right, Top);

   type Tree is private;

private

   type Cell is record
      Left, Right, Parent : Extended_Index_Type := 0;
      Position            : Position_Type := Top;
   end record;

   type Cell_Array is array (Index_Type) of Cell;

   type Tree is record
      Top : Extended_Index_Type := 0;
      C   : Cell_Array;
   end record;
end Binary_Trees;

Each cell contains the index of its right and left child, as well as the index of its parent. This index is 0 is the cell has no left or right child or no parent. It also contains a position which can be Top for the root of the tree and Left or Right for the other nodes, depending on whether they are left or right children in the tree structure. A tree contains an array of cells as well as the index of the tree root.

There are properties that are imposed on the record fields by the tree structure. These properties are required for a the record to represent a valid binary tree structure. For example, the root must have position Top and no parent:

 (if Top /= 0 then C (Top).Parent = 0
       and then C (Top).Position = Top)

the left child of a node I must have position Left and parent I:

    (for all I in Index_Type =>
         (if C (I).Left /= 0
          then C (C (I).Left).Position = Left
              and then C (C (I).Left).Parent = I))

All these properties represent an invariant over the structure. They can be grouped together in an expression function which can then be attached to the full view of Tree using a Type_Invariant aspect:

 type Tree is record
      Top : Extended_Index_Type := 0;
      C   : Cell_Array;
   end record
     with Type_Invariant => Tree_Structure (Tree);

   function Tree_Structure (T : Tree) return Boolean is
     ((if T.Top /= 0 then T.C (T.Top).Parent = 0
       and then T.C (T.Top).Position = Top)
      and then
        (for all I in Index_Type =>
             (if T.C (I).Left /= 0
              then T.C (T.C (I).Left).Position = Left
                and then T.C (T.C (I).Left).Parent = I))
      and then
        ...

In spirit, this means that Tree_Structure must always return True on objects of type Tree visible from outside Binary_Trees. To ensure this property, GNATprove enforces restrictions on subprograms working on trees depending on where they are declared. If the subprogram is private, like Tree_Structure, no invariant checks are required for its parameters, neither on entry nor on exit of the subprogram. In the invariant expression, only private functions should be used so as to avoid any circularity.

If the subprogram is declared outside of Binary_Trees or if it is declared in the public part of Binary_Trees, then the invariant must hold for its input in entry of the subprogram and for its outputs in exit of the subprogram. For example, let us consider the Insert function which inserts a new node into a tree. Let us assume this is a boundary function for Tree, that is, it is declared in the public part of the specification of the package Binary_Trees in which Tree is declared:

 procedure Insert (T : in out Tree; I : Index_Type; D : Direction);

The invariant is required to hold on input T when entering Insert. GNATprove will check Tree's invariant every time Insert is called inside Binary_Trees to make sure this is verified. In the same way, verification conditions are generated by GNATprove to ensure that the invariant holds for T at the end of Insert. In effect, it is like if we had written:

 procedure Insert (T : in out Tree; I : Index_Type; D : Direction)with
     Pre  => Tree_Structure (T),
     Post => Tree_Structure (T);

Unlike type predicates, type invariant can be broken temporarily in the body of Insert, as long as it is restored at the end of the subprogram:

procedure Insert (T : in out Tree; I : Index_Type; D : Direction) is
      M : Model_Type := Model (T) with Ghost;
      J : Index_Type;

   begin
      --  Find an empty slot in the underlying array

      Find_Empty_Slot (T, J);

      --  Plug it as the D child of I

      T.C (J).Position := D;
      T.C (J).Parent := I;

      --  The invariant of T is broken, J is not the child of I

      if D = Left then
         T.C (I).Left := J;
      else
         T.C (I).Right := J;
      end if;

      --  Tree_Structure (T) holds again
   end Insert;

Outside of Binary_Tree, the invariant of Tree is never checked. Indeed, the rules of SPARK are enough to ensure no invariant breaking value can leak out of Binary_Tree's implementation. This allows to split considerations between multiple layers. For example, we can then reuse our binary trees to implement search trees. We do not need to prove the tree structure invariant anymore, but can simply rely on it to prove the remaining properties:

package Search_Trees with SPARK_Mode is
   type Search_Tree is private;

   function Mem (T : Search_Tree; V : Natural) return Boolean;

   procedure Insert  (T : in out Search_Tree; V : Natural; I : out Extended_Index_Type);

private

   type Value_Array is array (Index_Type) of Natural;

   type Search_Tree is record
      Struct : Binary_Trees.Tree;
      Values : Value_Array;
   end record
     with Type_Invariant => Ordered_Leafs (Search_Tree);

   function Ordered_Leafs (T : Search_Tree) return Boolean with Ghost;
end Search_Trees;

When calling Binary_Trees.Insert in the implementation of Search_Trees.Insert, GNATprove does not need to check that the invariant of T.Struct hold, as it is enforced at the boundary of Binary_Trees:

  procedure Insert (T : in out Search_Tree; V : Natural; I : out Extended_Index_Type) is
   begin
      if Top (T.Struct) = 0 then
         Init (T.Struct);
         I := Top (T.Struct);
         T.Values (I) := V;
         return;
      end if;

      declare
         Current  : Extended_Index_Type := Top (T.Struct);
         Previous : Extended_Index_Type := 0;
         D        : Direction := Left;
      begin
         while Current /= 0 loop
            Previous := Current;
            if V = T.Values (Previous) then
               I := 0;
               return;
            elsif V < T.Values (Previous) then
               D := Left;
            else
               D := Right;
            end if;
            Current := Peek (T.Struct, Previous, D);
         end loop;

         --  We have found the leaf where we want to insert V

         Insert (T.Struct, Previous, D);
         --  No invariant check
         --  The tree structure is preserved by Insert

         I := Peek (T.Struct, Previous, D);
         T.Values (I) := V;

         --  Check that the leaf ordering is preserved
      end;
   end Insert;

Note that GNATprove does not support type invariants on tagged types nor on types declared in nested/child units for now. Therefore, we can neither change Search_Tree to derive from Tree nor move Binary_Trees as a nested or child package of Search_Trees.

]]>
GNAT On macOS Sierra http://blog.adacore.com/gnat-on-macos-sierra Tue, 11 Oct 2016 11:00:00 +0000 Emmanuel Briot http://blog.adacore.com/gnat-on-macos-sierra

Running GNAT On macOS Sierra

It is this time of the year again. Apple has released a new version of their operating system, now named macOS Sierra.

We started running some tests on that platform, and although we do not have full results yet, things are looking good.

The compiler and most tools work as expected.

However, Apple has once again further restricted what tools can do on the system (for sure that should result in a safer system). The major impact is on our command line debugger, gdb, since this is a tool whose whole purpose is to view and modify running processes on the system. Not something that macOS likes very much in general, although Apple's own debugger of course works flawlessly.

In previous versions of OSX, it was enough for us to codesign gdb. This no longer works, and so far there doesn't seem to be something that we can do on our side (other tools in the Apple ecosystems have similar unresolved issues).

The solutions differ on Sierra 10.12.0 and Sierra 10.12.1.

On Sierra 10.12.0

The solution will require you to slightly lower the security of your system by partially disabling SIP (System Integrity Protection). This can be done as follows:

  1. Reboot your system
  2. Keep command+R pressed until the Apple logo appears on the screen.
  3. Select the menu Utilities/Terminal
  4. Type "csrutil enable --without debug" in the terminal
  5. Finally, reboot your machine again

Note that disabling this will lower the security of your system, so doing the above should really be your decision.

Another impact of this change is that the DYLD_LIBRARY_PATH variable is no longer reset when spawning new processes via the shell. This variable is used by the dynamic linker to find dynamic libraries. It takes precedence over the search path coded in the executables, so is considered as unsafe by the OS. As a result, macOS by default unsets the variable so that the executable you spawn uses its own libraries. We recommend using the DYLD_FALLBACK_LIBRARY_PATH instead, which comes after the application's library search path, in case some libraries are still not found.

On Sierra 10.12.1

The solution requires a patched version of GDB, so either a recent wavefront of GNAT Pro (date >= 2016-11-08) or a fresh snapshot from FSF sources (the patch was committed today 2016-11-09). 

In addition to that, you will need to add the line 'set startup-with-shell off' at the start of your GDB session, which can be done once and for all by copying it to your file .gdbinit on your $HOME. The benefit of putting it in .gdbinit is that it will work with IDEs that launch GDB for you on the program (like GPS, GNATbench or Emacs).


For reference, see Apple's forum.

We will edit this post as we discover more things about macOS Sierra

Edit: The solution for Sierra 10.12.1 still requires user action to avoiding spawning a shell from GDB. We haven't found a better solution yet, we will update the post when there is one.

]]>
Verified, Trustworthy Code with SPARK and Frama-C http://blog.adacore.com/verified-trustworthy-code-with-spark-and-frama-c Wed, 05 Oct 2016 20:06:00 +0000 Yannick Moy http://blog.adacore.com/verified-trustworthy-code-with-spark-and-frama-c

Last week, a few of us at AdaCore attended a one-day workshop organized at Thales Research and Technologies, around the topic of "Verified, trustworthy code - formal verification of software". Attendees from many different branches of Thales (avionics, railway, security, networks) were given an overview of the state-of-the-practice in formal verification of software, focused on two technologies: the SPARK technology that we have been developing at AdaCore, in conjunction with our partner Altran, for programs in Ada; and the Frama-C technology developed at CEA research labs for programs in C.

The two technologies are quite close in terms of methodology and scientific background. My colleague Johannes Kanig recently published a document comparing SPARK with MISRA C and Frama-C, which gives a very good overview of both the similarities and the differences. The Frama-C open source technology is also at the core of the analysis technology developed by TrustInSoft.

The most interesting part of the day was the feedback given by three operational teams who have experimented for a few months with either SPARK (two teams) or Frama-C (one team). The lessons learned by first-time adopters of such technologies are quite valuable:

  1. The fact that the specification language is the same as the programming language (SPARK) or very close to it (Frama-C) makes it easy for developers to write specifications in a few days.
  2. The first application of such technologies in a team should be targeting a not-too-hard objective (for example absence of run-time errors rather than full functional correctness) on a small part of a codebase (for example a few units/files comprosing a few thousand lines of code rather than a complete codebase with hundreds of thousands of lines of code).
  3. Expertise is key to getting started. In particular, there should be an easy way to interact with the tool development/support team. This is all the more important for the first uses, before a process and internal expertise are in place.
  4. While specifying properties is easy, proving them automatically may be hard. It is important here to be able to get feedback from the tool (like counterexamples) when the proof fails, and to have a process in place to investigate proof failures.
  5. Last but not least, a detailed process needs to be described, so that future applications of formal verification (in the same team, or other teams in the same organization) can reliably get the expected results within the expected budget (effort, schedule).


The last point is particularly important to justify the use of new technology in a project. As part of the two experiments with SPARK at Thales, we defined guidelines for the adoption of SPARK in Ada projects, which follow a scale of four levels:

  1. Stone level - valid SPARK - The goal of reaching this level is to identify as much code as possible as belonging to the SPARK subset. The stricter SPARK rules are enforced on a possibly large part of the program, which leads to better quality and maintainability.

  2. Bronze level - initialization and correct data flow - The goal of reaching this level is to make sure no uninitialized data can ever be read and, optionally, to prevent unintended access to global variables. The SPARK code is guaranteed to be free from a number of defects: no reads of uninitialized variables, no possible interference between parameters and global variables, no unintended access to global variables.

  3. Silver level - absence of run-time errors (AoRTE) - The goal of this level is to ensure that the program does not raise an exception at run time. This ensures in particular that the control flow of the program cannot be circumvented by exploiting a buffer overflow, possibly as a consequence of an integer overflow. This also ensures that the program cannot crash or behave erratically when compiled without support for run-time exceptions (compiler switch -gnatp), after an operation that would have triggered a run-time exception. 

  4. Gold level - proof of key integrity properties - The goal of the gold level is to ensure key integrity properties such as maintaining critical data invariants throughout execution, and ensuring that transitions between states follow a specified safety automaton. Together with the silver level, these goals ensure program integrity, that is, the program keeps running within safe boundaries: the control flow of the program is correctly programmed and cannot be circumvented through run-time errors, and data cannot be corrupted.

During the two experiments, we managed to reach bronze, silver and gold levels on various parts of the projects, which allowed both detecting some errors and proving properties about program behavior. A really nice outcome for a few weeks of work.

]]>
Debugger improvements in GPS 17 http://blog.adacore.com/debugger-improvements-in-gps-17 Wed, 05 Oct 2016 09:04:12 +0000 Emmanuel Briot http://blog.adacore.com/debugger-improvements-in-gps-17

The GNAT Programming Studio started as a tool named GVD (graphical visual debugger), which was just a GUI for the gdb command line debugger. Editors were showing the code, but not allowing editing.

Of course, it then evolved (a lot) and became a full fledged integrated environment. But support for debugging activities is still a very important part of the experience. We had accumulated a number of enhancement requests, which have finally been implemented in this year's release.

Here is a quick summary.

Set breakpoints at any time

Historically, users had to initialize the debugger (i.e. enter a special mode in GPS) before they could start setting breakpoints. That's because breakpoints were only managed by the debugger itself. 

In practice though, it is generally when you are reading code that you decide that a breakpoint might be a good idea. Having to start the debugger first breaks the flow of ideas.

So it is now possible to set a breakpoint at any time in GPS, whether the debugger is running or not. To do this, simply click on the line number. GPS will display it with a blue background (and show a tooltip if you hover to explain what the color means). In the following example, we have clicked on line 42.

In previous versions, GPS was displaying small grey or red dots next to lines with executable code, and you had to click on those points to set the breakpoints. We have now removed the display of these dots for several reasons: they were computed via relatively expensive calls to the debugger (and thus required that a debugger be running), they were imprecise, depending on the optimization level used to compile the code, and were taking valuable screen-estate. 

New debugger perspective

Let's now start the debugger. This is done, as before, by selecting the menu /Debug/Initialize/ and the executable to run. This step is still necessary to let GPS know that the debugger should be started, and that you might want to do additional setup before actually running the debugger (like connecting to a board, attaching to a running process,...) We are working on simplifying that step as well, although this will not be part of the release this year.


If you look at the screenshot, you might notice that the layout of the window (what GPS calls the Perspective) is different. Although, if you have used GPS before, you will not see the same view, since GPS always tries to restore what you had setup previously. The solution, currently, is to remove the file perspectives6.xml in HOME/.gps/ or %USER_PROFILE%\.gps depending on your platform and then restart GPS.

Here are the changes we have done:

  • Removed the Data window, which was used to display the variables as boxes in a graphical browser. This view is still available, but has been replaced in the default perspective by the Debugger Variables view (see below)
  • Display the Breakpoints view by default

These are small changes, but might bring attention to some of the view we think are very useful when debugging.

The Debugger Variables view

A new view has been added to display the value of variables as a tree. It comes in addition to the existing Debugger Data. Here are two screenshots showing the same information in both views.

To display information in the Debugger Variables, the simplest is to right-click on the variable in the source editor, then select the Debug/Tree Display contextual menu. This will add the variable to the list of those displayed. You can then expand fields as you want by clicking on the arrows.

When the variable is an access (or a pointer), expand its value will show the value it points to.

If you look at the buttons in the toolbar of the Debugger Variables, you will find some that enable you to display the value of any expression (the + button), or the list of all local variables for the current subprogram, or the value of the CPU registers.

Breakpoints view

We have redone the Breakpoints view, which displays the list of breakpoints. It used to be a very large window but we have now simplified it significantly so that it can be kept visible at all times. And since breakpoints can indeed be set anytime, as we mentioned before, you can also display the  Breakpoint view when the debugger has not been started.

The view lists each known breakpoint, with its location. You can simply double-click on any of the breakpoints to open a source editor showing its location.

By unselecting the check box, the breakpoint will be disabled, and gdb will no longer stop at that location.

If you press the mouse for a short while on the breakpoint (or alternatively click on the gears icon in the toolbar), GPS opens a new dialog that lets you edit some advanced properties of the breakpoint, like the conditions that should be met for the debugger to stop, or how many times the debugger should ignore this breakpoint before it actually stops.

Performance

In the default debug perspective, GPS always displays the Call Stack window which shows, whenever the debugger stops, the current subprogram and where it was called from. This information is of course queried from gdb itself, and depending on the settings it might take a while to compute. For instance, displaying the value of the parameters can sometimes take a few seconds for large callstacks on some systems. This leads to very long delays whenever the debugger stops.

We have optimized things a bit here. If the Call Stack is not configured to show the value of the parameters, then GPS makes sure that gdb doesn't spend any time computing it.

Scripting

Like all parts of GPS, the debugger can be fully controlled via python scripts. This lets you run your own commands whenever the debugger stops for instance, or automatically display the value of some variables, or anything else you could imagine in your context.

In this release, we have added a number of new python commands to make interfacing simpler. For instance, there is now a value_of function to query the value of a variable (while cleaning up extra information output by gdb), or a breakpoints command to efficiently retrieve the current list of breakpoints (and benefit from GPS's own cache)

]]>
The Most Obscure Arithmetic Run-Time Error Contest http://blog.adacore.com/the-most-obscure-arithmetic-run-time-error-contest Thu, 22 Sep 2016 04:00:00 +0000 Yannick Moy http://blog.adacore.com/the-most-obscure-arithmetic-run-time-error-contest

Something that many developers do not realize is the number of run-time checks that occur in innocent looking arithmetic expressions. Of course, everyone knows about overflow checks and range checks (although many people confuse them) and division by zero. After all, these are typical errors that do show up in programs, so programmers are aware that they should keep an eye on these. Or do they?

Let's start with the typical examples of overflows and division by zero:

 Z := X + Y;
   Z := X / Y;

Here, X, Y and Z might be integers, fixed-point numbers or floating-point numbers. If X and Y are too big, X + Y won't fit into a machine integer or floating-point, hence the overflow error. If Y is zero, X / Y has no meaningful value, hence the division by zero error.

Let's look now at a more exotic overflow, which happens when you negate a signed integer:

 Z := -X;

In the specific case where X is the minimal machine integer of this size (say, -2147483648 for a 32 bits signed integer), negating it results in a value that's one more than the maximal integer of that size (2147483647 for a 32 bits signed integer). This is because machine signed integers are asymmetric: there is one more negative value than there are positive values.

This error is rather common, because it is such a special case, so programmers tend to overlook it. Take for example the original C code of the Crazyflie small drone that we translated into SPARK. They discovered the bug after debugging a scenario that causes the drone to spin uncontrollably.

The same reason explains why there might be an overflow on absolute value operation:

 Z := abs X;

Indeed, if X is -2147483648 here, its absolute value would be 2147483648 which is more than the maximal 32 bits machine integer.

There is an even more obscure overflow, which happens when you divide integers:

 Z := X / Y;

In the specific case where X is again the minimal machine integer of this size (-2147483648 for a 32 bits signed integer), and Y is -1, dividing X by Y is the same as negating X, so we are back to the previous run-time error.

Now, let's open the curtain on the most obscure arithmetic run-time error: the possibility that an exponentiation on a floating-point value results in a division by zero. Yes, you may have values for X and N such that X**N results in a division by zero. How? Take X=0.0 and N=-1. The Ada standard says that X**N in that case is the same as 1/(X**(-N))... but X is zero, so X**(-N) is zero, so... yes, we have a division by zero. Amazing!

Well, it turned out that this was also a surprise for the SPARK developers. We did not know this rule of Ada, so we did not implement the check that detects this case in SPARK. This is now fixed thanks to Rod Chapman who noticed that. At this point, you should wonder how one can ever make sure to get rid of all run-time errors in her code? Are coding standard, reviews and testing going to detect all possible problems? I don't believe so. I believe that the only way to detect all possible such cases is to use a static analysis tool like SPARK. If we were able to miss one case, while we earn a living for focusing on such checking, no amount of attention and cleverness is going to make a developer catch all such cases in a real life application.

]]>
Unity & Ada http://blog.adacore.com/unity-ada Mon, 19 Sep 2016 11:30:00 +0000 Quentin Ochem http://blog.adacore.com/unity-ada


Using Ada technologies to develop video games doesn’t sound like an an obvious choice - although it seems like there could be an argument to be made. The reverse, however, opens some more straightforward perspectives. There’s no reason why a code base developed in Ada couldn’t benefit from a game engine to support the user interface. Think of a flight simulator for example, running a mission computer in Ada.

In the past few years, a number of these have been made available to a larger audience, mostly to serve the so-called indie scene. In this blog post, we’ll concentrate on Unity, which has the advantage of using the Mono environment as scripting language. As there is no flight mission computer at AdaCore, we’ll use a slightly modified version, the SPARK/Ada example already ported to a number of environments. So that’ll be a game in Ada after all!

Setup

You can get the code on AdaCore’s Github repository. It has been developed for Windows, but should be easily ported over to Linux (the directive “for Shared_Library_Prefix use "";” in the GNAT Project file may be all you need to remove - see more information in the Ada project setup). The Ada folder contains what’s necessary to build the native code, the TetrisUI contains the Unity project. You will need both GNAT and Unity for everything to work.

Unity can be downloaded from https://unity3d.com/. During the installation, make sure that you select the correct version - 32 bits or 64 bits - matching your GNAT compilation environment. In particular, to date, GNAT GPL is only provided with a 32 bits code generator on Windows, and 64 bits on Linux.

Once the Ada project is built, make sure to copy the resulting library under the Asset/Plugin directory of the Unity project. This will be described in more details below.

Now, let’s dive into the project.

Exporting Tetris Ada to C#

We’re going to skip the section describing the development of the Tetris code. There’s a full explanation available . Note that interestingly, we’re not only interfacing Ada, but actually SPARK code. This provides a good demonstration of a typical situation with Ada, where the core safety-critical functionalities are developed using a higher safety standard (here the core of the game), which can then be integrated in various environments with fewer safety constraints (here the user interface developed in Unity).

As said in the introduction, Unity runs mono - an open-source version of the .Net platform - and therefore allows you to develop scripts using the C# language. So, exporting our Ada code to Unity will turn out to be interfacing Ada and C#.

Step 1 - Setting up the Build

The first step is to set up a library to be loaded within Unity. We need actually several things here: the library has to be dynamic, we need it to be automatically initialized, and we need to make sure that it doesn’t have any dependency on any other libraries, in particular the GNAT libraries. This is achieved through the following settings in the GNAT Project file:

   for Library_Kind use "dynamic";
   for Library_Standalone use "encapsulated";
   for Shared_Library_Prefix use "";

We specify here the fact that the library is dynamic (for Library_Kind use "dynamic";) and that it should encapsulate all dependencies, in particular the GNAT library (for Library_Standalone use "encapsulated";). That second setting is important, otherwise our library would only work if the GNAT run-time is also provided.

By default, the build system (gprbuild) will add a “lib” prefix to the library name. I’m personally working on Windows and don’t fancy having this prefix which doesn’t really look natural. This behavior is cancelled by the clause (for Shared_Library_Prefix use "";). On Linux, where you’re likely to want this prefix, you may need to remove that clause.

Step 2 - Exporting to C, then C#

There’s no direct way to export from Ada to C#, but as the C# Platform Invoke (PInvoke) services allow us to call native C code as often, we’re going to go through C conventions for the interfacing. As a disclaimer, to anyone that is already shaking to the idea of interfacing virtual machines and native code - fear no more. Using PInvoke is surprisingly easier than using something like the Java Native Interface for the jvm - at least for basic interfacing. In our case, for Ada, it will be nothing more than exporting proper symbols.

In order to precisely control what gets exported, we’re going to wrap all calls to the Tetris package (that contains the game core) into a new package Game_Loop (see https://github.com/AdaCore/UnityAdaTetris/blob/master/Ada/src/game_loop.ads and https://github.com/AdaCore/UnityAdaTetris/blob/master/Ada/src/game_loop.adb). This package will have in particular a subprogram “Cycle” responsible for moving pieces around the board. Its interface will look as follows:

  procedure Cycle
     with Export,
     Convention => C,
     External_Name => "tetris_cycle";

I’m using Ada 2012 aspects here, as opposed to the usual “pragma Export”, but the effect is the same. This is declaring the C call convention, exporting the procedure to a C symbol “tetris_cycle”.

Importing this symbol into C# is painfully easy. All we need to do is create a static function of the right profile, and then associate it to the right DllImport directive. We’ll see how to set the C# project in the next section, but here’s the code that will have to be written in the class:

    [DllImport("tetris")]  
    private static extern void tetris_cycle();

And that’s it! We’ll of course need to make sure that the library is at the correct location to be loaded, but again, that’s for the next section.


One last point of importance - exceptions. The core of the game is written in SPARK and all run-time checks have been proven. So we know that when called properly (ie respecting preconditions) this code cannot raise any exception. However, there’s no such checks at the C# to Ada interface level and it’s quite possible that values passed from C# to Ada are wrong. A good example of this is board coordinates, which are constrained between 1 .. X_Size and 1 .. Y_Size but mapped into regular integers. Nothing prevents values in C# to be wrong, and thus to issue an exception when being passed to Ada. The resulting behavior is quite annoying as C# has no way to catch Ada exceptions - Unity will just crash. As it turns out, it happened to me quite a few times before I realized what was happening.

There are ways to throw a C# exception from native code - as to translate an Ada exception into a C# one. The SWIG interface generator does it for example. However, it’s outside of the scope of this blog, so we’ll just make sure that all calls that are part of the interface have default handlers that provide default behaviors. Let’s look at another example from the interface, the code that provides the value on a cell of the board:

  function Get_Kind (X, Y : Integer) return Cell
     with Export,
     Convention => C,
     External_Name => "tetris_get_kind";

 The implementation will look like:

   function Get_Kind (X, Y : Integer) return Cell is
   begin
      return Cur_Board (Y)(X);
   exception
      when others =>
         return Empty;
   end Get_Kind;

Returning the “Empty” literal is far from ideal. There’s no information passed to Unity that something wrong happened. However, this is enough to keep things going.

The C# code will look like:

    [DllImport("tetris")]
    private static extern byte tetris_get_kind(int x, int y);

Integer is quite logically mapped into int. I have to admit that I used implementation knowledge for the return type. I know that it’s a small enumeration that turns into an 8 bits unsigned int, hence the “byte” type. As it turns out however, this implementation knowledge is available to everyone. The -gnatceg switch from GNAT allows to generate a C interface to Ada. The resulting C header files can be directly used to develop C code or to interface with anything (like C#). And as a matter of fact, we could even have used these in the SWIG tool we mentioned before to automatically generate C# and - hey - use their interface to exception handling mechanism!

Back to our work, the last piece worth mentioning here is handling of data structures. More precisely, regarding a type in the Tetris specification:

   type Piece is record
      S : Shape;
      D : Direction;
      X : PX_Coord;
      Y : PY_Coord;
   end record;

We need to be able to access to this type from C#. Again, using -gnatceg we can see that the compiler generates a structure with two unsigned 8 bits integers and 2 regular integers. This is -non portable- insights on the compiler behavior. To do proper interfacing, it would probably have been better to declare this structure and all the types as being C convention. But we’re taking shortcuts here for the purpose of the demonstration.

Interestingly, this type can directly be mapped to C#. C# has two main composite data structures, the “class” and the “struct”. A notable difference is that “class” is pass-by-reference and “struct” pass-by-copy. So a C# struct is very appropriate here and can be directly mapped to:

    private struct Piece
    {
        public byte shape;
        public byte direction;
        public int x;
        public int y;
    }

So that the Ada call:

   function Get_Cur_Piece return Piece
     with Export,
     Convention => C,
     External_Name => "tetris_get_cur_piece";

Becomes:

    [DllImport("tetris")]
    private static extern Piece tetris_get_cur_piece();

One last piece of information, a number of calls are returning booleans value. Recent GNAT compilers will complain when interfacing Ada Boolean type with C - there isn’t such a type in C. Instead of just disregarding the warning we can shut it down by using our own Boolean type (with proper size clause this time):

     type CSharp_Bool is new Boolean with Size => 8;

Used in various places in the interfacing.

In summary, interfacing Ada and C# comes with a couple of hurdles, but is overall relatively painless. The code and this article take a number of shortcuts and present alternatives to the interfacing. Using this in an industrial context would require a bit more care to make sure that the interfaces are portable and safe. And even better, automatic generation of the wrappers and interface would be  ideal (a bit like GNAT-AJIS for Java). The question of exception handling still needs to be worked out but for a demonstrator, that’s good enough. We now have an Ada library ready to be used from within Unity. Let’s play!

Developing the game UI in Unity

Some high level information on Unity

There’s many things to know about Unity and we’re not even going to begin scratching the surface. For more detail, there’s a massive amount of tutorials and books available. We’re only going to provide a high level view of the concepts we deployed in this demonstrator. In order to run the example, the one thing you will need to do is to build the Ada library (see previous section) and to copy it to the Resources asset folder described below.

The first time a Unity project is open, it shows a game scene. There may be many scenes in a game, but for Tetris we’ll only need one. The objects of that scene are listed in the list on the left. These are the ones which are statically defined, but as we are also going to create a couple of objects dynamically, these objects are typed after GameObject. Unity is using an interesting component based design method, where each GameObject is itself composed of components (shape, renderer, position, scripts, etc). Clicking on the Main Camera object, we’ll get the list of these components on the right panel. In particular a Transform at the top, specifying matrix transformation for this object (position, rotation...), a Camera component for what is specific to the camera (in particular switching between orthographic and perspective projections), and at the end a script called Tetris, which will contain the behavior of the game. We decided to associate this script to the camera component, but as a matter of fact, it won’t have to interact with the camera, we could just as well have placed it anywhere.

At the bottom of the screen, we have the Assets. The components and game objects used to build the scene. Two scripts - Tetris that we discussed already and Cell which we’ll see briefly in a bit. A material called “CubeMat” used to render cubes and the Main scene. There are also two directories, Resources and Plugins. In the Asset folder, developers are mostly free of organizing elements the way they want apart from a couple of special directories, these two in particular.

The Plugins directory in particular is responsible for containing the dynamic libraries to be loaded by Unity. When checking the project out, it should be empty. Just drag and drop tetris.dll (or libtetris.so) here to complete the project.

Resources contains objects that can be dynamically instantiated. Here, Cell is going to be one cell on the screen. Clicking on it, you’ll see that it has a position, a collider, a renderer, a material, and a filter (responsible for the shape - or mesh - of the object). It’s also associated with a script also called Cell, which implements some specific services.

And that’s pretty much it. Clicking on the play button on the top, you should be able to launch the game, move bricks with the arrows or accelerate the fall with the space key. Note that stopping and relaunching the game from within the same Unity instance will not reset the board. This is because the library is actually loaded within the Unity environment itself, not in response to the play button. To reset, we can either re-launch Unity, or extend the implementation with some initialization code launched at game start.


The Tetris behavior

The UI for this Tetris is very basic. It’s a 10 * 38 grid of pre-loaded cubes, that we’re going to make appear and disappear depending on the state of the core game. It’s arguably a bit of a shame to use a tool such as Unity and turn it into such a basic implementation - but it has the benefit of adhering to the core developed in Ada, which can then be ported to much more crude environments. It’s all ready to be extended though!

The core script is pretty much all contained in this “Tetris.cs” script, which you can open either from the asset list or by clicking on the entry in the Main camera. You’ll see a bunch of DllImport clauses here, similar to those we described in the previous section. Next is a call to Awake(). Awake is a special function that unity calls once the object has been created and set up. It’s just recognized by Unity from its name, there’s no overriding mechanism used here (although of course C# has support for it). The first line is getting a handle on a prefab, which is one of these resources we’re going to create dynamically:

   prefabBlock = Resources.Load("Cell") as GameObject;

The complete path to the prefab is “Assets/Resources/Cell” but the common prefix is omitted. Once this handle is obtained, I can then instantiate it through instantiate calls:

   Instantiate(prefabBlock)

Which returns a GameObject. As seen before, this GameObject is associated to a script Cell. In Unity, all these scripts are actually descendent of the type Behavior or MonoBeharior. What’s nice about the component based model of unity is that it’s always possible to get any components of a given object from any other component. In other words, I can store this object through a reference to a GameObject or using its Cell directly.

When only one component of a specific type is available, it can be accessed through the generic GetComponent<type>() call, so:

    cells[x, y] = Instantiate(prefabBlock).GetComponent<Cell>();

actually initializes the array cells at x,y position with the instance of the type Cell from the prefab I just initialized. Next we’re going to deactivate the cell for now, as the grid is empty by default. See that we can reference the GameObject from the Cell directly:

    cells[x, y].gameObject.SetActive(false);

The rest of the code is pretty straightforward. Update() is another of these magic functions, this one is called at every frame. We’re calling tetris_cycle() regularly to update the game, and subsequently activating / deactivating cells depending of the status of the board.

The Cell script has some other interesting elements. The first thing it does (in Awake) is to duplicate the material in order to be able to change its color independently of others. The color is then changed in the Renderer component from SetKind. Last but not least, the Explode function creates an instance of the explosion particle system to give a little something when a line is destroyed.

Going Further

There are various things to play with from there. The game itself is merely a proof of concepts, and many things can be added starting with a button to reset the game, a button to exit it, score, etc. The Ada to C# interfacing techniques can also be greatly improved, ideally through automatic binding generation. And of course, it would be nice to have a more comprehensive piece of Ada code to integrate into a larger scale project. All the pieces are here to get started!


]]>
GNAT Programming Studio (GPS) on GitHub http://blog.adacore.com/gnat-programming-studio-gps-on-github Mon, 12 Sep 2016 08:00:00 +0000 Emmanuel Briot http://blog.adacore.com/gnat-programming-studio-gps-on-github

As we mentioned in an earlier post, AdaCore is busy making the source code for its tools available on GitHub. This is not a simple process. Some of the projects first had to be converted to git (from our historical use of subversion), and we needed to make sure that the tools can indeed be built with what is already available out there.

We started with a few libraries and tools, like GtkAda and gprbuild.

We have now managed to make the GNAT Programming Studio (also known as GPS) available. This is a large project that depends on a large number of libraries.

Getting the sources

To download the sources (in fact, to download the whole history of the 16 year old project), head to the GPS GitHub repository. As always on GitHub, you can explore the code online. But the more convenient approach is to click on the green button at the top of the page:

Cloning the sources

Copy the URL, and in your terminal you can then type:

    git clone https://github.com/AdaCore/gps

to get the sources locally. Take a look at the INSTALL file for build instructions. Beware that you will need to first download and compile a number of other dependencies, like gprbuild, GtkAda and GnatColl. That most likely should not be the first project you try to compile.

What you can contribute

We have of course multiple reasons to make our sources available on a day to day basis. The most important is perhaps to help build communities around those tools. This is a great way to make sure they actually fulfill your expectations, and give you a chance to contribute ideas and code if you can.

In the specific case of GPS, we believe we have a great Integrated Development Environment, which can be used in a large number of settings - whether you are developing a multi-million line project, or getting started with your first embedded software, or programming in other languages like C, C++ or Python. We also try to welcome people coming from other environments by taking a look at what exists and when possible, providing similar features.

Such work can be done in GPS via the use of Python plugins, which have a great flexibility nowadays. Take a look at the GPS documentation and the existing plugins for examples.

Unfortunately, we can't do it all and you might be able to help. We have been talking for years of setting up a repository of useful plugins written by people outside of the main GPS developers. And we believe GitHub is a great step forward in that direction.

How you can contribute

The way this works on GitHub is the following:

  1. Create a GitHub account if you do not have one already
  2. Go to the GitHub page for GPS
  3. Click on the Fork button at the top-right corner. A window will pop up that lets you specify where to work. Select your account. Github clones the repository to your own copy.
  4. You can now clone your new repository to your machine.
  5. Modify the sources or add new files with standard git commands, in particular "git add" and "git commit"
  6. When you are ready, type "git push" to contribute your changes to your own repository on GitHub. 
    To make things simpler in the case of trivial changes, steps 4, 5 and 6 can be done directly from the GitHub interface by editing files in place.
  7. You can now let us, the GPS developers, know that you have made the ultimate, ground breaking and fabulous change (or, if you prefer, that you have done a small minor fix). Click on the New Pull Request button. In the resulting dialog, you can leave most of the settings with their default value (which will basically try to merge your master branch into the official GPS master branch). When you press Create Pull Request, the GPS developers will receive a notification that a change is ready for adding to GPS.
  8. We will then do code reviews and comments on your patch and, I am sure, ultimately integrate it in the official GPS sources. At this stage, you have become a GPS contributor!

See the GitHub documentation for more information.

The goal is that after including your changes in GPS, they become part of GPS forever (both the GPL release available for free to everyone, as well as the Pro releases available to our customers). We will retain your names in the commit messages (after all, you did the work).

Looking forward to your contributions!

]]>
Bookmarks in the GNAT Programming Studio (GPS) http://blog.adacore.com/bookmarks-in-the-gnat-programming-studio-gps Wed, 07 Sep 2016 14:00:00 +0000 Emmanuel Briot http://blog.adacore.com/bookmarks-in-the-gnat-programming-studio-gps

Bookmarks have been in GPS for a very long time (first added in 2005). They help you mark places in your source code that you want to easily go back to, even when the code changes.

The description in this blog only apply to very recent development versions of GPS. For our customers, you can immediately request a wavefront (nightly build of GPS) if you want to try them out. For users of the public versions, they will be part of the next GPL 2017 release , so you will have to wait a bit :-)

Basic usage

Creating a new bookmark

The basic usage of bookmarks is as follows: you open a source editor and navigate to the line of interest. You can then create a new bookmark by either using the menu /Edit/Create Bookmark or by opening the Bookmarks view (/Tools/Views/Bookmarks) and then clicking on the [+] button in the local toolbar. In both cases, the Bookmarks view is opened, a new bookmark is created and selected so that you can immediately change its name.

The default name of bookmark is the name of the enclosing subprogram and the initial location of the bookmark (file:line). But you can start typing a new name, and press Enter to finally create the bookmark.

In practice, this is really just a few clicks (one of the menu and press Enter to use the new name), or even just two key strokes if you have set a keyboard shortcut for the menu, via the Preferences dialog.

At any point in time, you can rename an existing bookmark by either clicking on the button in the local toolbar, or simply with a long press on the bookmark itself.

Note the goto icon on the left of the editor line 1646, which indicates there is a bookmark there, as well as the colored mark in the editor scrollbar that helps navigate in the file.

Even though the default name of the bookmark includes a file location, the major benefit of the bookmarks is that they will remain at the same location as the text is edited. In our example, if we add a new subprogram before Display_Splash_Screen, the bookmark will still point at the line containing the call to Gtk_New, even though that line might now be 1700 for instance.

Of course, GPS is not able to monitor changes that you might do through other editors, so in this case the marks might be altered and stop pointing to the expected location.


Adding more bookmarks

Adding more bookmarks

We can create any number of bookmarks, and these have limited impact on performance. So let's do that and create a few more bookmarks, in various files. As you can see in the scrollbar of the editor, we have two bookmarks set in the file bookmark_views.adb, and we can easily jump to them by clicking on the color mark.

But of course, it is much simpler to double-click inside the Bookmarks view itself, on the bookmark of interest to us.

At this point, we have a rather long unorganized list of bookmarks, and it is becoming harder to find them. This is basically what has been available in GPS since 2005, and that doesn't help keeping things organized. So we have recently done a number of improvements that get us far beyond this basic usage.

Organizing bookmarks into groups

Creating groups

When we create new bookmarks, GPS adds them at the top of the list. We might want to organize them differently, which we can do simply with a drag and drop operation: select the bookmark, keep the mouse pressed, and move it to a better place in the list.

Things become more interesting when you drop a bookmark on top of another one. In this case, GPS creates a group that contains the two bookmarks (and that basically behaves like a folder for files). The group is immediately selected so that you can rename it as you see fit.

In our example, we created two groups, corresponding to two features we are working on.

Groups can be nested to any depth, providing great flexibility. So let's create two nested groups, which we'll name TODO, beneath the two we have created. This is a great way to create a short todo list: one top-level group for the name of the feature, then below one group for the todo list, and a few additional bookmarks to relevant places in the code.


Unattached bookmarks, as TODO items

To create these additional groups, we will select the Source editor group, then click on the Create New Group button in the local toolbar, and type "TODO<enter>". This will automatically add the new group beneath Source editor. Let's do the same for the Bookmarks groups. These two groups are empty for now.

Let's add new entries to them. if we already know where code should be added to implement the new todo item, we can do as before: open the editor, select the line, then click on the [+] button. Most often, though, we don't yet know where the implementation will go.

So we want to create an unattached bookmark. Using the name bookmark here is really an abuse of language, since these have no associated source location. But since they are visible in the Bookmarks view, it is convenient to name them bookmarks.

To create them, let's select one of the TODO groups, then select the Create Unattached Bookmark in the local toolbar, and immediately start typing a brief description of the todo. As you can see in the screenshot, these bookmarks do not have a goto icon, since you cannot double click on them to jump to a source location.

When you delete a group, all bookmarks within are also deleted. So once you are done implementing a feature, simply delete the corresponding group to clean up the bookmarks view.

Adding notes

Adding notes to bookmarks

The short name we gave the bookmark is not enough to list all the great ideas we might have for it. Fortunately, we can now add notes to bookmarks, as a way to store more information.

Let's select the "write a blog post" item, then click on the Edit Note button in the local toolbar. This opens a small dialog with a large text area where we can type anything we want. Press Apply to save the text.

Note how a new tag icon was added next to the bookmark, to indicate it has more information. You can view this information in one of three ways:

  • select the bookmark, and click again on the Edit Note button as before
  • double-click on the tag icon.
  • leave the mouse hover the bookmark line. This will display a tooltip with extra information on the bookmark: its name, its current location and any note it might have. This is useful if you only want to quickly glance at the notes for one or more bookmarks

Add note with drag and drop

Dragging text to add a note

Sometimes, though, you want to associate code with the note (i.e. the bookmark should not only point to a location, but you also want to remember the code that was in that location). The simplest to do this is to select the text in the editor, and then drag and drop the selected text directly onto the bookmark. This will create a note (if needed) or add to the existing note the full selected text.

In the tooltips, we use a non-proportional font, so that the code is properly rendered and alignment preserved.




Filtering bookmarks

Filtering the list of bookmarks

If you start creating a lot of bookmarks, and even if you have properly organized them into groups, it might become difficult to find them later on. So we added a standard filter in the local toolbar, like was done already for a lot of other views. As soon as you start typing text in that filter, only the bookmarks that match (name, location or note) are left visible, and all the others are hidden.






Favorite files

Bookmarks used as favorite files

GPS provides a large number of ways to navigate your code, and in particular to open source files. The most efficient one is likely the omni-search (the search field at the top-right corner).

But some users like to have a short list of favorite files that they go to frequently. The Bookmarks view can be used to implement this.

Simply create a new group (here named Favorite files), and create one new bookmark in this group for each file you are interested in. I like to create the bookmark on line 1, but I always remove the line number indication in the name of the bookmark since the exact line is irrelevant here.



Conclusion

The flexibility of the Bookmarks view has been greatly enhanced recently, providing much needed features such as groups of bookmarks, notes, todo items, enhanced tooltips, drag-and-drop operations, ...

We have described two use cases in this post (bookmarks as todo lists and bookmarks for favorite files), but I am sure there are other ways that bookmarks can be used. I would be curious to hear your own ideas in the comments...

]]>
Automatic Generation of Frame Conditions for Record Components http://blog.adacore.com/automatic-generation-of-frame-conditions-for-record-components Tue, 26 Jul 2016 04:00:00 +0000 Claire Dross http://blog.adacore.com/automatic-generation-of-frame-conditions-for-record-components

As explained in a previous post, formal verification tools like GNATprove rely on the user to provide loop invariants to describe the actions performed inside loops.

Though the preservation of variables which are not modified in the loop need not be mentioned in the invariant, it is in general necessary to state explicitly the preservation of unmodified object parts, such as record fields or array elements. These preservation properties form the loop’s frame condition. As it may seem obvious to the user, the frame condition is unfortunately often forgotten when writing a loop invariant, leading to unprovable checks.

To alleviate this problem, the GNATprove tool now generates automatically frame conditions for unmodified fields of record variables. It also handles unmodified fields of array components as long as they are preserved at every index in the array.

Let us look what it means on an example. We consider a structure which is an array containing for each element its value, the maximum of the values of the previous elements in the array and the maximum of the values of the next elements in the array:

type Cell is record
   Value     : Natural;
   Max_Left  : Natural;
   Max_Right : Natural;
end record;
 
type Cell_Array is array (Positive range <>) of Cell;

We create a function which updates the fields Max_Left and Max_Right of each element of an array A with the appropriate values. The postcondition of the procedure states that the Max_Left (resp. Max_Right) component of each cell is bigger than the elements located before (resp. after) the cell.

procedure Update_Max (A : in out Cell_Array) with
  Post => (for all I in A'Range =>
             A (I).Value = A'Old (I).Value
           and (for all J in I .. A'Last =>
                    A (I).Value <= A (J).Max_Left
                and A (J).Value <= A (I).Max_Right));

We implement Update_Max using two loops. The first one goes from left to right and only updates the Max_Left field of each array cell while the second one goes from right to left and only updates the Max_Right field.

procedure Update_Max (A : in out Cell_Array)is
   K   : Positive;
   Max : Natural;

begin
   --  If A is empty, there is nothing to do.

   if A'Length = 0 then
      return;
   end if;

   --  Loop from A'First to A'Last. Max is the maximum of the
   --  elements encountered so far. At each step, update the
   --  Max_Left field of the current index with the current value
   --  of Max.

   K := A'First;
   Max := 0;

   loop
      if A (K).Value > Max then
         Max := A (K).Value;
      end if;

      A (K).Max_Left := Max;

      exit when K = A'Last;
      K := K + 1;
   end loop;

   --  Loop from A'Last to A'First. Max is the maximum of the
   --  elements encountered so far. At each step, update the
   --  Max_Right field of the current index with the current
   --  value of Max.

   K := A'Last;
   Max := 0;

   loop
      if A (K).Value > Max then
         Max := A (K).Value;
      end if;

      A (K).Max_Right := Max;

      exit when K = A'First;
      K := K - 1;
   end loop;
end Update_Max;

For the postcondition of Update_Max to be proved, we need to add a loop invariant to both loops stating how the loop has modified A, K, and Max during the previous iterations (see the user guide to know what should be written in loop invariants). In each loop invariant, we need only to speak about the components the array cells which we are updating:

loop
  if A (K).Value > Max then
     Max := A (K).Value;
  end if;
  A (K).Max_Left := Max;

  --  K stays in A'Range

  pragma Loop_Invariant (K in A'Range);

  --  Max is bigger than all the elements encountered so far

  pragma Loop_Invariant
    (for all I in A'First .. K => A (I).Value <= Max);

  --  For every indexes I <= J updated so far, Max_Left of J is bigger than the Value of I

  pragma Loop_Invariant
    (for all I in A'First .. K =>
       (for all J in I .. K =>
            A (I).Value <= A (J).Max_Left));

  exit when K = A'Last;
  K := K + 1;
end loop;

For this loop, GNATprove infers that components Value and Max_Right of each element of A are preserved and automatically generates the corresponding frame condition:

pragma Loop_Invariant
  (for all I in A'Range => A (I).Value = A'Loop_Entry (I).Value
               and A (I).Max_Right = A'Loop_Entry (I).Max_Right);

So that the above subprogram is completely proved by GNATprove with level=1.

Note that, as SPARK treats subprograms' inputs as a whole, the frame condition could not be generated if the modifications in the loops were done to A through a subprogram call, and the frame condition would need to be stated explicitly in the invariant:

procedure Set_Max_Left (Index : Positive; Value : Natural)
with Post => A = A'Old'Update
  (Index => A'Old (Index)'Update (Max_Left => Value));
   
loop
  if A (K).Value > Max then
     Max := A (K).Value;
  end if;

  --  Set_Max_Left only modifies A (K).Max_Left, but frame condition generation
  --  does not know it.

  Set_Max_Left (K, Max);

  -- Frame condition

  pragma Loop_Invariant
     (for all I in A'Range => A (I).Value = A'Loop_Entry (I).Value
                  and A (I).Max_Right = A'Loop_Entry (I).Max_Right);

  [...]

  exit when K = A'Last;
  K := K + 1;
end loop;

Also note that no frame condition is currently generated for unmodified components of an array. For example, we cannot prove at the beginning of the loop that the Max_Left component of the current array cell is equal to its value before the loop without explicitly stating its preservation as a loop invariant:

A_Init := A;

loop
  --  The element stored at index K has not been modified so far,
  --  but no frame condition is currently generated for that.

  pragma Assert (A (K).Max_Left = A_Init (K).Max_Left);

  if A (K).Value > Max then
     Max := A (K).Value;
  end if;
  A (K).Max_Left := Max;

  -- Frame condition

  pragma Loop_Invariant
     (for all I in A'Range =>
              (if I > K then A (I).Max_Left = A'Loop_Entry (I).Max_Left));

  [...]

  exit when K = A'Last;
  K := K + 1;
end loop;

See the corresponding section in the user guide for more information.

]]>
Research Corner - SPARK 2014 vs Frama-C vs Why3 http://blog.adacore.com/research-corner-spark-2014-vs-frama-c-vs-why3 Mon, 11 Jul 2016 04:00:00 +0000 Yannick Moy http://blog.adacore.com/research-corner-spark-2014-vs-frama-c-vs-why3

There have been informal comparisons by researchers between the use of SPARK and Frama-C, but these are not enough to really understand the most important differences at the heart of the these different toolsets for formal verification of programs. Another interesting point of comparison is the Why3 technology which underlies both SPARK 2014 and Frama-C (when using the Jessie plugin instead of the WP plugin for deductive verification). Hence the interest of this article we have written for the 7th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, which goes into deeper difference in design choices between these technologies, in terms of executability, semantics of logical parts, use of ghost code, and generation of counterexamples.

]]>
Introducing the Make With Ada competition! http://blog.adacore.com/introducing-make-with-ada Tue, 21 Jun 2016 13:46:00 +0000 AdaCore Admin http://blog.adacore.com/introducing-make-with-ada

If you’ve been looking for a way to start your next embedded project in Ada or SPARK. Then, look no further than the Make with Ada competition!

The embedded software competition is based on the Ada and SPARK languages and the criteria of: software dependability, openness, collaborativeness, and inventiveness.

Until the 30th of September individuals and teams of all abilities are challenged to create a new Ada language based project and log their progress along the way.

Whether you want to explore a different programming approach or demonstrate expertise with the community, we encourage you to register for the first Make with Ada competition.

Why not take some Make with Ada project inspiration from the the latest Ada ARM Cortex-M/R Drivers now available on our GitHub page? Or check out the recent blogs about the candy dispenser, pen plotter and smartwatch hacks from our Paris HQ.

Simply register to take the challenge and start new projects with Ada today! You'll also be in with a chance of winning a share of the prize fund, which totals in excess of 8000 Euros!

For the latest competition updates on Twitter follow @adaprogrammers and use the hashtag #MakewithAda to share your ideas!

]]>
Research Corner - Proving Security of Binary Programs with SPARK http://blog.adacore.com/research-corner-proving-security-of-binary-programs-with-spark Thu, 16 Jun 2016 04:00:00 +0000 Yannick Moy http://blog.adacore.com/research-corner-proving-security-of-binary-programs-with-spark

The paper attached titled "A Proof Infrastructure for Binary Programs" is the result of two years of work by researchers from Dependable Computing and Zephyr Software LLC. This is an interesting use of SPARK as an intermediate language. Other uses of SPARK as intermediate language are from code generation for modeling languages such as Simulink (in QGen qualifiable code generator) and SCADE (in KCG qualified code generator). It's interesting to see the reasons for which these researchers chose SPARK as intermediate language in this work (quoted from the article):

  • The SPARK Ada language has been designed for proof and includes syntactic structures to enable definition of the necessary verification conditions.
  • SPARK Ada is familiar to many in the community and simple to use.
  • SPARK Pro proof tools provide the capability to establish necessary proofs.
  • SPARK Pro has industrial-strength support thereby allowing the technology to be adopted by practitioners.
  • SPARK Pro provides an executable specification that can be tested.

These are indeed the main goals we have pursued with SPARK, so we're glad to see these researchers agreeing with us on these points. We also happen to agree with their conclusion:

"The SPARKPro toolchain has the advantage of being able to run multiple proofs in parallel with most proofs discharged automatically. Additionally, the SPARK Ada representation can be compiled into an executable program that could allow for verification by testing for representational accuracy. A current disadvantage of the toolchain is that, when proofs are not discharged automatically, completing the proof manually can be difficult."

Providing a smoother path from fully automatic proof to inclusion of manual proof is one of the main goals we're pursuing now, with recent advances being made in a new SPARK lemma library. For the current features in SPARK for manual proof, see examples in the SPARK User's Guide.

Legal notice: the original source for the attached article is copyrighted and published by Springer, published here by special authorization of Springer.


Attachments

]]>
C library bindings: GCC plugins to the rescue http://blog.adacore.com/bindings-gcc-plugins Mon, 13 Jun 2016 13:58:00 +0000 Pierre-Marie de Rodat http://blog.adacore.com/bindings-gcc-plugins

I recently started working on an Ada binding for the excellent libuv C library. This library provides a convenient API to perform asynchronous I/O under an event loop, which is a popular way to develop server stacks. A central part of this API is its enumeration type for error codes: most functions use it. Hence, one of the first things I had to do was to bind the enumeration type for error codes. Believe it or not: this is harder than it first seems!

The enumeration type itself is defined in uv.h and makes a heavy use of C macros. Besides, the representation of each enumerator can be platform dependent. Because of this, I would like to get the type binding automatically generated. But only the enumeration type: I want to bind the rest manually to expose an Ada API that is as idiomatic as possible.

Don’t we already have a tool for this?

The first option that came to my mind was to use the famous -fdump-ada-spec GCC flag:

subtype uv_errno_t is unsigned;
UV_E2BIG : constant uv_errno_t := -7;
UV_EACCES : constant uv_errno_t := -13;
UV_EADDRINUSE : constant uv_errno_t := -98;
--  [...]

That’s a good start but I wanted something better. First, uv_errno_t in C really is an enumeration: each value is independent, there is no bit-field game, so what I want is a genuine Ada enumerated type. Moreover, the output from -fdump-ada-spec is raw and supposed to be used as a starting base to be refined. For instance I would have to rework the casing of identifiers and remove the UV_ prefix.

Besides, the output contains the type definition… but also bindings for the rest of the API (other types, subprograms, …), so I would have to filter out the rest. Overall, given that this binding will have to be generated for each platform, using -fdump-ada-spec does not look convenient.

Cool kids handle C with Clang…

My second thought was to use Clang. Libclang’s Python API makes it possible to visit the whole parse tree, which sounds good. I would write a script to find the enumeration type definition, then iterate over its children so that I could list the couples of names and representation values.

Unfortunately, this API does not expose key data such as the value of an integer literal. This is a deal breaker for my binding generator: no integer literal means no representation value.

And now, what? Building a dummy compilation unit with debug information and extract enumeration data out of the generated DWARF looks cumbersome. No please… don’t tell me I'll have to parse preprocessed C code using regular expressions! Then I remembered something I heard some time ago, a technology existed called… GCC plugins.

Sure, I could probably have done this with Clang’s C++ API, but people working with Ada are much more likely to have GCC available rather than a Clang plugin development setup. Also, I was looking for an opportunity to check how the GCC plugins worked. ☺

… but graybeards use GCC plugins

Since GCC 4.5, it is possible for everyone to extend the compiler without re-building GCC, for instance using your favorite GNU/Linux distribution’s pre-packaged compiler. The GCC wiki lists several plugins such as the famous DragonEgg: an LLVM backend for GCC. Note that the material here assumes it works with GCC 6 but the plugin events have changed a bit since the introduction of plugins.

So how does that work? Each plugin is a mere shared library (.so file on GNU/Linux) that is expected to expose a couple of specific symbols: a license “manifest” and an entry point function. The most straightforward way to write plugins is to use the same programming language as GCC: C++. This way, access to all internals is going to look like GCC’s own code. Note that plugins exist to expose internals in other languages: see for instance MELT or Python.

Loading a plugin is easy: just add a -fplugin=/path/to/plugin.so argument to your gcc/g++ command line: see the documentation for more details.

Now, remember the compiler pipeline architectures that you learnt at school:

Typical compiler pipeline

The GCC plugin API makes it possible to run code at various points in the pipeline: before and after individual function parsing, after type specifier parsing and before each optimization pass, etc. At plugin initialization, a special function is executed and it registers which function should be executed, and when in the pipeline (see API documentation).

But what can plugins actually do? I would say: a lot… but not anything. Plugins interact directly with GCC’s internals, so they can inspect and mutate intermediate representations whenever they are run. However, they cannot change how existing passes work, so for instance plugins will not be able to hook into debug information generation.

Let’s bind

Getting back to our enumeration business: it happens that all we need to do is to process enumeration type declarations right after they have been parsed. Fantastic, let’s create a new GCC plugin. Create the following Makefile:

# Makefile
HOST_GCC=g++
TARGET_GCC=gcc
PLUGIN_SOURCE_FILES= bind-enums.cc
GCCPLUGINS_DIR:= $(shell $(TARGET_GCC) -print-file-name=plugin)
CXXFLAGS+= -I$(GCCPLUGINS_DIR)/include -fPIC -fno-rtti -O2 -Wall -Wextra

bind-enums.so: $(PLUGIN_SOURCE_FILES)
        $(HOST_GCC) -shared $(CXXFLAGS) $^ -o $@

Then create the following plugin skeleton:

/* bind-enums.cc */
#include <stdio.h>

/* All plugin sources should start including "gcc-plugin.h".  */
#include "gcc-plugin.h"
/* This let us inspect the GENERIC intermediate representation.  */
#include "tree.h"

/* All plugins must export this symbol so that they can be linked with
   GCC license-wise.  */
int plugin_is_GPL_compatible;

/* Most interesting part so far: this is the plugin entry point.  */
int
plugin_init (struct plugin_name_args *plugin_info,
             struct plugin_gcc_version *version)
{
  (void) version;

  /* Give GCC a proper name and version number for this plugin.  */
  const char *plugin_name = plugin_info->base_name;
  struct plugin_info pi = { "0.1", "Enum binder plugin" };
  register_callback (plugin_name, PLUGIN_INFO, NULL, &pi);

  /* Check everything is fine displaying a familiar message.  */
  printf ("Hello, world!\n");

  return 0;
}

Hopefully, the code and its comments are self-explanatory. The next steps are to build this plugin and actually run it:

# To run in your favorite shell
$ make
[…]
$ gcc -c -fplugin=$PWD/bind-enums.so random-source.c
Hello, world!

So far, so good. Now let’s do something useful with this plugin: handle the PLUGIN_FINISH_TYPE event to process enumeration types. There are two things to do:

  • create a function that will get executed everytime something fires at this event;
  • register this function as a callback for this event.
/* To be added before "return" in plugin_init.  */
  register_callback (plugin_name, PLUGIN_FINISH_TYPE,
                     &handle_finish_type, NULL);

/* To be added before plugin_init.  */

/* Given an enumeration type (ENUMERAL_TYPE node) and a name for it
   (IDENTIFIER_NODE), describe its enumerators on the standard
   output.  */
static void
dump_enum_type (tree enum_type, tree enum_name)
{
  printf ("Found enum %s:\n", IDENTIFIER_POINTER (enum_name));

  /* Process all enumerators.  These are encoded as a linked list of
     TREE_LIST nodes starting from TYPE_VALUES and following
     TREE_CHAIN links.  */
  for (tree v = TYPE_VALUES (enum_type);
       v != NULL;
       v = TREE_CHAIN (v))
  {
    /* Get this enumerator's value (TREE_VALUE).  Give up if it's not
       a small integer.  */
    char buffer[128] = "\"<big integer>\"";
    if (tree_fits_shwi_p (TREE_VALUE (v)))
      {
        long l = tree_to_shwi (TREE_VALUE (v));
        snprintf (buffer, 128, "%li", l);
      }

    printf ("  %s = %s\n",
            IDENTIFIER_POINTER (TREE_PURPOSE (v)),
            buffer);
  }
}

/* Thanks to register_callback, GCC will call the following for each
   parsed type specifier, providing the corresponding GENERIC node as
   the "gcc_data" argument.  */
static void
handle_finish_type (void *gcc_data, void *user_data)
{
  (void) user_data;
  tree t = (tree) gcc_data;

  /* Skip everything that is not a named enumeration type.  */
  if (TREE_CODE (t) != ENUMERAL_TYPE
      || TYPE_NAME (t) == NULL)
    return;

  dump_enum_type (t, TYPE_NAME (t));
}

These new functions finally feature GCC internals handling. As you might have guessed, tree is the type name GCC uses to designate a GENERIC node. The set of kinds for nodes is defined in tree.def (TREE_CODE (t) returns the node kind) while tree attribute getters and setters are defined in tree.h. You can find a friendlier and longer introduction to GENERIC in the GCC Internals Manual.

By the way: how do we know what GCC passes as the “gcc_data” argument? Well it’s not documented… or more precisely, it’s documented in the source code!

Rebuild the plugin and then run it on a simple example:

$ make
[…]
$ cat <<EOF > test.h
> enum simple_enum { A, B };
> enum complex_enum { C = 1, D = -3};
> typedef enum { E, H } typedef_enum;
$ gcc -c -fplugin=$PWD/bind-enums.so test.h 
Found enum simple_enum:
  A = 0
  B = 1
Found enum complex_enum:
  C = 1
  D = -3

That’s good! But wait: the input example contains 3 types whereas the plugin mentions only the first two, where’s the mistake?

This is actually expected: the first two enumerations have names (simple_enum and complex_enum) while the last one is anonymous. It’s actually the typedef that wraps it that has a name (typedef_enum). The PLUGIN_FINISH_TYPE event is called on the anonymous enum type, but as it has no name, the guard skips it: see the code above: /* Skip everything that is not a named enumeration type. */.

We need names to produce bindings, so let’s process typedef nodes.

/* To be added before "return" in plugin_init.  */
  register_callback (plugin_name, PLUGIN_FINISH_DECL,
                     &handle_finish_decl, NULL);

/* To be added before plugin_init.  */

/* Like handle_finish_type, but called instead for each parsed
   declaration.  */
static void
handle_finish_decl (void *gcc_data, void *user_data)
{
  (void) user_data;
  tree t = (tree) gcc_data;
  tree type = TREE_TYPE (t);

  /* Skip everything that is not a typedef for an enumeration type.  */
  if (TREE_CODE (t) != TYPE_DECL
      || TREE_CODE (type) != ENUMERAL_TYPE)
    return;

  dump_enum_type (type, DECL_NAME (t));
}

The PLUGIN_FINISH_DECL event is triggered for all parsed declarations: functions, arguments, variables, type definitions and so on. We want to process only typedefs (TYPE_DECL) that wrap (TREE_TYPE) enumeration types, hence the above guard.

Rebuild the plugin and run it once again:

$ make
[…]
$ gcc -c -fplugin=$PWD/bind-enums.so test.h 
Found enum simple_enum:
  A = 0
  B = 1
Found enum complex_enum:
  C = 1
  D = -3
Found enum typedef_enum:
  E = 0
  H = 1

Fine, it looks like we covered all cases. What remains to be done now is to tune the plugin so that its output is easy to parse, for instance using the JSON format. Then to write a simple script that turns this JSON data into the expected Ada spec for our enumeration types, this is easy enough but it goes beyond the scope of what is an already long enough post, don’t you think?

Help! My plugin went mad!

So you have started to write your own plugins but something is wrong: unexpected results, GCC internal error, crash, … How can we investigate such issues?

Of course you can do like hardcore programmers do and sow debug prints all over your code. However, if you are more like me and prefer using a debugger or tools such as Valgrind, there is a way. First edit your Makefile so that it builds with debug flags: replace the occurence of -O2 with -O0 -g3. This stands for: no optimization and debug information including macros. Then rebuild the plugin.

Running GDB over the gcc/g++ command is not the next step as it’s just a driver that will spawn subprocesses which perform the actual compilation. Instead, run your usual gcc/g++ command with additional -v -save-temps flags. This will print a lot, including the command lines for the various subprocesses involved:

$ gcc -c -fplugin=$PWD/bind-enums.so test.h -v -save-temps |& grep fplugin
COLLECT_GCC_OPTIONS='-c' '-fplugin=/tmp/bind-enums/bind-enums.so' '-v' '-save-temps' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-pc-linux-gnu/6.1.1/cc1 -E -quiet -v -iplugindir=/usr/lib/gcc/x86_64-pc-linux-gnu/6.1.1/plugin test.h -mtune=generic -march=x86-64 -fplugin=/tmp/bind-enums/bind-enums.so -fpch-preprocess -o test.i
COLLECT_GCC_OPTIONS='-c' '-fplugin=/tmp/bind-enums/bind-enums.so' '-v' '-save-temps' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-pc-linux-gnu/6.1.1/cc1 -fpreprocessed test.i -iplugindir=/usr/lib/gcc/x86_64-pc-linux-gnu/6.1.1/plugin -quiet -dumpbase test.h -mtune=generic -march=x86-64 -auxbase test -version -fplugin=/tmp/bind-enums/bind-enums.so -o test.s --output-pch=test.h.gch

The above output contains four lines: let’s forget about line 1 and line 3; line 2 is the preprocessor invocation (note the -E flag), while line 4 performs the actual C compilation. As it's the latter that actually parses the input as C code, it's the one that triggers the plugin events. And it's the one to run under GDB:

# Run GDB in quiet mode so that we are not flooded in verbose output.
$ gdb -q --args /usr/lib/gcc/x86_64-pc-linux-gnu/6.1.1/cc1 -fpreprocessed test.i -iplugindir=/usr/lib/gcc/x86_64-pc-linux-gnu/6.1.1/plugin -quiet -dumpbase test.h -mtune=generic -march=x86-64 -auxbase test -version -fplugin=/tmp/bind-enums/bind-enums.so -o test.s --output-pch=test.h.gch
Reading symbols from /usr/lib/gcc/x86_64-pc-linux-gnu/6.1.1/cc1...(no debugging symbols found)...done.

# Put a breakpoint in your plugin. The plugin is a dynamically loaded shared
# library, so it's expected that GDB cannot find the plugin yet.
(gdb) b plugin_init
Function "plugin_init" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (plugin_init) pending.

# Run the compiler: thanks to the above breakpoint, we will get in plugin_init.
(gdb) run
Starting program: /usr/lib/gcc/x86_64-pc-linux-gnu/6.1.1/cc1 -fpreprocessed test.i -iplugindir=/usr/lib/gcc/x86_64-pc-linux-gnu/6.1.1/plugin -quiet -dumpbase test.h -mtune=generic -march=x86-64 -auxbase test -version -fplugin=/tmp/bind-enums/bind-enums.so -o test.s --output-pch=test.h.gch

Breakpoint 1, plugin_init (plugin_info=0x1de5d30, version=0x1c7fdc0) at bind-enums.cc:63
63        const char *plugin_name = plugin_info->base_name;
(gdb) # Victory!

Debugging can lead you out of your plugin, into GCC’s own code. If this happens, you will need to build your own GCC to include debug information. This is a complex task for which I know, unfortunately, there is no documentation.

Final words

As this small example (final files attached) demonstrates, GCC plugins can be quite useful. This time, we just hooked into some kind of parse tree but it’s possible to deal with all intermediate representations in the compilation pipeline: go and check out what the plugins listed on GCC’s wiki can do!

Attachments

]]>
Make with Ada: ARM Cortex-M CNC controller http://blog.adacore.com/make-with-ada-arm-cortex-m-cnc-controller Wed, 01 Jun 2016 14:03:42 +0000 Fabien Chouteau http://blog.adacore.com/make-with-ada-arm-cortex-m-cnc-controller


I started this project more than a year ago. It was supposed to be the first Make with Ada project but it became the most challenging from both, the hardware and software side.

CNC and Gcode

CNC stands for Computerized Numerical Control. It’s the automatic control of multi-axes machines, such as lathes, mills, laser cutters or 3D printers.

Most of the CNC machines are operated via the Gcode language. This language provides commands to move the machine tool along the different axes.

Here are some examples of commands:

M17 ; Enable the motors
G28 ; Homing. Go to a known position, usually at the beginning of each axis
G00 X10.0 Y20.0 ; Fast positioning motion. Move to (X, Y) position as fast as possible
G01 X20.0 Y30.0 F50.0 ; Linear motion. Move to (X, Y) position at F speed (mm/sec)
G02/G03 X20.0 Y10.0 J-10.0 I0.0 ; Circular motion. Starting from the current position, move to (X, Y) along the circle of center (Current_Position + (I, J))
M18 ; Disable the motors

The Gcode above will produce this tool motion:

Most of the time, machinists will use a software to generate Gcode from a CAD design file. In my case, I used Inkscape (the vector graphics editor) and a plugin that generates Gcode from the graphics. Here is an example of Gcode which I used with my machine: make_with_ada.gcode.

Hardware

To achieve precise movements, CNC machines use a special kind of electric motors: stepper motors. With the appropriate electronic driver, stepper motors rotate by a small fixed angle every time a step signal is received by the driver. (more info).

The rotation motion of the motor is translated to linear motion with a leadscrew. With the characteristic of both, the motor and leadscrew, we can determine the number of steps required to move one millimeter. This information is then used by the CNC controller to convert motion commands into a precise number of steps to be executed on each motor.


To create my own small CNC machine, I used tiny stepper motors from old DVD drives and floppy disk drives. I mounted them on the enclosure of one of the DVD drives. The motors are driven by low voltage stepper drivers from Pololu and the software is running on an STM32F4 Discovery.

Software

My CNC controller is greatly inspired by the Grlb project. Besides the fact that my controller is running on an STM32F4 (ARM Cortex-M4F) and Grbl is running on an Arduino (AVR 8-bit), the main difference is the use of tasking provided by the Ada language and the Ravenscar run-time.

The embedded application is running in 3 tasks:

  1. Gcode interpreter: This task waits for Gcode commands from the UART port, parses them and translates all the motion commands into absolute linear movements (Motion blocks). The circle interpolations are transformed into a list of linear motions that will approximate the circle.
  2. The planner: This task takes motion blocks as an input and splits each one into multiple segments. A segment is a portion of a motion block with a constant speed. By setting the speed of each segment within a block, the planner can create acceleration and deceleration profiles (as seen in the video).
  3. Stepper: This is a periodic task that will generate step signals to the motors. The frequency of the task will depend on the feed rate required for the motion and the acceleration profile computed by the planner. Higher frequency means more steps per second and therefore higher motion speed.

Gcode simulator and machine interface

To be able to quickly evaluate my Gcode/Stepper algorithm and to be able to control my CNC machine, I developped a native (Linux/Windows) application.

The center part of the application shows the simulated motions from the Gcode.The top left button matrix provides manual control of the CNC machine, for example ‘move left’ by 10 mm. The left text view shows the Gcode to be simulated/executed. The bottom text view (empty on this screenshot) shows the message sent to us by the machine.

The Ada code running in the microcontroller and the code running in the simulation tool are 100% the same. This allows for very easy development, test and validation of the Gcode interpreter, motion planner and stepper algorithm.

Give me the code!!!

As with all the Make with Ada projects, the code is available on GitHub: here. Fork it, build it, use it, improve it.

]]>
GNATprove Tips and Tricks: Using the Lemma Library http://blog.adacore.com/gnatprove-tips-and-tricks-using-the-lemma-library Wed, 25 May 2016 04:00:00 +0000 Yannick Moy http://blog.adacore.com/gnatprove-tips-and-tricks-using-the-lemma-library

A well-know result of computing theory is that the theory of arithmetic is undecidable: one cannot design an algorithm which takes any arithmetic formula (with quantifiers and the usual arithmetic operators) and returns "true" when it is an "always true" formula. This is already the fact with addition, subtraction and multiplication as only arithmetic operators (called Peano arithmetic). Only when one limits operators to addition and subtraction (called Presburger arithmetic) do we have decision procedures, that is, algorithms to decide if a formula is always true or not.

This has practical consequences in automatic proof of programs which manipulate numbers. The provers that we use in SPARK have a good support for addition and subtraction, but much weaker support for multiplication and division (except for multiplication by an integer constant which is treated as repeated additions). This means that as soon as the program has multiplications and divisions, it is likely that some checks won't be proved automatically. Until recently, the only way forward was either to complete the proof using an interactive prover (like Coq or Isabelle/HOL) or to justify manually the message about an unproved check. There is now a better way to prove automatically such checks, using the recent SPARK lemma library.

Let's see how this works on an example of code, which applies the same ratio to all numbers in a sorted array:

package Math with
  SPARK_Mode
is
   subtype Value is Integer range 0 .. 10_000;
   type Index is range 1 .. 100;
   type Values is array (Index) of Value;

   function Sorted (V : Values) return Boolean is
     (for all J in Index'First .. Index'Last - 1 => V(J) <= V(J+1));

   subtype Sorted_Values is Values with
     Dynamic_Predicate => Sorted (Sorted_Values);

   procedure Apply_Ratio (V : in out Sorted_Values; Num, Denom : Value) with
     Pre  => Denom /= 0 and then Num <= Denom,
     Post => (for all J in Index => V(J) = V'Old(J) * Num / Denom);

end Math;

The implementation goes over the array, applying the ratio to each element in turn:

package body Math with
  SPARK_Mode
is
   procedure Apply_Ratio (V : in out Sorted_Values; Num, Denom : Value) is
   begin
      for J in Index loop
         V(J) := V(J) * Num / Denom;
      end loop;
   end Apply_Ratio;

end Math;

In order to prove the postcondition of Apply_Ratio, we must add a suitable loop invariant that states that:

  1. the ratio has been applied to all elements up to the current element
  2. all other elements are still equal to their value when entering the loop

Note that this kind of loop is quite common, and the most precise loop invariant (as given above) for these loops is described in the SPARK User's Guide:

pragma Loop_Invariant
           (for all K in Index'First .. J => V(K) = V'Loop_Entry(K) * Num / Denom);
         pragma Loop_Invariant
           (for all K in J + 1 .. Index'Last => V(K) = V'Loop_Entry(K));

With this loop invariant, the postcondition of procedure Apply_Ratio is proved, but two checks are not proved on the assignment to V(J) inside the loop:

math.adb:7:10: medium: predicate check might fail
math.adb:7:29: medium: range check might fail

Let's start with the unproved range check on V(J) * Num / Denom. We know that this value is between 0 and V(J), because Num / Denom is a ratio, as expressed in the precondition of Apply_Ratio: Num <= Denom. But because this expression involves multiplication and division between variables, current automatic provers have no clue! The solution here is to call a lemma that provides this information in its postcondition. Fortunately, these is such a lemma in the SPARK lemma library:

   procedure Lemma_Mult_Scale
     (Val         : Int;
      Scale_Num   : Nat;
      Scale_Denom : Pos;
      Res         : Int)
   with
     Global => null,
     Pre  => Scale_Num <= Scale_Denom and then
             Res = (Val * Scale_Num) / Scale_Denom,
     Post => abs (Res) <= abs (Val) and then
             (if Val >= 0 then Res >= 0 else Res <= 0);

Let's call this lemma in our code as follows:

for J in Index loop
         Lemma_Mult_Scale (Val         => V(J),
                           Scale_Num   => Num,
                           Scale_Denom => Denom,
                           Res         => V(J) * Num / Denom);
         V(J) := V(J) * Num / Denom;
      ...

Now the range check is proved by GNATprove. So let's turn to the predicate check. The issue is that variable V is of type Sorted_Values which is subject to predicate Sorted. Hence, V should still be sorted after the assignment to V(J). This holds because the array was originally sorted, so applying the same ratio to each element in increasing order (i.e. from left to right) maintains the property that the array is sorted. Mathematically, this relies on the properties of multiplication and division, which are monotonic on natural numbers: given two non-negative values A and B such that A is less or equal to B, multiplying or dividing them by a positive value C results in two values A' and B' such that A' is less or equal to B'. Again, current automatic provers have a hard time figuring this out, so we're going to call lemmas to provide this information. Here they are from the SPARK lemma library:

procedure Lemma_Mult_Is_Monotonic
     (Val1   : Int;
      Val2   : Int;
      Factor : Nat)
   with
     Global => null,
     Pre  => Val1 <= Val2,
     Post => Val1 * Factor <= Val2 * Factor;

   procedure Lemma_Div_Is_Monotonic
     (Val1  : Int;
      Val2  : Int;
      Denom : Pos)
   with
     Global => null,
     Pre  => Val1 <= Val2,
     Post => Val1 / Denom <= Val2 / Denom;

Now, we need to apply the lemmas to some values. Here, we'd like to apply them to the initial values of V(J-1) and V(J), which need to be maintained in the same order while they are multiplied by Num and divided by Denom. The initial value of V(J) is available before assigning to V(J), but not the initial value of V(J-1) which has been updated already. So we'll use ghost code to make it available at the J'th iteration of the loop, by declaring a local ghost variable in procedure Apply_Ratio:

 Prev_Value : Value := 0 with Ghost;

which is updated before the assignment to V(J):

Prev_Value := V(J);

and whose value is passed on for proof to the next iteration of the loop through a loop invariant:

pragma Loop_Invariant (Prev_Value = V'Loop_Entry(J));

That's it! We can now call the lemmas in our code as follows:

for J in Index loop
         Lemma_Mult_Is_Monotonic (Val1   => Prev_Value,
                                  Val2   => V(J),
                                  Factor => Num);
         Lemma_Div_Is_Monotonic (Val1  => Prev_Value * Num,
                                 Val2  => V(J) * Num,
                                 Denom => Denom);
         pragma Assert (if J > Index'First then
                          V(J - 1) <= V(J) * Num / Denom);

         Prev_Value := V(J);
         V(J) := V(J) * Num / Denom;
      ...

Note the intermediate assertion which expresses the desired property proved by the use of lemmas. Now, GNATprove proves the intermediate assertion by using the lemmas, and it proves the predicate check by using the intermediate assertion. Here is the final code of the procedure Apply_Ratio, which is fully proved by GNATprove:

with SPARK.Integer_Arithmetic_Lemmas; use SPARK.Integer_Arithmetic_Lemmas;

package body Math with
  SPARK_Mode
is
   procedure Apply_Ratio (V : in out Sorted_Values; Num, Denom : Value) is
      Prev_Value : Value := 0 with Ghost;
   begin
      for J in Index loop
         Lemma_Mult_Scale (Val         => V(J),
                           Scale_Num   => Num,
                           Scale_Denom => Denom,
                           Res         => V(J) * Num / Denom);
         Lemma_Mult_Is_Monotonic (Val1   => Prev_Value,
                                  Val2   => V(J),
                                  Factor => Num);
         Lemma_Div_Is_Monotonic (Val1  => Prev_Value * Num,
                                 Val2  => V(J) * Num,
                                 Denom => Denom);
         pragma Assert (if J > Index'First then
                          V(J - 1) <= V(J) * Num / Denom);

         Prev_Value := V(J);
         V(J) := V(J) * Num / Denom;

         pragma Loop_Invariant (Sorted (V));
         pragma Loop_Invariant (Prev_Value = V'Loop_Entry(J));
         pragma Loop_Invariant
           (for all K in Index'First .. J => V(K) = V'Loop_Entry(K) * Num / Denom);
         pragma Loop_Invariant
           (for all K in J + 1 .. Index'Last => V(K) = V'Loop_Entry(K));
      end loop;
   end Apply_Ratio;

end Math;

[The loop invariant stating that Sorted(V) is currently needed due to how type predicates are handled around loops, but this will soon be inferred automatically by GNATprove.]

The SPARK lemma library thus fills an interesting point in the matrix of techniques between fully automated proof and interactive proof, which can be depicted on this graph:

On the Y axis, the complexity of the property to prove increases. On the X axis, the complexity for the user to apply the technique increases. So we start with the default settings (no user effort), which are good to prove properties with a (possibly complex) boolean structure and linear integer arithmetic, then continue with higher proof levels (switch --level of GNATprove with values 1, 2, 3 or 4) to prove more complex properties with quantifiers and modular arithmetic, then continue with the SPARK lemma library for nonlinear arithmetic, then with intermediate assertions and ghost code (which we used also here), then manual proof using an interactive theorem prover.

For more details about the SPARK lemma library, see the SPARK User's Guide.

]]>
Quantifying over Elements of a Container http://blog.adacore.com/quantifying-over-elements-of-a-container Tue, 17 May 2016 04:00:00 +0000 Claire Dross http://blog.adacore.com/quantifying-over-elements-of-a-container

Containers holding several items of the same type such as arrays, lists, or sets are a common occurrence in computer programs. Stating a property over such containers often involves quantifying over the elements they contain.

As seen in a previous post, in SPARK 2014, it is possible to allow quantification over any such container type using an Iterable aspect.
This aspect provides the primitives of a container type that will be used to iterate over its content. For example, if we write:

type Container is private with
     Iterable => (First       => First,
                  Next        => Next,
                  Has_Element => Has_Element);

where

function First (S : Set) return Cursor;
   function Has_Element (S : Set; C : Cursor) return Boolean;
   function Next (S : Set; C : Cursor) return Cursor;

then quantification over containers can be done using the type Cursor. For example, we could state:

(for all C in S => P (Element (S, C)))

to say that S only contains elements for which a property P holds. For execution, this expression is translated as a loop using the provided First, Has_Element, and Next primitives. For proof, it is translated as a logic quantification over every element of type Cursor. To restrict the property to cursors that are actually valid in the container, the provided function Has_Element is used. For example, the property stated above becomes:

(for all C : Cursor => (if Has_Element (S, C) then P (Element (S, C)))

Like for the standard Ada iteration mechanism, it is possible to allow quantification directly over the elements of the container by providing in addition an Element primitive to the Iterable aspect. For example, if we write:

 type Container is private with
     Iterable => (First       => First,
                  Next        => Next,
                  Has_Element => Has_Element
                  Element     => Element);

where

function Element (S : Set; C : Cursor) return Element_Type;

then quantification over containers can be done directly on its elements. For example, we could rewrite the above property into:

 (for all E of S => P (E))

For execution, quantification over elements of a container is translated as a loop over its cursors. In the same way, for proof, quantification over elements of a container is no more than syntactic sugar for quantification over its cursors. For example, the above property is translated using quantification over cursors :

   (for all C : Cursor => (if Has_Element (S, C) then P (Element (S, C)))

Depending on the application, this translation may be too low-level and introduce an unnecessary burden on the automatic provers. As an example, let us consider a package for functional sets:

package Sets with SPARK_Mode is

   type Cursor is private;
   type Set (<>) is private with
     Iterable => (First       => First,
                  Next        => Next,
                  Has_Element => Has_Element,
                  Element     => Element);

   function Mem (S : Set; E : Element_Type) return Boolean with
     Post => Mem'Result = (for some F of S => F = E);

   function Intersection (S1, S2 : Set) return Set with
     Post => (for all E of Intersection'Result => Mem (S1, E) and Mem (S2, E))
     and (for all E of S1 =>(if Mem (S2, E) then Mem (Intersection'Result, E)));
(...)

Sets contain elements of type Element_Type. The most basic operation on sets is membership test, here provided by the Mem subprogram. Every other operation, such as intersection here, is then specified in terms of members. Iteration primitives First, Next, Has_Element, and Element, that take elements of a private type Cursor as an argument, are only provided for the sake of quantification.

Following the scheme described previously, the postcondition of Intersection is translated for proof as:

(for all C : Cursor =>
      (if Has_Element (Intersection'Result, C) then
             Mem (S1, Element (Intersection'Result, C))
         and Mem (S2, Element (Intersection'Result, C))))
and
(for all C1 : Cursor =>
      (if Has_Element (S1, C1) then
             (if Mem (S2, Element (S1, C1)) then
                   Mem (Intersection'Result, Element (S1, C1)))))

Using the postcondition of Mem, this can be refined further into:

(for all C : Cursor =>
      (if Has_Element (Intersection'Result, C) then
             (for some C1 : Cursor =>
                   Has_Element (S1, C1) and Element (Intersection'Result, C) = Element (S1, C1))
         and (for some C2 : Cursor =>
                   Has_Element (S2, C2) and Element (Intersection'Result, C) = Element (S2, C2)))))
and
(for all C1 : Cursor =>
      (if Has_Element (S1, C1) then
             (if (for some C2 : Cursor =>
                   Has_Element (S2, C2) and Element (S1, C1) = Element (S2, C2)))
then(for some C : Cursor =>
                        Has_Element (Intersection'Result, C)
                    and Element (Intersection'Result, C) = Element (S1, C1))))))

Though perfectly valid, this translation may produce complicated proofs, especially when verifying complex properties over sets. For SPARK formalization of the new ada-trait-based containers, we introduced a new GNATprove annotation, that can be used to change the way for ... of quantification is translated. It can be used to provide GNATprove with a "Contains" function, that will be used for quantification. For example, on our sets, we could write:

function Mem (S : Set; E : Element_Type) return Boolean;
pragma Annotate (GNATprove, Iterable_For_Proof, "Contains", Mem);

With this annotation, the postcondition of Intersection is translated in a simpler way, using logic quantification directly over elements:

(for all E : Element_Type =>
       (if Mem (Intersection'Result, E) then Mem (S1, E) and Mem (S2, E)))
and (for all E : Element_Type =>
       (if Mem (S1, E) then
              (if Mem (S2, E) then Mem (Intersection'Result, E))))

Note that care should be taken to provide an appropriate function contains, which returns true if and only if the element E is present in S. This assumption will not be verified by GNATprove.

The annotation Iterable_For_Proof can also be used in another case. Operations over complex data structures are sometimes specified using operations over a simpler model type. In this case, it may be more appropriate to translate for ... of quantification as quantification over the model's cursors. As an example, let us consider a package of linked lists that is specified using a sequence that allows accessing the element stored at each position:

package Lists with SPARK_Mode is

   type Sequence is private with
     Ghost,
     Iterable => (...,
                  Element     => Get);
   function Length (M : Sequence) return Natural with Ghost;
   function Get (M : Sequence; P : Positive) return Element_Type with
     Ghost,
     Pre => P <= Length (M);

   type Cursor is private;
   type List is private with
     Iterable => (...,
                  Element     => Element);

   function Position (L : List; C : Cursor) return Positive with Ghost;
   function Model (L : List) return Sequence with
     Ghost,
     Post => (for all I in 1 .. Length (Model'Result) =>
                  (for some C in L => Position (L, C) = I));

   function Element (L : List; C : Cursor) return Element_Type with
     Pre  => Has_Element (L, C),
     Post => Element'Result = Get (Model (L), Position (L, C));

   function Has_Element (L : List; C : Cursor) return Boolean with
     Post => Has_Element'Result = (Position (L, C) in 1 .. Length (Model (L)));

   procedure Append (L : in out List; E : Element_Type) with
     Post => length (Model (L)) = Length (Model (L))'Old + 1
     and Get (Model (L), Length (Model (L))) = E
     and (for all I in 1 .. Length (Model (L))'Old =>
            Get (Model (L), I) = Get (Model (L'Old), I));

   function Init (N : Natural; E : Element_Type) return List with
     Post => length (Model (Init'Result)) = N
       and (for all F of Init'Result => F = E);
(...)

Elements of lists can only be accessed through cursors. To specify easily the effects of position-based operations such as Append, we introduce a ghost type Sequence of unbounded arrays indexed by positive integers that is used to represent logically the content of the linked list in specifications. The sequence associated to a list can be constructed using the Model function. Following the usual translation scheme for quantified expressions, the last line of the postcondition of Init is translated for proof as:

(for all C : Cursor => (if Has_Element (Init'Result, C) then Element (Init'Result, C) = E));

Using the definition of Element and Has_Element, it can then be refined further into:

(for all C : Cursor => (if Position (Init'Result, C) in 1 .. Length (Model (Init'Result))
                        then Get (Model (Init'Result), Position (Init'Result, C)) = E));

To be able to link this property with other properties specified directly on models, like the postcondition of Append, it needs to be lifted to iterate over positions instead of cursors. This can be done using the postcondition of Model that states that there is a valid cursor in L for each position of of its model. This lifting requires a lot of quantifier reasoning from the prover, thus making proofs more difficult.

The GNATprove Iterable_For_Proof annotation can be used to provide GNATprove with a "Model" function, that will be called to translate quantification on complex containers toward quantification on their model. For example, on our lists, we could write:

function Model (L : List) return Sequence;
pragma Annotate (GNATprove, Iterable_For_Proof, "Model", Entity => Model);

With this annotation, the postcondition of Init is translated directly as a quantification on the elements of the result's model:

(for all I : Positive => (if I in 1 .. Length (Model (Init'Result)) 
                          then Get (Model (Init'Result), I) = E));

Like with the previous annotation, care should be taken to define the model function such that it always return a model containing exactly the same elements as L.

On the formal version of ada-trait-based containers, tuning the way quantification is translated for proof is really important. Indeed, their specification is container intensive, as up to three different functional containers are used as models of imperative containers, and also quite complex as it must allow reasoning efficiently on containers at different levels. As an example, a "Contains" Iterable_For_Proof annotation is used to avoid introducing a parasitic type of cursors in the functional model of imperative sets. It is combined with a "Model" annotation on imperative set so that quantification on elements of a set are translated directly as a high level, content based, property on functional sets rather than as a low level, cursor aware quantification. As a result, if S in an imperative set, the property:

(for all E of S => P (E))

is translated for proof as:

(for all E : Element_Type => (if Mem (Model (S), E) then P (E)))

which both makes the work of provers easier and allows a clean separation between low-level, cursor aware properties and high-level, content based specifications.

]]>
SPARKSMT - An SMTLIB Processing Tool Written in SPARK - Part I http://blog.adacore.com/sparksmt-part-1 Thu, 21 Apr 2016 04:00:00 +0000 Florian Schanda http://blog.adacore.com/sparksmt-part-1

Dear Diary,

Today I will write the first article in a short series about the development of an SMTLIB processing tool in SPARK. Instead of focusing on features, I intend to focus on the how I have proved absence of run-time errors in the name table and lexer. I had two objectives: show absence of run-time errors, and do not write useless defensive code. Today's blog will be about the name table, a data structure found in many compilers that can map strings to a unique integer and back. The next blog post will talk about the lexical analyzer.

The interface

Strings are unconstrained arrays and as such indefinite types, so keeping them around without using pointers in an Ada program is often a bit inconvenient. In particular you will not be able to use any of the definite containers, nor can we create a formal hash table with strings as the key. Thus the name table will create a unique integer for each string, but unlike a simple hash this process should be trivially reversible.

The name table provides a new private type Name_Id (which is just a Natural), and the following two subprograms:

   procedure Lookup (S : String;
                     N : out Name_Id)
   with Global => (In_Out => Name_Table),
        Pre    => Invariant,
        Post   => Invariant;
   --  Obtain name_id for the given string.

   function To_String (N : Name_Id) return String
   with Global => Name_Table,
        Pre    => Invariant;
   --  The original string.
  

You will notice at this point the Invariant. We establish this in the elaboration of the package, and from then onward we preserve it. Maintaining such an invariant is a common pattern if the proof obligations are non-trivial. The invariant is a just ghost function:

 function Invariant return Boolean
   with Ghost,
        Global => Name_Table;
   --  Internal invariant for the name table.

The central idea for proof in the name table is that you can only ever add to it, and once you have added a string to it it can never be removed. In other words, any Name_Id you have will always stay valid.

The internals

The basic concept of the name table is to have a unbounded array (implemented using Ada.Containers.Formal_Vector) containing characters, such as (f, o, o, b, a, r, ...). We then have an array that tells us how to produce strings from Name_Id objects. Such a Name_Entry is a small record:

 type Name_Entry is record
      Table_Index : Char_Table_Index;
      Length      : Positive;
   end record;
  

And the name table (again, an instance of Formal_Vector) is an array of these, for example:

 1 => (Table_Index => 1,
         Length      => 3),
   2 => (Table_Index => 4,
         Length      => 3),
   3 => (Table_Index => 1,
         Length      => 6),
   ...
  

In this example Name_Id 1 to 3 represent the string 'foo', 'bar' and 'foobar' respectively.

It should be fairly obvious how to create this table, but its not terribly efficient. We either have a lot of duplicate entries (and then we lose the ability to test if two strings are equal by only comparing their Name_Id), or we need to search the entire table for duplicates. The solution of course is to add a hash table on top of this, so we extend the Name_Entry to also be a linked list for a hash bucket.

   Hash_Table_Size : constant := 256;
   subtype Hash_Table_Index_T is Hash_Type range 0 .. (Hash_Table_Size - 1);
   type Hash_Table_T is array (Hash_Table_Index_T) of Name_Id;

   Hash_Table  : Hash_Table_T := (others => 0);

   type Name_Entry is record
      Table_Index : Char_Table_Index;
      Length      : Positive;
      Next_Hash   : Name_Id;
   end record;

The process for searching a string is now: first check the Hash_Table which will point to the first Name_Entry with a matching hash. If this is not a match, we follow Next_Hash until we get to the end of this linked list.

So far we have not proven anything, so what *are* the proof obligations? The main proof obligation comes from To_String where we wish to look into the table and then take Length characters from character array starting at Table_Index. We could write defensive code, but we can also tackle this with proof. This is the implementation of To_String:

 function To_String (N : Name_Id) return String
   is
      --  The only names are the ones produced by Lookup.
      pragma Assume (N <= Last_Index (Entry_Table));
   begin
      if N = 0 then
         return "";
      end if;
      declare
         E : constant Name_Entry := Element (Entry_Table, N);
         L : constant Positive   := E.Length;
      begin
         return S : String (1 .. L) do
            for I in Positive range 1 .. L loop
               S (I) := Element (Char_Table,
                                 E.Table_Index + Char_Table_Index (I - 1));
               pragma Loop_Invariant (Invariant);
            end loop;
         end return;
      end;
   end To_String;
  

The assumption is something we establish in two places: a) we cannot ever delete entries (this cannot be expressed yet in SPARK annotations), and when we add a new entry to the Entry_Table the Name_Id will be equal to Last_Index (Entry_Table) (this part is proven). Taken together, this justifies the assumption.

We special case the empty string, but then we just look into the table and pull out the elements of the character array. However we need to show that E.Table_Index + Char_Table_Index (I - 1) is in bounds of the container.

So, how do we do this? Essentially, we'd like to just know that any given element that we pull out of the table magically has this property. The property itself is easy enough to express:

 function Valid_Entry (E : Name_Entry) return Boolean is
      (E.Next_Hash <= Last_Index (Entry_Table) and
       Char_Table_Index (E.Length - 1) <= Last_Index (Char_Table) - E.Table_Index)
   with Ghost,
        Pre => Valid_Tables;

This predicate expresses two things: the linked list pointer will point to something inside the table (i.e. it's safe to follow the link) and that the range of characters described is within the range of the character table.

We can now easily write a predicate that expresses this property over the entire table using a quantified expression:

function Valid_Name_Table return Boolean is
      (for all I in Name_Id
         range First_Index (Entry_Table) .. Last_Index (Entry_Table) =>
         Valid_Entry (Element (Entry_Table, I)))
   with Ghost,
        Pre => Valid_Tables;
  

The precondition Valid_Tables just encodes the size limit on the two tables and is:

 function Valid_Tables return Boolean is
      (Last_Index (Entry_Table) <= Name_Id'Last and
       Last_Index (Char_Table) <= Char_Table_Index'Last)
   with Ghost;

We also need to say something about the hash table. This invariant records that any entry in the hash table will point to something sensible in the entry table (or is 0):

   function Valid_Hashes return Boolean is
      (for all H in Hash_Table_Index_T =>
         Hash_Table (H) <= Last_Index (Entry_Table))
   with Ghost;

We finally wrap this up in the single invariant:

 function Invariant return Boolean is
      ((Valid_Tables and then Valid_Name_Table) and
       Valid_Hashes);
  

Since Lookup always maintains this invariant, and it is a precondition to To_String, this is all we need to show that we will never access anything not yet allocated in the character table.

For completeness sake here is the code for Lookup (it's more complex and I'll leave it as an exercise to the reader to work out how it maintains the invariant....

  procedure Lookup (S : String;
                     N : out Name_Id)
   is
      Ptr : Name_Id := 0;
      H   : constant Hash_Table_Index_T :=
        Ada.Strings.Hash (S) mod Hash_Table_Size;
   begin
      --  Special case the empty string.
      if S'Length = 0 then
         N := 0;
         return;
      end if;

      --  Look at the first element of the hash bucket.
      N := Hash_Table (H);
      if N in Valid_Name_Id then
         --  It exists...
         Ptr := N;
         loop
            --  If it matches the string, then we stop here.
            if To_String (Ptr) = S then
               N := Ptr;
               return;
            end if;

            pragma Loop_Invariant
              (Ptr in Valid_Name_Id and
               Ptr <= Last_Index (Entry_Table) and
               Invariant);

            --  Otherwise, we follow the linked list until we hit the end.
            exit when Element (Entry_Table, Ptr).Next_Hash = 0;
            Ptr := Element (Entry_Table, Ptr).Next_Hash;
         end loop;
      end if;

      --  We have not found the string, so at this point we need to add it
      --  to the table.
      Merge (S, N);

      --  And then fix up the hash table. In the first case we need to
      --  update the last link in the hash bucket; in the second case we
      --  update the top-level hash table as its the first item in the
      --  bucket.
      if Ptr in Valid_Name_Id then
         Replace_Element (Entry_Table,
                          Ptr,
                          Element (Entry_Table, Ptr)'Update (Next_Hash => N));
      else
         Hash_Table (H) := N;
      end if;
   end Lookup;

It really helped the proof effort to separate out the code that actually adds a new entry to the table. The code for Merge (which also maintains the invariant) is shown below. Note there are two more assumptions here: that we'll never run out of memory. We can avoid doing this if we have a special Error token we can return, but here we just assume that we'll not run out of space. Also observe the postcondition for Merge it maintains the invariant and records that the Name_Id generated will be in the bounds of the entry table.

 procedure Merge (S : String;
                    N : out Name_Id)
   with Global => (In_Out   => (Char_Table, Entry_Table),
                   Proof_In => Hash_Table),
        Pre    => Invariant,
        Post   => Invariant and
                  Length (Entry_Table) = Length (Entry_Table)'Old + 1 and
                  Length (Char_Table) >= Length (Char_Table)'Old and
                  N = Last_Index (Entry_Table)
   is
      --  If this is not true, then we're out of memory on the string or
      --  entry table...
      pragma Assume (Last_Index (Char_Table)
                       < Char_Table_Index'Last - S'Length);
      pragma Assume (Last_Index (Entry_Table) < Name_Id'Last);
   begin
      --  Add the string to the character table.
      for I in Positive range 1 .. S'Length loop
         Append (Char_Table, S (I + (S'First - 1)));
         pragma Loop_Invariant
           (Invariant and
            Length (Char_Table) =
              Length (Char_Table)'Loop_Entry + Char_Tables.Capacity_Range (I));
      end loop;

      --  Construct a Name_Entry and add it to the table.
      declare
         E : constant Name_Entry :=
           (Table_Index => Last_Index (Char_Table) - (S'Length - 1),
            Length      => S'Length,
            Next_Hash   => 0);
      begin
         Append (Entry_Table, E);
      end;
      N := Last_Index (Entry_Table);
   end Merge;

Closing thoughts

Looking at the body of Lookup and To_String you will notice that there is actually not that much annotation, despite the complicated nature of the name table. The main trick to do this was to keep the invariant as an expression function.

Also note that you get 100% automatic proof with just CVC4.

In the next blog post I will write about the lexer, focusing on the termination proof.

]]>
Certification and Qualification http://blog.adacore.com/certification-and-qualification Mon, 18 Apr 2016 13:43:00 +0000 AdaCore Admin http://blog.adacore.com/certification-and-qualification

AdaCore provides several tools with certification and qualification capabilities, for the rail and avionics industry. Quentin Ochem’s presentation on “Certification and Qualification” at the 2015 AdaCore Tech Days in Boston, Massachusetts provided more information about these two standards, namely DO-178C and EN:50128:2011.

A recent case study from the UK based industry leader, UTC aerospace systems, shows how they are using CodePeer to facilitate the DO-178B certification of their digital terrain system. Our CodePeer tool was selected as it does not report false negatives and it also reduces the need to manually review the code through its static analysis features. Ultimately, the code reviews were carried out in the most efficient and effective way. CodePeer has also been qualified as a class T2 tool for data and control flow analysis under EN:50128 for rail systems.

Our latest guide to how AdaCore's many tools and technologies can be used in conjunction with EN:50128:2011 CENELEC is now available to view on our website.

If you want to be at this year’s AdaCore Tech Days, and hear about similar topics, you can join us either on the 21st - 22nd September in Boston, Massachusetts or on the 6th October in Paris, France. More information can be found here.

]]>
Efficient use of Simics for testing http://blog.adacore.com/efficient-use-of-simics-for-testing Wed, 13 Apr 2016 10:05:25 +0000 Jérôme Lambourg http://blog.adacore.com/efficient-use-of-simics-for-testing

As seen in the previous blog article, AdaCore relies heavily on virtualisation to perform the testing of its GNAT Pro products for VxWorks.

This involves roughly 350,000 tests that are run each day, including the 60,000 tests that are run on Simics. A typical test involves the following steps:

  1. Compilation of an example, with switches and sources defined in the test;
  2. Start an emulator, transfer the compiled module or RTP on it, potentially with some support files;
  3. Run the module or RTP;
  4. Retrieve the output, potentially with files created on the target;
  5. Exit from the emulator.

Compared to an equivalent test run on a native platform, those steps show two challenges:

  1. Feasibility: we need to add the proper support to enable those steps, in particular the file transfer between the host and the emulated target, and do so reliably.
  2. Efficiency: running the emulator and transferring the files back and forth can lead to a significant overhead, incompatible with the time constraints (24h maximum testing time). Some enhanced techniques are thus needed to speed up those critical steps.

Instrumentation overview

In order to accomplish this testing, we need instrumentation, both on the host side (e.g. on the simulator) and on the guest itself (the VxWorks kernel). The requirements are:

  • Be able to transfer files back and forth between guest and host;
  • Be able to retrieve the output;
  • Automatically execute the test scenario on the target.

File transfer

To support transferring files from/to the host, we used the Simics Builder module of Simics. This allowed us to create a dedicated peripheral with which we can interact to transfer files between the host and a RAM drive on the target.

We chose this solution as this can be very efficient (as opposed to transferring the files using a simulated network) while giving us a very high level of flexibility for the various tests.

The simulated peripheral takes the form of a small set of registers and a buffer. The implementation of such a peripheral as a Simics plugin is pretty straightforward. However, some care must be taken for:

  • The size of the individual registers (32bit or 64bit according to the target system)
  • The endianness of those registers

To copy files back and forth, the device responds to syscalls, with the following functions available:

  • OPEN
  • READ
  • WRITE
  • CLOSE
  • UNLINK
  • LSEEK
  • SHUTDOWN

Any write operation on the first register of the device triggers the syscall.

The 3 other registers contain the arguments, and their expected content depends on the specific syscall.

Below is the implementation of the system call of the simulated device, provided as an example of Simics plugin implementation:

static REG_TYPE do_syscall(hostfs_device_t *hfs)
{
  REG_TYPE  ID   = hfs->regs[SYSCALL_ID].value;
  REG_TYPE  arg1 = hfs->regs[ARG1].value;
  REG_TYPE  arg2 = hfs->regs[ARG2].value;
  REG_TYPE  arg3 = hfs->regs[ARG3].value;
  REG_TYPE  ret  = 0;
  char     *host_buf  = NULL;
  REG_TYPE guest_buf;
  REG_TYPE len;

  switch (ID)
    {
    case SYSCALL_OPEN:
      guest_buf = arg1;
      len = 1024; /* XXX: maximum length of filename string */
      arg2 = open_flags(arg2);

      host_buf = malloc(len);
      /* Convert guest buffer to host buffer */
      copy_from_target(hfs, guest_buf, -1, (uint8_t *)host_buf);
      ret = open(host_buf, arg2, arg3);
      free(host_buf);

      return ret;
      break;

On the VxWorks side, this call is implemented in the kernel:

PHYS_ADDR _to_physical_addr(VIRT_ADDR virtualAddr) {
  PHYS_ADDR physicalAddr;

  vmTranslate(NULL, virtualAddr, &physicalAddr);
  return physicalAddr;
}

#define TO_PHY(addr) _to_physical_addr((VIRT_ADDR)addr)

static uintptr_t
hfs_generic (uintptr_t syscall_id,
             uintptr_t arg1,
             uintptr_t arg2,
             uintptr_t arg3)
{
    uintptr_t *hostfs_register = (uintptr_t *)hostfs_addr();
    if (hostfs_register == 0) return -1;

    hostfs_register[1] = arg1;
    hostfs_register[2] = arg2;
    hostfs_register[3] = arg3;

    /* Write syscall_id to launch syscall */
    hostfs_register[0] = syscall_id;
    return hostfs_register[1];
}

uint32_t hfs_open (const char *pathname, uint32_t flags, uint32_t mode)
{
  VIRT_ADDR tmp = ensure_physical((VIRT_ADDR)pathname, strlen(pathname) + 1);
  VIRT_ADDR buf;

  if (tmp != (VIRT_ADDR)NULL) {
    memcpy ((void*)tmp, pathname, strlen(pathname) + 1);
    buf = tmp;
  } else {
    buf = (VIRT_ADDR)pathname;
  }

  return hfs_generic (HOSTFS_SYSCALL_OPEN, (uintptr_t) TO_PHY(buf), flags,
                        mode);
}

Performance considerations

On a typical server, our target is to run around 6,000 of these tests in less than 45 minutes, which means roughly 2 tests per second.

To achieve this goal, the first thing to do is to maximize the parallelism of the execution of the tests: each test can generally be run independently from one another, and also generally requires a single core. This means that on a server with 16 cores, we should be able to execute 16 tests in parallel. This also means that to achieve the target of 6000 tests in 45 minutes globally, on such server each test should target an execution time of less than 8 seconds to execute (8 * 6000 / 16 = 3000 seconds total execution time for the testsuite, so 50 minutes).

Our first experiments with Simics were pretty far from this target: depending on the simulated platform, it could take between 15 seconds to almost a minute just to start VxWorks. When trying to run several Simics instances in parallel, the numbers went even worse, as a lot of server resources were needed to start the simulation.

From what we could see, this was due to the highly configurable and scriptable nature of Simics, where the full simulated environment is built at startup. Such timings were incompatible with our target total execution time.

To address this issue, the Simics engineers pointed us to a very nice feature: The Simics checkpoint. Basically, it’s a mechanism allowing us to save the state of a Simics simulation and to restore it at will.

The restore is very fast.

So what we do now when we build a VxWorks kernel is to also create a generic Simics checkpoint at the point where the VxWorks kernel has just booted. In the Simics script we use this looks like:

script-branch {
  local $con = $system.console
  $con.wait-for-string "Run_shell"
  stop
  write-configuration "checkpoint" -z
  quit
}

And that’s it. To load the checkpoint in our tests:

read-configuration “checkpoint”

to restore the simulation with VxWorks already booted.

This mechanism pre-elaborates the simulation environment, and drastically reduces both the load on the server and the total startup time.

Conclusion

With this testing environment, we can successfully and efficiently test our compiler for VxWorks. As an example, a complete run of the ACATS test suite, containing ~3700 tests (the ACATS testsuite is the standard test suite for the Ada language) takes roughly 31 minutes on a fast linux server, which meets our target performance of 2 tests per second.

By using Simics, AdaCore can now quickly put in place an efficient quality assurance infrastructure when introducing a new VxWorks target to be supported by its GNAT Pro product, improving time-to-market and the overall quality of its products.

]]>
VectorCAST/Ada: Ada 2012 Language Support http://blog.adacore.com/experience-vectorcast-ada-a-look-at-the-latest-release Thu, 07 Apr 2016 12:17:00 +0000 AdaCore Admin http://blog.adacore.com/experience-vectorcast-ada-a-look-at-the-latest-release

We are pleased to announce that on April 27th our partner, Vector, will host a webinar to showcase their latest VectorCAST/Ada release!

Michael Frank, Vector Software's Ada Architect, will cover the following topics: Stub by Function, Change-Based Testing, Code Coverage Analysis, and Ada Language Support, which incorporates Ada 83, 95, 2005, and 2012.

Check out Vector's event page for more information on the topics and to sign up to the upcoming webinar!

]]>
Did SPARK 2014 Rethink Formal Methods? http://blog.adacore.com/did-spark-2014-rethink-formal-methods Sun, 03 Apr 2016 04:00:00 +0000 Yannick Moy http://blog.adacore.com/did-spark-2014-rethink-formal-methods

David Parnas is a well-known researcher in formal methods, who famously contributed to the analysis of the shut-down software for the Darlington nuclear power plant and designed the specification method known as Parnas tables and the development method called Software Cost Reduction. In 2010, the magazine CACM asked him to identify what was preventing more widespread adoption of formal methods in industry, and in this article on Really Rethinking Formal Methods he listed 17 areas that needed rethinking. The same year, we started a project to recreate SPARK with new ideas and new technology, which lead to SPARK 2014 as it is today. Parnas's article influenced some critical design decisions. Six years later, it's interesting to see how the choices we made in SPARK 2014 address (or not) Parnas's concerns.

IDENTIFIERS AND VARIABLES

"A better way to deal with arrays is needed. In general, it is time to rethink how states should be represented."

Parnas was concerned here that many formal methods cannot deal adequately with the complex state that programs manipulate, with records, arrays, objects, memory, encapsulation, etc. Because contracts in SPARK are really expressions of the programming language, SPARK deals naturally with any data that can be expressed in the program. And for data that cannot be expressed due to encapsulation, SPARK has the notion of abstract state.

CONVENTIONAL EXPRESSIONS OR SOMETHING MORE STRUCTURED

"It is time to look for new forms of expressions that are designed for use with the functions implemented by digital computer programs."

Parnas was concerned that mathematical formulas are not adequate for expressing the piecewise-continuous or discrete valued functions that are used in programs. SPARK contracts can use if-expressions and case-expressions to deal with these non-continuities. SPARK contracts themselves can be expressed using the non-continuous contract cases.

HIDDEN STATE: NORMAL OR EXTENSION?

"It is time to consider hidden state to be the normal case and develop methods that deal with it systematically."

In SPARK, abstract state achieves precisely this goal.

TERMINATION: NORMAL OR EXCEPTION?

"Rethinking requires asking if nontermination should be treated as the normal case in a way that lets us treat terminating programs as a special case."

This is the case in SPARK. One can prove termination by forbidding recursion and specifying loop variants for loops that may not terminate.

TIME: A SPECIAL VARIABLE OR ANOTHER VARIABLE?

"Rethinking would require serious consideration of this alternative. We gain simplicity if we do not have to treat time as anything special."

Parnas was advocating that time be modeled by additional regular variables instead of a special logic. He was mostly concerned with WCET of programs, something that is not easy to do with formal analysis of source code, and usually done by dynamic analysis of executable or sometimes by formal analysis of object code. Rather than verifying WCET with SPARK, we see an interest in verifying discrete temporal properties with SPARK, such as: "for the past 5 seconds (or 50 activations when a program is scheduled to run every 100 ms), variable INPUT1 was zero". Such a property can be specified in SPARK using ghost variables to record the past history, and verified using the standard SPARK verification toolset.

SIDE EFFECTS: NORMAL OR BAD?

"It is time to investigate methods that deal with side effects as the normal case."

In SPARK, the distinction between procedures and functions allows us to refine this goal: procedures are allowed to have side-effects by default, while functions should not have side-effects. Hence, expressions have no side-effects, but procedure calls may. And the user needs not specify the presence or exact side-effects, as the tool computes them based on the implementation.

NONDETERMINISM: NORMAL OR EXTENSION?

"Perhaps a formal method should treat nondeterminism as the normal case and deterministic programs as a special case."

In SPARK, we recognized the need for nondeterminism that programmers express in programs using volatile variables. SPARK distinguishes four cases which are represented through volatile variables: variables that may be written asynchronously, variables which may be read asynchronously, variables for which writting a value always has an effect (even if the same value is written twice), variables for which reading a value always has an effect. Using these flavors of volatile variables (and abstract state), it is possible to handle nondeterminism in SPARK at a fine-grain level.

SPECIFICATIONS: PROGRAMS OR PREDICATES?

"It is time to look for methods that use predicates on observable behavior as specifications."

In SPARK, contracts are precisely predicates over observable behavior.

WHAT CAN BE IGNORED?

"We should be looking for methods that do not ignore the finite limits that are one of the most frequent causes of bugs."

In SPARK, we prove the absence of run-time errors related to finite limits (of integers, arrays, etc.).

HOW DO WE ESTABLISH CORRESPONDENCE BETWEEN MODEL AND CODE?

"We need techniques for deriving mathematical models from program text."

In SPARK, the model being analyzed is a sound abstraction of the program semantics, as described in SPARK Reference Manual.

MATHEMATICS IN DOCUMENTATION

"Mathematical documentation should be a major research area in formal methods."

In SPARK, a subprogram contract act as a documentation of the subprogram, that is displayed as such in IDEs. In a certification context, a subprogram contract can be used to express the low-level requirements implemented in the subprogram.

PRE- AND POSTCONDITIONS

"It is time to consider abandoning the idea of pre- and postconditions. Instead of two separate conditions, we need a relation between starting state and stopping state."

In SPARK, the postcondition is precisely such a relation between the pre-state and the post-state, where the attribute Old may be used to refer to the value of modified objects in the pre-state.

CORRECTNESS PROOF OR PROPER CALCULATION?

"Researchers interested in developing practical formal methods should consider the engineering viewpoint; it replaces a vague general question [correctness] with a set of specific ones."

In SPARK, we have identified eight specific objectives that can be achieved with the use of SPARK technology.

TO CONCLUDE...

If you count the items above, you'll see I've excluded four items (of the 17 listes by Parnas) because they deal with choice of technology or terminology that are of a lesser interest. For the 13 others, the issues that Parnas deemed as relevant in 2010 are still relevant today (or so we think), and SPARK 2014 includes a suitable answer to each. To some extent, these answers were already present in the SPARK technology predating SPARK 2014. The work we've done since Parnas's article has shifted these answers towards more unification (of programming language and specification language) and automation (of generation of contracts and proof).

]]>
Provably safe programming at Embedded World http://blog.adacore.com/embedded-world-videos Wed, 16 Mar 2016 14:56:00 +0000 AdaCore Admin http://blog.adacore.com/embedded-world-videos

AdaCore continues to build reliable and secure software for embedded software development tools. Last month, we attended Embedded World 2016, one of the largest conferences of its kind in Europe, to present our embedded solutions and our expertise for safety, and mission critical applications in a variety of domains.

Embedded World 2016, for AdaCore, was centered around four major product releases. Namely, QGen 2.1 our customizable and qualifiable code generator. GNAT Pro 7.4 which incorporated several new embedded targets, amongst other enhancements. CodePeer our deep static analysis tool for Ada. SPARK Pro 16 was also released, which provides enhanced coverage of SPARK 2014 language features and more.

Our experts were on hand to talk through the demos running in Ada and SPARK, that were on display and to give further insight into the new product releases.

Day 2 of the conference saw Embedded Computing Executive VP, Rich Nass, interview AdaCore’s Commercial Team Lead, Jamie Ayre to gain an insight into the board demos on display that were running on iOS, Android, and in a bare board fashion on ARM Cortex M4 & M7.

In addition, the runtimes start from a zero footprint, depending on the customer’s needs and in the past these footprints have been used in applications that have gone into space!



Embedded News.TV, also spoke to Jamie about the game of Tetris built on an Atmel SMART SAM4S MCU, which was recently featured in a list created by ATMEL, naming the top 30 projects for 2015 that gamers will love, and this is how we made it!



The train demo also attracted a lot of attention, and as discussed in the interview the software elements are not completely obvious at first. In fact, the signalling system for the trains was written in the Ada 2012 and SPARK languages. With SPARK able to formally verify the properties of the code, this ensures that the trains will not collide at any point, including when the signal points change.



The self playing 2048 game which runs on a STM32F469N-discovery board on display at Embedded World 2016.


The Candy Dispenser with a Twist also runs on a STM32F469 discovery board with a 800x600 LCD.

]]>
CubeSat continues to orbit the Earth thanks to Ada & SPARK! http://blog.adacore.com/carl-brandon-vid Thu, 10 Mar 2016 14:52:08 +0000 AdaCore Admin http://blog.adacore.com/carl-brandon-vid

Dr Carl Brandon of Vermont Technical College and his team of students used SPARK and Ada to successfully launch a satellite into space in 2013 and it has continued to orbit the Earth ever since! At our AdaCore Tech Days in Boston last year Dr Brandon explained further.

The CubeSat was launched along with several other satellites, built as part of NASA’s 2010 CubeSat Launch Initiative (ELaNa), supporting the educational launch of micro-satellites.

Vermont Technical College’s Satellite was the only one from the same launch to remain in orbit.

But why?

The answer is simple, their satellite was programmed differently to the others as it was built using Ada and SPARK code, as opposed to C. This allowed them to ensure a successful launch and orbit. Ada only has a 10% error rate in comparison to that of C code. The SPARK 2005 toolset (the most advanced at the time of satellite production) has only 1% of the C error rate. More information can be found in a recent article that sets about comparing the two languages.

The team are programming a new satellite, The Lunar Ice Cube, using the updated SPARK 2014 language, which we hope to again see in space in the not too distant future.

Dr Brandon said that his students became competent SPARK users in only a few weeks and you can too with our free AdaCore U SPARK 2014 online courses!



If you want to be at this year’s AdaCore Tech Days, you can join us either on the 21st - 22nd September in Boston, Massachusetts or on the 6th October in Paris, France. More information can be found here.


]]>
Formal Verification of Legacy Code http://blog.adacore.com/formal-verification-of-legacy-code Thu, 10 Mar 2016 14:48:00 +0000 Yannick Moy http://blog.adacore.com/formal-verification-of-legacy-code

Just a few weeks ago, one of our partners reported a strange behavior of the well-known function Ada.Text_IO.Get_Line, which reads a line of text from an input file. When the last line of the file was of a specific length like 499 or 500 or 1000, and not terminated with a newline character, then Get_Line raised an exception End_Error instead of returning the expected string. That was puzzling for a central piece of code known to have worked for the past 10 years! But fair enough, there was indeed a bug in the interaction between subprograms in this code, in boundary cases having to do with the size of an intermediate buffer. My colleague Ed Schonberg who fixed the code of Get_Line had nonetheless the intuition that this particular event, finding such a bug in an otherwise trusted legacy piece of code, deserved a more in depth investigation to ensure no other bugs were hiding. So he challenged the SPARK team at AdaCore in checking the correctness of the patched version. He did well, as in the process we uncovered 3 more bugs.

Identifying the Preconditions

Initially, we extracted the Get_Line function and related subprograms in order to analyze them more easily with our various static analysis tools. Running CodePeer on the code detected some possible range check failure, that upon manual analysis corresponded to code being made unreachable by the recent fix. So we removed that dead code. In order to run SPARK on the code, we started with identifying the preconditions of the various subprograms being called. A critical precondition (leading to raising exception End_Error in the original problem reported by our partner) is that procedure Get_Line is not supposed to be called on an empty file. The same is true for the local procedure Get_Rest used in the function Get_Line. So we added an explicit precondition in both cases:

   procedure Get_Line
     (File : in out File_Type;
      Item : out String;
      Last : out Natural)
   with
     Pre => not End_Of_File (File);

   function Get_Rest (S : String) return String with
     Pre => not End_Of_File (File);

We then ran GNATprove on the code, and it showed places where it could not prove that the precondition was satisfied... which made us discover that the initial fix was incomplete! The same fix was needed in another place in the code to ensure that the precondition above could be satisfied always.

Specifying the Functional Behavior of Get_Line

To go beyond that initial attempt, we had to transform the code to hide low-level address manipulations under SPARK subprogram with appropriate contracts, in order to show that the code calling these subprograms was achieving the desired functionality. For example, the low-level memcpy function from libc was originally imported directly:
   procedure memcpy (s1, s2 : chars; n : size_t);
   pragma Import (C, memcpy);

and called in the code of Get_Line:

   memcpy (Item (Last + 1)'Address,
           Buf (1)'Address, size_t (N - 1));

The code above does not allow writing a suitable contract for memcpy. So we hide the low-level memcpy under a typed version:

   procedure Memcpy
     (S1 : in out String;
      S2 : String;
      From1, From2 : Positive;
      N : Natural)
   is
      pragma SPARK_Mode (Off);
      procedure memcpy (s1, s2 : chars; n : size_t);
      pragma Import (C, memcpy);
   begin
      memcpy (S1 (From1)'Address, S2 (From2)'Address, size_t (N));
   end Memcpy;

The call to memcpy above now passes in the typed variables instead of their addresses:

   Memcpy (S1 => Item, From1 => Last + 1,
           S2 => Buf, From2 => 1, N => N - 1);

This allows us to add a suitable functional contract to the typed version of Memcpy, stating that when N is positive, N characters are copied from string S2 at position From2 to string S1 at position From1:

   procedure Memcpy
     (S1 : in out String;
      S2 : String;
      From1, From2 : Positive;
      N : Natural)
   with
     Pre =>
       (if N /= 0 then
          From1 in S1'Range and then
          From1 + N - 1 in S1'Range and then
          From2 in S2'Range and then
          From2 + N - 1 in S2'Range),
     Contract_Cases =>
       (N = 0  => S1 = S1'Old,
        N /= 0 => (for all Idx in S1'Range =>
                     (if Idx in From1 .. From1 + N - 1 then S1 (Idx) = S2 (Idx - From1 + From2)
                      else S1 (Idx) = S1'Old (Idx))));

Note that we also added a suitable precondition to protect against possible buffer overflows, so that proof with SPARK would allow us to guarantee that no such buffer overflow can ever happen. The above contract is a simple one, that could be expressed directly in terms of the subprogram parameters. Subprograms that deal with the file system cannot be directly specified in this way. So instead we defined a ghost variable The_File to represent the file being read, and a ghost variable Cur_Position to represent the current position being read in this file:

   The_File     : String (Positive) with Ghost;
   Cur_Position : Positive := 1 with Ghost;

We could then define the functional contract of the low-level functions interacting with the file system in terms of their effect on these ghost variables. For example, here is the contract of procedure Fgetc which reads a character from a file (we used a procedure instead of a function in order to be able to update ghost variable Cur_Position in the call, something not allowed in SPARK functions):

   procedure Fgetc (Stream : File_Descr; Result : out Int) with
     Global => (Proof_In => The_File, In_Out => Cur_Position),
     Post => (if The_File (Cur_Position'Old) = EOF_Ch then
                Cur_Position = Cur_Position'Old and then
                Result = EOF
              elsif The_File (Cur_Position'Old) = ASCII.LF then
                Cur_Position = Cur_Position'Old and then
                Result = Character'Pos (ASCII.LF)
              else
                Cur_Position = Cur_Position'Old + 1 and then
                Result = Character'Pos (The_File (Cur_Position'Old)));

It says that Cur_Position is incremented unless newline or end-of-file is reached, and that the Result integer returned is the special EOF value for end-of-file, or the integer matching the position of the character otherwise. The most involved contract is the one of Fgets, which gets multiple characters at a time. Here is what the documentation for function fgets says:

The fgets() function reads at most one less than the number of characters specified by size from the given stream and stores them in the string str. Reading stops when a newline character is found, at end-of-file or error. The newline, if any, is retained. If any characters are read and there is no error, a `\0' character is appended to end the string.

Upon successful completion, fgets() returns a pointer to the string. If end-of-file occurs before any characters are read, it returns NULL and the buffer contents remain unchanged. If an error occurs, it returns NULL and the buffer contents are indeterminate.

We transformed it into a procedure in SPARK, so that it can update ghost variable Cur_Position. Here is the contract we wrote to express the informal specification above:

   procedure Fgets
     (Strng   : in out String;
      N       : Natural;
      Stream  : File_Descr;
      Success : out Boolean)
   with
     Global => (Proof_In => The_File, In_Out => Cur_Position),
     Post => (if Success then

                --  Success means no error and no empty file

                (Ferror (Stream) = 0 and then Fpeek (Stream)'Old /= EOF) and then

                --  Case 1: no EOF nor newline character found

                --  N-1 characters are copied to Strng. Nul character is appended.
                --  Previous characters in Strng are preserved beyond the Nth character.
                --  Cur_Position advances N-1 characters.

                (if No_Char_In_Slice (ASCII.LF, Cur_Position'Old, Cur_Position'Old + N - 2)
                      and then
                    No_Char_In_Slice (EOF_Ch, Cur_Position'Old, Cur_Position'Old + N - 2)
                 then
                    Cur_Position = Cur_Position'Old + N - 1 and then
                    (for all Idx in 1 .. N - 1 =>
                       Strng (Idx) = The_File (Cur_Position'Old + Idx - 1)) and then
                    Strng (N) = ASCII.NUL and then
                    (for all Idx in N + 1 .. Strng'Last =>
                       Strng (Idx) = Strng'Old (Idx))

                 --  Case 2: newline character is found

                 --  Characters up to the newline are copied to Strng. Nul character is
                 --  appended. Previous characters in Strng are preserved beyond the nul
                 --  character. Cur_Position advances to the position of the newline
                 --  character.

                 elsif Has_Char_In_Slice (ASCII.LF, Cur_Position'Old, Cur_Position'Old + N - 2)
                         and then
                       No_Char_In_Slice (EOF_Ch, Cur_Position'Old, Cur_Position'Old + N - 2)
                 then
                    declare LF_Pos = Find_Char_In_Slice (ASCII.LF, Cur_Position'Old, Cur_Position'Old + N - 2) in
                      Cur_Position = LF_Pos and then
                      (for all Idx in Cur_Position'Old .. LF_Pos - 1 =>
                         Strng (Idx - Cur_Position'Old + 1) = The_File (Idx)) and then
                      Strng (LF_Pos - Cur_Position'Old + 1) = ASCII.LF and then
                      Strng (LF_Pos - Cur_Position'Old + 2) = ASCII.NUL and then
                      (for all Idx in LF_Pos - Cur_Position'Old + 3 .. Strng'Last =>
                         Strng (Idx) = Strng'Old (Idx))

                 --  Case 3: EOF is found

                 --  Characters prior to EOF are copied to Strng. Nul character is
                 --  appended. Previous characters in Strng are preserved beyond the nul
                 --  character. Cur_Position advances to the position of EOF.

                 elsif No_Char_In_Slice (ASCII.LF, Cur_Position'Old, Cur_Position'Old + N - 2)
                         and then
                       Has_Char_In_Slice (EOF_Ch, Cur_Position'Old, Cur_Position'Old + N - 2)
                 then
                    declare EOF_Pos = Find_Char_In_Slice (EOF_Ch, Cur_Position'Old, Cur_Position'Old + N - 2) in
                      Cur_Position = EOF_Pos and then
                      (for all Idx in Cur_Position'Old .. EOF_Pos - 1 =>
                         Strng (Idx - Cur_Position'Old + 1) = The_File (Idx)) and then
                      Strng (EOF_Pos - Cur_Position'Old + 1) = ASCII.NUL and then
                      (for all Idx in EOF_Pos - Cur_Position'Old + 2 .. Strng'Last =>
                         Strng (Idx) = Strng'Old (Idx)) and then
                      --  redundant proposition to help automatic provers
                      No_Char_In_String (Strng, ASCII.LF, EOF_Pos - Cur_Position'Old + 1)

                 --  Case 4: both newline and EOF appear

                 --  In our model, we choose that this cannot occur. So we consider only
                 --  cases where EOF or newline character are repeated after the first
                 --  occurrence in the file.

                 else False)

                --  Failure corresponds to those cases where low-level fgets
                --  returns a NULL pointer: an error was issued, or the file is
                --  empty. In the last case Cur_Position is not modified.

                else
                  (Ferror (Stream) /= 0 or else
                  (Fpeek (Stream)'Old = EOF and then Cur_Position = Cur_Position'Old)));

Wow! Not so bad for a simple low-level function everyone uses without paying so much attention. But the contract is actually quite straighforward, if you care to look at it in details. If you do so, you'll realize I've displayed it here using a declare-var-in-expr construct that is not yet available in Ada, but will hopefully be in the next version of the language. It comes handy to make the contract readable here, so I'm using it just for this blog post (not in the actual code).


What's left? Well, we should write the contract we'd like to verify for Get_Line:

   procedure Get_Line
     (File : in out File_Type;
      Item : out String;
      Last : out Natural)
   with
     Global => (Input => (EOF, EOF_Ch, The_File), In_Out => (Cur_Position)),

     --  It is an error to call Get_Line on an empty file

     Pre  => not End_Of_File (File),

     Contract_Cases =>

       --  If Item is empty, return Item'First - 1 in Last

       (Item'Last < Item'First =>
          Last = Item'First - 1,

        --  Otherwise, return in Last the length of the string copied, which
        --  may be filling Item, or up to EOF or newline character.

        others =>
          (Last = Item'First - 1 or Last in Item'Range) and then
          (for all Idx in Item'First .. Last =>
             Item (Idx) = The_File (Idx - Item'First + Cur_Position'Old)) and then
          Cur_Position = Cur_Position'Old + Last - Item'First + 1 and then
          (Last = Item'Last or else
           The_File (Cur_Position) = EOF_Ch or else
           The_File (Cur_Position) = ASCII.LF));

And now, let's run GNATprove! I'm taking a bit of a shortcut here, as there are other intermediate subprograms that need contracts, and loops that require loop invariants, for the proof to be performed automatically by GNATprove. But in the process of interacting with GNATprove to get 100% proof, we discovered two other bugs in the implementation of our trusted Get_Line!

  1. One test "K + 2 > Buf'Last" was incorrect, and needed to be fixed as "K + 2 > N". This caused an incorrect value to be returned for the number of characters read in some cases.
  2. The case where Item is empty was incorrectly handled, so that Last was left uninitialized instead of being set to Item'First prior to returning.

Epilogue

After fixing the two bugs mentioned above, we were able to formally prove with SPARK that Get_Line implemented its contract. Well, a slightly simplified version of Get_Line, where we removed the logic around page marks. As an additional verification, we also wrote an implementation of the low-level functions memcpy, memset, fgets, etc. in terms of their effect on the model of ghost variable The_File and Cur_Position. This allowed us to test the not-so-simple contracts on imported low-level functions, to gain better confidence about their correctness.

Overall, we started with a trusted piece of code that had worked for many years, and we ended up fixing 3 bugs in it. The initial bug was present since the first implementation of the function (not the procedure) Get_Line in 2005. The last two bugs were introduced by a faulty change in 2010, when changing the implementation of Get_Line to use fgets instead of fgetc to speed up the function. Interestingly, the comment for this patch said "No change in behavior and good coverage already, so no test case required." And indeed, it's unlikely that testing would have allowed detecting these bugs. Manual review by compiler experts did not find those bugs either. Only formal verification with SPARK allowed us to pinpoint quickly the places where the logic was subtly flawed.

[Cover image by Rama - Own work, CC BY-SA 2.0 fr, https://commons.wikimedia.org/w/index.php?curid=28...]

]]>
SPARK Prez at New Conference on Railway Systems http://blog.adacore.com/spark-prez-at-new-conference-on-railway-systems Fri, 04 Mar 2016 05:00:00 +0000 Yannick Moy http://blog.adacore.com/spark-prez-at-new-conference-on-railway-systems

RSSR is a new conference focused on the development and verification of railway systems. We will present there how SPARK can be used to write abstract software specifications, whose refinement in terms of concrete implementation can be proved automatically using SPARK tools.

More precisely, we used the new functional containers that will be part of the next release of SPARK to specify the behavior of memory allocators, which are implemented in terms of arrays. We use ghost code to define the link between abstract specification and concrete implementation, in two different ways:

  1. In the simpler case, a ghost function provides an abstract view of the concrete implementation.
  2. In the more complex case, a ghost variable is defined to evolve the model together with the concrete data.

This interesting experiment showed us that the kind of refinement proofs that were usual in the B method, for example, are also applicable in SPARK. And that complete automation of proof is achievable in such cases too.

The conference takes place in Paris from June 28 to June 30. For the complete conference program, see here. The full article we wrote is attached.

Attachments

]]>
Make with Ada: Candy dispenser, with a twist... http://blog.adacore.com/make-with-ada-candy-dispenser-with-twist Thu, 03 Mar 2016 15:40:00 +0000 Fabien Chouteau http://blog.adacore.com/make-with-ada-candy-dispenser-with-twist

A few months ago, my colleague Rebecca installed a candy dispenser in our kitchen here at AdaCore. I don’t remember how exactly, but I was challenged to make it more… fun.

So my idea is to add a touch screen on which people have to answer questions about Ada or AdaCore’s technology, and to modify the dispenser so that people only get candy when they give the right answer. (Evil, isn’t it?)

For this, I will use the great STM32F469 discovery board with a 800x600 LCD and capacitive touch screen. But before that, I have to hack the candy dispenser....

Hacking the dispenser

The first thing to do is to hack the candy dispenser to be able to control the candy delivery system.

The candy dispenser is made of :

  • A container for the candies
  • A motor that turns a worm gear pushing the candies out of the machine
  • An infrared proximity sensor which detects the user’s hand

My goal is to find the signal that commands the motor and insert my system in the middle of it. When the dispenser will try to turn on the motor, it means a hand is detected and I can decide whether or not I actually want to turn on the motor.

To find the signal controlling the motor, I started by looking at the wire going to the motor and where it lands on the board. It is connected to the center leg of a “big” transistor. Another leg of the transistor is tied to the ground, so the third one must be the signal I am looking for. By following the trace, I see that this signal is connected to an 8 pin IC: it must be the microcontroller driving the machine.

At this point, I have enough info to hack the dispenser, but I want to understand how the detection works and see what the signal is going to look like. So I hook up my logic analyser to each pin of the IC and start recording.

Here are the 3 interesting signals: infrared LED, infrared sensor, and motor control.

The detection works in 3 phases:

  • Wait: The microcontroller is turning on the infra-red LED (signal 01) only 0.2 milliseconds every 100ms.This is to save power as the machine is designed to be battery powered.
  • Detection: When something is detected, the MCU will then turn on the infra-red LED 10 times to confirm that there’s actually something in front of the sensor.
  • Delivery: The motor is turned on for a maximum of 300ms. During that period the MCU checks every 20ms to see if the hand is still in front of the sensor (signal 03).

This confirms that the signal controlling the motor is where I want to insert my project. This way I let the MCU of the dispenser handle the detection and the false positive.

I scratched the solder mask, cut the trace and solder a wire to each side. The green wire is now my detection signal (high when a hand is detected) and the white wire is my motor command.

I ran those two wires with an additional ground wire down to the base of the dispenser and then outside to connect them to the STM32F469 discovery board.

Software

The software side is not that complicated: I use the GPL release of GNAT and the Ravenscar run-time on an ARM Cortex-M on the STM32F469. The green wire coming from the dispenser is connected to a GPIO that will trigger an interrupt, so whenever the software gets an interrupt it means that there’s a hand in front of the sensor. And using another GPIO, I can turn on or off the dispenser motor.

For the GUI, I used one of my toy projects called Giza: it’s a simple graphical tool kit for basic touch screen user interface. The interfaces has two windows. The first window shows the question and there is one button for each of the possible answers. When the player clicks on the wrong answer, another question is displayed after a few seconds; when it’s the right answer, the delivery window is shown. On the delivery window the player can abort by using the “No, thanks” button or put his hand under the sensor and get his candies!

The code is available on GitHub here: https://github.com/Fabien-Chouteau/AMCQ

Now let’s see what happens when I put the candy dispenser back in the kitchen:

]]>